Cingulata: Run C++ code over encrypted data with fully homomorphic encryption

xsamgreen · on June 6, 2020

FHE is undergoing standardization right now. The leading algorithm candidates are based on lattice crypto which is currently quantum-safe: https://homomorphicencryption.org/

Here are some mature and actively-developed HE libraries:

https://github.com/homenc/HElib (By IBM, employed the inventor of FHE)

https://github.com/Microsoft/SEAL (Best tutorials)

https://palisade-crypto.org/software-library (High-quality codebase)

https://github.com/tfhe/tfhe (Fast boolean logic, used by Cingulata)

Good introductory material is hard to come by, but I like the first half of this talk: https://simons.berkeley.edu/talks/intro-fhe-and-tfhe

entelechy · on June 6, 2020

Great work! This allows to perform arbitrary computation on untrusted devices.

However last time I checked computation in this scheme is ridiculously slow: on modern machines, cutting edge implementation of FHE manage to get around 100 integer operations per second.

Never the less there have been some brave startups trying to commercialise this technology:

https://venturebeat.com/2020/02/18/enveil-raises-10-million-...

Other interesting things build on top of FHE:

sql database where data and queries are fully encrypted: https://github.com/zerodb/zerodb

fully encripted brainfuck vm: https://github.com/f-prime/arcanevm

rhindi · on June 6, 2020

There are two main approaches to FHE: homomorphic boolean circuits and homomorphic numerical processing.

In the former (eg Cingulata), you convert a program into a boolean circuit, and evaluate each gate homomorphically. While this is general purpose, it also means you decompose functions that could be done in one instruction into multiple binary operations (so very slow). That’s usually what people refer to when they say FHE is slow.

The other approach consists of operating directly on encrypted integers or reals, and finding ways to do more complex computations (like a square function) in one step. While this is obviously much faster, it is also limited to whatever operations is supported by the scheme. This is what people refer to when they say FHE can only do certain things.

For years, the tradeoff has basically been slow and general purpose, or fast and limited. But there are new scheme being worked on that will be published soon that enable to go way beyond what’s currently done, such as doing efficient deep learning over encrypted data and other complex numerical processing.

Lots is coming out of labs and will be on the market within 2 years!

entelechy · on June 6, 2020

ohh interesting! Are there any opensource implementations of homomorphic numerical processing?

Were there any efforts of combining both approaches?

hedora · on June 6, 2020

Note that most useful homomorphic numerical encryption schemes are easily breakable. Once you have equality you can usually de-anonymize user data. Many companies have been burnt by this.

With a less than operator and the ability to encrypt chosen plaintext values, you can decrypt arbitrary messages in a linear (in message size) number of steps.

Arithmetic operations can often be used to build gadgets that bootstrap comparison operations. For instance, with addition and equality you can implement a comparison operation for low-medium cardinality fields.

The field is littered with negative results that are being sold as secure, practical systems. Be careful when using them on important data.

rhindi · on June 6, 2020

All FHE schemes today add tiny random noise to the ciphertext so that encrypting the same data twice give different results. The noise is then kept to a nominal level as you compute homomorphically using a special operation called bootstrapping. Then when you decrypt, you just ignore the noise and get your result. If you do that well and dont let the noise grow too big, you get very strong security.

Fwiw, bootstrapping is actually what makes FHE slow, not the actual addition/multiplication etc

bondarchuk · on June 6, 2020

>and the ability to encrypt chosen plaintext values

Isn't this a big assumption? The way I envision it is

1. client encrypts data with their key

2. server computes on data without decrypting and without needing the key

3. client decrypts computation output with their key.

Or is it always required at step 2 that the server also has the key needed for encryption (but not decryption obviously)?

littlestymaar · on June 6, 2020

> Isn't this a big assumption?

The standard resilience criteria for modern multi-purpose encryption suppose that your scheme should be resistant to adaptive chosen-cipher attack. Chosen plaintext is a way weaker attack (the hierarchy being: known plaintext < chosen plaintext < chosen cipher < adaptative chosen cipher).

It may be OK for some situations, but it requires to be much more cautious than with regular crypto (which is already error-prone…).

rhindi · on June 6, 2020

The server doesn’t need the decryption key, ever. Thats the whole point in fact. FHE is end to end encryption for compute. However there is sometimes a public key used, called an evaluation key.

lumost · on June 6, 2020

Is there a good standards body to refer to for progress on these alternate encryption methods?

ssmiler · on June 7, 2020

You can refer to https://homomorphicencryption.org for more details about security, libraries and standardization. Although for the moment it's in early stages of standardization.

Ar-Curunir · on June 6, 2020

FHE schemes don’t allow comparing equality of plaintexts without decryption

ssmiler · on June 7, 2020

Research paper https://eprint.iacr.org/2018/758 introduces an unified view for HE plaintext spaces. It allows switching between integral, approximate numbers and bit level representations (different HE schemes). But I'm not aware of a HE library implementing this. For the IDASH 2018 competition we have used 2 different libraries (TFHE and CKKS) in the same computation, although the scheme switching procedure was done manually.

rhindi · on June 6, 2020

Yes, turns out you can convert ciphertexts from one scheme to another, so you can go back and forth between them depending on what type of computation you are trying to do. However the cost of transciphering is high, so in practice it doesn’t work well. But give it a few years and it’ll work!

bargle0 · on June 6, 2020

HEAAN and HEmat are two libraries for numerical processing that you can find on github. They’re not perfect, and require work to get in to shape for real distributed computation.

rhindi · on June 6, 2020

I suggest looking at TFHE and SEAL for starters

Jyaif · on June 6, 2020

Where did you get the "100 integer operations per second"?

I thought the speed was in the order of minutes for a single operation.

entelechy · on June 6, 2020

Indeed I didn't remember it correctly:

According to https://tfhe.github.io/tfhe/ states:

> Each binary gate takes about 13 milliseconds single-core time to evaluate, which improves [DM15] by a factor 53, and the mux gate takes about 26 CPU-ms

Addition of two bits can be implemented using 5 binary gates (fulladder) Hence to add 2 32 bit numbers ~416ms => 2 additions per second

EDIT: Shame I cannot edit my original post

joshuamorton · on June 7, 2020

This would mean that ~2500 CPUs could do 5000 operations per second, which is amusingly close to the same compute-per-square-foot as the ENIAC.

speedgoose · on June 6, 2020

How encrypted and private is the encrypted data in such systems?

When I looked at encrypted databases, the real ones, not the encrypted at rest databases, I read comments saying that the crypto was relatively too weak to have any use outside research. That it is a neat research topic, it will be great eventually, but it's not ready for production.

So I went with the classic and simple solution : encrypt with aes256gcm, and decrypt and reencrypt if I manipulate the data.

Does a system like cingulata offers encryption as strong or better than aes256gcm?

qayxc · on June 6, 2020

Systems like Cingulata are true end-to-end encrypted systems. No data is decrypted at any point in the process.

All data manipulation and -processing takes place on encrypted data, as opposed to the encrypted database you mentioned, which still decrypts its contents in memory prior to processing.

The reason homomorphic encryption is far from being ready for production, is that all operations (e.g. all your algorithms and programs) need to be transformed to a virtual circuit that operates on cipher text encrypted by a specific algorithm.

This is akin to translating your software into an inefficient byte code that's then dynamically executed by an obnoxiously slow interpreter.

The great part is that you can simply encrypt your data on a local (and trusted) machine, send the cipher text into the cloud for indexed storage or processing and do your queries or operations on encrypted data. At no point will your data ever be decrypted on the remote machine.

So there's great potential there w.r.t. privacy and cloud computing (and especially AI where training data is often the "magic sauce" that gives your company an edge over the competition) and SaaS.

speedgoose · on June 6, 2020

Thanks for the explanation. The byte code analogy made me understand the technology a lot better.

jonahbenton · on June 6, 2020

Super helpful explanation, +100.

rhindi · on June 6, 2020

All modern FHE is lattice based, so pretty strong if you chose the right parameters. But of course if you dont chose secure parameters.. well, it wont be secure :)

There are tools to measure the security level of FHE schemes: https://bitbucket.org/malb/lwe-estimator/

speedgoose · on June 6, 2020

I guess only experts should chose the parameters.

bawolff · on June 6, 2020

FHE encryption is bleeding edge crypto. You probably shouldn't use it in real systems with hard security requirements at all without input from experts.

rhindi · on June 6, 2020

You have formulas to calculate the security level given a threat model, so the compiler could in theory do it automatically. Just specify that you wants 128 bits of security or whatever, and it will do the rest.

ssmiler · on June 7, 2020

Cingulata automatically chooses secure parameters using https://github.com/CEA-LIST/CinguParam module.

It's a separate project because our intention is to provide an easier/faster way to chose HE parameters than lwe-estimator. You need to provide only the multiplicative depth (or the circuit describing the computation for example) and CinguParam will automatically generate the code snippet/parameter file for the HE scheme you want. Also as CinguParam contains a database of HE parameters the actual parameter generation is really faster than using lwe-estimator. There is a lot of work to be done on this project in order to automatize parameter database update, generate HE parameters more precisely using circuit representation instead of multiplicative depth, take into account HE libraries implementation details (RNS, NTT), etc.

bawolff · on June 6, 2020

Note that encrypted db stuff generally tries to use weaker notions of security in exchange for performance. Generally research revealed that it was even weaker than the authors thought. FHE is generally something different.

>Does a system like cingulata offers encryption as strong or better than aes256gcm?

One of the security garuntees of aes-gcm is non malleablity. Any attempt to modify the ciphertext is detected. This is the key reason why GCM is popular over older methods like CBC. In homomorphic encryption, the entire point is to be able to able to do arbiteary computations on ciphertexts without decrypting. So even theoretically it would be impossible for homomorphic encryption to give same/better security properties as aes-gcm.

aruss · on June 6, 2020

That's true that it's theoretically impossible, but you can get close by using verifiable computation protocols like Pinocchio to restrict the computations that the adversary can do. That is to say that before blindly accepting the result from the adversary and decrypting it, you first ensure that they performed a computation you specified beforehand.

salawat · on June 8, 2020

...so if you're going to redo the computation you just sent off to the adversary to do in order to check that the adversary did the computation you wanted them to, why mess with outsourcing the outsourcing of the computation at all?

ssmiler · on June 8, 2020

The idea behind verifiable computation is that the verification phase should be less expensive than the computation phase itself.

hansdieter1337 · on June 6, 2020

Reminds me of my work with CryptDB. A framework to run SQL queries over encrypted data: https://github.com/CryptDB/cryptdb

One important fact about homomorphic and other encryption schemes you can calculate on: It leaks information! E.g., if enc(x) + enc(y) = enc(z), you gain the knowledge that x + y = z. With enough data, it’s easy to obtain the unencrypted data without the secret(s).

aruss · on June 6, 2020

I did part of a PhD in this field; your comment about information leakage is not true. FHE schemes are IND-CPA secure, which means you can't test for equality in the encrypted space (Note that in the standard definition of homomorphic encryption, you're only guaranteed that Dec(Enc(x+y)) = Dec(Enc(x) + Enc(y)) = x + y). Your comment only applies to schemes where you can test for equality in the encrypted space, which are not IND-CPA secure and should never be used in practice (except in some extreme scenarios where you can/need to make some strong assumptions).

gautamcgoel · on June 6, 2020

I think you meant homomorphic, not homophobic...

hansdieter1337 · on June 6, 2020

ah, hehe, yes indeed. Thank you, auto-correct

dekhn · on June 6, 2020

During the COVID freakout, I opined that all the CPU/GPU spent on Folding@Home was unlikely to create any sort of interesting breakthrough and it used an absurd amount of CPU/GPU to not make breakthroughs. In the past, I've been kind of opposed to using FHE on health data (seems wasteful; instead, should be run on trusted compute with a decrypt and encrypt key). But for the amount of wasted compute in CPU/GPU for molecular dynamics... coudl we instead process encyrpted health data on consumer devices (I'm not aware of any health data analytics problems that need to scale over millions of consumer devices).

lacker · on June 6, 2020

For “big data” type tasks, the cost of shuffling around the data to thousands of different consumer devices probably makes it ineffective. You’re better off just paying for one centralized data warehouse, rather than all the custom coding and decentralized bandwidth to make some consumer-based strategy work. This seems more like it could be useful for building things like decentralized encryption protocols.

pimlottc · on June 6, 2020

> Cingulata (pronounced "tchingulata")

As an English speaker, this makes me more confused about the pronunciation, not less.

pmelendez · on June 6, 2020

As a non-native English speaker living in a English speaking country, I appreciate the phonetic guidance of a fairly uncommon word

tzs · on June 6, 2020

Did that really provide any phonetic guidance? All it appears to say is that it is just like "cingulata" except the "c" should be pronounced "tch".

As you note "cingulata" is fairly uncommon. I have no idea how to pronounce that, and so their guidance still leaves me in the dark about everything but the first letter.

For phonetic guidance something using the International Phonetic Alphabet seems like the way to go. I think that would give something like this: tʃiŋgyəˈlātə, -ätə

That's just a guess taking the IPA for cingulata from Merrian-Webster (siŋgyəˈlātə, -ätə) and replacing the "s" with the "tʃ" that the Macmillan dictionary gives for the "tch" in "latch", which is the first word that came to mind that has a "tch".

(Also, I believe many publications use their own variants on the IPA, so I'm not sure that what I gave is actually proper. If MW and/or Macmillian have their own, the above could be some weird bastardization with little meaning).

zimpenfish · on June 6, 2020

Same. You can't just steal words[1] and change the pronunciation!

[1] https://en.wikipedia.org/wiki/Cingulata

asjw · on June 6, 2020

Cingulata comes from the Latin words cingŭla or cingŭlum and it should be pronounced with the soft C (like in Tchaikovsky)

> History and Etymology for Cingulata New Latin, from Latin cingulum, cingula girdle + New Latin -ata

https://www.merriam-webster.com/dictionary/Cingulata#:~:text...

It's a common mistake English speakers do all the times

The rule is quite simple: a C followed by the vocals e and i is always pronounced tch in Latin.

Latins used K for "hard C" like in "corn" and S for the s sound like in "cent"

gpderetta · on June 6, 2020

C followed by i and e is soft (as in english ch) in ecclesiatical latin and modern Italian, but my understanding is that romans actually pronounced C as hard C everywhere. I.e. Cicero called himself Kikero.

asjw · on June 6, 2020

Correct, in ancient Rome C,K and Q had the same hard sound

We are talking about classic Latin, born around the IV century b.c.

But it changed a lot over the centuries influenced by the Greeks, to the point that the Byzantin Roman empire chose Medieval Greek as official language at the end of the VI century and even before, their Latin was different from the classic Roman one. I think they abandoned Latin as official language in the third century to use Koinè Greek (sorry don't know the name in English) basically a Greek dialect, the first common form of Greek, used among all Hellenistic cities which eventually became modern Greek, still in use nowadays.

Most of the Latin found in science (like in the animal taxonomy) comes from "modern" ecclesiastic Latin which incorporated traits of vulgar and neighbors languages (French, German and Italian), it simplified the alphabet but reduced the symbols available so C was mad an affricate consonant.

In practice we have dealt with modern Latin or some form of it for the past 15 centuries, longer than Rome existed.

zimpenfish · on June 6, 2020

> a C followed by the vocals e and i is always pronounced tch in Latin.

MW (from your link) says it's pronounced "siŋgyəˈlātə" which is definitely not "tchingulata", no matter what Latin says.

asjw · on June 6, 2020

Yes, in English

It also says that bus is pronounced ˈbəs, but it's a contraction of French omnibus, that comes from Latin omnibus, which is pronounced ɔmnibus (both in French and Latin), like goose, but with a much shorter oo sound.

Anyway the point was is English language that stole words from other languages and changed their pronunciation, not the other way around.

ReactiveJelly · on June 6, 2020

If Tchaikovsky is Russian and the translation of his name into written English is arbitrary and it's pronounced "Chaikovsky", what is the "T" for?

asjw · on June 6, 2020

That's also very simple: it's the transliteration of the phonetic symbol 'tʃ which represents the soft C sound as in cheese.

That's what happens when a language steals a foreign word (Latin in this case) and changes its pronunciation

In Latin the way groups of letters are pronounced it's (almost) unambigous, it's a phonetic alphabet itself

If you change the pronunciation, that part is lost and you have to rely on recollection instead of recognition

It also means that if you don't already know a word you can't be sure on how it is pronounced

otabdeveloper4 · on June 6, 2020

The 't' is from French. That's how they code the sound in the French language.

asjw · on June 6, 2020

P.s. Tchaikovsky in Russian is Чайко́вский and it is pronounced Cheekousky and not Chaikowsky

The correct transliteration is Čajkovskij and the correct phonetic one is tɕɪˈkofskʲɪj

fabiospampinato · on June 6, 2020

It's a slippery thing, how should the word "lead" be pronounced? Did I mean lead as in leader or lead the metal? In the English language words don't always have a unique pronunciation associated with them.

ssmiler · on June 7, 2020

Author here. We use the romanian language pronunciation for Cingulata. The first sound from Tchaikovsky is the right way as someone already mentioned in this thread.

asdfaoeu · on June 6, 2020

47 upvotes, no comments. I'm guessing no one understands this.

jacobush · on June 6, 2020

Yep. Was hoping for someone to debunk or explain. My (probably deeply flawed) hunch was that one would need some kind of specialised virtual machine or something to make homomorphic stuff. I'm babbling now.

Yes, someone please explain Cingulata.

qayxc · on June 6, 2020

Cingulata is a system for secure, privacy preserving applications based on homomorphic encryption.

Homomorphic encryption is a form of cryptography that allows for operations on encrypted content to yield the same results (though encrypted) as if applied to plain text input.

Basically, the framework takes algorithms implemented in C++ into a virtual boolean circuit that operates on encrypted data. This means you can run your database, AI training, page ranking, etc. on encrypted data for a truly end-to-end encrypted processing or computation on untrusted devices or environments (cloud computing!).

This comes at a price, though, as the virtual circuit is basically a software interpreter for your original algorithm and thus is abysmally slow compared to the original code...

At least that's my understanding of the system.

marius_k · on June 6, 2020

Are these operations encrypted also? Is this method more efficient rather than returning a mapping of all possible unencrypted inputs and outputs for a given operation.

[edit] I just did the math. 1. mapping of adding two bytes would make ~64kB list of values. 2. mapping of adding two i32 would make ~73786976294838.2MB list of values (looks like encryption would do a better job)

aruss · on June 6, 2020

It's a requirement of these algorithms that they run in polynomial time with respect to the size of the input. They're large polynomials with big constants, but they're dwarfed by the exponentials you're considering.

radres · on June 6, 2020

So I can give an executable to anyone in the world and they cannot get to my source code?

aruss · on June 6, 2020

Not quite. The data is encrypted, the computation is not. Through the use of universal circuits (circuits that run other circuits) you could give an encrypted executable to an untrusted party to run on encrypted data, but note that the result of this computation is encrypted as well (so, the executable would be useless for the other party -- only you would be able to use the results). In fact, this idea underpins the crucial idea behind FHE: bootstrapping.

layoutIfNeeded · on June 6, 2020

No.

You have a function F which maps inputs x to outputs y. You transform this function into new one F', that will map the encrypted inputs encrypt(x) to the encrypted outputs encrypt(y), without knowing how to decrypt.

   F(x) = y 
   F'(encrypt(x)) = encrypt(y)

russfink · on June 6, 2020

This is correct. Additionally, the entity performing the computation does not get to learn the answer. This is an important distinction between a technique such as this, and software obfuscation.