Hacker News new | past | comments | ask | show | jobs | submit login
Cingulata: Run C++ code over encrypted data with fully homomorphic encryption (github.com/cea-list)
177 points by p4bl0 on June 6, 2020 | hide | past | favorite | 62 comments



FHE is undergoing standardization right now. The leading algorithm candidates are based on lattice crypto which is currently quantum-safe: https://homomorphicencryption.org/

Here are some mature and actively-developed HE libraries:

https://github.com/homenc/HElib (By IBM, employed the inventor of FHE)

https://github.com/Microsoft/SEAL (Best tutorials)

https://palisade-crypto.org/software-library (High-quality codebase)

https://github.com/tfhe/tfhe (Fast boolean logic, used by Cingulata)

Good introductory material is hard to come by, but I like the first half of this talk: https://simons.berkeley.edu/talks/intro-fhe-and-tfhe


Great work! This allows to perform arbitrary computation on untrusted devices.

However last time I checked computation in this scheme is ridiculously slow: on modern machines, cutting edge implementation of FHE manage to get around 100 integer operations per second.

Never the less there have been some brave startups trying to commercialise this technology:

https://venturebeat.com/2020/02/18/enveil-raises-10-million-...

Other interesting things build on top of FHE:

sql database where data and queries are fully encrypted: https://github.com/zerodb/zerodb

fully encripted brainfuck vm: https://github.com/f-prime/arcanevm


There are two main approaches to FHE: homomorphic boolean circuits and homomorphic numerical processing.

In the former (eg Cingulata), you convert a program into a boolean circuit, and evaluate each gate homomorphically. While this is general purpose, it also means you decompose functions that could be done in one instruction into multiple binary operations (so very slow). That’s usually what people refer to when they say FHE is slow.

The other approach consists of operating directly on encrypted integers or reals, and finding ways to do more complex computations (like a square function) in one step. While this is obviously much faster, it is also limited to whatever operations is supported by the scheme. This is what people refer to when they say FHE can only do certain things.

For years, the tradeoff has basically been slow and general purpose, or fast and limited. But there are new scheme being worked on that will be published soon that enable to go way beyond what’s currently done, such as doing efficient deep learning over encrypted data and other complex numerical processing.

Lots is coming out of labs and will be on the market within 2 years!


ohh interesting! Are there any opensource implementations of homomorphic numerical processing?

Were there any efforts of combining both approaches?


Note that most useful homomorphic numerical encryption schemes are easily breakable. Once you have equality you can usually de-anonymize user data. Many companies have been burnt by this.

With a less than operator and the ability to encrypt chosen plaintext values, you can decrypt arbitrary messages in a linear (in message size) number of steps.

Arithmetic operations can often be used to build gadgets that bootstrap comparison operations. For instance, with addition and equality you can implement a comparison operation for low-medium cardinality fields.

The field is littered with negative results that are being sold as secure, practical systems. Be careful when using them on important data.


All FHE schemes today add tiny random noise to the ciphertext so that encrypting the same data twice give different results. The noise is then kept to a nominal level as you compute homomorphically using a special operation called bootstrapping. Then when you decrypt, you just ignore the noise and get your result. If you do that well and dont let the noise grow too big, you get very strong security.

Fwiw, bootstrapping is actually what makes FHE slow, not the actual addition/multiplication etc


>and the ability to encrypt chosen plaintext values

Isn't this a big assumption? The way I envision it is

1. client encrypts data with their key

2. server computes on data without decrypting and without needing the key

3. client decrypts computation output with their key.

Or is it always required at step 2 that the server also has the key needed for encryption (but not decryption obviously)?


> Isn't this a big assumption?

The standard resilience criteria for modern multi-purpose encryption suppose that your scheme should be resistant to adaptive chosen-cipher attack. Chosen plaintext is a way weaker attack (the hierarchy being: known plaintext < chosen plaintext < chosen cipher < adaptative chosen cipher).

It may be OK for some situations, but it requires to be much more cautious than with regular crypto (which is already error-prone…).


The server doesn’t need the decryption key, ever. Thats the whole point in fact. FHE is end to end encryption for compute. However there is sometimes a public key used, called an evaluation key.


Is there a good standards body to refer to for progress on these alternate encryption methods?


You can refer to https://homomorphicencryption.org for more details about security, libraries and standardization. Although for the moment it's in early stages of standardization.


FHE schemes don’t allow comparing equality of plaintexts without decryption


Research paper https://eprint.iacr.org/2018/758 introduces an unified view for HE plaintext spaces. It allows switching between integral, approximate numbers and bit level representations (different HE schemes). But I'm not aware of a HE library implementing this. For the IDASH 2018 competition we have used 2 different libraries (TFHE and CKKS) in the same computation, although the scheme switching procedure was done manually.


Yes, turns out you can convert ciphertexts from one scheme to another, so you can go back and forth between them depending on what type of computation you are trying to do. However the cost of transciphering is high, so in practice it doesn’t work well. But give it a few years and it’ll work!


HEAAN and HEmat are two libraries for numerical processing that you can find on github. They’re not perfect, and require work to get in to shape for real distributed computation.


I suggest looking at TFHE and SEAL for starters


Where did you get the "100 integer operations per second"?

I thought the speed was in the order of minutes for a single operation.


Indeed I didn't remember it correctly:

According to https://tfhe.github.io/tfhe/ states:

> Each binary gate takes about 13 milliseconds single-core time to evaluate, which improves [DM15] by a factor 53, and the mux gate takes about 26 CPU-ms

Addition of two bits can be implemented using 5 binary gates (fulladder) Hence to add 2 32 bit numbers ~416ms => 2 additions per second

EDIT: Shame I cannot edit my original post


This would mean that ~2500 CPUs could do 5000 operations per second, which is amusingly close to the same compute-per-square-foot as the ENIAC.


How encrypted and private is the encrypted data in such systems?

When I looked at encrypted databases, the real ones, not the encrypted at rest databases, I read comments saying that the crypto was relatively too weak to have any use outside research. That it is a neat research topic, it will be great eventually, but it's not ready for production.

So I went with the classic and simple solution : encrypt with aes256gcm, and decrypt and reencrypt if I manipulate the data.

Does a system like cingulata offers encryption as strong or better than aes256gcm?


Systems like Cingulata are true end-to-end encrypted systems. No data is decrypted at any point in the process.

All data manipulation and -processing takes place on encrypted data, as opposed to the encrypted database you mentioned, which still decrypts its contents in memory prior to processing.

The reason homomorphic encryption is far from being ready for production, is that all operations (e.g. all your algorithms and programs) need to be transformed to a virtual circuit that operates on cipher text encrypted by a specific algorithm.

This is akin to translating your software into an inefficient byte code that's then dynamically executed by an obnoxiously slow interpreter.

The great part is that you can simply encrypt your data on a local (and trusted) machine, send the cipher text into the cloud for indexed storage or processing and do your queries or operations on encrypted data. At no point will your data ever be decrypted on the remote machine.

So there's great potential there w.r.t. privacy and cloud computing (and especially AI where training data is often the "magic sauce" that gives your company an edge over the competition) and SaaS.


Thanks for the explanation. The byte code analogy made me understand the technology a lot better.


Super helpful explanation, +100.


All modern FHE is lattice based, so pretty strong if you chose the right parameters. But of course if you dont chose secure parameters.. well, it wont be secure :)

There are tools to measure the security level of FHE schemes: https://bitbucket.org/malb/lwe-estimator/


I guess only experts should chose the parameters.


FHE encryption is bleeding edge crypto. You probably shouldn't use it in real systems with hard security requirements at all without input from experts.


You have formulas to calculate the security level given a threat model, so the compiler could in theory do it automatically. Just specify that you wants 128 bits of security or whatever, and it will do the rest.


Cingulata automatically chooses secure parameters using https://github.com/CEA-LIST/CinguParam module.

It's a separate project because our intention is to provide an easier/faster way to chose HE parameters than lwe-estimator. You need to provide only the multiplicative depth (or the circuit describing the computation for example) and CinguParam will automatically generate the code snippet/parameter file for the HE scheme you want. Also as CinguParam contains a database of HE parameters the actual parameter generation is really faster than using lwe-estimator. There is a lot of work to be done on this project in order to automatize parameter database update, generate HE parameters more precisely using circuit representation instead of multiplicative depth, take into account HE libraries implementation details (RNS, NTT), etc.


Note that encrypted db stuff generally tries to use weaker notions of security in exchange for performance. Generally research revealed that it was even weaker than the authors thought. FHE is generally something different.

>Does a system like cingulata offers encryption as strong or better than aes256gcm?

One of the security garuntees of aes-gcm is non malleablity. Any attempt to modify the ciphertext is detected. This is the key reason why GCM is popular over older methods like CBC. In homomorphic encryption, the entire point is to be able to able to do arbiteary computations on ciphertexts without decrypting. So even theoretically it would be impossible for homomorphic encryption to give same/better security properties as aes-gcm.


That's true that it's theoretically impossible, but you can get close by using verifiable computation protocols like Pinocchio to restrict the computations that the adversary can do. That is to say that before blindly accepting the result from the adversary and decrypting it, you first ensure that they performed a computation you specified beforehand.


...so if you're going to redo the computation you just sent off to the adversary to do in order to check that the adversary did the computation you wanted them to, why mess with outsourcing the outsourcing of the computation at all?


The idea behind verifiable computation is that the verification phase should be less expensive than the computation phase itself.


Reminds me of my work with CryptDB. A framework to run SQL queries over encrypted data: https://github.com/CryptDB/cryptdb

One important fact about homomorphic and other encryption schemes you can calculate on: It leaks information! E.g., if enc(x) + enc(y) = enc(z), you gain the knowledge that x + y = z. With enough data, it’s easy to obtain the unencrypted data without the secret(s).


I did part of a PhD in this field; your comment about information leakage is not true. FHE schemes are IND-CPA secure, which means you can't test for equality in the encrypted space (Note that in the standard definition of homomorphic encryption, you're only guaranteed that Dec(Enc(x+y)) = Dec(Enc(x) + Enc(y)) = x + y). Your comment only applies to schemes where you can test for equality in the encrypted space, which are not IND-CPA secure and should never be used in practice (except in some extreme scenarios where you can/need to make some strong assumptions).


I think you meant homomorphic, not homophobic...


ah, hehe, yes indeed. Thank you, auto-correct


During the COVID freakout, I opined that all the CPU/GPU spent on Folding@Home was unlikely to create any sort of interesting breakthrough and it used an absurd amount of CPU/GPU to not make breakthroughs. In the past, I've been kind of opposed to using FHE on health data (seems wasteful; instead, should be run on trusted compute with a decrypt and encrypt key). But for the amount of wasted compute in CPU/GPU for molecular dynamics... coudl we instead process encyrpted health data on consumer devices (I'm not aware of any health data analytics problems that need to scale over millions of consumer devices).


For “big data” type tasks, the cost of shuffling around the data to thousands of different consumer devices probably makes it ineffective. You’re better off just paying for one centralized data warehouse, rather than all the custom coding and decentralized bandwidth to make some consumer-based strategy work. This seems more like it could be useful for building things like decentralized encryption protocols.


> Cingulata (pronounced "tchingulata")

As an English speaker, this makes me more confused about the pronunciation, not less.


As a non-native English speaker living in a English speaking country, I appreciate the phonetic guidance of a fairly uncommon word


Did that really provide any phonetic guidance? All it appears to say is that it is just like "cingulata" except the "c" should be pronounced "tch".

As you note "cingulata" is fairly uncommon. I have no idea how to pronounce that, and so their guidance still leaves me in the dark about everything but the first letter.

For phonetic guidance something using the International Phonetic Alphabet seems like the way to go. I think that would give something like this: tʃiŋgyəˈlātə, -ätə

That's just a guess taking the IPA for cingulata from Merrian-Webster (siŋgyəˈlātə, -ätə) and replacing the "s" with the "tʃ" that the Macmillan dictionary gives for the "tch" in "latch", which is the first word that came to mind that has a "tch".

(Also, I believe many publications use their own variants on the IPA, so I'm not sure that what I gave is actually proper. If MW and/or Macmillian have their own, the above could be some weird bastardization with little meaning).


Same. You can't just steal words[1] and change the pronunciation!

[1] https://en.wikipedia.org/wiki/Cingulata


Cingulata comes from the Latin words cingŭla or cingŭlum and it should be pronounced with the soft C (like in Tchaikovsky)

> History and Etymology for Cingulata New Latin, from Latin cingulum, cingula girdle + New Latin -ata

https://www.merriam-webster.com/dictionary/Cingulata#:~:text...

It's a common mistake English speakers do all the times

The rule is quite simple: a C followed by the vocals e and i is always pronounced tch in Latin.

Latins used K for "hard C" like in "corn" and S for the s sound like in "cent"


C followed by i and e is soft (as in english ch) in ecclesiatical latin and modern Italian, but my understanding is that romans actually pronounced C as hard C everywhere. I.e. Cicero called himself Kikero.


Correct, in ancient Rome C,K and Q had the same hard sound

We are talking about classic Latin, born around the IV century b.c.

But it changed a lot over the centuries influenced by the Greeks, to the point that the Byzantin Roman empire chose Medieval Greek as official language at the end of the VI century and even before, their Latin was different from the classic Roman one. I think they abandoned Latin as official language in the third century to use Koinè Greek (sorry don't know the name in English) basically a Greek dialect, the first common form of Greek, used among all Hellenistic cities which eventually became modern Greek, still in use nowadays.

Most of the Latin found in science (like in the animal taxonomy) comes from "modern" ecclesiastic Latin which incorporated traits of vulgar and neighbors languages (French, German and Italian), it simplified the alphabet but reduced the symbols available so C was mad an affricate consonant.

In practice we have dealt with modern Latin or some form of it for the past 15 centuries, longer than Rome existed.


> a C followed by the vocals e and i is always pronounced tch in Latin.

MW (from your link) says it's pronounced "siŋgyəˈlātə" which is definitely not "tchingulata", no matter what Latin says.


Yes, in English

It also says that bus is pronounced ˈbəs, but it's a contraction of French omnibus, that comes from Latin omnibus, which is pronounced ɔmnibus (both in French and Latin), like goose, but with a much shorter oo sound.

Anyway the point was is English language that stole words from other languages and changed their pronunciation, not the other way around.


If Tchaikovsky is Russian and the translation of his name into written English is arbitrary and it's pronounced "Chaikovsky", what is the "T" for?


That's also very simple: it's the transliteration of the phonetic symbol 'tʃ which represents the soft C sound as in cheese.

That's what happens when a language steals a foreign word (Latin in this case) and changes its pronunciation

In Latin the way groups of letters are pronounced it's (almost) unambigous, it's a phonetic alphabet itself

If you change the pronunciation, that part is lost and you have to rely on recollection instead of recognition

It also means that if you don't already know a word you can't be sure on how it is pronounced


The 't' is from French. That's how they code the sound in the French language.


P.s. Tchaikovsky in Russian is Чайко́вский and it is pronounced Cheekousky and not Chaikowsky

The correct transliteration is Čajkovskij and the correct phonetic one is tɕɪˈkofskʲɪj


It's a slippery thing, how should the word "lead" be pronounced? Did I mean lead as in leader or lead the metal? In the English language words don't always have a unique pronunciation associated with them.


Author here. We use the romanian language pronunciation for Cingulata. The first sound from Tchaikovsky is the right way as someone already mentioned in this thread.


47 upvotes, no comments. I'm guessing no one understands this.


Yep. Was hoping for someone to debunk or explain. My (probably deeply flawed) hunch was that one would need some kind of specialised virtual machine or something to make homomorphic stuff. I'm babbling now.

Yes, someone please explain Cingulata.


Cingulata is a system for secure, privacy preserving applications based on homomorphic encryption.

Homomorphic encryption is a form of cryptography that allows for operations on encrypted content to yield the same results (though encrypted) as if applied to plain text input.

Basically, the framework takes algorithms implemented in C++ into a virtual boolean circuit that operates on encrypted data. This means you can run your database, AI training, page ranking, etc. on encrypted data for a truly end-to-end encrypted processing or computation on untrusted devices or environments (cloud computing!).

This comes at a price, though, as the virtual circuit is basically a software interpreter for your original algorithm and thus is abysmally slow compared to the original code...

At least that's my understanding of the system.


Are these operations encrypted also? Is this method more efficient rather than returning a mapping of all possible unencrypted inputs and outputs for a given operation.

[edit] I just did the math. 1. mapping of adding two bytes would make ~64kB list of values. 2. mapping of adding two i32 would make ~73786976294838.2MB list of values (looks like encryption would do a better job)


It's a requirement of these algorithms that they run in polynomial time with respect to the size of the input. They're large polynomials with big constants, but they're dwarfed by the exponentials you're considering.


So I can give an executable to anyone in the world and they cannot get to my source code?


Not quite. The data is encrypted, the computation is not. Through the use of universal circuits (circuits that run other circuits) you could give an encrypted executable to an untrusted party to run on encrypted data, but note that the result of this computation is encrypted as well (so, the executable would be useless for the other party -- only you would be able to use the results). In fact, this idea underpins the crucial idea behind FHE: bootstrapping.


No.

You have a function F which maps inputs x to outputs y. You transform this function into new one F', that will map the encrypted inputs encrypt(x) to the encrypted outputs encrypt(y), without knowing how to decrypt.

   F(x) = y 
   F'(encrypt(x)) = encrypt(y)


This is correct. Additionally, the entity performing the computation does not get to learn the answer. This is an important distinction between a technique such as this, and software obfuscation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: