The firedancer team at one of the better HFT firms wrote an AVX512 optimized implementation of ed25519 and X25519 that’s significantly faster than OpenSSL.
I laughed a little at calling Firedancer contributors "a team at a HFT firm".
Not that you are technically wrong, not at all, that's where Jump came from. It's just that this is all completely blockchain-driven optimization, but the b-word is so dirty now that we've gotta go back to using TradFi for the rep.
It's hard to separate from the sea of grifters, con men, cranks, and scammers that infest the domain. Just using the word is a yellow flag that you might be some kind of whacko, even if all you really want to talk about is the math.
People have to forever be on guard that you might at any point pivot to all taxation is theft or how you have formed your own micro nation that consists entirely of yourself and thus have diplomatic immunity from all prosecution. Because it happens. Or maybe you have a once in a lifetime deal to buy this receipt like object for some hideous art that is guaranteed to appreciate in value millions of percent. It's just the crowd that has aggregated around crypto currencies includes a lot of untrustworthy people.
Why do people need to be on guard for those beliefs? People should be critical thinkers and not thought police.
Granted, there are all kinds of whackos in crypto, but we should only be concerned about the immoral ones trying to scam us out of our money: SBF, Do-Kwon, and the like.
people are legitimately buying farming land in the US and currently suing farmers for "anti-trust" for refusing to sell them their land so that they can quite literally create a crypto based sovereign micro-nation of wealthy tech VC's. [1] and I think that is a selfish, vile and delusional thing to do. It has nothing to do with "thought police" its as simple as looking at the impact of their actions and beliefs and making the decision to reject that way of thinking and way of life.
This directly implies that all the people that did useful stuff (improving cancer survivability, new vaccines, renewable energy, and others) are all "below" the "greatest minds of our generation".
Not to mention it also suggests there is a way to "compare" minds. I would not choose myself to do somethings, but that does not mean I despise automatically people choosing to.
It doesn't seem wasteful and unproductive, given that the result of the HFT industry is smaller bid/ask spreads (lowering costs for all trades) and payment for order flow which is the mechanism that eliminated retail commissions and provides price improvement on many retail trades. And even so, HFT firms are making money.
It might not seem like real work, but making money by reducing costs of market participants sounds like a good thing. I admit though, block trades might be harder now than before the rise of HFT.
If you could do warehousing/distributing/coordinating fresh foods in a way that reduced the difference in price between the farmer and the consumer and make money doing it, that would clearly be good work.
I'll never be able to figure out what people get from repeating the same thing over and over. I've seen this same exact comment 1000 times on hn and I'm 100% sure you have too (indeed I believe the reason you repeat is because you've seen it and agree with it).
I see they learned clang’s dirty little secret over intrinsics viz. that in producing the IR it deviates (sometimes dramatically when AVX-512 is concerned) from the documented opcodes and the results are inevitably detrimental.
This is why ffmpeg uses assembly, and people get extremely mad when you say it's done for a reason, because they always want to come up with a fancier abstraction (usually cross-platform) which then defeats the purpose because it doesn't actually work.
nb those abstractions do make sense when you can only afford to write a single implementation of the algorithm; then you're just talking about a high level programming language. But they frequently fail to achieve their goal when you're writing a second implementation for the sole purpose of being faster.
It's much more than just performance they've thought about. Here are some of the secure programming practices that have been implemented:
/* All the functions in this file are considered "secure", specifically:
- Constant time in the input, i.e. the input can be a secret[2]
- Small and auditable code base, incl. simple types
- Either, no local variables = no need to clear them before exit (most functions)
- Or, only static allocation + clear local variable before exit (fd_ed25519_scalar_mul_base_const_time)
- Clear registers via FD_FN_SENSITIVE[3]
- C safety
*/
libsodium[4] implements similar mechanisms, and Linux kernel encryption code does too (example: use of kfree_sensitive)[5]. However, firedancer appears to better avoid moving secrets outside of CPU registers, and [3] explains that libraries such as libsodium have inadequate zeroisation, something which firedancer claims to improve upon.
These are table stakes for core cryptographic code, and SOT crypto code --- like the Amazon implementation this story is about --- tend at this point all to be derived from formal methods.
As an example, the Amazon implementation doesn't refer to gcc's[1] and clang's[2] "zero_call_used_regs" to zeroise CPU registers upon return or exception of functions working on crypto secrets. OpenSSL doesn't either.[3] firedancer _does_ use "zero_call_used_regs" to allow gcc/clang to zeroise used CPU registers.[9]
As another example, the Amazon implementation also doesn't refer to gcc's "strub" attribute which zeroises the function's stack upon return or exception of functions working on crypto secrets.[4][5] OpenSSL doesn't either.[3] firedancer _does_ use the "strub" attribute to allow gcc to zeroise the function's stack.[9]
Is there a performance impact? [6] has the overhead at 0% for X25519 for implementing CPU register and stack zeroisation. Compiling the Linux kernel with "CONFIG_ZERO_CALL_USED_REGS=1" for x64_64 (impacting all kernel functions) was found to result in a 1-1.5% performance penalty.[7][8]
Zeroizing a register seems pretty straightforward. Zeroizing any cache that it may have touched seems a lot more complex. I guess that's why they work so hard to keep everything in registers. Lucky for them we aren't in the x86 era anymore and there are a useful number of registers. I'll need to read up on how they avoid context switches while their registers are loaded.
That looks really neat, but I still don't understand what firedancer actually is - what is a validator client for Solana and why does it need it's own crypto library?
It’s a new from scratch implementation of a validator for Solana the fastest blockchain by several orders of magnitude. The slowest part is signature verification so they sped up hashing to improve performance of the entire system.
They follow a first principles approach (the lead has a few physics degrees) and opted to speed up the cryptography. The beauty of this, despite the bad views on blockchain, is that they freaking sped up the cryptography of commonly used algorithms more than anything open or closed source that I personally am aware of.
It’s a win in cryptography, much like this Amazon post is, except it’s slower than the firedancer implementation.
Off topic - is Firedancer going to survive Jump winding down its crypto arm?
Kanav left, they liquidated a huge staked ETH position a few months ago (+ a bunch of other coins), and the SEC/CFTC is all over them for the Terra Luna fiasco.
You will see a half dozen or so talks about firedancer and probably 35-40 or so of us total (I’m at the company that does security for firedancer, Asymmetric Research. We were founded by former jumpers).
You can make the determination on your own, but there will be an obvious large showing of firedancer folks and some exciting updates for the project.
> The beauty of this, despite the bad views on blockchain, is that they freaking sped up the cryptography of commonly used algorithms more than anything open or closed source that I personally am aware of.
For users that have AVX-512, which isn't widely available (AMD Zen 4 / Zen 5, Sapphire Rapids)...
Sure, and cpus supporting it will proliferate. Shockingly to no one reading hacker news... Both software and hardware continue to improve with time generally speaking. This was a huge software improvement on hardware that supports that functionality. It is a huge win for anyone wanting to use these algorithms where they can afford hardware that supports it.
We should celebrate Amazon's improvements and we should celebrate these improvements. Both are great for the future of technology, regardless of why they were initially developed. Improving tech and keeping it open source is good for all.
The formal methods nerd in me is happy to see HOL Light being used to formally verify this implementation. I'm curious to see how closely their abstract machine models follow specific machine implementations. OOO, speculation, and deep pipelining have non-trivial impacts on potential side channels, and these vary quite a bit by stepping and architecture.
Even worse: Each new CPU generation will need a new machine model and a reevaluation. Because OOO, speculation and all the timing behaviour are non-functional properties that frequently change due to new optimizations, different internal structuring, etc.
> The x25519 algorithm also plays a role in post-quantum safe cryptographic solutions, having been included as the classical algorithm in the TLS 1.3 and SSH hybrid scheme specifications for post-quantum key agreement.
Really though? This mostly-untrue statement is the line that warrants adding hashtag #post-quantum-cryptography to the blogpost?
My (probably naive) understanding is that 25519 already provided better performance than other algorithms used for similar purposes (e.g. RSA) when tuned for a roughly similar level of security; anecdotally, generating 2048-bit or larger RSA keys for me tends to be a lot slower than ed25519. At times I've run into places that require me to use RSA keys though (ironically, I seem to remember first experiencing this with AWS years back, although I honestly can't recall if this is still the case or not).
If this further improvement becomes widely used, it would be interesting to see if it's enough to tip the scales towards ed25519 being more of the de facto "default" ssh key algorithm. My experience is that a decent number of people still use RSA keys most of the time, but I don't feel like I have nearly enough of a sample size to conclude anything significant from that.
> My experience is that a decent number of people still use RSA keys most of the time, but I don't feel like I have nearly enough of a sample size to conclude anything significant from that.
I wouldn't be surprised if a lot of people still use RSA for SSH keys for one or more of the following reasons:
1. A lot of tutorials about generating SSH Keys were written before ed25519, so if they follow an old tutorial they'll probably be generating an RSA key.
2. Older versions of OpenSSH, that you'd find on CentOS 7 and below, would default to RSA if you didn't specify a key type when running ssh-keygen.
3. There are some systems out there that don't support ed25519, though they are becoming rarer. If you have to deal with those systems then you're forced to use RSA (at least for that system).
4. Some of us have been using SSH keys from way before OpenSSH add support for ed25519 keys in 2014, so any long lived SSH keys won't be ed25519 keys (wow, ed25519 has now been about in OpenSSH for over 10 years).
5. a lot of people (especially older people I suspect) think "RSA" when they hear "public key cryptography".
I'm in my twenties and still have that reaction. I know elliptic curves exist, I even sort-of-kind-of have an awareness of how they work, but if I was asked to name one cryptosystem that used public and private keys, I'd definitely say RSA first and not elliptic curves.
This is likely in no small part due to CS education only really teaching the mechanics of RSA (modular arithmetic, Fermat's little theorem, etc), or at least, that still seems to be the case at Berkeley. I'd guess because elliptic curve crypto requires more advanced math to reason about (more advanced group theory, at least) and doesn't map as cleanly to existing concepts that non-math-major undergrads have.
cryptopals.com also doesn't cover any elliptive curve crypto until you get into the last set.
I would think that the (non-EC) Diffie-Hellman would also be easy enough to teach as well: exponentials and discrete log problem aren't any/much complicated than explaining factorization.
> 3. There are some systems out there that don't support ed25519, though they are becoming rarer. If you have to deal with those systems then you're forced to use RSA (at least for that system).
> If you interact with government or some large entities that do business with government, they have to comply with FIPS 140-2, and cannot use ed25519.
Not even when FIPS 140-3 was (finally) finalized in 2019, and testing began in 2020?
(I guess the problem is that various crypto implementations need to get recertified under the new standard...)
edit: it looks like AWS-LC [0] and boringcrypto [1] have both been validated under FIPS 140-3. Azure's OpenSSL crypto [2] has only been validated under FIPS 140-2 as far as I can tell.
When I run `ssh-keygen`, I can remember the options `-t rsa` or `-t dsa`. I simply cannot remember the flag `-t ed25519`. I have to look it up every time.
I just remember the flag as being vaguely similar the name of the monster robot from RoboCop.
As of OpenSSH 9.5 the default has changed, so you don't have to specify anything:
* ssh-keygen(1): generate Ed25519 keys by default. Ed25519 public keys
are very convenient due to their small size. Ed25519 keys are
specified in RFC 8709 and OpenSSH has supported them since version 6.5
(January 2014).
> My (probably naive) understanding is that 25519 already provided better performance than other algorithms used for similar purposes (e.g. RSA) when tuned for a roughly similar level of security; anecdotally, generating 2048-bit or larger RSA keys for me tends to be a lot slower than ed25519.
My also naive (an possibly out of date) understanding is key generation is much faster in with ecc, and that signing is faster too, but verifying is faster for rsa. So switching from a RSA to an ECC server certificate saves bytes on the wire, because keys are smaller, and saves server cpu because signing is faster, but may increase client cpu because verification is slower. The byte savings may make up for the increase in cpu though.
> My also naive (an possibly out of date) understanding is key generation is much faster in with ecc, and that signing is faster too, but verifying is faster for rsa. So switching from a RSA to an ECC server certificate saves bytes on the wire, because keys are smaller, and saves server cpu because signing is faster, but may increase client cpu because verification is slower. The byte savings may make up for the increase in cpu though.
Interesting! I wonder if this new algorithm is intended to help with that. I'm super curious if the smaller payload does indeed make a difference (with the current algorithm) like you mention; I know that with databases and filesystems, compression is commonly used to shift the balance from I/O to CPU due to disk writes being slow (with reduced storage size being a side benefit but not usually the main motivation), but I also know that cryptographic verification being too slow can be an anti-feature if it makes brute forcing feasible, so the amount of CPU work needed might be pretty high still.
It's 11 k verify/s for ecda vs 39k verify/s for rsa-2048. A TLS handshake needs at least one sign and verify from the server cert, plus some verifies for the signature on the cert chain (but those signatures are used over and over).
> anecdotally, generating 2048-bit or larger RSA keys for me tends to be a lot slower than ed25519
That’s not really anecdotal. Generating an ed25519 key is barely more than generating a random 256-bit value. Generating an RSA key is significantly more work.
I pretty quickly realized in college when learning about this stuff that the math was well over my head, and I shifted my focus more to understanding how to properly use cryptography rather than implement it (which turned out to be more important as a software engineer anyhow). In retrospect, I really appreciate how the professor I had in a security-focused course explicitly told us it was okay if we didn't understand the math and wouldn't be tested on it when going over how it worked.
Counterpoint: it's not OK to skip the math with cryptography. You may not need to power through all of Silverman's curve book (though: I don't know for sure that's true, which is why I don't call myself a cryptography engineer), but you have to get as deep into the math as you can in order to safely use cryptographic algorithms.
If you're math-avoidant, stick with high-level abstractions like NaCL and TLS. There's nothing wrong with that!
A professor talking about and demonstrating cryptography at the level of individual algorithms is doing their class a disservice if they say "none of the math will be on the test". The algorithms are enough to put something together that seems like it works; the math is what you need to find out if your resulting system actually does work. It's where many of the fun bug classes live.
I'm not sure if you're reading more into what I said than I intended, but I'm not convinced by this argument. You might have missed that this course was on security in general, not cryptography; not everything in the course was cryptographic related.
That said, I'd argue that for the vast majority of software engineers the type of stuff they're dealing with can be dealt with without needing to know the math. For example, you don't need to understand the math to behind the algorithms to know that bcrypt is a reasonable password hashing algorithm and that sha1 and md5 are not, or that salts are used to mitigate issues when users reuse passwords. These are principles that you can understand at a high level without fully understanding the underlying details. If anything, I think that overemphasis on requiring people to learn and understand the math has the effect of over-focusing on simpler algorithms that aren't actually what people want to be using in practice due to the fact that they're easier to teach and often foundational in conveying concepts that would need to be learned to understand the more complicated algorithms.
If using cryptographic algorithms directly requires knowing the math, then I'd agree that most people shouldn't be using them directly, but I'd go further and say that a lack of libraries that are safe for people to use for software engineering without understanding the implementation is a failing of the ecosystem; as much as "regular" software engineering people (like myself!) can struggle with the math behind cryptography, I think that a lot of people developing cryptographic libraries struggle with building reasonable abstractions and making user-friendly APIs (which is a skill I think in general is not emphasized enough for most software engineers, to the detriment of everyone).
Sure. It's a failing of the ecosystem. That observation, a cup of coffee, and 1-3 years will get you a Kenny Paterson paper stunt-breaking your system. I feel where you're coming from, but, respectfully: it does not matter.
My thing here is just: learn the math! Or do something else. I did! There is so much to do in our industry.
> My thing here is just: learn the math! Or do something else. I did! There is so much to do in our industry.
I'm not sure I understand what you mean here by "something else in our industry". Are you arguing that I'm not qualified to be a software engineer due to not understanding the math behind elliptic curves, or did you miss my repeated use of phrases like "the vast majority of software engineers" rather than some specialty where cryptography implementation details details are more important? If the latter, I can reassure you that I don't work in cryptography, work on any cryptographic libraries, or have any specific responsibilities related to security beyond the general idea that all software being written should be secure. If the former, I'll have to respectfully disagree, and suggest that maybe even if you aren't willing to consider that you're wrong about the math being a hard requirement for someone being qualified as a software engineer, it's worth considering that you almost certainly don't have enough information to conclude whether a stranger on the internet is qualified based on reading some of their comments.
RSA signature verification is already very fast and TLS doesn't use RSA for encryption anymore so the problem reduces to optimizing signing operations.
I was aware of s2n-bignum which is a very cool project, but apparently there is a larger sister project, aws-lc, that aims for broader set of APIs including OpenSSL compatibility, while retaining the general approach and vibe (lots of formal verification + performance work): https://github.com/aws/aws-lc
That's pretty sweet. I'm currently using BoringSSL in a project as a supplement to OpenSSL (mostly because it is much easier to build for Windows users than requiring them to fiddle with msys2/vcpkg etc; the alternative is to rely on the Windows CNG API, but it lacks features like ed25519 support.) I wonder how much effort it would take to use aws-lc instead... Not that I'm that interested, BSSL is pretty good, but free performance and heavy automated verification is always nice :)
Related: one of the authors of this post, John Harrison, wrote a really good book about automated theorm proving about 15 years ago while working on floating point verification at Intel -- there's still no other book quite like this one, I think https://www.cl.cam.ac.uk/~jrh13/
Holy shit these claims are wild!
It's not just a percent more performance here and there, the graphs look more like 50% more throughput on the same hardware (depending on the cpu architecture).
My immediate fear was that they optimized away the security features like absence of timing side channels, but they say they still have those.
They also claim to have formal proof of correctness, which is even more amazing, because they are not doing it on a symbolic level but on a machine instruction level. Apparently they tought their reasoning system the semantics of all the CPU instructions used in the assembler implementation.
I'll still wait what djb has to say about this, but it looks freaking amazing to me.
I'm assuming when they say that this improves user experience, that it implies the use case is primarily TLS. In which case store-now-decrypt-later attacks are already considered an urgent threat with regard to post quantum crypto. With FIPS 203 being released and Chrome is already using an implementation based on the draft standard, this seems like this algo (at least for TLS) should be on its way out.
Thanks I forgot about that. So if understand it right, the idea is to provide some insurance in the case that these relatively young algorithms are broken as they get exposed to more and more cryptanalysis
No one other than NIST is recommending phasing out pre-quantum crypto. Everyone else is using a combination of pre-quantum and post-quantum because trust in the security and robustness of the post-quantum ecosystem is fairly low.
Curve25519 is designed to be resistant to timing attacks, such as clamping the 254th bit in x25519 keys to 1 so that implementors can not optimize away a multiplication round.
That doesn't mean that this implementation doesn't have timing attacks, but the implementors claim they chose mechanisms which should be constant-time.
> Does 25519 suffer from key/data-dependant execution time?
I mean, when implemented naively, yes, but the industry has been aware of timing attacks for decades such that this is table stakes for any crypto implementations.
From the article:
> We also do our best to execute the algorithms in constant time, to thwart side-channel attacks that infer secret information from the durations of computations.
https://github.com/awslabs/s2n-bignum (where most of the heavy lifting is done, per the article) further explicitly states that "Each function is moreover written in a constant-time style to avoid timing side-channels."
The next paragraph makes a slightly stronger statement about its constant-time'ness:
> Our implementations of x/Ed25519 are designed with constant time in mind. They perform exactly the same sequence of basic CPU instructions regardless of the input values, and they avoid any CPU instructions that might have data-dependent timing.
[Widely used] Cryptographic Rust crates offering "constant time" operations in "pure Rust" — but Rust has no primitives for doing constant time operations, so it's only through hopes and prayers that it might actually work, and with no guarantee anywhere that it actually should.
(Other, less timing attack related stuff, but e.g., major companies still not supporting anything beyond RSA.)
https://github.com/firedancer-io/firedancer/pull/716
Ditto for sha256: https://github.com/firedancer-io/firedancer/pull/778
And sha512: https://github.com/firedancer-io/firedancer/pull/760
If you’re an optimization nerd, this codebase is wild.
reply