Hacker News new | past | comments | ask | show | jobs | submit login
Is something bugging you? (antithesis.com)
1179 points by wwilson 9 months ago | hide | past | favorite | 417 comments



> The biggest effect was that it gave our tiny engineering team the productivity of a team 50x its size.

I feel like the idea of the legendary "10x" developer has been bastardized to just mean workers who work 15 hours a day 6.5 days a week to get something out the door until they burn out.

But here's your real 10x (or 50x) productivity. People who implement something very few people even considered or understood to be possible, which then gives amazing leverage to deliver working software in a fraction of the time.


It seems like the industry would get a lot more 10x behavior if it was recognized and rewarded more often than it currently does. Too often, management will focus more on the guy who works 12 hour days to accomplish 8 hours of real work than the guy who gets the same thing accomplished in an 8 hour day. Also, deviations from 'normal' are frowned upon. Taking time to improve the process isn't built into the schedule; so taking time to build a wheelbarrow is discouraged when they think you could be hauling buckets faster instead.


>It seems like the industry would get a lot more 10x behavior if it was recognized and rewarded more often than it currently does

I'd be happier if industry cares more for team productivity - I have witnessed how rewarding "10x" individuals may lead to perverse results on a wider scale, a la Cobra Effect. In one insidious case, our management-enabled, long-tenured "10x" rockstar fixed all the big customer-facing bugs quickly, but would create multiple smaller bugs and regressions for the 1x developers to fix while he moved to the next big problem worthy of his attention. Everyone else ended up being 0.7x - which made the curse of an engineer look even more productive comparatively!

Because he was allowed to break the rules, there was a growing portion of the codebase that only he could work on - while it wasn't Rust, imagine an org has a "No Unsafe Rust" rule that is optional to 1 guy. Organizations ought to be very careful how they measure productivity, and should certainly look beyond first-order metrics.


> In one insidious case, our management-enabled, long-tenured "10x" rockstar fixed all the big customer-facing bugs quickly, but would create multiple smaller bugs and regressions for the 1x developers to fix while he moved to the next big problem worthy of his attention. Everyone else ended up being 0.7x - which made the curse of an engineer look even more productive comparatively! Because he was allowed to break the rules,

bingo, well said. Worked on a team like this with a “principal” engineer who’d work very fast with bug-ridden work like this simply because he had the automatic blessing from on high to do whatever he wanted. My unfortunate task was to run along behind him and clean up, which to my credit I think I did a pretty good job at, but of course these types can only very rarely acknowledge/appreciate that.

Eventually he got super insecure/threatened and attempted to push me out along with whoever else he felt was a threat to his fiefdom.


What happened to him in the end?


I don’t know, eventually the working environment got too toxic and I along with two other senior engineers quit. As far as I know he’s still running his little kingdom.


I try to look at these things through the lens of “software literacy” - software is a form of literacy and this story might be better viewed as “a bunch of illiterate managers are impressed with one good writer at the encyclopdia publishers, now it turns out this guy makes mistakes, but hey, what do you expect when the management cannot read or write !”


Sure. A cohesive team of employees who stick together for more than 2 years is so rare. They can be so much more productive than a group of talented people who rotate a new person in and out every 3-6 months.


This reminds me of the "Parable of the Two Programmers." [1] A story about what happens to a brilliant developer given an identical task to a mediocre developer.

[1] I preserved a copy of it on my (no-advertising or monetization) blog here: https://realmensch.org/2017/08/25/the-parable-of-the-two-pro...


I had an idea once but when I tried to explain it people didn't understand.

I revisited earlier thought: communication is a 2 man job, one is to not make an effort to understand while the other explains things poorly. It always manages to never work out.

Periodically I thought about the puzzle and was eventually able to explain it such that people thought it was brilliant ~ tho much to complex to execute.

I thought about it some more, years went by and I eventually managed to make it easy to understand. The response: "If it was that simple someone else would have thought of it." I still find it hilarious decades later.

It pops to mind often when I rewrite some code and it goes from almost unreadable to something simple and elegant. Ah, this must be how someone else would have done it!


> Ah, this must be how someone else would have done it!

This is a good exclamation :D

And it's a poignant story. Thanks for sharing.


That’s pretty good. It needs an Athena poster :-)


“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”

― Abraham Lincoln

I have started to follow this 'lately' (for a decade) and it has worked miracles. As for the anxious managers/clients, I keep them updated of the design/documentation/though process, mentioning the risks of the path-not-taken, and that maintain their peace of mind. But this depends heavily on the client and the managers.


I can't seem to find it in a google search, maybe I'm just recalling entirely the wrong terms.

In the early computing era there was a competition. Something like take some input and produce an output. One programmer made a large program in (IIRC) Fortran with complex specifications documentation etc. The other used shell pipes, sort, and a small handful or two of other programs in a pipeline to accomplish the same task in like 10 developer min.


The Knuth link in the sibling comment is an original, but you're probably thinking of "The Tao of Programming"

http://catb.org/~esr/writings/unix-koans/ten-thousand.html

"""“And who better understands the Unix-nature?” Master Foo asked. “Is it he who writes the ten thousand lines, or he who, perceiving the emptiness of the task, gains merit by not coding?”"""


There was also the "Hadoop vs. unix pipeline running on a laptop"-story a few years back, a more modern take: https://adamdrake.com/command-line-tools-can-be-235x-faster-...



Sounds like "Knuth vs McIlroy", which has been discussed on hn and elsewhere before, and the general take is that it was somewhat unfair to Knuth.

[1] https://homepages.cwi.nl/~storm/teaching/reader/BentleyEtAl8... [2] https://www.google.com/search?q=knuth+vs+mcilroy


This is the competition I was thinking of. I must have read it in a dead-image PDF version some other time on HN. This paper isn't the one I recall but the solution is exactly the sort I vaguely recalled.

I'm trying to copy-in the program as it might have existed, with some obvious updates to work in today's shells ...

  #!/bin/sh
  tr -cs A-Za-z '
  ' "${2:-/dev/stdin}" |\
  tr A-Z a-z |\
  sort |\
  uniq -c |\
  sort -rn |\
  sed ${1:-100}q
Alternately (escapes not yet tested) $ tr -cs A-Za-z \012 "${INPUTFILEHERE:-/dev/stdin}" | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${MAXWORDSHERE:-100}q

Edited: Removed some errors likely induced by OCR / me not catching that in the initial transcription from the browser view of the file.


Just to be clear, it was not a competition. For more, please follow the links from some of the previous HN discussions, e.g. https://news.ycombinator.com/item?id=31301777.

[For those who may not follow all the links: Bentley asked Knuth to write a program in Pascal (WEB) to illustrate literate programming—i.e. explaining a long complicated program—and so Knuth wrote a beautiful program with a custom data structure (hash-packed tries). Bentley then asked McIlroy to review the program. In the second half of the review, McIlroy (the inventor of Unix pipes) questioned the problem itself (the idea of writing a program for scratch), and used the opportunity to evangelize Unix and Unix pipes (at the time not widely known or available).]


I was both of those developers at different times, at least metaphorically.

I drank from the OO koolaid at one point. I was really into building things up using OOD and creating extensible, flexible code to accomplish everything.

And when I showed some code I'd written to my brother, he (rightly) scoffed and said that should have been 2-3 lines of shell script.

And I was enlightened. ;)

Like, I seriously rebuilt my programming philosophy practically from the ground up after that one comment. It's cool having a really smart brother, even if he's younger than me. :)


This is unrelated to the excellent story, but it's annoying that the repost has the following "correction":

> The manager of Charles has by now [become] tired of seeing him goof off.

"The manager has tired of Charles" is as correct as "the manager has become tired of Charles". To tire is a verb. The square bracket correction is unnecessary and arguably makes the sentence worse.


Sure enough. Presumably my brain was switched off if I added that "correction" myself.

Or it was already there in whatever source I managed to copy it from. No idea.


Without more backup I can only describe that as being fiction. Righteous fiction, where the good guy gets downtrodden and the bad guy wins to fuel the reader's resentment.


It's practically my life experience.

Sometimes I'm appreciated, and managers actually realize what they have when I create something for them. Frequently I accomplish borderline miracles and a manager will look at me and say, "OK, what about this other thing?"

My first job out of college, I was working for a company run by a guy who said to me, "Programmers are a dime a dozen."

He also said to me, after I quit, after his client refused to give him any more work unless he guaranteed that I was the lead developer on it, "I can't believe you quit." I simply shrugged and thought, "Maybe you shouldn't have treated me like crap, including not even matching the other offer I got."

I've also made quite a lot of money "Rescuing Small Companies From Code Disasters. (TM)" ;) Yes, that's my catch phrase. So I've seen the messes that teams often create.

The "incompetent" team code description in the story is practically prescient. I've seen the results of exactly that kind of management and team a dozen times. Things that, given the same project description, I could have created in 1/100 the code and with much more overall flexibility. I've literally thrown out entire projects like that and replaced them with the much smaller, tighter, and faster code that does more than the original project.

So all I can say is: Find better teams to work with if you think this is fiction. This resonates with me because it contains industry Truth.


To me it is a story about managers clueless about the work. You can make all the effort in the world to imagine doing something but the taste of the soup is in the eating. I do very simple physical grunt work for a living, there it is much more obvious that it is impossible. It's truly hilarious.

They probably deserve more praise when they do guess correctly but would anyone really know when it happens?


That’s because most executives can’t understand technology deeply enough to know the difference.


Even when they are smart enough to know, they seem to have very short memories. While I don't consider myself to be a 10x engineer; I have certainly done a number of 10x things over my career.

I worked for a company where I almost single handedly built a product that resulted in tens of millions of dollars in sales. I got a nice 'atta boy' for it, but my future ideas were often overridden by someone in management who 'knew better'. After the management changed, I found myself in a downsizing event once I started criticizing them for a lack of innovation.


This is the sad part of it, many people without core competence end up in "leadership" positions and remove any "perceived" threats to their authority. I believe part of it is due to the absence of leadership training in the engineering curriculum. Colleges should encourage engineers to take up few leadership courses and get them trained on things like Influence and Power.


Knowing the difference between an overly ambitious or technically wrong proposal and when to listen to the engineer seems impossible at times. Perhaps it requires a consultant.


In my experience working for several companies, people who know and are not merely acting out of fear can call things out and explain why something is wrong or ambitious. And more often than not sensible engineers get it. Consultants are usually called in by the leadership when they can't deal with the backlash or have no core competence in the first place.


Reminds me of the inventer of the blue LED (see recent veritasium video)


Did you go build your own company? You totally should with a story like that.


As a matter of fact, I did.

https://www.didgets.com


And this is the single biggest reason why good developers become managers.


It's almost impossible to get executives to think in return on equity (“RoE”) for the future instead of “costs” measured in dollars and cents last quarter.

Which is weird, since so many executives are working in a VC-funded environment, and internal work should be “venture funded” as well.


> It seems like the industry would get a lot more 10x behavior if it was recognized and rewarded more often than it currently does.

I don't agree with that, there are a _lot_ of completely crap developers and they get put into positions where even the ones capable of doing so aren't allowed to because it's not on a ticket.

I've seen some thing.


"Don't confuse motion with action", in other words. I think a lot of people aren't good at it because they themselves are rewarded for the opposite. This seems rife in the "just above individual contributor" management layer, but that's a biased take.


Honestly? You work at a place a manager hasn't heard "impact" yet? I thought managers at this point just walk around the office saying "impact".


When I was in college, I've met a few people that coded _a lot_ faster than me. Typically, they started since they were 12 instead of 21 (like me). That's how 10x engineers exist, by the time they are 30, they have roughly 20 years of programming experience behind their belt instead of 10.

Also, their professional experience is much greater. Sure, their initial jobs at 15 are the occassional weird gig for the uncle/aunt or cousin/nephew but they get picked up by professional firms at 18 and do a job next to their CS studies.

At least, that's how it used to be. Not sure if this is still happening due to the new job environment, but this was the reality from around 2004 to 2018.

For 10x engineers to exist, all it takes is a few examples. To me, everyone is in agreement that they seem to be rare. I point to a public 10x engineer. He'd never say it himself, but my guess is that this person is a 10x engineer [1].

If you disagree, I'm curious how you'd disagree. I'm just a blind man touching a part of the elephant [2]. I do not claim to see the whole picture.

[1] https://bellard.org/ (the person who created JSLinux)

[2] https://en.wikipedia.org/wiki/Blind_men_and_an_elephant - if you don't know the parable, it's a fun one!


Yup, that's been my experience as someone who asked for a C++ compiler for my 12th birthday, worked on a bunch of random websites and webapps for friends of the family, and spent some time at age 16-17 running a Beowulf cluster and attempting to help postdocs port their code to run on MPI (with mixed success). All thru my CS education I was writing tons of toy programs, contributing (as much as I could) toward OSS, reading lots of stuff on best practices, and leaning on my much older (12 years) brother who was working in the industry. He pointed me to Java and IntelliJ, told me to read Design Patterns (Gang of Four) and Refactoring (Fowler). I read Joel on Software religiously, even though he was a Microsoft guy and I was a hardcore Linux-head.

By the time I joined my first real company at age 21, I was ready to start putting a lot of this stuff into place. I joined a small med device software company which had a great product but really no strong software engineering culture: zero unit tests, using CVS with no branches, release builds were done manually on the COO's workstation, etc.

As literally the most junior person in the company I worked through all these things and convinced my much more senior colleagues that we should start using release branches instead of "hey everybody, please don't check in any new code until we get this release out the door". I wrote automated build scripts mostly for my own benefit, until the COO realized that he didn't have to worry about keeping a dev environment on his machine, now that he didn't code any more. I wrote a junit-inspired unit testing framework for the language we were using (https://en.wikipedia.org/wiki/IDL_(programming_language) - like Matlab but weirder).

Without my work as a "10x junior engineer", the company would have been unable to scale to more than 3 or 4 developers. I got involved in hiring and made sure we were hiring people who were on board with writing tests. We finally turned into a "real" software company 2 or 3 years after I joined.


This sounds similar to the best programmer I personally know and he was an intern working at LLVM at the time. It's funny how companies treat that part of his life as "no experience". Then suddenly he goes into the HFT space and within a couple of years he has a similar rank that people have that are twice his age.

10x engineers exist. To be fair, it does depend which software engineer you see as "the standard software engineer", but if I take myself as a standard (as an employed software engineer with 5 years of experience), then 10x software engineers exist.


Yeah similar story here. Basically if you start programming 10y before your peers of the same age then it’s logical you’ll be more productive and knowledgeable. Maybe a lot of “10x engineers” are just guys who gave themselves a huge head start for whatever reason


To work for the aunt/uncle, friend of the family etc described earlier the thread, or having an older brother helping… 10x programmers have a bunch of people giving them a head-start, not just themselves. Sure the person needs a drive and early start, but it’s moot without the support to actually convert that early start into a career. Maybe not moot, but for sure a whole lot harder.

I was coding before age 10 and delivering homework in website form any chance I got. Coding was just another video game to me. And yet, I didn’t even end up being a programmer, let alone 10x. I was given bad advice from highschool career advisors, lacked the family connections that knew better to give me odd tech jobs, had disinterested teachers at best and ones that insulted students who asked clarification questions at worst. I took an engineering class in highschool where i loved the content but I felt like I didn’t belong and dropped from it convinced tech was not for me after all (and then proceeded making games in unreal in my free time). For every person supporting my interest in CS there were 10 telling/nudging me to give up, so i was eventually worn out and gave it up. I still code 25 years later but I am far behind now instead of ahead.

Anyway, this is a longwinded way to say it takes privilege to be a 10x programmer, not everyone who gives themselves a head start will succeed.


I started learning how to write a Makefile when I was around 15, and I learned all about using tabs. My colleagues didn't touch Make until they were 25 and were extremely confused.


I'm not even sure that coding _much_ faster than necessary is even required to give a 3-5x multiple on "average", let alone "worst case" developers. Some of the biggest productivity wins can be had by being able to look at requirements, knowing what's right or wrong about them, and getting everyone on the same page so the thing only needs to be made once. Being good at test and debug so problems are identified and fixed _early_ are also big wins. Lots of that is just having the experience to recognize what sort of problem you're dealing with very quickly.

Being a programming prodigy is nice, but I don't think you even really need that.


All of the things you list are the product of the experience that OP is talking about. Anyone can get there with 20 years of (sufficiently rigorous) experience by ~40, but people who start as a child have a head start and it does show.

It's probably especially obvious in the child-prodigy types because we as an industry have a tendency to force people out of IC roles by 40, so the child prodigies are the only ones who have enough time to develop 20 years of experience working directly with code.


the other factor I noticed from the days I was programming for fun vs these days where I'm programming for pay.

in those early years the tasks that you take on are probably above your skillset and fight through it, you're not accountable to anyone.

in a job you're usually hired for what you already know, +/- some margin for more gradual learning , not really that much room for moonshots. the work need to be divided into bit sized parts that you can justify to the higher-ups when needed. you have less room for exploring really non linear paths toward the solution which can be harder to explain but where you learn more.

so in the end this end up sometimes amounting to 10years of experience outside work being more impactful than 10 years at work. ,


It doesn't have to be that way. Sure, if you can get a reputation for being sharp, reliable, quick to learn, and happy to work on things you haven't seen before, it's not so hard to move into completely new tech areas.

People will hire you more for your problem solving ability, reliability, and ability to work with people than your experience with any particular tech. If you've demonstrated you can pick up a new tech area easily, it's not too hard to do it again.

That sort of thing will let you build deep experience in a lot of areas and keep you current with the latest technologies. Keeps things interesting, too. :)


The challenge becomes editing down the resume. Mine is something like 5 or 6 pages even after some paring down. That's still probably way too long.


It's not just that people get pushed out of IC work, but everyone tends to have less energy as they get older + other life demands accumulate.

The combination of 10-15 years of experience and the energy/time of 20s is very powerful.


Underrated comment


Last year, we had 2 new hires.. one is fresh out of college (and not one of the top ones), other with 15 years experience on resume in our industry.

I am not sure there is 10x difference, but there is at least 5x difference in performance, in favor of fresh college grad, and they are now working on the more complex tasks too.

The sad part is our hiring is still heavily in "senior engineer with lots of experience" phase, and intership program has been canceled.


I also had a lot of luck with interns.

At my previous company I had 2 first-job juniors and 4 or 5 interns that were outstanding.

(however the last one was totally terrible and totally killed my hiring credibility hah, but it was a 80% success rate still)

I find that there are too many pretenders, though. I get way too many people with 15 years of experience and ChatGPT resumes that just can't code at all :/


have hired five interns and five experienced developers. Some of the interns exceeded expectations, while some of the experienced developers also performed well.

The top-performing experienced developers outshone all the graduates. However, the less effective experienced developers were on par with the graduates, showing no significant difference in performance.

The takeaway for me is that simple anecdotes are not very informative. Over time and with a larger sample size, experienced individuals tend to perform better. Nonetheless, some graduates will also become exceptionally skilled.

Graduates are more cost-effective. Experienced professionals require less oversight. If they need substantial guidance, they don't truly qualify as senior, as opposed to their resume that says, at 25 they have been a CTO for 10 years).


I’m not complaining about the quality of seniors I hire or comparing them to interns.


The new core question for interviewers to answer is "why should I hire you over ChatGPT?" If you are just going to take the requirements and copy paste them into ChatGPT, I could do that myself just as fast and you aren't adding any value.


I am not convinced that just starting early is all there is to it. I started Math, Sports, and Piano at like 6 years old but there are still plenty of "10x <insert activity here>" people that figuratively and literally run circles around me. Talent is a real thing.


The intensity you did it though matters. You probably didn't spend that many years on a specific sport for instance.

And when we're talking about sports, genetics matter as well (depending on each one)

When we're talking brains, while genetics also matter, assuming normal (whatever that is) brain, the plasticity changes a lot how it operates.

So, the 10 years thing is definitely a big if not the biggest part. In my opinion. Would love to see studies if any exist out there on this


I did spend years on a specific sport starting as a kid. I was average. There were people that first played the sport as teenagers and within a year were competitive nationally.

I was in the same math classes as some of my peers for a decade+. Some people were great, some were bad and most were somewhere in between. The kids who were exceptional at 9 were exceptional at 17.

Obviously time matters but genetics play a huge role as well. I have a family friend with 2 adopted kids and 2 biological kids. The adopted ones are average but the biological ones are very smart. Just like their parents.


It's possible there are plenty of individuals 10x better than you while you are 10x better than most, due to early exposure. I wouldn't say this of sports and math necessarily, but I definitely would say it of your example of piano, language acquisition, and I would not be surprised if programming patterned with them, at least partially.


That may be true of individual activities, but you trained in multiple. A fairer comparison would require the same people who best you in athletics to at least be comparable at math etc.


Some people organize their time and focus their efforts more efficiently than others. They also use tools that others might not even know or careabout.

You probably surf the internet 10x faster than your parents. Yes you've probably had more exposure than them, but you could probably teach them how to do it just as fast. But would they want to learn and would they actually adapt what you taught them?


With motivation, repetition, and those depend on how plastic your brain is, thus the age, yes!


Nick with Antithesis here with a funny story on this.

I became friends with Dave our CTO when I was 5 or 6, we were neighbors. He'd already started coding little games in Basic (this was 1985). Later in our friendship, like when I was maybe 10, I asked him if he could help me learn to code, which he did. After a week or two I had made some progress but compared what I could do to what he was doing and figured "I guess I just started too late, what's the point?".

I found out later that most people didn't start coding till late HS or college! It worked out though - I'm programmer adjacent and have taken care of the business side of our projects through the years :)


> That's how 10x engineers exist, by the time they are 30, they have roughly 20 years of programming experience behind their belt instead of 10.

This is a relatively small part of it.

The majority of developers who have been programming for 20 years, maybe learned a few tricks along the way then got stuck in a local maximum.

There are a few who learn deep computer science principles and understand how to apply them to novel problems. I'm thinking of techniques like in this book:

https://www.everand.com/book/282526076/Paradigms-of-Artifici...

(mainly because Peter Norvig is my goto as the paradigmatic 10x developer)

For example, in the Efficiency Issues chapter about how to optimize programs, Norvig lists writing a compiler from one language into a more efficient one. Most developers who have been working 20 years, either won't think of that, or understand how to implement it. But these are the kinds of things that can result in really outsize productivity gains.


Yes: Programmers who start at twelve are often the 10x programmers who can really program faster than the average developer by a lot.

No: It's not because they have 10 more years of experience. Read "The Mythical Man Month." That's the book that popularized the concept that some developers were 5-25x faster than others. One of the takeaways was that the speed of a developer was not correlated with experience. At all.

That said, the kind of person who can learn programming at 12 might just be the kind of person who is really good at programming.

I started learning programming concepts at 11-12. I'm not the best programmer I know, but when I started out in the industry at 22 I was working with developers with 10+ years of (real) experience on me...and I was able to come in and improve on their code to an extreme degree. I was completing my projects faster than other senior developers. With less than two years of experience in the industry I was promoted to "senior" developer and put on a project as lead (and sole) developer and my project was the only one to be completed on time, and with no defects. (This is video game industry, so it wasn't exactly a super-simple project; at the time this meant games written 100% in assembly language with all kinds of memory and performance constraints, and a single bug meant Nintendo would reject the image and make you fix the problem. We got our cartridge approved the first time through.)

Some programmers are just faster and more intuitive with programming than others. This shouldn't be a surprise. Some writers are better and faster than others. Some artists are better and faster than others. Some architects are better and faster than others. Some product designers are better and faster than others. It's not all about the number of hours of practice in any of these cases; yes, the best in a field often practices an insane amount. But the very top in each field, despite having similar numbers of hours of practice and experience, can vary in skill by an insane amount. Even some of the best in each field are vastly different in speed: You can have an artist who takes years to paint a single painting, and another who does several per week, but of similar ultimate quality. Humans have different aptitudes. This shouldn't even be controversial.

I do wonder if the "learned programming at 12" has anything to do with it: Most people will only ever be able to speak a language as fluently as a native speaker if they learn it before they're about 13-14 years old. After that the brain (again, for most people; this isn't universal) apparently becomes less flexible. In MRI studies they can actually detect differences between the parts of the brain used to learn a foreign language as an adult vs. as a tween or early teen. So there's a chance that early exposure to the right concepts actually reshapes the brain. But that's just conjecture mixed with my intuition of the situation: When I observe "normal" developers program, it really feels like I'm a native speaker and they're trying to convert between an alien way of thinking about a problem into a foreign language they're not that familiar with.

AND...there may not be a need to explicitly PROGRAM before you're 15 to be good at it as an adult. There are video games that exercise similar brain regions that could substitute for actual programming experience. AND I may be 100% wrong. Would be good for someone to fund some studies.


That childhood native-fluency analogy is insightful! Your experience matches mine.

I started programming at age 7 and it's true that the way code forms in my head feels similar to the way words form when I'm writing or speaking in English. In the same way that I don't stop and consciously figure out whether to use the past or present tense while I'm talking, I usually don't consciously think about, say, what kind of looping construct I'm about to use; it's just the natural-feeling way to express the idea I'm trying to convey. The idea itself is kind of already in the form of mental code in the same way that my thoughts are kind of already in English if I'm speaking.

But... maybe that's how it is for everyone, even people who learned later? I only know how it is in my own head.


I totally get the same sense that I'm just "communicating" using code. I just write out the code that expresses the concepts I have in my head.

And at least some people clearly don't. I was talking to one guy who said that even for a simple for-each loop it was way faster for him to "Google the code he needs and modify it" than to write it. This boggled me. I couldn't imagine being able to Google and parse results and find the one I wanted and copy and paste it and modify it being faster than just writing the code.

Even famous developers brag about their inability to code. DHH (RoR developer) has a tweet where he brags that he couldn't code a bubble sort without Googling it. A nested loop with a single compare and swap...and he's "proud" of the fact that he needs to Google it?

I have no words.


The association with video games in your last paragraph makes a lot of sense to me. This is how I feel solving problems.

I always thought that people who start at 12 and keep at it are good because they really love it.I see people who struggle a lot with learning, and it's because they hate it but are doing it for other reasons.


People are also prone to love doing things that they're good at, so it's hard to know which came first. :)


That's true!!!


> Most people will only ever be able to speak a language as fluently as a native speaker if they learn it before they're about 13-14 years old.

Very few people both have a ton of exposure to a language and actually study the grammar and stuff as adults. If you don't learn the grammar you will still speak like a dog after living in a country for 20 years. A lot of people in an average company don't write hard things at their job, didn't read any textbooks etc. and spend loads of time in meetings etc.


> Very few people both have a ton of exposure to a language and actually study the grammar and stuff as adults.

Very few people actually learn to speak a language as a native speaker by "studying the grammar."

I remember people trying to learn what was and what wasn't a run-on sentence in junior high school, and being shocked that they had a hard time telling the difference.

And studying language explicitly doesn't change the brain regions used to the same that are used by a native speaker.

And that's my point. I didn't really "study" programming explicitly as much as understanding it intuitively. When exposed to a new concept, I just immediately internalize it; I don't need to use it a bunch of times and intentionally practice it. I just need to see it and it's obvious and becomes part of my tool-set.


The real problem is the measurement: speed of coding (or doing any other job) or volume of worl done. Those two are actually really bad productivity measures.


I'm tired of hearing about 10x engineers. I just want to be a good 1x engineer. Or good at anything in life realy.


The truest 10x engineer I ever encountered was a memory firmware guy with ASIC experience who absolutely made sure to log off at 5 every day after really putting in the work. Go to guy for all parts of the codebase, even that which he didn't expressly touch.


> I'm tired of hearing about 10x engineers.

"The truest 10x engineer I ever encountered was..."


Once you have few years of experience, you don't need to be 10x to have success. You can be a reliable 1.3x, a little bit better then your teammates.

In the end it doesn't matter, whole team could be laid off at once.


Spend less time on HN and you might get more done.


Do you want to read hacker news or be hacker news?


Or stay.

It's not about the hours.


I feel you. I had a semi-traumatic experience at a previous job, combined with a number of other factors which have left me feeling like a shell of my former self. Now I'm sure if I'm working at .5x or if I feel like .5x. Such is life, at times.


The “10x engineer” comes from the observation that there is a 10x difference in productivity between the best and the worst engineers. By saying that you want to be a 1x engineer, you’re saying you want to be the least productive engineer possible. 1x is not the average, 1x is the worst.


I'm not sure your math works.

What we do know is that the worst engineers provide negative productivity. If 1x is the worst engineer, then let's for the sake of discussion denote x as -1 in order for the product to be negative. Except that means the 10x engineer provides -10 productivity, actually making them the worst engineer. Therein lies a conflict.

What we also know is that best engineer has positive productivity, so that means the multiplicand must always be positive. Which means that it is the multiplier that must go negative, meaning that a -1x and maybe even a -10x engineer exists.


Thank you. This sounds so trivial at first, but your reductio ad absurdum at the beginning of your comment really nails it.

Throw into the mix the fact that productivity is hard to measure as soon as more than one person works on something and that doesn't even begin to consider the economical aspects of software.

And even when ignoring this point, there's that pesky short-term vs long-term thing.

Also, how do you define the term "productivity"? I was assuming that you mean somethint along the lines of (indirect, if employed) monetary output.


You are arguing against the idea that there is a factor of ten difference in productivity between the best and the worst engineers. That’s fine if you want to do that, but that’s explicitly where the term “10x engineer” comes from and what defines its meaning. So if you disagree with the underlying concept, there is no way for you to use terms like “[n]x engineer” coherently since you disagree with its most fundamental premise. You certainly shouldn’t reinvent different meanings for these terms.


You're not wrong, but I think you may be treating something as literal math, when it is in fact idiomatic labels used to express trends.


The problem here is the introduction of productivity.

The 10x developer originated from a study that measured performance. The 10x developer being able to do a task in a 10th of the time is quite conceivable and reflects what the study found. I'm sure we've all seen a developer take 10 hours to do a job that would take another developer just 1 hour. Nobody is doing it in negative hours, so the math works.

But performance is not the same as productivity.


Measuring productivity like that in technology makes no sense because our work is not fungible; what and how we do it matters as much as how fast we do it. Time-based productivity measurement is for factory workers stamping out widgets. So in our revenue-based world, negative productivity makes sense.


> productivity

Performance. That is what the study that found a 10x performance difference observed. There is no mention of productivity in the study. If anyone has tried to study productivity, they most certainly have not come up with a 10x moniker. It seems productivity was mentioned in this thread only because it also happens to start with the letter 'p' and someone got confused.


Productivity in this case is performance on contrived tasks that don't represent day-to-day work well and for some of which negative score isn't possible.


Engineers with negative productivity are vanishingly rare, soon to be terminated, and reasonable to exclude for the purpose of the comparison.


Okay, but it still doesn't work. The world's worst engineer who somehow managed to successfully contribute one line of code to something like GPT is way more productive than a great engineer who designed from top to bottom the best laid software ever conceived but was thrown away before seeing the light of day because the business changed direction.

Of course, that doesn't actually matter as the original study found a 10x difference in measuring performance, not productivity. There is nothing out there to suggest that some developers are 10x more productive outside of those who mixed up their p words. We're not actually talking about productivity; that was just a mistake. If one were to study productivity, I expect they would find that some engineers are many orders of magnitude more productive than the least productive engineers.


> The world's worst engineer who somehow managed to successfully contribute one line of code to something like GPT is way more productive than a great engineer who designed from top to bottom the best laid software ever conceived but was thrown away before seeing the light of day because the business changed direction.

I reject that assertion.

> performance, not productivity

You keep saying this like it's a slam dunk refutation, but performance and productivity are highly related.


> I reject that assertion.

Because you don't believe the worst developer contributed anything to GPT? Sure, in reality that's no doubt true, but it was only ever meant to be illustrative.

> but performance and productivity are highly related.

Not in any meaningful way. The study found that the fastest developer can perform a set of defined tasks in a 10th of the time of the slowest developer. That is what the 10x developer refers to. But being fast doesn't mean being productive.

Come to my backyard and we will each dig a hole of equal size. Let's assume that you can dig the hole in a 10th of the time I can – that you are the 10x hole digger. But, no matter how fast you are, neither of us will be productive.


the worst engineer certainly has negative productivity, so I'm not sure that your explanation can possibly be the correct one.


I’m explaining what the terms “10x” and “1x” mean, not asserting that the original observation is correct under all circumstances.


Except you haven't explained it at all. Sackman, Erickson, and Grant found that some developers were able to complete what was effectively a programming contest in a 10th of the time of the slowest participants. This is the origin of the 10x developer idea.

You, on the other hand, are claiming that 10x engineers are 10 times more productive than the worst engineers. Completing a programming challenge in a 10th of the time is not the same as being 10 times more productive, and obviously your usage can't be an explanation, even as one you made up on the spot, as the math doesn't add up.


That was designed as a repeatable experiment, which seems entirely reasonable when you want to conduct a study. Why are you characterising that as “a programming contest”? That seems like an uncharitably distorted way of describing a study.

That study also does not exist in isolation:

https://www.construx.com/blog/the-origins-of-10x-how-valid-i...


> Why are you characterising that as “a programming contest”?

Because it was? Do you have a better way to repeatedly test performance? And yes, the study's intent was to look at performance, not productivity. It's even right in the title. Not sure where you dreamed up the latter.


i believe the original was for an entire "organizations" performance, and was also done in 1977. Since they are averages, It makes "sense" to conclude that the best of a good team is 10x better than the average of the worst team. Not really what the experimwnt concludes but what can you do.


The first was 1968, but there have been more studies since.

https://www.construx.com/blog/the-origins-of-10x-how-valid-i...


Hmm, I never thought of it that way. I just heard 10x employees and fit it to what I knew. Which is that 90% of the work is accomplished by about 10% of workers. The other 90% really only get 10% done. So most developers are somewhere on a scale of 0.1 - 1. With 1 being a totally competent and good developer. The 10x people are just different though, it's like a pro-athlete to a regular player. It's not unique to software development, though it may stand out and be sought after more. I've noticed it in pretty much every industry. Some people are just able to achieve flow state in their work and be vastly more productive than others, be it writing code or laying sod. I don't find that there's a lot of in between 1 and 10 though.


Even if this was the origin of the term, it still doesn't make sense because the best engineers can solve problems the worst would never be able to do so. The difference between the best and worst is much more than 10x the worst. Maybe the worst who meets certain minimums at a company, but then the best would also be limited by those willing to work for what the company pays, and I hypothesis that the minimums of the lower bound and the maximums of the upper bound are correlated.


It sounds like you disagree with the concept of a 10x engineer then. In which case you should avoid using the term, rather than making up a new definition.


Concepts and words change meaning and sometimes we all need to accept that the popular meaning is not the definition we use.

This is especially common when dealing with historical or academic definitions versus common modern usage. "Evolution" particularly annoys me.

You should avoid using the term, rather than using a definition at odds with common usage. Your usage is confusing - and that is why you are getting push-back.

The definition you have given is nonsensical - it can't be consistent over time or between companies because it depends on finding a minimum in a group. And a value that is strongly dependent on the worst developer is useless because it mostly measures how bad the worst developer is - it doesn't say anything about how good the best developer is.


So it’s like saying “this soap cleans 10x better than if you washed the thing with poop”? That’s not nearly as interesting to debate.


I think getting something worthwhile done is a better focus (actually quite hard!), and naturally increases your productivity as a side-effect.

Productivity has no inherent value - like efficiency and perfection, it is necessarily of something else. Its value is entirely derived.


It depends on the day if I feel like a 2x or a 0.1x engineer. Keep at it. You are not alone!


Do 10x engineers get 10x the wages? Somehow I feel being exceptionally better than other engineers is just unfair to both of you and the ones worse than you. I wouldn't want to be a 10x either, I'd rather just be normal engineer.


Meta compensates 10x types very well. 3x bonus multipliers, additional equity that can range from 100k-1m+, and level increases are a huge bump to comp (https://www.levels.fyi/)


Meta compensates all SWEs very well. To suppose arguendo that 10x types exist, I don't think they're really compensated linearly 10x more than everyone else. But yeah, certainly, if you are great at your job and want to make a bunch of money, Meta is a great employer for that.

3x bonus multiplies (Redefines Expectations) are extremely uncommon. Level increases certainly help but like, L7 only makes ~3x what L5 does -- not 10x. And there are few L7s and very few L8+.


I have many meta colleagues I've worked with in the past. All of them are well compensated but none of them were outstanding, or 10x.


You took the words right out of my mouth


Your definition is also vague. Someone still needs to do the legwork. One man armies who can do everything themselves don't really fit in standardized teams where everything is compartmentalized and work divided and spread out.

They work best on their own projects with nobody else in their way, no colleagues, no managers, but that's not most jobs. Once you're part of a team, you can't do too much work yourself no matter how good you are, as inevitably the other slower/weaker team members will slow you down as you'll fight dealing with the issues they introduce into the project or the issues from management, so every team moves at the speed of the lowest common denominator no matter their rockstars.


That rings true and is probably why the 10x engineers I have seen usually work on devops or modify the framework the other devs are using in some way. For example, an engineer who speeds up a build or test suite by an order of magnitude is easily a 10x engineer in most organizations, in terms of man hours saved.


> For example, an engineer who speeds up a build or test suite by an order of magnitude is easily a 10x engineer in most organizations, in terms of man hours saved.

Yeah but this isn't something scalable that can happen regularly as part of your job description. Like most jobs/companies don't have so many low hanging fruits to pick that someone can speed of build by orders of magnitude on a weekly basis. It's usually a one time thing. And one time things don't usually make you a 10x dev. Maybe you just got lucky once to see something others missed.

And often times at big places most people know where the low hanging fruits are and can fix them, but management, release schedules and tech debt are perpetually in the way.

IMHO what makes you a 10x dev is you always know how to unblock people no matter the issue so that the project is constantly smooth saling, not chasing orders of magnitude improvements unicorns.


Does anyone else feel like people follow these sort of industry pop-culture terms a bit too intensely? What I mean is that the existence of the term tends to bring out people trying to figure who that might be, as if it has to be 100% true.

I personally think that some people can provide “10x” (arbitrary) the value on occasion, like the low hanging fruit you said. I also believe some people are slightly more skilled than others, and get more results out of their work. That said, there are so many ways for somebody to have an impact that doesn’t have to immediate, that I find the term itself too prevalent.


"Does anyone else feel like people follow these sort of industry pop-culture terms a bit too intensely? "

Agreed, there is too much effort going into the "superstars" theme, but there are definitely people who get 10x done in the same time as others.


Yep. No matter what you're doing, some people are more productive than others. Often it's a matter of experience and practice, sometimes ability to focus, sometimes motivation, rarely it's a lack or surplus of inherent ability. Using people effectively in the context of a team all depends on the skill of the manager though.


I think a lot of people that complain about 10x chattery in HN should take some kind of carpentry etc course or some other kind of handiwork like that with a real master.

Some of those people not only get things done much quicker, but they also get it done with better quality than an amateur, with less mistakes, throwing away less material, sometimes with more safety.

This is definitely more than 10x better. And there are some real hacks doing those kinds of jobs. I find programming to be not different than that.


Well sure, if you compare a master to a novice, there is almost always a great difference. But between masters of carpentry, there is usually not so much difference. But here with the 10x trope it is supposed to be different and I would say indeed, but it is not as common as many would like to think.


Perhaps there aren’t that many non-master carpenters (I don’t think that’s true, there’s plenty of professional incompetents), but I am 100% sure that not all professional developers are “masters”.


> Like most jobs/companies don't have so many low hanging fruits to pick that someone can speed of build by orders of magnitude on a weekly basis

You and I have worked at very different organizations. Everywhere I've been has had insane levels of inefficiency in literally every process.


>insane levels of inefficiency in literally every process.

In processes yes, not in code, and solo 10x devs alone can't fix broken processes as those are a the effect of broken management and engineering culture.

People know where the inefficiencies are, but management doesn't care.


same here - it is especially bad in huge companies, the inefficiencies and waste are legendary.


It really does depend on where you work. The order of magnitude improvements I'm describing involved interdisciplinary expertise involving both bespoke distributed build systems and assembly language. They're not unicorns, they do exist, but they are very rare and most engineers just aren't going to be able to find them, even with infinite time. Hence why a 10x engineer is so valuable and not everyone can be one. I myself am certainly not one, in most contexts.


> Like most jobs/companies don't have so many low hanging fruits to pick that someone can speed of build by orders of magnitude on a weekly basis.

But then you just move on to the next highest leverage task.


Nothing wrong with "one man armies" in the team context. There is a long list of tasks that needs to be done.. over same time period, one person will do 5 complex tasks (with tests and documentation), while the other will do just 1 task, and then spend even more time redoing it properly.

Over time this produces funny effects, like super-big 20 point task done in few days because wrong person started working on it.


On my team, one of the main multipliers is understanding the need behind the requested implementation, and proposing alternative solutions - minimizing or avoiding code changes altogether. It helps that we work on internal tooling and are very close to the process and stakeholders.

"Hmmm, there's another way to accomplish this" being the 10x. Doing things faster is not it.


Exactly this. It’s why it’s so frustrating when product managers who think they’re above giving background run the show (the ones who think they’re your manager and are therefore too important to share that with you)


I've always thought a x10 is one who sits back and sees a simpler way - like some math problems have an easy solution, if you can see it. Also: change the question; change the context (Alan Kay)

(And absolutely not brute-force grinding themselves away)


Agreed. You can brute force, but not for long.



Literally the opposite of what makes a car go fast :-)


Is it?

How fast would you drive a car if I gave you the keys and told you „everything works perfectly fine, however the brakes have been removed“?


This is the kind of half-correct pithy quote which falls down at the edge cases, e.g. if you're writing an algorithm that selects the lowest-speed car for new learner drivers and for some reason a blissfully unaware learner ends up in a McLaren GT with no brakes for lesson number 1.


Was it 50x productivity due to 10x engineers, or 50x productivity due to optimized company structure? (edit: obviously, these do not need to be mutually exclusive - it's a sum of all the different parts)

It's easy to bog down even the best Nx engineers if you keep them occupied with endless bullshit tasks, meetings, (ever) changing timelines, and all that.

Kind of like having a professional driver drive a sportscar through a racetrack, versus the streets of Boston.


This is a perceptive observation. In my experience, so called "10x" engineers are as productive as they are because they have a process by which they practice the development on software that anticipates future problems. As a result when they check something in, they spend very little time "debugging" or "fixing bugs" with code that does what they already need it to do.

It is always very useful as an engineer to log your time, what are you working on "right now" and is it "new work" , "maintenance work", or "fixing work." Then for each log entry that isn't "new work" thinking about what you could have done that would have caught that problem before it was committed to the code base.

I find it is much better to evaluate engineers based on how often they are solving the same problem that they had before vs creating new stuff. That ratio, for me, is the essence of the Nx engineer (for 0.1 < N < 10)

The point that Wilson makes that having infrastructure/tools that push that ratio further from "repair" work to "new work" is hugely empowering to an organization.


> People who implement something very few people even considered or understood to be possible, which then gives amazing leverage to deliver working software in a fraction of the time.

I agree with the first part of your statement, but what really happens to such people?

In my experience (sample size greater than one), they receive some kudos, but remain underpaid, never promoted, and are given more work under tight deadlines. At least until some of them are laid off along with lower performers.

But for those who say that hard things are impossible, they seem to get along just fine. They merely declare such things as out-of-scope or lie about their roadmap.


> In my experience (sample size greater than one), they receive some kudos, but remain underpaid, never promoted, and are given more work under tight deadlines. At least until some of them are laid off along with lower performers.

100% agree, I've seen plenty of the best of the best get treated like trash and laid off at first sight of trouble on the horizon


Anyone can be a 10x engineer when they write something similar/identical to what they've written before. Other jobs are not like this. A plumber may only be 20% faster on the best days of their career.


In my experience it often comes down to business processes. We have a guy in my extended team who knows everything about his side of the company. When I work with him I accomplish business altering deliveries in a very short amount of time, which after a week or two rarely needs to be touched again unless something in the business changes. He’s not a PO and we don’t do anything too formally because it’s just him, me and another developer + whatever business manager will benefit from the development (and a few testers from their tran). In many ways the way we work these projects are very akin to Team Topologies.

At other times I’ll be assigned projects with regular POs, Architects and business employees who barely know what it is they are doing themselves, with poorly defined tasks and all sorts of bureaucratic nonsense “agile” process methods and well spend forever delivering nothing.

So sometimes I’m a 50x developer delivering business altering changes. At other times I’m a useless cog in a sea of pseudo workers. I don’t particularly care, I get paid, but if management actually knew what was going on, and how to change it… well…


How many organisations - of any kind, startups or enterprise or unicorns or whatever - will invest so much in effort that doesn't even touch the product. Before the product exists!

I think the reluctance to invest effort in something that will give devs super-powers 6 months in the future is why we don't get all those 10x devs.


The 10x developer I know basically DOES seem to do 10x more than anyone else on our team. But they are working on a team that does relatively simple work and are by far the most senior person on that team.

It's like how an NBA player would be a 10x player on a college basketball team. Great to work with them but If I was in their shoes I don't know how enjoyable/engaging the work would be.


Yep. Often I find our most accelerative work is stuff that makes testing changes easy (a very simple to bootstrap staging environment) or creates a lot of guarantees (typescript).


No one reading this during the hours of 9-5 is a 10x.


Or is. If a 1x puts in an 8 hour day, a 10x only has to put in a 48 minute day. That leaves plenty of time to read this.


His point is that smart and productive people are generally hard working, focused and diligent, which is how they get to be so experienced and productive.

Hence not wasting time on social networks.

> a 10x only has to put in a 48 minute day

Nobody would call this person "10x".


> His point is that smart and productive people are generally hard working, focused and diligent

I don't think that tracks. Smart, productive, hard working people don't work 9-5. They work every hour they can, breaking only when they have pushed themselves to the limit. The limit can be hit at any hour. There is no magical property of the universe that gives people unlimited stamina during the hours of 9-5.

> Nobody would call this person "10x".

I'm not sure they would call anyone that, to be fair. A "10x developer" who also puts in 8 hours alongside the 1x developers isn't a 10x developer, he would be called a sucker.


Hackernews is hardly a waste of time though. 10x is probably curious of topics mentioned on Hackernews.


That’s a bad take because you’re assuming that developer is capable of replicating that * 10


That's entirely the fundamental flaw of the Nx developer ethos to a tee. No individual will benchmark reliably against any other person of their same trade/craft perfectly over time. The mythical BS times developer is so over simplified to be a meaningless concept. Hire "unicorn" and get amazing results just isn't a guarantee. They just probably have better chance than average to make a higher impact, which is good enough for companies that are willing to pay Nx times average salaries to acquire them.


I know it's meant to be funny, but the number of tech people who spend zero time learning about "what's out there", are usually not the most effective developers. You won't find better solutions to existing or even new problems without an interest in industry. Maybe this particular article isn't "industry valuable fair enough", but having zero interest in refining and enhancing your craft beyond the work in front of you is almost guaranteed to end with worse outcomes.


Hard agree.

Another flaw in his thinking: brain cycles and sub-conscious processing.

I'm in the middle of a hard problem right now. I ran out of ideas, and opened HN about half an hour ago. In that time, without "trying", I've had two new ideas - one sent me back to my notes, which revealed that my original thinking was flawed; the second sent me to documentation, which suggested a new route to pursue. I'm digesting the implications of that while I write this.

Beating my head against the problem directly for thirty minutes would have been less productive. (Though if I wasn't WFH I would have, and also been miserable, and learned less about the industry than I have from this thread. So there's that.)

I'm far from a 10x anything, but I don't have the only brain which works this way.


Not disagreeing with you, but just as a thought exercise: what if you had spent that time going for a walk outside rather than on HN?


If I'd done that I couldn't have context-switched back to the documentation so easily!

Nah. I get what you're saying, and it's a great idea. Sometimes I do go for a walk. More often I do dishes. Those are both, however, higher-commitment, more time-consuming activities than flipping to a different window. If I remember correctly, my sticking point that day felt like a small one, and so the distraction / break I felt like I needed was correspondingly small.


Not true. I've known some very ADHD developers who are constantly context shifting and are able to fuck around on Hackernews for a while and then suddenly knock out a huge amount of work. The problem is that (speaking from personal experience) everybody with ADHD thinks they can do this and 99% cannot.


10x developer is just a buzzword people throw around when they're trying to sell you something.


6.5 X 15 is only 97 hours per week not even close to the 400 hrs (5X40) per week of programming a 10X Rust programmer can provide. I jest but all this 10X stuff is getting ridiculous. They stayed in "Stealth" mode because they didn't have anything worth showing for 5 years. Doesn't sound all that productive to me. More likely what they are trying to do was hard and complicated and took a while to figure out.


They're not boasting about their current productivity, they're boasting about the one they achieved at FoundationDB when they implemented the testing, which gave them the idea to build antithesis


This might be the best introduction post I've read.

Lays the foundation (get it?) for who the people are and what they've built.

Then explains how the current thing they are building is a result of the previous thing. It feels that they actually want this problem solved for everyone because they have experienced how good the solution feels.

Then tells us about the teams (pretty big names with complex systems) that have already used it.

All of these wrapped in good writing that appeals to developers/founders. Landing page is great too!


It seems like marketing copy. Not a technical blog post.

It would be nice to see some actual use cases and examples.

Instead, the writer just name-dropped a few big companies and claimed to have a revolutionary product that works magically. Then include the typical buzzwords like '10x programmer' and 'stealth mode'. The latter doesn't make sense because they also name-drop clients.


I'm assuming you aren't aware of FoundationDB: https://www.foundationdb.org/files/fdb-paper.pdf

Having that context puts the post in a much better perspective. It's definitely an introduction post (the company has been developing this in stealth mode for the past few years), but it is most certainly _not_ a marketing post. These people developed extremely novel testing techniques for FoundationDB and are now generalizing them to work with any containerized application.

It's a big deal.


I have heard of it but know very little about it.

I'm reading the paper. It's very intriguing but so is the marketing material for the new Tesla. I need to work with things like this before I believe the claims.

Too many people here on HN are getting caught up in a barely-tested new technology and hailing it as some revolution.


It absolutely doesn’t read like typical marketing copy, and yes it’s not a dense technical blog post either. I’m sure the use cases and examples will come, but putting them in this post would have been overkill.

Also, stealth mode just means your company isn’t public, you can still have clients.


Except it doesn't actually explain in what it does: Is it fuzzing? Do you supply your own test cases? Is it testing hardware non-determinism?


Post author here. Sorry it was vague, but there's only so much detail you can go into in a blog post aimed at general audiences. Our documentation (https://antithesis.com/docs/) has a lot more info.

Here's my attempt at a more complete answer: think of the story of the blind men and the elephant. There's a thing, called fuzzing, invented by security researchers. There's a thing, called property-based testing, invented by functional programmers. There's a thing, called network simulation, invented by distributed systems people. There's a thing, called rare-event simulation, invented by physicists (!). But if you squint, all of these things are really the same kind of thing, which we call "autonomous testing". It's where you express high-level properties of your system, and have the computer do the grunt work to see if they're true. Antithesis is our attempt to take the best ideas from each of these fields, and turn them into something really usable for the vast majority of software.

We believe the two fundamental problems preventing widespread adoption of autonomous testing are: (1) most software is non-deterministic, but non-determinism breaks the core feedback loop that guides things like coverage-guided fuzzing. (2) the state space you're searching is inconceivably vast, and the search problem in full generality is insolubly hard. Antithesis tries to address both of these problems.

So... is it fuzzing? Sort of, except you can apply it to whole interacting networked systems, not just standalone parsers and libraries. Is it property-based testing? Sort of, except you can express properties that require a "global" view of the entire state space traversed by the system, which could never be locally asserted in code. Is it fault injection or chaos testing? Sort of, except that it can use the techniques of coverage guided fuzzing to get deep into the nooks and crannies of your software, and determinism to ensure that every bug is replayable, no matter how weird it is.

It's hard to explain, because it's hard to wrap your arms around the whole thing. But our other big goal is to make all of this easy to understand and easy to use. In some ways, that's proved to be even harder than the very hard technological problems we've faced. But we're excited and up for it, and we think the payoff could be big for our whole industry.

Your feedback about what's explained well and what's explained poorly is an important signal for us in this third very hard task. Please keep giving it to us!


I remember watching the Strange Loop video on your testing strategy, and now I need to go back and relearn how it differed from model checking (ie Promela or TLA+). Model checking is probably the big QA story that tech companies ignore because it requires dramatically more education, especially from QA departments typically seen as "inferior" to SWE.



This is interesting - it is kind of picking a fight with SaaS/cloud providers though, as that is the one kind of software you won't be able to import into your environment: not because it can't do the job, but because you don't have the code. So this would create an incentive to go back to PaaS.

It's definitely true though that a big problem with backend is that you can't easily treat it as a whole system for test purposes.


> it is kind of picking a fight with SaaS/cloud providers

or starting a bidding war


how so?


By selling cloud providers the chance to be the only cloud provider supported by Antithesis.


Ambitious. I doubt if antithesis's moat is big enough for that.


> turn them into something really usable for the vast majority of software

Would it work for debugging, say, Notepad on Windows?


Is there more info on how Antithesis solves problem number 2 (large state spaces)? I understand the fuzzing / workload generation part well, but there's so many different state space reduction techniques that I don't know what Antithesis is doing under the hood to combat that.


> most software is non-deterministic

Doesn't Antithesis rely on the fact that software is always deterministic? Reproducibility appears to be its top selling feature – something that wouldn't be possible if software were non-deterministic.


We can force any* software to be deterministic.

* Offer only good for x86-64 software that runs on Linux whose dependencies you can install locally or mock. The first two restrictions we will probably relax someday.


That point about dependencies -- how well does this play or easy to integrate with a build system like Bazel or Buck?


Aren't you just 'forcing' determinism in the inputs, relying on the software to be always deterministic for the same inputs?


Nope. We’re emulating a deterministic computer, so your software can’t act nondeterministically if it tries.


Right, by emulating a deterministic computer you can ensure that the inputs to the software are always deterministic – something traditional computing environments are unable to offer for various reasons.

However, if we pretend that software was somehow able to be non-deterministic, it would be able to evade your deterministic computer. But since software is always deterministic, you just have to guarantee determinism in the inputs.


[I work at Antithesis]

>But since software is always deterministic, you just have to guarantee determinism in the inputs.

This is technically correct, but that's a very load-bearing "just". A lot of things would have to count as inputs. Think about execution time, for example. CPUs don't execute at the same speed all the time because of automatic throttling. Network packets have different flight times. Threads and processes get scheduled a little differently. In distributed/concurrent systems, all this matters. If you run the same workload twice, observable events will happen at different times and in different orders because of tiny deviations in initial conditions.

So yes, if you consider the time it takes to run every single machine instruction as an "input", then software is deterministic given the same inputs. But in the real world that's not actionable. Even if you had all those inputs, how are you going to pass them in? For all intents and purposes most software execution is non-deterministic.

The Antithesis simulation is deterministic in this way though. It is in charge of how long everything takes in "simulated time", right down to the running times of individual CPU instructions. Everything observable from within the simulation happens the exact same way, every time. You can compare a memory dump at the same (simulated) instant across two different runs and they will be bit-for-bit identical.


> Think about execution time, for example.

Sure. A good example. Execution time – more accurately, execution speed – isn't a property of software. For example, as you point out yourself, you can alter the execution speed without altering the software. It is, indeed, an input.

> Even if you had all those inputs, how are you going to pass them in?

Well, we know how to pass them in non-deterministically. That's how software is able to do anything.

Perhaps one could create a simulated environment that is able to control all the inputs? In fact, I'm told there is a company known as Antithesis working on exactly that.


Oh, that sounds like a challenge…

Is the challenge here the same as with digital simulations of electronic circuits? That is, at the end of the day analog physics becomes confounding? Or are you doing deterministic simulation of random RF noise as well?


Do you emit deterministic sequences from things like RDRAND? I guess you'd have to.


Yes, they said they do


Has any thought been given to repurposing this deterministic computer for more than just autonomous testing/fuzzing? For example, given an ability to record/snapshot the state, resumable software (i.e. durable execution)?


Somebody once suggested to me that this could be very hand for the reproducible builds folks. I'm sure that now that we're out in the open, lots of people will suggest great applications for it.

Disclosure: Antithesis co-founder.


My favourite application for "deterministic computer" is creating a cluster in order to have a virtual machine which is resilient to hardware failure. Potentially even "this VM will keep running even if an entire AWS region goes down" (although that would add significant latency).


This vaguely reminds me of Jefferson's "Virtual Time" paper from 1985[1]. The underlying idea at the time didn't really take off because it required, like Zookeeper, a greenfield project: except that it kinda doesn't and today you could imagine instrumenting an entire Linux syscall table and letting any Linux container become a virtual time system -- but Linux didn't exist in 1985 and wouldn't be standard until much later.

So Jefferson just says, let's take your I/O-ful process, split it a message-passing actor model, and monitor all the messages going in and coming out. The messages coming out, they won't necessarily do what they're supposed to do yet, they'll just be recorded with a plus sign and a virtual timestamp, and by assumption eventually you'll block on some response. So we have a bunch of recorded message timestamps coming in, we have your recorded messages going out.

Well, there's a problem here, which is that if we have multiple actors we may discover that their timestamps have traveled out-of-order. You sent some message at t=532 but someone actually sent you a message at t=231 that you might have selected instead of whatever you actually selected to send the t=532 message. (For instance in the OS case, they might have literally sent a SIGKILL to your process and you might not have sent anything after that.) That's what the plus sign is for, indirectly: we can restart your process from either a known synchronization state or else from the very beginning, we know all of its inputs during its first run so we have "determinized" it up past t=231 to see what it does now. Now, it sends a new message at say t=373. So we use the opposite of +, the minus sign, to send to all the other processes the "undo" message for their t=532 message, this removes it from their message buffer: that will never be sent to them. And if they haven't hit that timestamp in their personal processing yet, no further action is needed, otherwise we need to roll them back too. Doing so you determinize the whole networked cluster.

The only other really modern implementation of these older ideas that I remember seeing was Haxl[2], a Haskell library which does something similar but rather than using a virtual time coordinate, it just uses a process-local cache: when you request any I/O, it first fetches from the cache if possible and then if that's not possible it goes out, fetches the data, and then caches it. As a result you can just offer someone a pre-populated cache which, with these recorded inputs, will regenerate the offending stack trace deterministically.

1: https://dl.acm.org/doi/10.1145/3916.3988

2: https://github.com/facebook/Haxl


> Your feedback about what's explained well and what's explained poorly is an important signal for us in this third very hard task. Please keep giving it to us!

It's hard to understand these complex concepts via language alone.

Diagrams would be a huge help to understand how this system of testing works compared to existing testing concepts


thanks, I'll dig in. I'm a very visual person and charts/diagrams/flows always help my grasp of something more than a wall of text. Maybe include some of those in there when you get the time?


Sure, it doesn't go into details. And that is exactly why I termed it an excellent introduction and a sales pitch.

I haven't heard of deterministic testing before. Nor have I heard of FoundationDB or the related things. And I went from knowing zero things about them to getting impressed and interested. This led me to go into their docs, blogs, landing page, etc. to know more.


Yeah. I could figure out the global idea, but then the mechanics of how it would actually work were very sparse.


The entire testing system they describe feels like something I can strive towards too. They make you want their solution because it offers a way of life and thinking and doing like you've never experienced before


Did you read a different article than me?

The linked article is 3/4 about some history and rationale before it actually tells you what they build.

It's like those pesky recipe blogs that tell you about the authors childhood, when you just want to make vegan pancakes.


This is a great pitch, and I don't want to come across as negative, but I feel like a statement like "we found all bugs" can only be true with a very narrow definition of bug.

The most pernicious, hard-to-find bugs that I've come across have all been around the business logic of an application, rather than it hitting into an error state. I'm thinking of the category where you have something like "a database is currently reporting a completed transaction against a customer, but no completed purchase item, how should it be displayed on the customer recent transactions page?". Implementing something where "a thing will appear and not crash" in those cases is one thing, but making sure that it actually makes sense as a choice given all the context of everyone elses choices everywhere else in the stack is a lot harder.

Or to take a database, something along the lines of "our query planner produces a really suboptimal plan in this edge-case".

Neither of those types of problems could ever be automatically detected, because they aren't issues of the programming reaching an error state- the issue is figuring out in the first place what "correct" actually is for you application.

Maybe I'm setting the bar too high for what a "bug" is, but I guess my point is, its one thing to fantasize about having zero bugs, its another to build software in the real world. I probably still settle for 0 run time errors though to be fair. . .


I do think that it was a mistake to use the word "all" and imply that there are absolutely no bugs in FoundationDB. However, FoundationDB is truly known as having advanced the state of the art for testing practices: https://apple.github.io/foundationdb/testing.html.

So in normal cases this would reek of someone being arrogant / overconfident, but here they really have gotten very close to zero bugs.


The other issue I would point out is that building a database, while impressive with their quality, is still fundamentally different than an application or set of applications like a larger SaaS offering would involve (api, web, mobile, etc). Like the difference between API and UI test strategies, where API has much more clearly defined and standardized inputs and outputs.

To be clear, I am not saying that you can't define all inputs and outputs of a "complete SaaS product offering stack", because you likely could, though if it's already been built by someone that doesn't have these things in mind, then it's a different problem space to find bugs.

As someone who has spent the last 15 years championing quality strategy for companies and training folks of varying roles on how to properly assess risk, it does indeed feel like this has a more narrow scope of "bug" as a definition, in the sort of way that a developer could try to claim that robust unit tests would catch "any" bugs, or even most of them. The types of risk to a software's quality have larger surface areas than at that level.


There's a lot of assertions that I throw into business applications that would be very useful to test in this way. So I don't think this only applies to testing databases.

Also, when properties are difficult to think of, that often means that a model of the behavior might be more appropriate to test against, e.g. https://concerningquality.com/model-based-testing/. It would take a bit of design work to get this to play nicely with the Antithesis approach, but it's definitely doable.


Just to clarify, I am definitely not saying this is only useful or only applies to databases.

The point was more that, I don't see how this testing approach (at the level that it functions) would catch all of the bugs that I have seen in my career, and so to say "all of the bugs" or even "most of the bugs" is definitely a stretch.

This is certainly useful, just like unit tests, assertions, etc are all very useful. It's just not the whole picture of "bugs".


Yes, there are plenty of non-functional logic bugs, e.g. performance issues. I think this starts to drastically hone in on the set of "all" bugs though, especially by doing things like network fault injection by default. This will trigger complex interactions between dependencies that are likely almost never tested.

They should clarify that this is focused on functional logic bugs though, I agree with that.


I’d go so far as to say it is actually impossible to do UI testing in some kind of web based product unless it came from the browser makers themselves.

I’d settle for decent heap debugging.


I think the reference to "all the bugs" here is basically that our insanely brutal deterministic testing system was not finding any more bugs after 100's of thousands of runs. Can't prove a negative obviously, but the fact that we'd gotten to that "all green" status gave us a ton of confidence to push forward in feature development, believing we were building on something solid - which, time has shown we were.


Thanks -- that's very clarifying! But isn't this circular? The lack of bugs is used as evidence of the effectiveness of the testing approach, but the testing approach is validated by...not finding any more bugs in the software?


Yeah but if your software is running in an environment that controls for a lot of non-determinism and can simulate various kinds of failures and degradations at varying rates, and do it all in accelerated time and your software is still working correctly; I think it’d be somewhat reasonable to assert that maybe the testing setup has done a pretty good job.


Agreed, the approach sounds very interesting and I can see how it could be very effective! I'd love to try it on my own stuff. That's why it's so surprising (to me) to claim that the approach found nearly every bug in something as complicated as a production distributed database. My career experience tells me (quite strongly) that can't possibly be true.


I consider a "bug" to be "it was supposed to do something and failed".

Issues around business logic are not failures of the system, the system worked to spec, the spec was not comprehensive enough and now we iterate.


What do you call it when the spec is wrong? Like clearly actually wrong, such as when someone copied a paragraph from one CRUD-describing page to the next and forgot to change the word "thing1" to "thing2" in the delete description.

Because I'd call that a bug. A spec bug, but a bug. It's no feature request to make the code based on the newer page delete thing2 rather than thing1, it's fixing a defect


There’s the distinction between correctness and fitness for purpose which I think is helpful for clarifying the issues here.

Correctness bug: it didn’t do what the spec says it should do.

Fitness for purpose bug: it does what the spec says to do, but, with better knowledge, the spec isn’t what you actually want.

Edit: looks like this maps, respectively, to failing verification and failing validation. https://news.ycombinator.com/item?id=39359673

Edit2: My earlier comment on the different things that get called "bugs", before I was aware of this terminology: https://news.ycombinator.com/item?id=22259973


Ya, I would like a word for this as well. I naturally refer to this category of error as bug, but this occasionally leads to significant conflict with others at work. I now default to calling _almost everything_ a feature request, which is obviously dumb but less likely to get me into trouble. If there is a better word for "it does exactly what we planned, but what we planned was wrong" I would love to adopt it.


I reported such a bug to some software my company uses (Tempo). Vendor proceeds to call it a feature request because the software successfully fails to show public information (visible in the UI, but HTTP 403 in the API unless you're an admin).

Instead of changing one word in the code that defines the access level required for this GET call, it gets triaged as not being a bug, put on a backlog, and we never heard from it again obviously

We pay for this shit


Successful failure is my favorite kind, I like to think that all my failures are successful


Systems Engineering has terminology for this distinction.

Verification is "does this thing do what I asked it to do".

Validation is "did I ask it to do the right thing".


[dead]


They're fairly standard terms from "old style" project management - they show up in the usual V Model of Waterfall vein.

E.g. see Wikipedia: https://en.m.wikipedia.org/wiki/Verification_and_validation


A spec bug is just as bad as a code bug! Declaring a system free of defects because it matches the spec is sneaky sleight-of-hand that ignores the costs of having a spec.

The actual testing value is the difference between the cost of writing and maintaining the code, and the cost of writing and maintaining the spec.

If the spec is similar in complexity to the code itself, then bugs in the spec are just as likely as bugs in the code, thus verification to spec has gained you nothing (and probably cost you a lot).


I agree they are separate, but in my long experience, spec bugs are at least as common as your first definition.


...And now we could probably start debating your narrow definition of "system". ;-)


Most of the software I've built doesn't have "a spec.", but let me zoom in on specs. around streaming media. MPEG DASH, CMAF or even the base media file format (ISO/IEC 14496-12) at times can be pretty vague. In practice, this frequently turns up in actual interoperability issues where it's pretty difficult to point out which of two products is according to spec and which one has a bug.

So yes, I totally agree with GP and would actually go further: a phrase like "we found all the bugs in the database" is nonsense and makes the article less credible.


The best definition I've heard for "bug" is "software not working as documented". Of course, a lot of software is lacking documentation -- and those are doc bugs. But I like this definition because even when the docs are incomplete, the definition guides you to ask: would I really document that the software behaves like this or would I change the behavior [and document that]? It's harder (at least for me) to sweep goofy behavior under the rug.


To be fair, the line right after that is "I know, I know, that's an insane thing to say."


I feel like business logic bugs live on a separate layer, the application layer, and it's not fair to count those against the database itself.

I agree that suboptimal query planning would be a database-layer bug, a defect which could easily be missed by the bug-testing framework.


Good summary of the hard part of being a software developer that deals with clients.


What software developer does not deal with clients (and makes a living)?


lots of software developers never deal with clients (clients as in the people who will actually use the software) - most of them in fact, in any of the big companies I have worked for anyway...and that is probably not a good thing.

I myself, prefer to work with the people who will actually use what I build - get a better product hat way.


I've been super interested in this field since finding out about it from the `sled` simulation guide [0] (which outlines how FoundationDB does what they do).

Currently bringing a similar kind of testing in to our workplace by writing our services to run on top of `madsim` [1]. This lets us continue writing async/await-style services in tokio but then (in tests) replace them with a deterministic executor that patches all sources of non-determinism (including dependencies that call out to the OS). It's pretty seamless.

The author of this article isn't joking when they say that the startup cost of this effort is monumental. Dealing with every possible source of non-determinism, re-writing services to be testable/sans-IO [2], etc. takes a lot of engineering effort.

Once the system is in place though, it's hard to describe just how confident you feel in your code. Combined with tools like quickcheck [3], you can test hundreds of thousands of subtle failure cases in I/O, event ordering, timeouts, dropped packets, filesystem failures, etc.

This kind of testing is an incredibly powerful tool to have in your toolbelt, if you have the patience and fortitude to invest in it.

As for Antithesis itself, it looks very very cool. Bringing the deterministic testing down the stack to below the OS is awesome. Should make it possible to test entire systems without wiring up a harness manually every time. Can’t wait to try it out!

[0]: https://sled.rs/simulation.html

[1]: https://github.com/madsim-rs/madsim?tab=readme-ov-file#madsi...

[2]: https://sans-io.readthedocs.io/

[3]: https://github.com/BurntSushi/quickcheck?tab=readme-ov-file#...


> you can test hundreds of thousands of subtle failure cases in I/O, event ordering, timeouts, dropped packets, filesystem failures, etc.

As cool as all this is, I can't stop but wonder how often the culture of micro-services and distributed computing is ill advised. So much complexity I've seen in such systems boils down to calling a "function" is: async, depends on the OS, is executed at some point or never, always returns a bunch of strings that need to be parsed to re-enter the static type system, which comes with its own set of failure modes. This makes the seemingly simple task of abstracting logic into a named component, aka a function, extremely complex. You don't need to test for any of the subtle failures you mentioned if you leave the logic inside the same process and just call a function. I know monoliths aren't always a good idea or fit, at the same time I'm highly septical whether the current prevalence of service based software architectures is justified and pays off.


> I can't stop but wonder how often the culture of micro-services and distributed computing is ill advised.

You can't get away from distributed computing, unless you get away from computing. A modern computer isn't a single unit, it's a system of computers talking to each other. Even if you go back a long time, you'll find many computers or proto-computers talking to each other, but with a lot stricter timings, as the computers are less flexible.

If you save a file to a disk, you're really asking the OS (somehow) to send a message to the computer on the storage device, asking it to store your data, and it will respond with success or failure and it might also write the data. (sometimes it will tell your os success and then proceed to throw the data away, which is always fun)

That said, keeping things together where it makes sense, is definitely a good thing.


I see your point. Even multithreading can be seen as a form of distributed programming. At the same time, in my experience these parts can often be isolated. You trust your DB to handle such issues, and I'm very happy we are getting a new era of DBs like Tigerbetle, FoundationDB and sled that are designed to survive Jepsen. But how many teams are building DBs? That point is a bit ironic, given I'm currently building an in-memory DB at work. But it's a completely different level of complexity. And your example with writing a file, that too is a somewhat solved problem, use ZFS. I'd argue there are many situations where the fault tolerant distributed requirements can be served by existing abstractions.


> Dealing with every possible source of non-determinism, re-writing services to be testable/sans-IO [2], etc. takes a lot of engineering effort.

Are there public examples of what such a re-write looks like?

Also, are you working at a rust shop that's developing this way?

Final Note, TigerBeetle is another product that was written this way.


TigerBeetle is actually another customer of ours. You might ask why, given that they have their own, very sophisticated simulation testing. The answer is that they're so fanatical about correctness, they wanted a "red team" for their own fault simulator, in case a bug in their tests might hide a bug in their database!

I gotta say, that is some next-level commitment to writing a good database.

Disclosure: Antithesis co-founder here.


Sure! I mentioned a few orthogonal concepts that go well together, and each of the following examples has a different combination that they employ:

- the company that developed Madsim (RisingWave) [0] [1] is tries hardest to eliminate non-determinism with the broadest scope (stubbing out syscalls, etc.)

- sled [2] itself has an interesting combo of deterministic tests combined with quickcheck+failpoints test case auto-discovery

- Dropbox [3] uses a similar approach but they talk about it a bit more abstractly.

Sans-IO is more documented in Python [4], but str0m [5] and quinn-proto [6] are the best examples in Rust I’m aware of. Note that sans-IO is orthogonal to deterministic test frameworks, but it composes well with them.

With the disclaimer that anything I comment on this site is my opinion alone, and does not reflect the company I work at —— I do work at a rust shop that has utilized these techniques on some projects.

TigerBeetle is an amazing example and I’ve looked at it before! They are really the best example of this approach outside of FoundationDB I think.

[0]: https://risingwave.com/blog/deterministic-simulation-a-new-e...

[1]: https://risingwave.com/blog/applying-deterministic-simulatio...

[2]: https://dropbox.tech/infrastructure/-testing-our-new-sync-en...

[3]: https://github.com/spacejam/sled

[4]: https://fractalideas.com/blog/sans-io-when-rubber-meets-road...

[5]: https://github.com/algesten/str0m

[6]: https://docs.rs/quinn-proto/0.10.6/quinn_proto/struct.Connec...


Does something like madsim / Deterministic Simulation Testing exist for Java applications?


The writing is really enjoyable.

> Programming in this state is like living life surrounded by a force field that protects you from all harm. [...] We deleted all of our dependencies (including Zookeeper) because they had bugs, and wrote our own Paxos implementation in very little time and it _had no bugs_.

Being able to make that statement and back it by evidence must be indeed a cool thing.


The earliest that I've seen the attitude that one should eliminate dependencies because they have more bugs than internally written code was this book from 1995: https://store.doverpublications.com/products/9780486152936

pp. 65-66:

> The longer I have computed, the less I seem to use Numerical Software Packages. In an ideal world this would be crazy; maybe it is even a little bit crazy today. But I've been bitten too often by bugs in those Packages. For me, it is simply too frustrating to be sidetracked while solving my own problem by the need to debug somebody else's software. So, except for linear algebra packages, I usually roll my own. It's inefficient, I suppose, but my nerves are calmer.

> The most troubling aspect of using Numerical Software Packages, however, is not their occasional goofs, but rather the way the packages inevitably hide deficiencies in a problem's formulation. We can dump a set of equations into a solver and it will usually give back a solution without complaint - even if the equations are quite poorly conditioned or have an unsuspected singularity that is distorting the answers from physical reality. Or it may give us an alternative solution that we failed to anticipate. The package helps us ignore these possibilities - or even to detect their occurrence if the execution is buried inside a larger program. Given our capacity for error-blindness, software that actually hides our errors from us is a questionable form of progress.

> And if we do detect suspicious behavior, we really can't dig into the package to find our troubles. We will simply have to reprogram the problem ourselves. We would have been better off doing so from the beginning - with a good chance that the immersion into the problem's reality would have dispelled the logical confusions before ever getting to the machine.

I suppose whether to do this depends on how rigorous one is, how rigorous certain dependencies are, and how much time one has. I'm not going to be writing my own database (too complicated, multiple well-tested options available) but if I only use a subset of the functionality of a smaller package that isn't tested well, rolling my own could make sense.


In the specific case in question, the biggest problem was that dependencies like Zookeeper weren't compatible with our testing approach, so we couldn't do true end to end tests unless we replaced them. One of the nice things about Antithesis is that because our approach to deterministic simulation is at the whole system level, we can do it against real dependencies if you can install them.

I was a co-founder of both FoundationDB and Antithesis.


That tracks well (both the quotes and your thoughts).

One example that comes to mind where I want to roll my own thing (and am in the process of doing so) is replacing our ci/cd usage of jenkins that is solely for running qa automation tests against PR's on github. Jenkins does way way more than we need. We just need github PR interaction/webhook, secure credentials management, and spawning ecs tasks on aws...

Every time I force myself to update our jenkins instance, I buckle up because there is probably some random plugin, or jenkins agent thing, or ... SOMETHING that will break and require me to spend time tracking down what broke and why. 100% surface area for issues, whilst we use <5% of what Jenkins actually provides.


I have proved my code has no bugs according to the spec.

I do not make the claim my spec has no bugs.


With formal proof systems, you can also claim that for your spec.


A formal proof is only as good as what-you-are-proving maps to what-you-intended-to-prove.


I've written formal proofs with bugs more than once. Reality is much messier than you can encode into any proof and there will ultimately be a boundary where the real systems you're trying to build can still have bugs.

Formal verification is incredibly, amazingly good if you achieve it, but it's not the same as "perfect".


No you can't.

You can claim that your spec doesn't violate some invariants in a finite number of steps, you can't claim that the spec contains all the invariants the real system must have and that it doesn't violate them in number of steps + 1.


This doesn't track with the real world, though.

If you are writing software, it is almost always trying to accomplish a goal outside of itself. It is trying to solve a problem for someone, and how that problem can or should be solved is rarely perfectly clear.

The spec is supposed to map to a real world problem, and there is never going to be a way to formalize that mapping.


"Its not a bug, its a feature"


Three thoughts:

1. It's a brilliant idea that came at the right time. It feels like people are finally losing patience with flaky software, see developer sentiment on: fuzzers, static typing, memory safety, standardized protocols, containers, etc.

2. It's meant to be niche. $2 per hour per CPU (or $7000 per year per CPU if reserved), no free tier for hobby or FOSS, and the only way to try/buy is to contact them. Ouch. It's a valid business model, I'm just sad it's not going for maximum positive impact.

3. Kudos for the high quality writing and documentation, and I absolutely love that the docs include things like (emphasis in original):

> If a bug is found in production, or by your customers, you should demand an explanation from us.

That's exactly how you buy developer goodwill. Reminds me of Mullvad, who I still recommend to people even after they dropped the ball on me.


Thanks for your kind words! As I mention in this comment (https://news.ycombinator.com/item?id=39358526) we are planning to have pricing suitable for small teams, and perhaps even a free tier for FOSS, in the future.

Disclosure: Antithesis co-founder.


There a few FOSS projects I'd love to set this up for if you ever get to the free tier. :)


"It's meant to be niche. $2 per hour per CPU (or $7000 per year per CPU if reserved), no free tier for hobby or FOSS, and the only way to try/buy is to contact them. Ouch. It's a valid business model, I'm just sad it's not going for maximum positive impact."

This is the sort of thing that, if it takes off, will start affecting the entire software world. Hardware will start adding features to support it. In 30 years this may simply be how computing works. But the pioneers need to recover the costs of the arrows they got stuck with before it can really spread out. Don't look at this an event, but as the beginning of a process.


I think their target audience is teams who already have mature software and comprehensive tests. From the docs, the kinds of bugs their platform is designed to find are the wild “unreproducible” kind that only happens rarely in production. Most teams have much bigger problems and obvious bugs to fix.

Heck, most software in production today barely has unit tests.


$2 per hour per CPU could be expensive or inexpensive, depending on how long it takes to fuzz your program. I wonder how that multiplies out in real use cases?


I met Antithesis at Strangeloop this year and got to talk to employees about the state of the art of automated fault injection that I was following when I worked at Amazon, and I cannot overstate how their product is a huge leap forward compared to many of the formal verification systems being used today.

I actually got to follow their bug tracking process on an issue they identified in Apache Spark streaming - going off of the docs, they managed to identify a subtle and insidious correctness error in a common operation that would've caused headaches in low visibility edge case for years at that point. In the end the docs were incorrect, but after that showing I cannot imagine how critical tools like Antithesis will be inside companies building distributed systems.

I hope we get some blog posts that dig into the technical weeds soon, I'd love to hear what brought them to their current approach.


I'm trying to avoid diving into the hype cycle about this immediately - but this sounds like the holy grail right? Use your existing application as-is (assuming it's containerized), and simply check properties on it?

The blocker in doing that has always been the foundations of our machines: non-deterministic CPUs and operating systems. Re-building an entire vertical computing stack is practically impossible, so they just _avoid_ it by building a high-fidelity deterministic simulator.

I do wonder how they are checking for equivalence between the simulator and existing OS's, as that sounds like a non-trivial task. But, even still, I'm really bought in to this idea.


You still have to use their SDKs to write lots of integration tests (they call them “workloads”).

Then they run those tests while injecting all sorts of failures like OS failures, network issues, race and timing conditions, random number generator issues, etc.

It’s likely the only practical way today of testing for those things reliably, but you still have to write all of the tests and define your app state.


Does it even need to be containerized? According to the post, it sounds like Antithesis is a solution at the hypervisor layer.


Yes it looks like containerization is required: https://antithesis.com/docs/getting_started/setup.html#conta...


Containers are doing two jobs for us: they give our customers a convenient way to send us software to run, and they give us a convenient place to simulate the network boundary between different machines in a distributed system. The whole guest operating system running the containers is also running inside the deterministic hypervisor and under test (and it's mostly just NixOS Linux, not something weird that we wrote).

I'm a co-founder of Antithesis.


Oh, cool to hear you're using NixOS. The Nix philosophy totally gels with the philosophy described in the post.

But it's also probably fair to describe NixOS as something weird that somebody else wrote :)


> a platform that takes your software and hunts for bugs in it

Ok but, what actually IS it?

It seems like it is a cloud service that will run integration tests. I have to figure out how to deploy to this special environment and I still have to write those integration tests using special libraries.

But even after all that integration refactoring, how is this supposed to help me find actual bugs that I wouldn't already have found in my own environment with my own integration tests?


I'd suggest taking a dive into the docs - there is quite a lot there that should address some of these questions.

That said, Antithesis doesn't require you to write manual tests, integration or otherwise. It requires your software system to be packaged in containers, which is fairly straightforward, and then requires a workload to be written which will emulate the normal functioning of the software system. So for example an e-commerce store would have product views, cart adds, checkouts, etc.

With this, Antithesis can start testing (running your workload, varying inputs, injecting faults, etc) the software and looking for violations of test properties. There are many (60+) test properties that come "out of the box" such as crashes, out of memory, etc. You can (and should) also define custom properties that are unique to your system, as this will surface more problems.

As your tests run, violations of test properties are reported, with lots of useful debug information included. Test runs that are particularly interesting can have a lot of extra analysis done, due to our ability to "rewind" and change inputs, get artifacts, add logging, etc.


"Workloads" seem to be effectively equivalent to integration tests.

I don't mean to poke holes but I'm having trouble seeing the value add here.

If I have to deploy to some new environment anyways and I have to tailor the "Workloads" anyways why would I pay extra for vendor lock-in?

The type of devious bug this is promising to find would be something like:

"The DB silently drops timezone from Dates because of the column type. This results in unexpected data being returned for users in different timezones from the server"

I just don't see how repeatably calling the API with an expanding set of random inputs helps find something like that.


IMO "Read the docs" is not a reasonable response to "what is it?"

Typically someone wants to know the most basic question before devoting time to diving into the docs.


The article says they created a deterministic hypervisor that runs all pseudorandom behavior from a starting seed to enable perfect re-playability.

But that's all we know so far. I'm assuming there'll be some sort of fuzz testing, and static analysis or some defining actions that your software can perform.

Honestly it sounds a lot like it has a lot of crossover with what the Vale language is trying to solve: https://vale.dev/, but focused on trying to get existing software to that state instead of creating a new language to make new software already be at that state by default.


I came away with the same questions.


Reading their docs it seems you got it pretty much correctly. You write integration tests (aka “workloads”) and then they run it under different scenarios.

This means they use their hypervisor to change random seeds, make http requests fail or take too long, break connections between servers, change the order of server responses, and all sorts of wild things you don’t usually control, but that happen in the real world. Then they compare the expected workload responses and figure out which conditions break your systems.

That’s why they sell yearly contracts - you’re supposed to pay them to keep your workloads running continuously all year to try all sorts of different combinations of failures.


I got really excited about this, and I spent a little time looking through the documentation, but I can't figure out how this is different than randomizing unit tests? It seems if I have a unit test suite already, then that's 99% of the work? Am I misunderstanding? I am drawing my conclusions from reading the Getting Started series of the docs, especially the Workloads section: https://antithesis.com/docs/getting_started/workload.html


Antithesis here - curious what part of the Getting Started doc gave you that impression? If you take a look at our How Antithesis Works page, it might help answer you question as to how Antithesis is different from just bundling your unit tests.

https://antithesis.com/docs/introduction/how_antithesis_work...

In short though, unit tests can help to inform a workload, but we don't require them. We autonomously explore software system execution paths by introducing different inputs, faults, etc., which discovers behaviors that may have been unforeseen by anyone writing unit tests.


Thanks for the response. The linked introduction does help. The workload page does give me that impression (and based on upvotes of my post it does to others as well)...so perhaps disambiguating that the void test*() examples on the workloads page are not unit tests might help!

Congrats on the launch and I'll consider using it for some of my projects.


This is that, and the exact same vibe, except: it promises to keep being that simple even after you add threads, and locks, and network calls, and disk accesses and..

With this, if you write a test for a function that makes a network call and writes the result to disk, your test will fail if your code does not handle the network call failing or stalling indefinitely, or the disk running out of space, or the power going out just before you close the file, or..

So it’s; yes, but it expands the space where testing is as easy as unit testing to cover much more interesting levels of complexity


Great read. Great product. I've been an early user of Antithesis. My background is dependability and formal distributed systems.

This thing is magic (or rather, it's indistinguishable from magic ;-)).

If they told me I could test any distributed system without a single line of code change, do things like step-by-step debugging, even rollback time at will, I would not believe it. But Antithesis works as advertised.

It's a game-changer for distributed systems that truly care about dependability.


I don’t want to sound silly, but there are 24 open and 37 closed bugs on the FoundationDB Github page. Could it perhaps be that bug-free is somewhat exaggerated?

Antithesis looks very promising by the way :-)

Edit: perhaps Apple didn’t continue the rigorous testing while evolving the FoundationDB codebase.


FoundationDB is an impressive achievement, quite possibly the only distributed database out there that lives up to its strict serializability claims (see https://jepsen.io/consistency/models/strict-serializable for a good definition). The way they wrote it is indeed very interesting and a tool that does this for other systems is immediately worth looking at.


Is it that good? I've been tasked to deploy it for sometime and it always bit me in the ass for one reason or another. And I'm not the one who use it so I don't know if it's actually good. For now I much prefer redis.


It's great, but operationally there are lots of gotchas and little guidance.

We got bitten _hard_ in production when we accidentally allowed some of the nodes to get above 90% of the storage used. The whole database collapsed into a state where it could only do a few transactions a second. Then the ops team, thinking they were clever, doubled the size of the cluster in order to give it the resources it needed to get the average utilization down to 45%; this was an unforced error as that pushed the size of the cluster outside the fdb comfort zone (120 nodes) which is itself a problem. The deed was done though and pulling nodes was not possible in this state, so slowly, slooooowly... things got fixed.

We ended up spending an entire weekend slowly, slowly getting things back into a good place. We did not lose data, but basically prod was down for the duration, and we found it necessary to _manually_ evict the full nodes one at a time over the period.

Now, this was a few years ago, and fdb has performed wickedly fast, with utter, total reliability before that and since, and to this day the ops team is butthurt about fdb.

From an engineering perspective, if you aren't using java fdb is pretty not great, since the very limited number of abstraction layers that exist are all java-centric. There are many, many issues with the maximum transaction time thing, the maximum key size and value size and total transaction size issue, the lack of pushdown predicates (e.g., filtered scans can't be done in-place which means that in AWS, they cost a lot in inter-az network charge terms and also are gated by the network performance of your instances), and so on.

What ALL of these have issues have in common is that they bite you late in the game. The storage issue bites you when you're hitting the DB hard in production and have a big data set, the lack of abstractions means that even something as finding leaked junked keys turns out to be impossible unless you were diligent to manually frame all your values so you could identify things as more than just bytes, the transaction time thing is very weird to deal with as you tend to have creeping crud aspects and the lack of libraries that instrument the transactions to give you early warning is an issue, likewise for certain kinds of key-value pairs, there's a creeping size problem - hey, this value is an index of other values; if you're not very careful up front, you _will_ eventually hit either the txn size limit or the key limit. The usual workarounds for those is to do separate transactions - a staging transaction, then essentially a swap operation and then a garbage collection transaction - but that has lots of issues overtime when coupled with application failure.

There are answers to ALL of these, manual ones. For the popular languages other than java - Go, python, maybe Ruby - there _should_ be answers for them, but there aren't. These are very sharp edges. Those java layers are _also_ _not_ _bug_ _free_. So yeah, one has a reliable storage storage layer (a topic that has come up over and over again in the last few years) but it's the layer on top of that where all the bugs are, but now with constraints and factors that are harder to reason about than the usual storage layer.

One might say, hey, SQL has all of these problems too, except no. You can bump into transaction limits, but the limits are vastly higher than fdb and the transaction time sluggishness will identify it long before you run into the "your transaction is rejected, spin retrying something that will _never_ recover" sort of issue that your average developer will eventually encounter in fdb.

That said, I love fdb as a software achievement. I just wish they had finished it. For my current project, I have designed it out. I might be able to avoid all of the sharp edges above at this point, but since we are not a java shop, I also can't rely on all the engineers to even know they exist.


It depends how you define "good". I care mostly about my distributed database being correct, living up to its consistency claims, and providing strict serializability.

(see also https://aphyr.com/posts/283-jepsen-redis)

I care much less about how easy it is to use or deploy, but "good" is a subjective term, so other people might see things differently.


> quite possibly the only distributed database out there that lives up to its strict serializability claims

Jepsen has never tested FoundationDB, not sure why you claim this and link to Jepsen's site.


FDB co-founder here.

Aphyr / Jepsen never tested FDB because, as he tweeted "their testing appears to be waaaay more rigorous than mine." We actually put a screen cap of that tweet in the blog post linked here.


Just a heads-up for anyone diving deeper into this thread - I dug into the original tweet and managed to track down the parent tweet right here: [1]. Moreover, there's a snapshot on archive.org [2] capturing the reply along with the quote in question. Interestingly, there's also a snapshot from foundationdb.com [3] that discusses the outcomes of running Jepsen tests on FDB. Worth checking out for those interested in the technical nitty-gritty.

[1]: https://twitter.com/obfuscurity/status/405016890306985984

[2]: https://web.archive.org/web/20220805112242/https://twitter.c...

[3]: https://web.archive.org/web/20150325003526/http://blog.found...


> not sure why you claim this and link to Jepsen's site.

They link to the website for a definition of the term they are using.


I am really intrigued by organizations that build effective test cultures. I am not interested in people who have testing teams (ala how it was done before, say 2004) or teams that simply do unit tests and integration tests. I am interested in people who realized that building the right testing culture is key to their success. Before reading this article sqlite would probably be my top reference. I don't have the article handy, but the sqlite developers spend like a year building a test framework to make incredibly bulletproof software. I wasn't aware of foundationDB before but the idea of the simulation engine - that's exactly what most distributed systems folks need.

disclaimer - I work at AWS. And we have a combination of TLA+, fuzz, and simulation testing. When I first started it was obvious my team had a huge testing gap. It pains me to say this but for a big part of AWS testing is sort of an after thought. It comes from the "we don't hire test engineers" mentality I suppose - but this likely differs wildly by team. Over the years we've tried to backfill the gap with simulators. But it is really hard to do this culturally. because it is really hard while you are trying to build new stuff, fix bugs, etc. And your entire team (and leaders) have to be bought into the foundational value of this infrastructure. And because we don't have it, we write COEs, we fix the obvious bugs, but do we take the 1 year it would take to avoid all problems in the future - no. So yeah I am super jealous of your "fully-deterministic event-based network simulation".


Congratulations to the Antithesis team!

I actually interviewed with them when they were just starting, and outside of being very technically proficient, they are also a great group of folks. They flew my wife and I out to DC on what happened to be the coldest day of the year that year (we are from California) so we didn’t end up following through but I’d like to think there is an alternative me out there in the multiverse hacking away on this stuff.

I highly recommend Will’s talks (which I believe he links in the blog post):

https://m.youtube.com/watch?v=4fFDFbi3toc

https://m.youtube.com/watch?v=fFSPwJFXVlw


Checking their bug report which should contain "detailed information about a particular bug" I am not sure I can fully understand those claims: https://public.antithesis.com/report/ZsfkRkU58VYYW1yRVF8zsvU...

To my untrained eye I get: Logs, a graph of when in time the bug happened over multiple runs and a statistical analysis which part of the application code could be invovled. The statistical analysis is nice but it is completely flat, without any hierarchical relationships making it quite hard to parse mentally.

I kind of expected more context to be provided about inputs, steps and systems that lead to the bug. Is it expected to then start adding all the logging/debugging that might be missing from the logs and re-run it to track it down? I hoped that given the deterministic systems and inputs there could be more initial hints provided.


On mobile, the "Let's talk" button in the top right corner is cut off by the carousel menu overlay. Seems like CSS is still out of scope of the bug fixing magic for now.

On a more serious note, it's an interesting blog post, but it comes off as veeery confident about what is clearly an incredibly broad and complex topic. Curious to see how it will work in production.


Aww... crap, you're right. I knew we should have finished the UI testing product and run it on ourselves before launching.

Disclosure: Antithesis co-founder.


Designer here, sorry, it is intentional. I thought horizontally scrollable menu is more straightforward than full screen expander.


Yeah, if only there was some scientific way to ensure that elements don't overlap, let's call it "constraints" maybe, so one could test layouts by simply solving, idk... something like a set of linear equations? Hope some day CSS will stop being "aweso"me and become nothing in favor of a useful layout system.


Not directly related to this post, but clicking around the webpage I chuckled seeing Palantir's case study/testimonial:

https://antithesis.com/solutions/who_we_help


I got similar productivity boosts after learning TLA+ and Alloy.

Simulation is an interesting approach but I am curious if they ever implemented the simulation wrong would it report errors that don't happen on the target platform or fail to find errors that the target platform reports? How wide the gap is will matter... and how many possible platforms and configurations will the hypervisor cover?


> We thought about this and decided to just go all out and write a hypervisor which emulates a deterministic computer.

Huh. Yes, that would work. It's in the category of obvious in hindsight. That is a very convincing sales pitch.


Along similar lines, Mario Carneiro wrote a formalisation of a subset of x86 in MetaMath Zero (https://github.com/digama0/mm0/blob/master/examples/x86.mm0) with the ultimate goal of proving that the MetaMath Zero verifier itself is sound. https://arxiv.org/pdf/1910.10703.pdf

(And of course Permutation City is a fiction book all about emulating computers with sound properties!)


To me this is very reminiscent of time travel debugging tools like the one used for Firefox’s C++ code, rr / Pernosco: https://pernos.co/


Seems more like a fuzzer for Docker images.

Like this: https://docs.gitlab.com/ee/user/application_security/coverag...

It won't tell you whether the software works correctly, it will just tell you if it raises an exception or crashes.

Put a fuzzer on Chrome for example, you won't catch most of the issues it has, though Chrome actually has tons of bugs and issues, but you may find security issues if you devote a big enough budget to run your fuzzer long time enough to cover all the branches.

So it's good in the case where you use "exceptions as tests", where any minor out-of-scope behavior raises an exception and all the cases are pre-planned (a bit like you baked-in runtime checks, and the fuzzer explores them)


The similarity is about obtaining determinism through something like a hypervisor. The way rr works is it basically writes down the result of all the system calls, etc, basically everything that ended up on the Turing machine’s tape, so you can rewind and replay.


I was really hoping this would be rr but more general purpose.


I really like antithesis' approach: it's non-intrusive as all the changes are on a VM so one can run deterministic simulation without changing their code. It's also technically challenging, as making a VM suitable for deterministic simulation is not an easy feat.

On a side, I was wondering how this approach compares to Meta's Hermit(https://github.com/facebookexperimental/hermit), which is a deterministic Linux instead of a VM.


> The biggest effect was that it gave our tiny engineering team the productivity of a team 50x its size.

49 years ago, a man named Fred Brooks published a book, wherein he postulated that adding people to a late software project makes it later. It's staggering that 49 years later, people are still discovering that having a larger engineering team does not make your work more productive (or better). So what does make work more productive?

Productivity requires efficiency. Efficiency is expensive, complicated, nuanced, curt. You can't just start out from day 1 with an efficient team or company. It has to be grown, intentionally, continuously, like a garden of fragile flowers in a harsh environment.

Is the soil's pH right? Good. Is it getting enough sun? Good. Wait, is that leaf a little yellow? Might need to shade it. Hmm, are we watering it too much? Let's change some things and see. Ok, doing better now. Ah, it's growing fast now. Let's trim some of those lower leaves. Hmm, it's looking a little tall, is it growing too fast? Maybe it does need more sun after all.

If you really pay attention, and continue to make changes towards the goal of efficiency, you'll get there. No need for a 10x developer or 3 billion dollars. You just have to listen, look, change, measure, repeat. Eventually you'll feel the magic of zooming along productively. But you have to keep your eye on it until it blooms. And then keep it blooming...


I was mentally hijacked into clicking the jobs link (despite recently deciding I wasn’t going to go down that rabbit hole again!) but fortunately/unfortunately it is in-person and daily so, so flying out from Chicago a week out of the month won’t work and I don’t even have to ask!

More to the point of the story (though I do think the actual point was indeed a hiring or contracting pitch), this reminds me a lot of the internal tests the SQLite team has. I would love to hear from someone with access to those if they feel the same way.


> I was mentally hijacked into clicking the jobs link (despite recently deciding I wasn’t going to go down that rabbit hole again!) but fortunately/unfortunately it is in-person and daily so, so flying out from Chicago a week out of the month won’t work and I don’t even have to ask!

given their PLTR connection, probably not


Oh, suddenly I'm not interested, either! Thanks!


"At FoundationDB, once we hit the point of having ~zero bugs and confidence that any new ones would be found immediately, we entered into this blessed condition and we flew. Programming in this state is like living life surrounded by a force field that protects you from all harm. Suddenly, you feel like you can take risks"

When this state hits it really is a thing to behold. Its very empowering to trust your system to this extent, and to know if you introduce a bug a test will save you.


the palantir testimonial on the landing page is funny


Even funnier if you manage to click "Declassify" :)


you're ip address is probably in the palantir databases anyway :o


And if you highlight the redactions, it reads:

REDACTED REDACTED REDACTED REDACTED REDACTED REDACTED and REDACTED REDACTED? REDACTED REDACTED Antithesis REDACTED REDACTED REDACTED REDACTED, REDACTED REDACTED REDACTED REDACTED. REDACTED REDACTED Palantir REDACTED REDACTED REDACTED REDACTED REDACTED REDACTED REDACTED.

:-)


This sort of awkward joke made to cover for capitalist illogic makes us all dumber.


We've done something similar for our medical device; totally deterministic simulations that cover all sorts of real world scenarios and help us improve our product. When you have determinism, you can make changes and just rerun the whole thing to make sure you actually addressed the problems you found.

Another nice side effect is that if you hang on to the specification for the simulation, you only have to hang on to core metrics from the simulation, since the entire program state can be reproduced in a debugger by just using the same specification on the same code version.


Reminds me of my work in the blockchain space. I built a Decentralized Exchange which gets all of its inputs from blockchain data. Since all the inputs are deterministic (and it's all based in block time instead of real time), I can replay any scenario any number of times... But it goes one step further than what is mentioned in the article because I can replay any bug which ever happens, exactly as it happened, IN PRODUCTION.

The output is also deterministic so all operations are idempotent; they can be replayed any number of times, going as far back in the past as you want and end up with exactly the same state.

But, ironically, the DEX hasn't had any bug since it started operating 3 years ago... It makes debugging so easy, I was able to resolve every possible issue I could imagine before the launch.

I think I haven't updated the code in a year but I've been running it this whole time and it's been processing trades.

Another unusual feature is that you could technically trade between 2 blockchains while the DEX is offline... Even if the DEX is only restarted 1 year or decades later, it will resolve the trades as though the DEX had been online the entire time. Only difference is that the disbursement of funds will be in the future after the DEX is restarted.


In my career I learned two powerful tools to get bug free code. Design by Contract and Randomized testing.

I had to roll this by myself for each project I did. Antithesis seems to systematize it and created great tooling around it. That's Great!!!

However, looking at their docs they rely on assertion failures to find bugs. I believe Antithesis has a missed opportunity here by not properly pushing for Design by Contract instead of generic use of assertions. They don't even mention Design by Contract. I suspect the vast majority of people here on HN have never heard of it.

They should create a Design by Contract SDK for languages that don't have one (think most languages) that interacts nicely with tooling and only fallback to generic assertions when their SDK is not available. A Design by Contract SDK would provide better error messages over generic assertions, further helping users solve bugs. In fact, their testing framework is useless without contracts being diligently used. It requires a different training and mindset from engineers. Teaching them Design by Contract puts them in that frame of mind.

They have an opportunity to teach Design by Contract to a new generation of engineers. I'm surprised they don't even mention it.


I've never gotten anything more out of DbC than it being assertions and if-statements, but described using fancy English. I even worked with the creator of C4J a few years ago.


The primary benefit imo is

* Way of thinking and discipline. Instead of adhock assertions, you deliberately state in code "These are the preconditions, invariants, and postconditions" of this function/module

* Better error messages.

* Better documentation (can automate extracting the contracts as documentation).

* Better tooling. Can automate creating tests from preconditions. You can sample the functions input space and make sure invariants and postconditions hold.

It's like, do you name all your functions 'func a1, func a2, func a3' or do you provide better names?


What’s your favourite writing on best practices for randomised testing?


I wonder if they are working on a time travel debugger. If it is truly deterministic presumably you could visit any point in time after a record is made and replay it.


Exactly - that's what we've already built for web development at https://replay.io :)

I did a "Learn with Jason" show discussion that covered the concepts of Replay, how to use it, and how it works:

- https://www.learnwithjason.dev/travel-through-time-to-debug-...

Not only is the debugger itself time-traveling, but those time-travel capabilities are exposed by our backend API:

- https://static.replay.io/protocol/

Our entire debugging frontend is built on that API. We've also started to build new advanced features that leverage that API in unique ways, like our React and Redux DevTools integration and "Jump to Code" feature:

- https://blog.replay.io/how-we-rebuilt-react-devtools-with-re...

- https://blog.isquaredsoftware.com/2023/10/presentations-reac...

- https://github.com/Replayio/Protocol-Examples


No comment. :-)

Disclosure: I am a co-founder of Antithesis.


It looks amazing, nice work!

Do you have any plans to let small open source teams use the project for free? Obviously you have bills to pay and your customers are happy to do that, but I was wondering if you'd allow open source projects access to your service once a week or something.

Partly because I want to play with this and I can't see my employer or client paying for it! But also it fits neatly into "DX", the Developer Experience, i.e. making the development cycle as friction free for devs as possible. I'm DevOps with a lifelong interest in UX, so DX is something I'm excited about.


Pricing suitable for small teams, and perhaps even a free tier, is absolutely on the roadmap. We decided to build the "hard", security-obsessed version of the infrastructure first -- single-tenant, with dedicated and physically isolated hardware and networking for every customer. That means there's a bit of per-customer overhead that we have to recoup.

In the future, we will probably have a multi-tenant offering that's easier for open source projects to adopt. In the meantime, if your project is cool and would benefit from our testing, you can try to get our research team interested in using it as part of the curriculum that makes our platform smarter.

Disclosure: I'm an Antithesis co-founder.


We've actually done quite a bit of testing on open source projects as we've built this, and have discussed doing an on-going program of testing open source projects that have interested contributors. We'd probably find some interesting things and could do some write-ups. Reach out to us via our contact page or contact@antithesis.com and let's chat.


That's exactly what Tomorrow Corporation uses for their hand written game engine and compiler: https://www.youtube.com/watch?v=72y2EC5fkcE


[I work at Antithesis]

The system can certainly revisit a previous simulated moment and replay it. And we have some pretty cool things using that capability as a primitive. Check out the probability chart in the bug report linked from the demo page: https://antithesis.com/product/demo


Now I want a simulation-run replay scrubbing slider MIDI-connected to my Pioneer DJ rig to scratch through our troublesome tests as my homies push patched containers.

Seriously: impressive product revelation.


Let's do it.


That’s what rr-project does essentially?


I can see how determinism can be achieved (not easy, but possible), and I can see how describing a few important system invariants can match 100's or 1000's of hand rolled tests, but I'm having a hard time understanding how it's possible to intelligently explore the inputs to generate.

e.g. if I wrote a compiler, how would Antithesis generate mostly valid source code for it? Simply fuzzing utf8 inputs wouldn't get very far.


The blog post has some impressive copy but is lacking details on how you implement their product.

I am highly skeptical of any claims that something 'magically just works' without much configuration or setup.


(Disclosure: I’m an Antithesis employee.)

The blog post is meant as a high-level introduction for a general audience. The documentation (https://antithesis.com/docs/) goes into considerably more detail about what kind of configuration and setup you need to start testing with Antithesis.


I don't know how they'd do compiler testing, but I know how I do it (testing Common Lisp), and can talk about that if you're interested.

But it would be cool to hear how they'd do it.


There’s a straightforward way to reach this testing state for optimization problems. Write 2 implementations of the code, one that is simple/slow and one that is optimized. Generate random inputs and assert outputs match correctly.

I’ve used this for leetcode-style problems and have never failed on correctness.

It is liberating to code in systems that test like this for the exact reasons mentioned in the article.


Non-overlapping problem spaces.

Leet-code ends in unit-testing land, this product begins in system-testing land.


Hmmm, so this excited me a great deal since I spent some time building a property-based testing system for a trading execution engine. (Which really can't have any issues). It's an agent-based system with many possible race conditions.

Now, having built such a system (and being incredibly happy with it). I do not understand where antithesis comes in.

Broadly there were 2 components:

1. A random strategy that gives random commands that are distributed in a very specific way.

2. Properties to test, observing the system (by listening to the messages) and checking them for some properties (like positions always being correctly protected etc..)

Now, there is not much else (besides random trading data which is very easy to do or you can use actual live data). And all of them depend to 99% on domain knowledge. Now, where is there space for a company/generic framework to come in? I'm genuinely curious.


There are situations where no bugs is an important requirement, if it means no bugs that cause a noticeable failure. Things such as planes, submarines, nuclear reactors. For those there is provably correct code. That takes a long time to write, and I mean a really long time. Applying that to all software doesn't make sense from a commercial perspective. There are areas where improvements can have a big impact though, such as language safety improvements (Rust) and cybersecurity requirements regarding private data protection. I see those as being the biggest win.

I don't see no bugs in a distributed database as important enough to delay shipping for 5 years, but (a) it's not my baby; (b) I don't know what industries/use cases they are targeting. For me it's much more important to ship something with no critical bugs early, get user feedback, iterate, then rinse and repeat continually.


A lot of people underestimate the power of QA. Yeah, it would be great if we could just perfectly engineer something out of the gate. But you can also just take several months to stare at something, poke at it, jiggle it, and fix every conceivable problem, before shipping it. Heresy in the software world, but in every other part of the world it's called quality.


> I don't see no bugs in a distributed database as important enough to delay shipping for 5 years

The marketplace has enough distributed databases with bugs. There's a nice catalogue of them at jepsen.io.

> For me it's much more important to ship something with no critical bugs early, get user feedback, iterate, then rinse and repeat continually.

* You can't really choose which bugs are critical if you're selling a database. A lost write is as critical as the customer deems it is.

* You're not limited to your own users' feedback. There's plenty of users out there who disapprove of a buggy database, so you can probably take their views onboard before release.


This is a false dichotomy though. The proposed approach here has a (theoretically) great cost to value ratio. Spending time on a workload generation process, and adding some asserts to your code is much lower cost than hand-writing tens of thousands of test cases.

So it's not that this approach is only useful for critical applications, it's that it's low-cost enough to potentially speed up "regular" business application testing.


I love this idea. In the early days of computing computers were claimed to always be deterministic. Give it the same inputs and you get the same outputs. Little by little that disappeared, with interrupts, with multithreading, with human derived inputs, with multitasking, with distributed processing, until today computers and applications are often not deterministic at all, and it does indeed make them very difficult to test. Bringing back the determinism may not only be good for testing, it seems likely to improve reliability. While I see how this is great for distributed databases I wonder if it has application when inputs are inherently non-deterministic (e.g., human input, sensor derived inputs).


Was an eaaaaaaaarly tester for this. Pretty neat stuff.


I appreciated this post. Separately from what they are talking about, I found this bit insightful:

// This limits the value of testing, because if you had the foresight to write a test for a particular case, then you probably had the foresight to make the code handle that case too.

I often felt this way when I saw developers feel a sense of doing good work and creating safe software because they wrote unit tests like expect add(2,2) = 4. There is basically a 1-1 correlation between cases you thought to test and that you coded for, which is really no better off in terms of unexplored scenarios.

I get that this has some incremental value in catching blatant miscoding and regressions down the road so it's helpful, it's just not getting at the main thing that will kill you.

I felt similarly about human QA back in my finance days that asked developers for a test plan. If the dev writes a test plan, it also only covers what the dev already thought about. So I asked my team to write the vaguest/highest level test plan possible (eg, "it should now be possible to trade a Singaporean bond" rather than "type the Singaporean bond ticker into the field, type the amount, type the yield, click buy or sell") - the vagueness made more room for the QA person to do something different (even things like tabbing vs clicking, or filling the fields out of sequence, or misreading the labels) than how the dev saw it, which is the whole point.


This is how I have always developed hard-to-make-bug-free software. Write a simulator first that will deterministically give your software the worst possible environment to run in with failures happening all the time. I have delivered complicated software into production using this method with zero bugs ever reported. It works brilliantly.


Reminds me of the clever hack of playing back TCP dump logs from prod on a test network, but dialed up. Neat.

Naturally I’d prefer professional programmers learn the cognitive tools for manageably reasoning about nondeterminism, but they’ve been around over half a century and it hasn’t happened yet.

What’s really interesting to me is that the simulation adequately replicates the real network. One of the more popular criticisms of analytical approaches is sone variant of: yeah, but the real network isn’t going to behave like your model. Which by the way is an entirely plausible concern for anyone who has messed with that layer.


> Naturally I’d prefer professional programmers learn the cognitive tools for manageably reasoning about nondeterminism

It’s not an either-or here, though. Part of the challenge is you’re not always thinking about all the non-determinisms in your code, and the interconnections between your code and other code (whose behavior you can sometimes only assume) can make that close to impossible. Part of that is the “your model of the network” critique, but also part of that is “your model of how people will use your software” isn’t necessarily correct either.


What is interesting here is that the solution could fuzz-test anything, including the network model, leading to failures even more implausible than reality.


This is really exciting.

I am an absolute beginner at TLA+ but I really like this possible design space.

I have an idea for a package manager that combines type system with this style of deterministic testing and state space exploration.

Imagine knowing that your invocation of

   package-manager install <tool name>
Will always work because file system and OS state are part of the deterministic model.

or an next gen Helm with type system and state space exploration is tested:

   kubectl apply <yaml>
will always work when it comes up because all configuration state space exploration has been tested thanks to types.


Coincidence, I'm reading this and thinking about test harnesses for my package manager idea, which is really just a thin wrapper around nix, designed under the assumption that the network might partition at any moment: keep the data nearest where it's needed, refer by hash not by name, gossip metadata necessary to find the hash for you, no single points of failure.

Tell me more about yours?


I am thinking about state machine progressions and TLA+ style specifications which are invariants over a progression of variables.

Your package manager knows your operating system's current state and the state space of all the control flow graph through the program and configuration together can go to, it can verify that everything lines up and there will be no error when executed a bit like a compiler but without causing the Halting problem.

In TLA+ you can dump a state graph as a dot file, which I turn into a SVG and run with TLA+ graph visualiser.

Types verify possible control flow is valid at every point. We just need to add types to the operating system and file system and represent state space for deterministic verification.

You could hide packages that won't work.

The package manager would have to lookup precached state spaces or download them as part of the verification process.


>It’s pretty weird for a startup to remain in stealth for over five years.

Not really. I have friends who work for a startup that's been in "stealth" for 20 years. Stealth is a business model not a phase.


This sounds quite cool. Although it doesn't say so, I imagine the name is riff off Hypothesis, the testing tool that performs automatic test case simplification in a general way.


(I’m an early employee of what was, on my start date, just called “Void Star.”)

As I recall it, the name meant two things:

1. Our “autonomous testing” approach is the opposite, or the antithesis, of flaky and unreliable testing methodologies.

2. You can think of our product as standing in dialectical opposition to buggy customer software, pointing out its internal contradictions (bugs) and together synthesizing a new, bug-free software product. (N.b.: I’ve never actually read Hegel.)

We did note the resonance with Hypothesis (a library I like a lot!) at the time, but it was just an added bonus :).


interesting, this kind of responsive environment is dear but rare

i can't recall the last time i went to a place and people even considered investing in such setups

i assume that except for hard problems and teams seeking challenges, most people will revert to the mean and refuse any kind of infrastructure work because it's mentally more comfortable piling features and fixing bugs later

ps: i wish there was a meetup of teams like this, or even job boards :)


We'll be starting some meetups, attending conferences, etc. this year. Also hop into our Discord if you want to chat, lots of us are in there regularly. discord.gg/antithesis


oh, that's cool, thanks


This is really impressive, but still, if you're working on a piece of software where this can work, count yourself lucky. Most software I've worked on (boring line-of-business stuff) would need as many lines of code to test a behavior as to implement the behavior.

It's not very frequently that you have a behavior that's very hard to make correct, but very easy to check for correctness.


Looks like this coincides with seed funding[1], congrats folks! Did you guys just bootstrap through the last 5 years of development?

[1] https://www.saltwire.com/cape-breton/business/code-testing-s...


What is described in this post is the gold standard of software reliability testing. A world where all critical and foundational systems are tested to this level would be a massive step forward for technology in general.

I'm skeptical of their claims but inspired by the vision. Even taking into account my skepticism, I would prefer to deploy systems tested to this standard over alternatives.


This looks awesome, but what if my stack has a lot of buy-in into the AWS infrastructure and uses lambda, SNS, SQS etc... how to containerise this??

EDIT: Oh nice: https://antithesis.com/docs/using_antithesis/environment.htm...


"I love me a powerful type system, but it’s not the same as actually running your software in thousands and thousands of crazy situations you’d never dreamed of."

Would not trust. Formal software verification is badly needed. Running thousands of tests means almost nothing in software world. Don't fool beginners with your test hero stories.


That'll work great for your Distributed QSort Incorporated startup, where the only product is a sorting algorithm.

Formal software verification is very useful. But what can be usefully formalized is rather limited, and what can be formalized correctly in practice is even more limited. That means you need to restrict your scope to something sane and useful. As a result, in the real world running thousands of tests is practically useful. (Well, it depends on what those tests are; it's easy to write 1000s of tests that either test the same thing, or only test the things that will pass and not the things that would fail.) They are especially useful if running in a mode where the unexpected happens often, as it sounds like this system can do. (It's reminiscent of rr's chaos mode -- https://rr-project.org/ linking to https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo... )


https://en.wikipedia.org/wiki/L4_microkernel_family they are the real heroes, not someone starting the sentence with "i love me", and continuing accordingly.


Formal verification require a formal statement of what the software is supposed to do.

But if you have that, you have a recipe for doing property based testing: generate inputs that satisfy the conditions specified in this formal description, then verify that the behavior satisfies the specification also.

And then run for millions and millions of inputs.

Is it really going to be worth proving the program correct, when you could just run an endless series of tests? Especially if the verifier takes forever solving NP hard theorem proving problems at every check in. Use that compute time to just run the tests.


I remember a colleague writing his first computer program:

x = 1 // set x to "1"

x = 1 // set x to "1", again, just to make sure.


Great, but formal software verification is not yet broadly applicable to most day-to-day app development.

Good type systems (a pretty decent chunk of formal software dev) are absolutely necessary and available.

But things get tricky moving past that.

I've tried out TLA+/PlusCal, and one or more things usually happen:

1) The state space blows up and there's simply too much to simulate, so you can't run your proof.

2) With regard to race-detection, you yourself have to choose which sections of code are atomic, and which can be interleaved. Huge effort, source of errors, and fills the TLA file with noise.

3) Anything you want to run/simulate needs an implementation in TLA+. By necessity it's a cut-down version, or 'model'. But even when I'm happy to pretend all-of-Kafka is just a single linkedlist, there's still so much (bug-inviting) coding to model your critical logic in terms of your linked list.

Ironically, TLA+ is not itself typed (deliberately!). In a toy traffic light example, I once proved that cars and pedestrians wouldn't be given "greenLight" at the same time. Instead, the cars had "greenLight" and the pedestrians had "green"!


"Colorless green ideas sleep furiously"


Gosh, I know it's a bit late, but I wish they'd called the product _The Prime Radiant_

Fans of Asimov's _Foundation_ series will appreciate the analogue to how this system aims to predict every eventuality based on every possible combination of events, a la psychohistory.

P.S. amazing intro post. Can't wait to try the product.


It would be the opposite of the product:

For a software not interacting with the real world, there is only one possibility for frame N+1, if you know the state of a system.

https://en.wikipedia.org/wiki/Determinism

PRNG are illusions, just misunderstood by humans.


Feels like I may have brought a spoon to a gun fight, but I would have considered psychohistory to be the ultimate extrapolation of determinism, and the fact that the prime radiant is able to predict _which_ version of events will happen is because it (somehow) knows the state of the system.

Of course, to argue against myself, it would surely be based on layers of probabilities, and they say several times in the series that it can't predict low-level specific things, just high-level things. And perhaps the whole underlying question posed by the series is whether the universe really is deterministic. But anyway I don't think it's all off-base.



No, it is a reference to https://en.wikipedia.org/wiki/Laplace%27s_demon - Pierre Simon Laplace, A Philosophical Essay on Probabilities

(if you know the state of a system + all the rules then you can know the next state)

and determinism actually applies way beyond just algorithms.


This looks to be an incredible tool that was years in the making. Excited to see where it goes from here!


I'm sure I've heard of something similar being built, but specific to the JVM (ie, a specialised JVM that tests your code by choosing the most hostile thread switching points). Unfortunately that was mentioned to me at least 10 years ago, and I can't find it.


Sounds a bit like jockey applied to qemu. Very neat indeed.

https://www.cs.purdue.edu/homes/xyzhang/spring07/Papers/HPL-...


There's indeed a connection between record/replay and deterministic execution, but there's a difference worth mentioning, too. Both can tell you about the past, but only deterministic execution can tell you about alternate histories. And that's very valuable both for bug search (fuzzing works better) and for debugging (see for example the graphs where we show when a bug became likely to occur, seconds before it actually occurred).

(Also, you won't be able to usefully record a hypervisor with jockey or rr, because those operate in userspace and the actual execution of guest code does not. You could probably record software cpu execution with qemu, but it would be slow)

I'm a co-founder of Antithesis.


I have been down this road a little bit, applying the ideas from jockey to write and ship a deterministic HFT system, so I have some understanding of the difficulties here.

We needed that for fault tolerance, so we could have a hot synced standby. We did have to record all inputs (and outputs for sanity checking) though.

We did also get a good taste of the debugging superpowers you mention in your blog article. We could pull down a trace from a days trading and replay on our own machines, and skip back and forth in time and find the root cause of anything.

It sounds like what you have done is something similar, but with your own (AMD64) virtual machine implementation, making it fully deterministic and replayable, and providing useful and custom hardware impls (networking, clock, etc).

That sounds like a lot of hard but also fun work.

I am missing something though, in that you are not using it just for lockstep sync or deterministic replays, but you are using it for fuzzing. That is, you are altering the replay somehow to find crashes or assertion failures.

Ah, I think perhaps you are running a large number of sims with a different seed (for injecting faults or whatnot) for your VM, and then just recording that seed when something fails.


I assume deterministic execution also lets you do failing test case reduction.

I've found this sort of high volume random testing w. test case reduction is just a game changer for compiler testing, where there's much the same effect at quickly flushing out newly introduced bugs.

I like the subtle dig at type systems. :)


that sounds like "automated advanced chaos monkey" to me https://en.wikipedia.org/wiki/Chaos_engineering#Chaos_Monkey


Depends on how far you mean with "advanced" here. We specifically cover the differences between Antithesis and Chaos Engineering in our "How It's Different" page:

https://antithesis.com/product/how_is_antithesis_different/

Here's the relevant text though:

Antithesis testing resembles chaos testing, in that it injects faults to trigger and identify problems. But Antithesis runs these tests in a fully deterministic simulated environment, rather than in production. This means Antithesis testing never risks real-world downtime. This in turn allows for much more aggressive fault injection, which finds more bugs, and finds them faster. Antithesis can also test new builds before they roll out to production, meaning you find the bugs before your customer does.

Finally, Antithesis can perfectly reproduce any problem it finds, enabling quick debugging. While chaos testing can discover problems in production, it is then unable to replicate them, because the real world is not deterministic.


Pricing doesn't make sense.

What does a CPU hour mean for this framework? How many do I need?


So this is a valgrind for containers? "If" it works well, and doesn't flag false things, this is pretty useful.

You might want to sell this as a bug finder. But it also could be sold as a security hardening tool.


I think there is a lot of opportunity for integrating simulation into software development. I'm surprised it isn't more common though I suppose the upfront investment would scare many away.


I kept cringing when I read the words “no bugs.”

This is hubris in the classic style - it’s asking for a literal thunderbolt from the heavens.

It may be true, but…come on.

Everyone who has ever written a program has thought they were done only to find one more bug. It’s the fundamental experience of programming to asymptotically approach zero bugs but never actually get there.

Again, perhaps the claim is true but it goes against my instincts to entertain the possibility.


Yeah, same. It suggests that you must be employing one of the time-honored approaches to getting zero bugs:

* Redefine all bugs as features

* Redefine "bug" to conveniently only apply to the things your system prevents

* Don't write software

This reminds me of bugzilla's "Zarro Boogs" phrase that pointedly avoids saying "Zero Bugs" because it's such a deceptive term, see https://en.wikipedia.org/wiki/Bugzilla

Being able to say "no bugs" with justifiable confidence, even when restricting it to some class of bugs, is truly a great and significant thing. cf Rust. But claiming to have no bugs is cringeworthy.


I think there is something interesting about the fact that someone writing "no bugs" makes us all uncomfortable.

If they really did have a complex product, running in production from a sizeable userbase and had 2 bug reports ever, then I think it's a reasonable thing to say.

The fact that it isn't a reasonable thing to say for the most other software is a little sad.


Right, the claim may be true, but I have a visceral reaction to it. And tbh I'd be hesitant to work with someone who made a zero-bugs claim about their own work.


It's a marketing blog post, not a technical post. Something about the whole thing feels icky.


Nope. CompCert is an example of software that was mostly proven correct (all the important stuff) and zero bugs have ever been found in the proven correct parts. So yes you can indeed write zero bugs software.


I'm trying to figure out the name. Is it simply a play on Hypothesis, or am I missing something clever about this being the opposite of property-based testing?


This "no bugs" maximalism is counterproductive. There are many classes of bugs that this cannot hope to handle. For example, let's say I have a transaction processing application that speaks to Stripe to handle the credit card flow. What happens if Stripe begins send a webhook showing that it rejected my transactions but report them as completed successfully when I poll them? The need to "delete all of our dependencies" (I presume they wrote their own OS kernel too?) in FoundationDB shows that upstream bugs will always sneak through this tooling.


Happy customer here —— maybe the first or second? Distributed systems are hard; #iykyk.

Antithesis makes them less hard (not in line an NP hard sense but still!).


It sounds great but let’s see how it works and what are the constraints and what the UX looks loke


The UI is exposed on the website, you just have to poke around a bit. There's a link to a live report from this documentation page: https://antithesis.com/docs/reports/triage.html


This sounds amazing, but I wonder how long it would take to set up for any reasonably complex system.


First reaction: "Yes, your site's weird font is bugging me!"


(Antithesis employee here.)

We’re using Inter, which our designer assures me is pretty popular. But this isn’t the first time we’ve heard this complaint. Also, we had an issue in testing on a different part of our site where the font was getting computed as something weird and ugly on our development NixOS machines. Would you mind replying with what the browser console says your font is getting computed as? Thanks!

Edit: The other person who mentioned this seems to think that it’s caused by their JavaScript blocker—we’re trying to figure out why, but in the meantime, enabling JS might help if you haven’t.


(It's JS blocking - I told NoScript to allow antithesis.com, and that completely changed the font in FireFox.)


Why are all the cool people working on DBs and talking about Paxos?


This reminds me of Java Pathfinder, but for distributed systems.


Could this work for embedded C projects? Bare metal or RTOS?


I need to follow this example to build software faster


Tactical 10x engineers - leave lot of mess and tech debt. Eventually this becomes 0.1X engineer. 0.5X strategic engineer will make the whole surrounding 2X engineers.


Looks promising!!


Reading this article, I want the same now for js code that involves web-workers...

How can I write code that involves a webworker in a way that I can simulate every possible CPU scheduling between the main thread in the webworker (given they communicate via post message and no shared array buffer)? Is it possible to write such brute force test in pure JS, without having to simulate the entire computer?


Use TLA+/PlusCal for this. It's what it's there for.


alternative name for the product: Laplace's Demon for Your Code


The kids dieing in Gaza while our governments close one eye.


Imagine being proud of working for Palantir.


Your life depends on lots of unsavory tasks.


Yes, like sewage pipe maintenance. Not data mining to figure out who to assassinate without trial.

Using the “unsavory” euphemism for unethical and illegal violence is somewhat of a deception, is it not?


No, I did mean to say potentially "unethical" tasks. Much of national defense is deemed "unethical" by many, yet a critical function that allows you to live a peaceful life.


[flagged]


What a bizarre stance. It's literally easier to https-everything than it is to split your content up between http and https. And there are zero reasons to not use https.

You come off as someone who couldn't figure out how to run certbot.


What symptoms did you run into?


> and found all of the bugs in the database

This is when I stopped reading


one of the best applications yet of AI in cyber


https://antithesis.com/images/people/will.jpg the look of the CEO is selling the software to me automatically. reliable and nice


Business value is a good way to think about it:

> As a software developer, fixing bugs is a good thing. Right? Isn’t it always a good thing?

> No!

> Fixing bugs is only important when the value of having the bug fixed exceeds the cost of the fixing it.

https://www.joelonsoftware.com/2001/07/31/hard-assed-bug-fix...


If you have high volume automated testing, you want to fix all the bugs, even ones not directly affecting customers, otherwise they keep showing up and may keep you from finding other more customer-relevant bugs.


Talk about bad writing. If I don't know what the hell your thing is in the first paragraph, I'm not going to read your whole blog post to find out. Homepage is just as bad.


The article is more of a history lesson and context than it is an ad. I see what you mean, but clicking “product -> What Is Antithesis?” Shows a clear description of what it does. Perhaps that could also either be added to the article or the home page?


Could you give example of a good writing according to your perspective?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: