>>*I see a lot of negativity in this thread, which mirrors my experience: in mos...

x0x0 · on Nov 21, 2022

> As an outside observer to this area, something doesn't add up. It sounds like software tooling is desperately needed to advance the entire field across the board

This is hardly the only place in society where X is desperately needed, but people don't want to pay for it, and continue to suffer through the underprovision of X.

Here, software engineering is simply not viewed as prestigious or important, or often, even as valuable. The science is viewed as valuable. A lack of bugs or rigorous software engineering practices... nobody cares.

Some of it is a phd is a prestige competition, and dirty engineers being comped on par with people who spent (cough wasted cough) a decade or more of their life in college/post-docs just won't do.

And a piece of it is the scale of the investment needed. Imagine a couple million LOCs with only manual testing. Your two weeks of writing tests is a tiny drop in the bucket. Retrofitting reasonable software dev standards on these projects is enormously expensive.

Finally, there's an inescapable volume issue. Suppose I build a hot new confocal microscope and I sell 300 of them for low hundreds of thousands each. I have 2 teams of devs ($2M/year/team) for 2 years on analysis code, for a $8m investment. That's $27k/machine. That's real tough math to make work. Whereas Google pays gmail engineers really well, in part because they spread those costs over a billion users.

ethbr0 · on Nov 22, 2022

> Here, software engineering is simply not viewed as prestigious or important, or often, even as valuable. The science is viewed as valuable.

Specifically, publishing is viewed as valuable. And publishing is a frozen-in-time snapshot.

Ergo, all the things that drive quality software elsewhere (SRE, maintainability, interpretability) simply don't exist as incentives.

> [Low product sales count is] real tough math to make work. Whereas Google pays gmail engineers really well, in part because they spread those costs over a billion users.

Also a great observation. Most cutting edge is, by definition, mostly custom. It's really hard to amortize even incredibly valuable things over tiny population counts and still pay market software engineering wages.

mfld · on Nov 20, 2022

I have a practical example to explain what, at first glance, doesn't seem to add up. Many genomics analysis contain a step to clean up DNA sequences. For this "Read Trimming" step there exist no fewer than 40 open source tools. Let's say you need this step in your project and use one of these tools. You find an issue the original creator is unwilling to fix: just choose another tool, problem solved. You find that the tool does not perform well enough on your data: choose another tool, problem solved. So while the articles points are true, they in practice often don't lead to a real pain.

mike_hearn · on Nov 20, 2022

I recall once hearing from a VC about why they hardly invest in biotech (or it might have been reading it somewhere, memory is fuzzy). It boiled down to: way too much non-replicable research, often with suspicions of fraud by the original labs. It can easily be the case that a biotech startup burns through millions setting up a lab from scratch, then attempting to replicate some academic paper that they thought they could commercialize, only to discover that the effect doesn't really exist. This problem doesn't affect the software industry, so that's where the money goes.

Why so few tooling companies - is there actually a market for good software in science? For there to be such a market most scientists would have to care about the correctness of their results, and care enough to spend grant money on improvements. They all claim to care, but observation of actual working practices points to the opposite too much of the time (of course there are some good apples!).

In 2020 I got interested in research about COVID, so over the next couple of years I read a lot of papers and source code coming out of the health world. I also talked to some scientists and a coder who worked alongside scientists. He'd worked on malaria research, before deciding to change field because it was so corrupt. He also told me about an attempt to recruit a coder who'd worked on climate models who turned out to be quitting science entirely, for the same reason. The same anti-patterns would crop up repeatedly:

- Programs would turn out to contain serious bugs that totally altered their output when fixed, but it would be ignored because nobody wants to retract papers. Instead scientists would lie or BS about the nature of the errors e.g. claiming huge result changes were actually small and irrelevant.

- Validation would be often non-existent or based on circular reasoning. As a consequence there are either no tests or the tests are meaningless.

- Code is often write-once, run-once. Journals happily accept papers that propose an entirely ad-hoc and situation specific hypothesis that doesn't generalize at all, so very similar code is constantly being written then thrown away by hundreds of different isolated and competing groups.

These issues will sooner or later cause honest programmers to doubt their role. What's the point in fixing bugs if the system doesn't care about incorrect results? How do you know your refactoring was correct if there are no unit tests and nobody can even tell you how to write them? How do you get people to use tools with better error checking if the only thing users care about is convenience of development? How do you create widely adopted abstractions beyond trivial data wrangling if the scientists are effectively being paid by LOC written?

The validation issue is especially neuralgic. Scientists will check if a program they wrote works by simply eyeballing the output and deciding that it looks right. How do they know it looks right? Based on their expertise; you wouldn't understand, it's far too complicated for a non-scientist. Where does that expertise come from? By reading papers with graphs in them. Where do those graphs come from? More unvalidated programs. Missing in a disturbing number of cases - real world data, or acceptance that real data takes precedence over predicted data. Example from [1]: "we believe in checking models against each other, as it's the best way to understand which models work best in what circumstances". Another [2]: "There is agreement in the literature that comparing the results of different models provides important evidence of validity and increases model credibility".

There are a bunch of people in this thread saying things like, oh, I'd love to help humanity but don't want to take the pay cut. To anyone thinking of going into science I'd strongly suggest you start by taking a few days to download papers from the lab you're thinking of joining and carefully checking them for mistakes, logical inconsistencies, absurd assumptions or assertions etc. Check the citations, ensure they actually support the claim being made. That sort of thing. If they have code on github go read it. Otherwise you might end up taking a huge pay cut only to discover that the lab or even whole field you've joined has simply become a self-reinforcing exercise in grant application, in which the software exists mostly for show.

[1] https://github.com/ptti/ptti/blob/master/README.md

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3001435/