Hacker News new | past | comments | ask | show | jobs | submit login
Consider working on genomics (claymcleod.dev)
372 points by clmcleod on Nov 19, 2022 | hide | past | favorite | 290 comments



I'm a software engineer who works on genomics. I see a lot of negativity in this thread, which mirrors my experience: in most places, you'll be paid like a researcher with the respect of a lab assistant – unless you have a PhD and a postdoc.

That said, it's possible find work that's respected and pays well. Most of that kind of work is happening in the context of startups or freelancing. My favorite example of this is Robert Edgar: he's a freelance computational biologist with over 100k citations who has made a living for the past 20 years by selling licenses to his bioinformatics software (https://scholar.google.com/citations?user=RzVMRc0AAAAJ).

To find those kinds of jobs, I'd try YC's Work at a Startup, Flagship Pioneering's portfolio companies, and emailing founders of companies that have a bioinformatics component (my email is in my profile!).

I think the issues with the field are because it's a new and growing space. We do need better tooling, respect for engineering, and established best practices, but that seems to have been the case in the past for other domains that moved from research to industry – including software engineering itself.


>>I see a lot of negativity in this thread, which mirrors my experience: in most places, you'll be paid like a researcher with the respect of a lab assistant – unless you have a PhD and a postdoc.

As an outside observer to this area, something doesn't add up. It sounds like software tooling is desperately needed to advance the entire field across the board, yet it seems that few startups or founders are attempting to tackle this problem or, if they exist, aren't having much of an impact (perhaps, yet?).

One would imagine that all of this inefficiency, suffering and bottlenecking of incredible therapies to cure diseases and advance human knowledge would be a siren call for capital allocators to unlock value by solving this pain point -- but here we are, in some cases still 20 years and counting.

I can buy the argument that FAANGs have had amazing compensation packages over the past 2 decades, but this still doesn't address the reason why nobody else has bothered or been able to to "disrupt" (yes, air quotes) the industry in this regard and harvest such seemingly low-hanging fruit.

I see a few comments talking about the PI and grant-funding model -- but if the promised value was sufficiently large then I find it hard to believe that this wouldn't have been a competitive candidate alongside other recent buzzword-laden investment trends such blockchain & AI that pulled down so much VC funding over the past decade.

Clearly, I'm missing a piece of the value puzzle as to why founders and startups are few and far between to specifically address the dire straits that biological software engineering (computational biology, bioinformatics, systems biology, etc.) finds itself in.


> As an outside observer to this area, something doesn't add up. It sounds like software tooling is desperately needed to advance the entire field across the board

This is hardly the only place in society where X is desperately needed, but people don't want to pay for it, and continue to suffer through the underprovision of X.

Here, software engineering is simply not viewed as prestigious or important, or often, even as valuable. The science is viewed as valuable. A lack of bugs or rigorous software engineering practices... nobody cares.

Some of it is a phd is a prestige competition, and dirty engineers being comped on par with people who spent (cough wasted cough) a decade or more of their life in college/post-docs just won't do.

And a piece of it is the scale of the investment needed. Imagine a couple million LOCs with only manual testing. Your two weeks of writing tests is a tiny drop in the bucket. Retrofitting reasonable software dev standards on these projects is enormously expensive.

Finally, there's an inescapable volume issue. Suppose I build a hot new confocal microscope and I sell 300 of them for low hundreds of thousands each. I have 2 teams of devs ($2M/year/team) for 2 years on analysis code, for a $8m investment. That's $27k/machine. That's real tough math to make work. Whereas Google pays gmail engineers really well, in part because they spread those costs over a billion users.


> Here, software engineering is simply not viewed as prestigious or important, or often, even as valuable. The science is viewed as valuable.

Specifically, publishing is viewed as valuable. And publishing is a frozen-in-time snapshot.

Ergo, all the things that drive quality software elsewhere (SRE, maintainability, interpretability) simply don't exist as incentives.

> [Low product sales count is] real tough math to make work. Whereas Google pays gmail engineers really well, in part because they spread those costs over a billion users.

Also a great observation. Most cutting edge is, by definition, mostly custom. It's really hard to amortize even incredibly valuable things over tiny population counts and still pay market software engineering wages.


I have a practical example to explain what, at first glance, doesn't seem to add up. Many genomics analysis contain a step to clean up DNA sequences. For this "Read Trimming" step there exist no fewer than 40 open source tools. Let's say you need this step in your project and use one of these tools. You find an issue the original creator is unwilling to fix: just choose another tool, problem solved. You find that the tool does not perform well enough on your data: choose another tool, problem solved. So while the articles points are true, they in practice often don't lead to a real pain.


I recall once hearing from a VC about why they hardly invest in biotech (or it might have been reading it somewhere, memory is fuzzy). It boiled down to: way too much non-replicable research, often with suspicions of fraud by the original labs. It can easily be the case that a biotech startup burns through millions setting up a lab from scratch, then attempting to replicate some academic paper that they thought they could commercialize, only to discover that the effect doesn't really exist. This problem doesn't affect the software industry, so that's where the money goes.

Why so few tooling companies - is there actually a market for good software in science? For there to be such a market most scientists would have to care about the correctness of their results, and care enough to spend grant money on improvements. They all claim to care, but observation of actual working practices points to the opposite too much of the time (of course there are some good apples!).

In 2020 I got interested in research about COVID, so over the next couple of years I read a lot of papers and source code coming out of the health world. I also talked to some scientists and a coder who worked alongside scientists. He'd worked on malaria research, before deciding to change field because it was so corrupt. He also told me about an attempt to recruit a coder who'd worked on climate models who turned out to be quitting science entirely, for the same reason. The same anti-patterns would crop up repeatedly:

- Programs would turn out to contain serious bugs that totally altered their output when fixed, but it would be ignored because nobody wants to retract papers. Instead scientists would lie or BS about the nature of the errors e.g. claiming huge result changes were actually small and irrelevant.

- Validation would be often non-existent or based on circular reasoning. As a consequence there are either no tests or the tests are meaningless.

- Code is often write-once, run-once. Journals happily accept papers that propose an entirely ad-hoc and situation specific hypothesis that doesn't generalize at all, so very similar code is constantly being written then thrown away by hundreds of different isolated and competing groups.

These issues will sooner or later cause honest programmers to doubt their role. What's the point in fixing bugs if the system doesn't care about incorrect results? How do you know your refactoring was correct if there are no unit tests and nobody can even tell you how to write them? How do you get people to use tools with better error checking if the only thing users care about is convenience of development? How do you create widely adopted abstractions beyond trivial data wrangling if the scientists are effectively being paid by LOC written?

The validation issue is especially neuralgic. Scientists will check if a program they wrote works by simply eyeballing the output and deciding that it looks right. How do they know it looks right? Based on their expertise; you wouldn't understand, it's far too complicated for a non-scientist. Where does that expertise come from? By reading papers with graphs in them. Where do those graphs come from? More unvalidated programs. Missing in a disturbing number of cases - real world data, or acceptance that real data takes precedence over predicted data. Example from [1]: "we believe in checking models against each other, as it's the best way to understand which models work best in what circumstances". Another [2]: "There is agreement in the literature that comparing the results of different models provides important evidence of validity and increases model credibility".

There are a bunch of people in this thread saying things like, oh, I'd love to help humanity but don't want to take the pay cut. To anyone thinking of going into science I'd strongly suggest you start by taking a few days to download papers from the lab you're thinking of joining and carefully checking them for mistakes, logical inconsistencies, absurd assumptions or assertions etc. Check the citations, ensure they actually support the claim being made. That sort of thing. If they have code on github go read it. Otherwise you might end up taking a huge pay cut only to discover that the lab or even whole field you've joined has simply become a self-reinforcing exercise in grant application, in which the software exists mostly for show.

[1] https://github.com/ptti/ptti/blob/master/README.md

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3001435/


If you want to find good jobs like this, you should check out the monthly HN "Who is hiring" threads. There aren't that many biotechs, but there are a few (like ours). I agree about the need for better tooling. There's also a need for a stronger conceptual understanding of cell biology generally, and an ability to build an ecosystem of APIs that work well together. My email is in my profile if you'd like to talk more about this space.


Genomics today is the internet of the early 90s or programming of the late 70s. There is going to be an enormous boom in genomics-derived technologies in the coming decade which has been driven by the exponential decay in data generation costs. You're absolutely right that places with a "startup" mentality are where to be right now.

On that note :) I'm starting a forward-looking research lab at UofT to advance massive-scale (think petabytes) genetic analyses and am looking to find the right few individuals who have a similar vision. It's difficult to find passionate engineers with a solid CS and HPC background who are willing to meet halfway and work _together_ with biologists in getting the analysis right. Robert does this _very_ well, and that's why we recently co-wrote a landmark Nature paper: https://www.nature.com/articles/s41586-021-04332-2.

Job post: https://jobrxiv.org/job/university-of-toronto-27778-full-sta...


Usually, scientific oriented companies or organizations have little regard for software as a domain, craft, etc. It’s just a thing that gets in the way, despite being vital. It’s almost just a utility to them rather than a differentiator and active component of the advanced work going on.

For example, the Broad Institute is super interesting, but having applied there several times, they are esoteric, to say the least, in their hiring. They pay well below market, and their process is opaque and slow and sometimes downright non-communicative. They are also not really open to remote work, so you gotta move there and commute to the heart of Cambridge. Budgets are set by folks maybe a couple years out of a PhD program, who will also make technical decisions in terms of the software design (the latter an assumption given my experience in similar places).

These organizations are also pretty traditional in their selection of stacks. Good luck trying to use a functional-first language, aside from maybe Scala (usually lots of Java stacks), and be prepared to write lots of Python, the only language that exists to many scientists. I once saw a Python signature (function name and arguments) spill over 10-20 lines, in a file over 10,000 lines long. They had given up on another software stack because “it wasn’t working for them”.

This is all painting with broad strokes, of course. But I think scientific organizations that would embrace software as a major component of their technological and scientific development would do well. There’s a lot of opportunity.


> Good luck trying to use a functional-first language

Good luck trying to use a functional-first language at any company (be in bioinformatics or otherwise).


it happens :)

and the coming years will be interesting, rust is placing a lot of functional bits on the map, just like closures were an obscure thing 10 years ago, there might be a rise in abstraction in the mainstream


> I once saw a Python signature (function name and arguments) spill over 10-20 lines,

Quote I liked (can't find attribution; maybe Alan Perlis?):

"If your function has 10 arguments, you're missing some."


> Good luck trying to use a functional-first language, aside from maybe Scala

While they've moved away from it in the last few years, the Broad Institute had a huge investment in Scala. It's been in use there since at least 2010 and I believe longer. The primary software department was almost entirely Scala based for several years. That same department had pockets of Clojure as well.


Current Broad SWE with 5 years’ tenure. Feel free to ask any questions.

I’m in the “bunch of software people together” department so it’s not as insular or PI driven as working in a lab.

I still mostly like the role but it has become more generic over the years as the department acquiesced to the working ways & programming languages of outside private funders.


1. Could you share a bit about your stack? I'd be specially interested about the data engineering side of things if possible.

2. As a SWE, how deep into biology/genetics concepts have you had to go during your tenure?


1. I'm on the product engineering side rather than data, but from what I know, Scala is still heavily entrenched. New projects are all in Java; other languages are effectively disallowed.

2. I learned a fair amount in the 2017-2019 time frame but with the pandemic and increased specialization my pace has decreased. Lunchtime talks are just not as fun to attend on Zoom. Another possible explanation is that my curiosity has been satisfied, and someone with more curiosity could get more out of it.


I live next to Broad's offices and see people leaving/entering the office at odd hours on Saturday and Sunday. That (and the fact that they pay about 75% what I made as a new grad) prevented me from ever applying there.


Keep in mind that there are wetlabs with experiments being conducted in them. Lab techs will be coming and going at all hours.


Yeah, I’d love to work in scientific computing and write Elixir, but it seems non-existent.


I've worked on genomics, at the Broad. Can confirm there is a ton of toxicity. There's also a lot of smart and great people, if you can find a team specifically dedicated to software I would recommend it. Also the pay is good...for a non-profit.

More broadly...shortages like this aren't because SWEs just love ad-tracking and hate health improvements. People need to be willing to pay for these services (case in point; the jobs link for AWS has 3 links which appear unrelated to genomics, for Microsoft there are 2 for interns, and for Google its empty).

There are a lot of opportunities in biotech for SWEs, and many firms (though not all) really do respect the power of software. Worth looking around if you're interested in the area.


Hmm as an ex-Broad employee (and now in another genomics center), what did you find really toxic about the Broad?

FWIW, I really loved Broad the people, my direct line manager and co-workers. The management was horrible and the management at DSP (not the line folks/managers) were the worst.


Since my post was upvoted a tad... I'll give more feedback about the Broad.

When I joined, it was running really like an academic center. Like literally in my lab, if I wanted to go into the lab and pipet and do library prep, the wet lab scientist would teach me and vice versa. It was lit. a place where anybody could pivot their career to anything. We worked on NIAID/NIH grants and went to conferences even as SWE's and I felt we were doin' important research - not just pipeline monkeys but actually performed important analysis like RNASeq differential analysis, ChIPSeq peak calling, metagenomics etc. on publications along with PhD scientists even if I didn't have the academic credentials. Groups within the Broad was running courses for Software Engineers to learn Biology... and you could literally take off middle of day to go across the street to Stata Center to attend lectures on ML or audit Comp Bio at MIT. Nobody would bat an eye. 75% of my group got a Masters degree on the job where we spent more time some months on classes than actual work.

The culture somewhere shifted around 2018-2019... where they brought in new management to run DSP (Data Science Platform where most SWE's likely end up). The DSP management (not the chief guy) but lieutenants ran the "tech playbook"... get PM, Scrum/Agile coaches in; make software and comp biologists line workers.A lot of my fav. people either left or got pushed out. A lot of intuitional knowledge about sequencing and biology got lost, self-driven people left and line workers to work on Portal web development and Data pipeline management came in. To the point where I presented once to the software engineers of DSP and nobody in the room even knew the basic's like what is a long or short reads is. I left soon afterwards to a place where I wouldn't be silo'd. I wouldn't recommend the Broad to anybody now... unless you're working for an academic group. Avoid platform groups (DSP mostly; other platforms are still good) if you want to learn & grow.


This is a great read. Thanks for the information. What you originally described is basically my dream job: software engineers working alongside scientists and engineers, where the software engineers become domain knowledgeable if not experts in certain areas.

I had a job similar to that at a similar places (actually places), but I ended up leaving because I was a one man team and got burnt out. Writing software for scientific purposes and true R&D is very fun and interesting, and I think there is a lot of untapped potential for doing some interesting things there. But there is a balance between the wild west, then what your first described, and then what you later described. Keeping things organized enough to not be chaos but loose enough to not get siloed.


A really hard aspect to this is that there's a massive impedance mismatch between the research & production side of things. Working in the research side is pretty straightforward - although software development practices are going to be a lot looser & faster. Working in a production environment is straightforward, it's like any other software job. But - working at the confluence of those two states is incredibly difficult.


Fwiw I acknowledged the good people directly from the Cromwell team on a presentation recently due to the incredible support/help somebody & their team provided my team. The WDL/Cromwell community has grown and I've heard people mention it everywhere now (far away from the Broad) and it's in no small part due to that team and its former leadership.


Hey, that's my project! (And geoffjentry is my former boss.)

Nice to hear the praise, thank you. The project has changed a lot over time and inevitably left some disappointed people filing Github issues (CWL, non-cloud backends, etc).

It's really unique and enjoyable working on OSS that has a strong community, it is the highlight of my career.


Current DSPer since 2017 and broadly agree with OP.


Yea I liked almost everybody in DSP.

I disagree strongly however with the management's tech approach, the Broad cannot compete with the other companies in the Boston area in terms of comp. What worked in the past is smart people came to the Broad for the fun, autonomy and intellectual challenge over pay.

If you reduce people to just worker-bee's finishing CRUD tickets on an Agile board - you'll get efficiency for the 1st 2 years but you'll lose so much institutional/domain knowledge and collaboration; the PMs will get promoted in 1 year while projects on a 5 year timeframe get uber-delayed; the smart people will get bored and leave and even the good efficient engineers in the system will also leave for the better well paid tech jobs that Broad cannot compete with. I've seen it in every company who has run this playbook.


> respect of a lab assistant – unless you have a PhD and a postdoc.

This applies to even those with those with Biology backgrounds, as an undergrad that entered the Industry after an expensive and precarious 5 years of University and exiting during the aftermath of the financial crisis with tons of debt I knew I was never going to enjoy or like my time there within the first months.

I had aspirations to be CLS (you need to be sponsored by a corporation for the training/licensing process) but the truth is the Industry is rife with petty political rivalries where you can get sucked into for no other reason than being assigned to someone's lab that didn't cite them years back--you and your career can easily become collateral damage as result or some other bitter rivalry.

I found most in that Industry to be passive-aggressive cowards who would never confront an issue with anyone or anything and would rather create and foster this toxic atmosphere where it's typical that unless you did a PhD or a Post-doc you might as well be a mindless drone who carries out the edicts of your superiors who graduated in the 70s or 80s.

I will offer this advice: don't enter the Industry unless you get paid extraordinarily more to do so than any other offer you get, and if you love the life/health sciences (as I once did) please find some other outlet because the Industry will quickly steal any passion and leave you without much recourse.

Work in Genomics is promising, as is most Health Sciences in the 21st Century, but it is in DIRE need of a cultural shift (most boomer aged researchers need to die or retire already) and since the best ones are bio-hackers for a reason despite the lack of funding, there are other options albeit not lucrative ones.

> The culture somewhere shifted around 2018-2019

Your experience sounds like the brochure version of what we were sold as an undergrad in the Health Sciences, the ability to have on the job cross-discipline job experience, the reality was way more toxic, we didn't have agile or PM back then but we had Lab Directors and the thumb of corporate which in my view was way more hostile towards such an environment. Anything that deviated from your workload was seen as a unnecessary distraction and misuse of company resources.

I'm glad I made the pivot to tech when I did despite the turmoil to get there, but sadly now that I'm focused on AI/ML in order to come back to tech industry outside of my narrow displine, it's now imploding on itself with mass layoffs or hiring freezes and it seems that the recession will be used a reason to up-end the many reasons why tech was better than the health sciences, where apparently it's already becoming more normal for even a role as an intern for a YC backed company to require a Masters/PhD student!


Yea if there's a unsolicited take-away from me... don't ever become a lifer or think a company/field will never change. Nobody, even if it's GOOG circa 2010's, IBM circa 1970's, MSFT 1990's ever offer good combination of intellectual and comp forever. Only you and maybe your mother care about your well-being... I don't feel sad anymore about the good jobs/research opp. is now gone. I realize I was lucky and I aim to always to try adapt to get into those environments and accepting things are always changing.


"From my experience, what works incredibly well is a partnership between biologists and software engineers: the biologists first come up with the first concept of the tool, which is purely focused on ensuring good results. After this first iteration is completed, engineers then come in and rewrite the tool using modern engineering practices with things like speed and reliability in mind."

Like others have pointed out, this really makes the engineer's end of the bargain sound like janitorial work. There's no lack of fields where researchers and engineers both sit at the table from the beginning to pick which projects to pursue and how to implement them.


> this really makes the engineer's end of the bargain sound like janitorial work

I don't think you should interpret it that way. Another take would be that its like collaborating with a domain expert outside your specialization.

Important is that your potential impact as an engineer can grow as you become more knowledgeable in the relevant bio. Most of the scientists I've worked with were happy to teach background (and some were just exceptional, fun times if you also found the field interesting as I did!). Obviously some allowance must be made for differences in culture from org to org, and that likely accounts to some of the disappointed voices - but I'm not convinced this is endemic to the field as opposed to organization specific. Just like with an opportunity with any particular company, do your research.

Incidentally, working on a well defined engineering+optimization problem, if you are lucky enough to bump into one, is just candy for lots of engineering types. Ok quick & simple one: a scientist I worked with was doing some analysis that involved intersecting piles of genomic intervals with each other, which was taking many hours for a single run - super painful to tweak and re-execute. Our team showed them how to use interval-trees and made these available integrated in our internal tools, and the problem transformed into ~10 min execution runs. See, a wee a bit of comp-sci where suddenly you're the domain expert. And appropriately appreciated!


Yeah I think this is fair enough after reading it back. However, that was not exactly my intention here, and I think this is a case of me needing to be more careful in my wording.

When I said that software engineers add in the speed and reliability, I didn't mean they _only_ add in the speed and reliability: just that these two tenants of good software engineering where accounted for in this "correct" way of doing things (as opposed to the state of most genomics software that I described above).

However, I can see how my phrasing can give the wrong impression about the contributions an engineer makes when the biologist and engineer sit down to do create the real thing together. In a positive environment, both sides (biologists and software engineers) share enough information with one another that the either can make contributions to the scientific/software engineering domain.


software engineer provides/developes the appropriate level of abstraction for the non-software engineer to make use of.

Which if there's no standard for field, and working outside of a given field, makes writing grant(s) without paring up with someone who can develop field standards to be included in grant necessary. Hard to find/compete for scarce applicants using limited resources.

aka startups vs. big company funding for pure research lab (bell labs, xero parc, etc)


This rings all kinds of alarm bells and flies all the red flags.

How many of us have heard from some guy who has said, essentially, "I have an idea for the next <fill in the blank with whatever is hot>, and I've already sketched out a prototype. There's just the small matter of programming and we'll make $LARGE_SUM?

Imagine this happening in the business world. A partnership between SMEs and software engineers. Oh, we do this all the time, that's why software engineers get paid well: we turn ideas into working code. Anyone ever heard of a product manager "banging out a prototype" and then handing if off to the software engineers to rewrite?

The more I re-read the passage from the article, the worse it sounds.


I’m more familiar with chemistry, but a lot of times the scientist is the one who needs to make the first iteration to prove their idea. It’s often the case you really don’t understand the problem until you actually program/run the idea in at least a quick and dirty way.

But the role of the software engineer after that is invaluable in making that idea accessible and reproducible.


Janitors maintain, not create or optimize. It's literally an engineer's job to plan and realize a concept.


Call me when non-tech fields learn to treat engineers as equal partners rather than disposable labor.

Academia? Yeah they're going to be one of the last to realize, PIs don't want to cede any power in their little fiefdom. Very familiar with the dynamics there.


Scope of PI interest / funding defined by research grant.

So, unless research is ground breaking / exploritory across disciplines, supporting disciplines tend to be extremely limited by PI research interest.

aka (wording sanitized a bit) tends to come across as being tight wad / ham fisted


side note: Biological sciences tend to have way to many applicants for available positions. Typically not enough software engineers for available positions.

Treating a position with limited available applicants as if there were to many available applicants is always a receipt for issues.


> the biologists first come up with the first concept of the tool, which is purely focused on ensuring good results. After this first iteration is completed, engineers then come in and rewrite the tool using modern engineering practices with things like speed and reliability in mind.

I think that is already accepted as good practice, and the way most people in the field work, which is part of the reason why the field is in this shoddy state right now. Because in reality, most of the time those engineers don't exist and it will never advance to the second stage, but will still be used regardless. And even if you manage to find an engineer for your team, the same problem exists in many layers down your stack.

As with most other kinds of software, the biologists should be treated as customers (or trained up to be skilled-enough engineers), as it is done in other disciplines. To create good accounting software you also wouldn't propose to have the accountant write the initial version of the software, would you?

> Many of the projects that are critical to the foundation of genomics are reaching or have eclipsed the ten-year mark. How much longer can we expect these individuals to single-handedly maintain these code bases?

What you propose sounds more like "hey, be the next idiot that commits to maintaining critical software for nothing", rather than any systemic change. The ugly secret of bioinformatics is the same one as in broader tech: Most of it runs on the backs of unpaid OSS maintainers (in this case a handful of motivated PIs that carve out some of their time for that).

If you want to have good software in the sciences, you first have to solve the OSS funding problem.

PS: the `user-select: none;` on your page is really annoying


> As with most other kinds of software, the biologists should be treated as customers (or trained up to be skilled-enough engineers), as it is done in other disciplines. To create good accounting software you also wouldn't propose to have the accountant write the initial version of the software, would you?

Accounting is a bit different, because it has already been invented. There are standards and best practices for it. In bioinformatics, writing software is often a research activity. You write software to determine what the software should do, and then you adjust your ideas and rewrite it. The person writing the first version(s) of the software is a researcher – at least in practice if not by job title.


Accounting is not a static thing, and is also constantly changing with new legislation and financial instruments popping up. Most bioinformatics tasks nowadays are not any more "creative" in their research. Specifically in the last few years a good chunk of the research is just okayish application of ML research to their field of research.

For many specific problem sets in the natural science informatics disciplines, you can just stay up-to-date on ML trends and release a new paper that applies them every few years, in an almost automatable way.


There is a good chunk of research like that, but there is also a good chunk of research where the "biologist as a customer" model does not work. In research like that, it's the job of the person writing the software to figure out which biological problems they are trying to solve and how.


Anecdata, but I did a master’s in bioinformatics a few years ago, and as part of that I spent about a year in a human genomics lab.

I fully agree that the software used is really bad in general, but what is worse is the level of IT literacy among the PhD’s and post-docs from the biology side. (Also statistics, I guess a lot of p-hacking is the result of authors simply being clueless…)

After finishing my thesis, I was offered to stay and work at the lab. After thinking about it, and accepting, I was told that funding wasn’t secured yet, but that it should come ”any day now”…

Thanks, but no thanks.

Anyway, I fully see the need for professional software engineers in this field, but job security and even job availability (aside from the low salaries) in academia is abysmal, so I don’t think the current situation will change any time soon.


Honestly, I think people like you are the right kind of people to start private enterprises and bring along professional software engineers to help you build something incredible in the space.


There might me an opportunity for a company to enter this space, but then again maybe not.

When everything is a mess of ad hoc Perl, Python and R scripts to solve unique one-off problems, you might well find that there isn't a sufficiently common subset of functionality that people are prepared to pay money for. That is, while the need may be there, the business case may not be. It might be that most of the field are quite content with the status quo.

It's easier and cheaper to get some poorly-trained PhD students to wrangle badly-written and poorly-maintainable scripts than it is to pay a company to provide a robust and well-written solution instead. The "indentured labour" also distorts the supporting ecosystem. [I say this after having done a PhD in biomedical science.]

I remember one of my colleagues asking me to help him getting some special software from a particular group working [for DNA methylation analysis]. They wanted paying $10K for it IIRC. It was a complete mess, wouldn't work, had not documentation, and I didn't trust it was genuinely functional it was just such a state. For a one-off, maybe $10K was worth it, but if you only have 2-3 customers worldwide who will pay, it's not a viable business if the product works perfectly, let alone if it's a fragile disaster that barely works at all.


When considering software roles in science organizations, forget assumptions you might make about a typical tech job, joining a bunch of other software and hardware people -- or you'll risk accidentally ending up on the other side of a distorted status system (not the side that normally pampers techbros).

You need to feel out the particular person you'll be reporting to on how well they personally respect and understand the role, and also whether they'll have clout/funding and have your back if the org turns out to be rough (think AMZN). And also try to feel out respect within the organization, and some of the people/teams with whom you'll be collaborating.

You also need to check compensation, so you don't wind up a low-paid person who later discovers they're competing for local house offers with others in the org who are getting big-bucks TC (plus consulting on the side).

You also probably have to be OK with never being the star (like you hypothetically could someday be in a software company). Supporting actors should still get respect and get paid.

Find the right science situation, and you might have much more positive impact on the world than you could have in a software company, while also being happy and comfortable.

Some more quick of-the-cuff comments about this (sorry for run-ons, but I need to get back to my weekend)...

* RESPECT -- Whether or not the organization is university-affiliated, a lot of the researchers and administrators might have only worked in academia-like environments before. Academia is very hierarchical, software engineering might be considered commodity technician or support staff, and the high-status people almost certainly don't understand your discipline, though they might think they do. (They often think software is relatively easy grunt work, and that software people just have oversized egos, which has some truth to it, but not that much.)

(Some real-life instances of this I've heard of include: someone with no understanding overriding software engineering technical decisions, because a colleague from their academic caste made an offhand comment, and they assume an academic who hasn't even looked at the system knows more than an experienced practitioner developing it; not wanting to include people who made key software contributions as coauthor on a paper for a software system, but making sure professors who had near-zero involvement were included; scientists openly speaking of the software people as having commodity interchangeable skillsets, in way they'd never speak about peers in their domain; getting an unsalvageable monstrosity of pasted-together incompatible frameworks and Stack Overflow posts done by a summer intern, dumped on software engineer to "clean up" or "extend", and being unable to convince that this is orders of magnitude harder to fix than to just make a viable system in the same time the intern took; in an academic environment, a grad student being higher status than key software people, and bossing them around with bad decisions, while treating their own obligations like homework they were trying to sneak past a grader rather than as a system that has to actually work.)

* COMPENSATION -- Related to the above. If you're very experienced and marketable in tech, and would be making key enabling contributions, are you getting paid like it?

(The most recent life sciences software engineering opportunity I talked with, with a high-profile organization, they needed FAANG-like Staff/Principal experience in multiple areas, all-in-one person, for key bespoke computational infrastructure on which a lot was riding. When we got to salary, it was capped at less than a new grads were getting offered elsewhere, and despite being in a top HCOLA city. The recruiter half-heartedly argued about it being for the science, etc. I said, if they're thinking of this as an academic non-profit, that would be OK, so long as everyone there is making this level of money. But that wasn't the case: the science domain people were considered the valuable assets, making good money, and software was seen as more a commodity support skill by whomever set the pay grade. Maybe within a decade that will agree with the market, everyone will decide that someone who can learn organic chemistry should get paid more than someone who doesn't seem to do much more than fingerpaint in a Web framework builder and type nonsense in Jira, :) and maybe then most software people will be thankful for any job at all, but not yet.)

(I did actually look at a science company with a strong software tech company influence. But, though they claimed to be rethinking how the tech company did things, they seemed to carbon-copy the single most obvious bad side of that company. Talking with colleagues after I withdrew my application, the gossip was that they were getting lots of software people who'd burnt out on the tech company. So I guess maybe the rethinking was on what had been bothering those people, who were already at the tech company, and so who weren't entirely representative of the talent pool that included people for whom the tech company had showstoppers.)


your parenthetical paragraphs are bigger than your paragraph paragraphs


I work in this field at a large medical research institution. There is a significant amount of genomics analysis that occurs here on a day-to-day basis. The genomic processing pipeline work all falls directly into my group.

There is next to zero demand for tool development internally. I do it on the side of "normal" IT data management because I love high performance computing, algorithms, and multithreaded hackery. But even at my large, well-funded institution, there isn't a specific role where that is all that you do by design.

I do suck at marketing - meaning, despite having some success with big improvements in research tools that folks have definitely appreciated, no one comes to me asking for help with better engineering of genomic applications. Partly that is due to many researchers maybe only know R, so they will default to whatever packages are already available in Bioconductor, install those, and throw the resulting mash-up for their current research effort onto the compute cluster and simply wait for hours or days for the jobs to finish.

PIs are often insulated from software engineering problems too - if work is completed before the next bi-weekly meeting and update session, well, it must be ok.


Great post - which contains the answer to many of the questions raised in this thread. I am working in this area as well. There is "next to zero demand for tool development" because there are great open source tools. Only in rare cases (e.g. Illumina Dragen), a commercial software adds significant value that the audience is willing to pay for.


Ah, sounds so much like history of programming. At the 50's stage of straight up statistical manipulation.

DNA base units not viewed as base 4 binary number system that can be transformed into an abstract software language, where can select abstraction level of choice to use. Much like musical notation not viewed as numeric system.

Although, most software engineers don't view systems as numerical language development, too.


difference in view between qualatative & quantitative usage; NP vs. P type problem(s).


The author completely neglects the downsides:

- The compensation absolutely do not match the workload and education required. - The sheer number of disreputable PIs and their unrealistic goals for software. - The data is likely questionable and often underpowered. - Institutional politics. everywhere. - Marketing ("Curing Cancer"). The role is actually just juggling various bioinformatics file formats.


> just juggling various bioinformatics file formats

Your other points are spot on. This one I want to address specifically. The file formats. Academics love their incredibly over-engineered file formats. MARC. SGML. DICOM. HL7. RDF. Those are just the ones I know. Universally, they try to cover every corner case that anyone could ever imagine. Academics absolutely love their ontologies. Just implementing one of them is a nightmare. Going from one to another is an exercise in the philosophies of ontologies.


Actually I think genomics / bioinformatics is a counterpoint there. One of the things I like about the field is nearly every file format is under-engineered. It's TSV all the way down and if you need compression gzip it. If you need to index that, sort it (literally often with unix sort command) and block-gzip it. Anything more engineered arose specifically because the above failed and something more is actually needed.

The downside is it's a giant hellscape of unstructured, poorly specified formats where data types are barely specified at all or if they are most of the schema is published on some rambling blog post by some rando scientist. You will spend most of your time understanding it by empirical reverse engineering of the data that you are trying to deal with.


Oh, then eventually they'll get a committee together and after a few years they'll produce a unified file format that somehow manages to cover all the cases in the different existing formats (or at least the ones used by well-funded PIs) and is a hellscape of optional properties and required elements so poorly specified that it's impossible for any two implementations to communicate.


HL7 is't technically an academic file format, it's an industry standard interchange format for health data.

DICOM is for radiology.

RDF and SGML, well, they're from the same era as XML, so yeah.


> Just implementing one of them is a nightmare. Going from one to another is an exercise in the philosophies of ontologies.

Good thing there are lots of competing implementations! It would be a shame if these files were actually portable.


> The role is actually just juggling various bioinformatics file formats.

I need an advanced degree for that?


You do in academia. Otherwise you might as well be washing dirty labware for all the respect you get.


it is a meme that bioinformatics is just about converting different file formats but it's a shallow take


- Marketing ("Curing Cancer")

Nothing like putting that boilerplate pablum on research grant proposals. Either that or something about green energy. Some PIs just want to play with ligands, man.


PIs?


Principal investigator: https://en.wikipedia.org/wiki/Principal_investigator

The person that runs a research lab, which at a university is usually a (tenured) Professor.


Principal Investigators. Basically, the academics who get the grant funding and run their own research groups.


Related HN discussion (May 2022) on similar article:

https://news.ycombinator.com/item?id=31577376

https://www.nature.com/articles/d41586-022-01516-2

> "Fundamentally, RSEs build software to support scientific research. They generally don’t have research questions of their own — they develop the computer tools to help other people to do cool things."


This does not have to be true. You can certainly pursue interesting biology research questions informed by a software engineering POV.


20 years ago I got interested in "bioinformatics." I loved learning something about molecular biology, after all those years of hearing about DNA and not understanding it. And "Molecular Biology of the Cell" is, hands down, the greatest textbook ever written.

That said: a lot of the comments are spot on. You're working in a field where the hard scientists and business people rule and you're a helper. Maybe they're grateful for your help OR maybe they regard you as an overpaid lab assistant. After all, they have PhD's and postdocs, and you don't.

I've never actually worked in that field. I'd guess that it might be very satisfying, despite the low pay. Or not.


I have worked in a genomics lab after finishing a bioinformatics master's.

It was my first fulltime job, and by far the most chill. People were great. The PI was laid back, the whole lab went out for beers every now and then - and not because of a mandatory startup-style 'bonding' event. We genuinely enjoyed each others company and hung out outside of work. I never had that in any other job, which were/are all commercial operations.

The vibe and the power structure felt very different. More level. There werent any purely managerial roles, everyone was doing at least a bit of 'science'. And even junior ICs like me got to coach undergrads every now and then. Most of the operational budget comes from grants, on which you have to deliver. The pay is not amazing, so most employees really are in it for the science.

Or I was still young and naive and was lucky all of the two layers of management were all nice people.

Ultimately I left, as the grant money coudnt keep up with offers I was getting.

It is still the job I am most proud of. I love talking about it, and it really sucks that even a well funded lab cant really afford market engineering rates.


I'm in a role that is very similar (different field though). However, I know enough about academia to know that alot depends on the culture the PI fosters. Also, I spend a lot of time learning the field.


I wish more fields would just start adopting the product/engineer partnership that Software companies have perfected. Engineers are very good at what they do. Product people are very good at what they do. They need each other to build things. Sure, engineers might know enough about product to get by and product people might know enough about coding to get by, but the reason it works is because each one is an expert in what they do and are equal.

Its no different in finance, healthcare, genomics etc. I'd love to work in a setting where I'm paired with an SME product manager in a domain I have no clue about and they respect my work and I respect theirs and we are partners.

This is one of the biggest factors that made software/internet companies explode. They respected people who build software. They didn't need to. A bunch of MBAs could have easily just decided that the best way to run the company was to treat the people building the product as a cost center. Many did. I think that's probably one of the reason for the lack of innovation and down fall in many old tech companies like HP/IBM.

The ones that treated SWEs properly and valued them accordingly, did very well.


I have heard from a friend who's a doctor that in hospitals there's a very adversarial relationship between doctors and MBAs. The MBAs see the doctors as a cost center, and the doctors resent people without MDs being above them.

Your comment reminds me to be thankful that at many software companies engineering, product, and design do respect each other as equal partners. I totally agree that to do otherwise is business suicide.


to very opposing philosopies:

MD's -> patient interest comes first

MBA's -> company interest comes first


Having worked (as a consultant/contractor) for a few businesses in the field, I can say that my experience was closer to "grateful for your help" than to "an overpaid lab assistant". I even recall once, in a meeting, being referred to (by a senior staff scientist with a Ph.D.) as "the technical guy", causing me to wonder at how someone who does gene sequencing thinks of programming as being more technical.

But, YMMV.


> causing me to wonder at how someone who does gene sequencing thinks of programming as being more technical

Everything you don't understand looks complicated from the outside.


Molecular Biology of the Cell got me extremely excited about genetics and bioinformatics, highly, highly recommend this book to any software person I meet who is interested in biology.

As to the work environment, it seems to be extremely varied depending on the lab and team your on. I came from a number of years doing web development in marketing and finance before joining an R1 university research lab, and in many ways the day-to-day is quite similar in both fields. You are not the 'go-to' person for most things, but with that said, even as an individual contributor I feel my voice is heard on technical decisions where appropriate. As for pay, it's the biggest aspect that will make me leave at some point. If you do not have a PhD, or even a degree in my case, you can't expect to get paid a lot. As to the speculation on the satisfaction of the work, it is indeed deeply satisfying!

I got to have a conversation with one of the hero donors that gave a kidney biopsy after a life-saving transplant. It's hard to overstate just how impactful your work feels when talking to someone like that. Even as a small cog in the larger machine (our lab is around 50 strong with many people being at the top of their sub-fields), the end results of the effort will be massive improvements in individuals quality of life, this alone makes it quite easy to get out of bed in the morning.


Any particular edition of Molecular Biology of the Cell you’d recommend? I just looked up the 7th edition on Amazon (seems like the latest) and it’s $300 USD. Oof.


I've still got my 3rd edition copy (from 1999 when I was an undergrad molecular biologist). Most of the basic biochemistry and molecular biology will be exactly the same--it hasn't changed much if at all. While there have been lots of additional details added over the last two decades, the fundamentals are unchanged for the most part.

This wouldn't apply to other fields such as Immunology (Janeway's Immunobiology) where I have purchased multiple copies of the years due to the field changing so fast.


I'm on #3.

An awful lot has changed since 2000. RNA is now a Thing, where it was just a poor stepchild before. Protein folding, of course.

But yeah. The pictures are shining examples of what a scientific diagram can be.


Go on ebay and by the "international" edition


What level of chemistry do you need to know in order to benefit from reading the text?


> That said: a lot of the comments are spot on. You're working in a field where the hard scientists and business people rule and you're a helper

This definitely was the culture when I started working in the field 6 years ago. However, the culture has shifted (at least where I work) to where biologists and engineers are equal partners that work together on solving these problems. For those organizations that are not this way, I think they’re going to have to change if they want to innovate.


Agreed. Huge change over the last 10-15 years. My first job in the space had a view that obviously a mere software developer wouldn't be paid more than even a postdoc scientist. And as postdocs weren't paid all that well, you see where this is going.

These days more biotech companies are computationally/software focused. They understand that to pull in strong talent they're not operating in the same academic science world.


That may be the case for engineers with PhDs and scientific credentials, but I'm not so sure that is true of normal developers who did not play the academic game. I'm not going to take a job based on the eventuality of a culture shift, and I don't think you should either.

This isn't just genomics, by the way. Scientific computing folks are very similar.


That's always been my impression, but it does sound like "software eats the world" has had some effect. At least in some places.

Looking at it from their point of view: CS people tend to think that "everything is just information, and now that we're here you're all going to be working for us."

You can see why a PhD in mol bio would resent that. Everything is not just information.


I worked in the field. Leaving to work on ads immediately tripled my salary, and gave me more room to grow.

Everyone who says you're the hired help and treated about as well as a secretary that the organization dislikes is dead on. At best, you're viewed as an overpaid cost center.

Which is sad, because I'd love to work in these areas... but I'm not giving up 66% to 75% of my income to as charity to private corporations.


>Molecular Biology of the Cell

Tangential, but what are the chemistry prereqs to grasp this book?


MBC is readable by someone with an undergraduate background in science. You'd probably want basic knowledge of biology, general chemistry and organic chemistry.

It's essentially an upper-level undergraduate textbook.


Fwiw, I studied humanities, do a lot of pop science reading in my spare time, and I'm able to appreciate it. There are more detailed and technical sections I skim or skip, but the overviews are fantastic. Incredible description of, for example, the sheer wonder we should all experience at the fact that all life starts as a single cell.


The chemistry requirements are minimal. You should understand the difference between ionic and covalent bonds, how van der Waals forces work, hydrophobicity, solubility, and the effects of catalysts on reaction transition states. It will also be important to understand what reaction kinetics are and what pH means. An understanding of buffers might be useful.

I would argue that, to understand the book, you specifically don't need to know electrochemistry, organic chemistry, analytical chemistry, organometallics, spectroscopy, or even physical chemistry.


Probably just a college level gen chem class. Pretty accessible, albeit technical, textbook from what I remember of reading it for a course a few years ago.


I did get a book or two out of the library, plus I had Chem 101 in college, but really, not very much.


Is it feasible to do any meaningful work in this field without joining a team? (e.g. as a solo hobbyist/entrepreneur)


If you have a software background and can get some basic domain knowledge, there's lots of open source projects that could use your contribution.

Doing fundamental reseach is a taller order. But lots of software, tools, pipelines etc need maintainers, optimizations...


Which projects? That seems like a good place to start.


I contribute to Nextflow core (https://nf-co.re/) It's more of a collection of pipelines than traditional software, but there are users all around the world and a good community.

Most of the packages on bioconda (https://bioconda.github.io/) are open source. But you probably want to find a sub-field that interests you most before finding a project.

In grad school, we also had an ex-google software engineer volunteer with us one day a week. It was very impactful for many members of the lab to learn good engineering practices, and it wasn't at all like the sentiment others in this thread are expressing where engineers were "janitors".


https://github.com/scverse But this is mostly about transriptomics (RNA), not genomics.


Difficult but possible. For example, Robert Edgar [1] works alone and is one of the most productive developers in this field.

[1] http://drive5.com


I worked with Bob some ~20 years ago at Berkeley. he showed up one day to check out the seminars and see if he could "help out" after having sold his database company to Intel. he said he'd been trained as a physics guy in the 80s but there were no real jobs so he started a software company instead. He joined my advisor's group (it helped a lot, because at the time most journals wouldn't publish a paper submitted from a home address).

He proceeded to completely understand hidden markov models and protein sequence alignment and was immediately hacking improvements to HMMER. However, Sean Eddy couldn't understand his optimizations (Sean has to know how HMMER works at all times) and so Bob went off and made his own tools like MUSCLE.

One of the reasons he can do this is, well, he's a programmer/math genius, and the other reason is that HMMs and protein alignments are a fairly well understood and programmable thing these days.

Still blows me away we train up all these people to be scientists when there are no jobs for them in that role.


I don't work in this space anymore, but just want to say kseq (and the rest of klib) is such an awesome time saver. Thank you.


I’ve had a growing interest in the power of DNA and what the data can be used for since discovering no less than 3 family secrets (one of which pertaining to me) after taking an Ancestry DNA test. Did I know I was going to find 18 half siblings the moment my results came in? Nope, but yet there they are, listed in order of most shared DNA.

Despite my interest, I’ve found that landing a job in this field at my desired compensation level is very difficult especially if you not have the ”correct” academic background. Who does a double degree for computer science and forensic genealogy? I’m sure some people but for $75k/yr you’d think the companies need to at least adjust their expectations.


Yeah, I agree, I looked into this before, and the pay doesn’t come close to other swe jobs, as far as I have seen whenever I look. It is usually like 2x less, it’s hard to want to choose that just to work on something a bit more interesting. I even have a background in bioinformatics, but I never found anything that compensates it as much as pure swe roles.


Yeah the low pay is one thing, but in my experience a lot of the academic jobs seem to want a domain scientist who can do programming, not the other way around.


Not always true - but finding a very good programmer who knows the domain well enough to make a significant impact is challenging.


I've been a software engineer in this space. I just want to say that there is exactly 1 job (non-intern) job between Microsoft, Google and Amazon listed according to the search links provided in the article.


What is the significance of that observation?


> Often, it's not required to know the domain before you join a group, and they will teach you on the job.

I looked. There are zero full-time, remote roles that don't require previous genomics experience at any of the companies listed.


I know there are at least a few, because positions on my team offer remote and don't require and previous genomics experience.


Perhaps you and the GP could compare notes about where they searched and where you advertise.


https://talent.stjude.org/careers/jobs?keywords=software%20e... , i.e., taking your search and entering my zip code, sShows only 3 positions in St. Jude, none of which mention the word genomics at all.


Science programming jobs suck. You get all the bad parts of academia, including less money, plus you're seen as a janitor rather than an engineer, and you get to deal with scientists all day.

Tooling roles in SWE in every other field are highly regarded. Why not here?


Because it's not what sells. It's literally a tool, and if you don't deliver the level of perfection they're used to get from sequencing, NMR or assay testing machines, you're the PITA. You really have to bring something very interesting to the table to earn some status, and software engineering just doesn't. It's too far from the core business. Think of the attitude SWEs have towards sales people...


The thing is many will pay big bucks to contractors/consultants/IT services/LIMS systems, but if you’re an employee, nope.

They have a hard time having someone with a BS or MS making 50-75k more than a freshly-minted PhD.

I just left a job in pharma because I cannot do it anymore (salary being a big one, but my experiences reflect many in this thread).

They spent 500k on a consulting company to build a few NGS processing pipelines. This was built using a framework I was unfamiliar with. I re-factored one of them and was able to increase runtime by 60% in a couple weeks. I was paid in the low 100’s.

They would rather contract out the high-paid work and pay orders of magnitude more for it.


it depends. I've been in science support roles where people are genuinely grateful for the help, and its _really_ interesting to get to peek in on people's research.

it depends on the role. it worked out really well for me when I got to drop in and do piece work on lots of projects in different fields. working on a larger software development project can be really painful and demoralizing because the people running it don't really understand how the sausage gets made.


Well, math / computational power for simple, static protein modeling is horendus.


Look, I did this at multiple places for a number of years. The issue is that you often form an adversarial relationship with the scientists. They don't really want you there. They are perfectly happy just organizing everything by hand with post its and excel spreadsheets. They do not want you to mess with their flow with your software, even if it would help them to be more efficient.


Can you elaborate with some anecdotes? Why is their current workflow wrong? Why would an organization hire someone to build software if they can achieve goals with spreadsheets?


The fact they renamed human genes because they were importing it in Excel in a way it thought they were dates and was changing them says a lot.

Can you really trust the scientific results if they depend on software made by people who don't care about code quality?


>renamed human genes because they were importing it in Excel in a way it thought they were dates

Just at your shop, or the field in general?



The whole field! This was actually a thing(!)


Why indeed.


Even large corporations in this space pay relatively little for software engineers, and treat them with little importance.

I also experienced "software engineers" who had no idea what they were doing being given more credence because they had a PhD in some bio-related field. Oh, you got a PhD in some molecular aspect of some tiny piece of biology, and that makes you qualified to build big data systems? It did not. Apparently what that gives you is an adherence to reading decades old textbooks about database design. It was like working with a first year software engineering undergrad from twenty years ago.

To be fair, it looks like the same can be said for machine learning. Many software engineers I know are in the "machine learning space", but report that they are just operations support for data scientists, and don't actually get to learn about, let alone be involved with creating, the models they support.

If you are a software engineer, work in a software company, where engineering is the value proposition.


Google already axed all job offers, Microsoft and AWS are searching student interns...

I used to work in genomics and computational biology. It was incredibly interesting. But it's university research and gets paid as such. 2-year time-limited contracts, lots of interns and students, extremely low salaries.


The AWS jobs aren’t even related to genomics. They just have genomics in the description of types of workloads performed by customers of AWS. The jobs are hard core CS automated reasoning jobs.


Shameless self-promotion incoming.

I'm interested in contributing to this field. I have significant experience in 3D graphics, game engines, compilers and language runtimes. I'm a competent low-level engineer.

There's a lot of red-flags in this thread about adverse working conditions, but I'm running under the assumption there are a handful of companies out there that work with a software-minded approach.. ie. respect SWEs for who they are and what they do. If you represent one such company, and are looking for engineers who have a keen eye for performance and architecture, I'd love to hear from you.

jesse@scallywag.software

https://scallywag.software/resume.html

EDIT: Largely interested in remote roles, but could relocate for the right offer.


Sorry, no. This was my dream area to work in and I obsessed over degree programs in bioinformatics many years ago. Then I realized it’s incredibly low paying for the work, finding work in the area was a chore, and a masters might not even get your foot in the door. Nothing communicated that you would be valued. The harsh reality of the world won out in the end.


I hope this FAANG downturn will push software engineers to new industries, and bring some cross-pollination.

What happens when the world's most brillant minds do something else than making us click on more ads ?


If academics embraced software and software developers as heartily as advertisers did, you'd see that result. Until they do, I expect you'll continue to see a bunch of skepticism from developers.


Exactly my thoughts, and what I hope this post makes some consider.


Can you provide a list of the top problems in that space? Much rather try to understand them deeply myself and build a company solving them than just getting a job.


This please. I would love to start working on (or create from scratch) some software that helps people in that field.


Creating pipelines is still a problem. Typically one needs to call a bunch of other tools in order to get to the final result. There could be map/reduce behavior in the middle where chunks of data are processed in parallel in order to gain speed. And you need some kind of data management/tracking as well (putting samples in groups, ingesting raw data, exporting results). And sane monitoring especially if something breaks/fails.

There are probably 100s of tools written for this but no clear winner so far. The traditional software engineering approaches like git, ci/cd seem too heavyweight (or rather too low-level) especially during development. IMHO there could be space for a fully remote/cloud solution where one would code/debug/deploy from the browser optimized for writing/maintaining pipelines.


I also found the quality & proliferation of data pipeline tools to be baffling. Somehow always more painful to put these together than it seemed like it ought to be.

At one point we wrote an internal tool (I think lots of organizations do this, since all the 100s of existing tools somehow don't fit, so you invent #101) and while it was tremendously satisfying getting batch jobs with 1000's of cpu's churning away, that kind of data infrastructure needs to be standardized. I think some companies are doing this, e.g. saw a presentation about Arvados/Curii that seemed interesting (but haven't used it so not sure). Maybe CWL will turn out to be the way forward here?


Protein structure prediction was a huge deal, which is why AlphaFold received so much fanfare. It is actually pretty good. The next step is to predict where multi-protein complexes would interact- which is not just as simple as predicting the structure of two proteins independently and then trying to fit them together like a puzzle, because the the interactions can also change the structure. While it's not as hard as it used to be to experimentally determine protein targets of, for example, a protein kinase, it's still not an arbitrary or cheap experiment, and to do that for the many thousands of such proteins, across different conditions (stress, presence of co-factors, etc) and in different organisms would be rather a lot of work. Something like alphafold that makes reasonable predictions and can be used to help you focus on what's most likely to be relevant to your disease or process of interest helps quite a bit.

There's also more need for integrating "multi-omics" data, where you have data from multiple assays (gene expression, phospho-proteomics, lipidomics, epigenetics, small RNA expression, etc etc) with the goal of somehow combining all these different assay results from various levels of gene regulation, to get closer to figuring out actual mechanism for complex processes. Building on that, we can also do single-cell multi-omics to some extent- where you have results from different sequencing-based assays on the level of the same individual cell. This is still pretty limited, but it's exciting and advancing pretty quickly. This will eventually be combined with things like spatial transcriptomics, which is useful for mapping out what's going on in heterogeneous tissue samples like tumors, for example, so we'll end up with spatial single-cell multi-omics, at which point you're looking at 1) some quantitative trait for multiple genes/loci/molecules, and often 10k+ of such features at the same time per assay, 2) multiple assays, such as DNA accessibility and gene expression, in 3) single-cells, of which you might have 10k of in a single sample, 4) across a physical tissue sample where individual cells are spatially mapped, and where you probably want to figure out how cells might influence the state of those around them, and 5) in multiple different samples, where you might want to compare disease vs control, or look for correlation to heterogeneity of results within one group.

There's a lot of public data already available for single-cell gene expression projects if you want to get a feel for how these things are structured and how (passable but not amazing) the existing tooling is- one of the main repositories for this data is the NCBI's SRA https://www.ncbi.nlm.nih.gov/sra but you'll quickly note that searching and browsing is not as easy as you might think it would be- because one of the main limiting factors in bioinformatics is how bad everyone is at keeping terminology consistent. For many bioinformaticians, a majority of time is spent in the data cleaning phase. It's awful. Sometimes the experimental parameters make it into SRA or GEO, but sometimes you have to read through the associated paper to pull that out. Often it's only large consortium projects like the The Cancer Genome Atlas (TCGA) or the Genotype-Tissue Expression project (GTEx) - which have enough funding for staff dedicated to data management- end up publishing datasets that are easy to "consume" without having to jump through a whole bunch of hurdles to figure out how the data was produced.

I have a BS/MS in bioinformatics and I'm presently a PhD candidate in genetics and computational biology defending in February.


So if I understood you correctly then further lowering the cost of experimentally determining protein targets could be a viable way forward that is completely orthogonal to computational methods?


I'd like to hear about this too!


I am a career SW engineer that has worked on genomics in a startup. The field is genuinely exciting.

The endemic disease of the field is the leadership. A leadership made out of Principal Investigators forged in academia, appear simply incapable of producing any item which is not articles (or equivalents thereof).


Do you think that's true of pharmaceuticals/biotechs as well? Or just academia?


Decades ago my very very bright HCI prof commissioned a psyc study for a database we were building for some biologists next door, you know so we could better address their needs in ways that would be useful to them. Details are pretty fuzzy anymore but they proved correct many times over.

Things the study said would not work never worked i.e. biologists wanted "temporarily" private data, say until till published as psyc predicted they would never freely share it.

but the biggest thing I will try to paraphrase:

Biology is an observational, the work is in interpreting which lends to group dynamics and politics, leaders ect.

Which is at odds with Math/CS which is constructive where if something can be proved then that is that.

So when a CS person states a fact from their perspective a biologist might see it as just another opinion subject to hierarchical ranking.

So I would argue it is a function the individuals proclivities and correlated training in the cultural environment they end up in.

So a healthy work environment could value both fact and opinion where each has a complementary role whether academic or industry.

But as a longtime academic, I am now sadly looking towards industry.


The people staffing senior leadership in pharma/biotech are typically either former PIs from the academia, or people who could have been PIs but chose to go straight for the industry.

They have more cash to play with, but their leadership fails in the same pattern.


My first job offer out of college for compsci was for a genomics research company that desperately needed software engineers. At the time they were storing sequences as ATGC strings in an oracle database using perl scripts. It was really below even undergraduate-level basic stuff.

The offer was $38k a year. About two days later, I got my second offer, $50k from a game company, and then a real offer, $60k, which I took. This was in the late 1990s.

That was 20+ years ago, of course, but I sort of wonder if things have changed. I frankly think a lot of SWE work for fundamentally evil, socially destructive companies, and I honestly don't think you have to to earn a good living, but you also don't have to work for companies that deliriously underpay you.


Genomics is still predominantly a research field. In research, software development and hence software engineers are not valued much, because technologies change rapidly, new ideas come every day, so it is about being able to hack together a workable solution enough to write a paper or get funding.

Software development becomes important when certain data processing methods have been standardised, eg mapping sequencing data to mouse or human genome, differential expression analysis, pca visualisations.


This is very true and I loved working with bioinformaticians but the pay is so much lower than a normal SWE role which is why SWEs will pick tech over genomics companies.


Not quite a decade ago, I took some work for a lab to replace some aging software (circa 1990) used to do peptide synthesis.

It was an enlightening experience. While I was the programming expert with a CS degree, I wasn't trusted for anything, because I wasn't a PhD or had a background in bioinformatics. However, I did get to work with lots of smart people, fixed and improved the code and processes that the Phd level statisticians and bioinformaticians used.

It is a real joy to work in hard science, with brilliant people who love their work. I learned a ton and gained a healthy respect for the people that do this kind of work.

However, the downsides are pretty bad. Pay and compensation is awful. Most people, myself included, could have made as good if not better pay waiting tables. There end up being different levels of people Administrators, Private investigators, and lab workers (peons). Unless you are an admin or a high level PI you're not gonna be getting much money.

Everybody lives and dies by the grant. If funding dries up, you will be out of a job.

Ethics. Us CS people are woefully under educated on ethics. You will find yourself asking why we can simply do something, often the answer will be ethics.

Regulations, like ethics, you will have to bend to regulations. It's not a bad thing, just a different thing.

Unless you find yourself in a admin role, you will just be another lab peon. Its not a a bad place to be, but you will never be at the top of the totempole.

Loads and loads of ego. You will work with very smart and sometimes unreasonable people. Learning to navigate this with tact is important.


I don't have any funding to hire right now, but I'm always happy to chat about the industry and my experience building Hail (https://hail.is, https://github.com/hail-is/hail), a tool widely used by folks with large collections of human sequences.

The other posters are not wrong about compensation. Total compensation is off by a factor of two to three.

However, it is absolutely possible to work with a group of top-notch engineers on serious distributed systems & compilers in service of an excellent scientific-user experience. I know because I do. We are lucky to have a PI who respects and hires a diversity of expertise within his lab.

I enjoy being deeply embedded with our users. I do not have to guess what they need or want because I help them do it every day.

I also enjoy enmeshing engineering with statistics, mathematics, and biology. Work is more interesting when so many disciplines conspire towards the end of improved human health.


Yes, Genomics may be important, but are there really that many jobs for software developers? (same could be said for many other important fields - I recently saw an article about how software engineers should move to green energy - but who is going to pay them?)


There are a lot more positions for people with advanced mathematics and/or science backgrounds with strong programming chops than there are typical software positions. But they do exist.


I have a BS in neurobiology but have been working in software for 20 years. I'd always wanted to get into a more biology-focused software after interning at NIDA (NIH) and saw how bad the software support was. I spent most of internship developing software to make it easier to digitize the dozens of giant drawers full of index cards where they recorded all their raw data.

The problem is that the organizations involved in this sort of work often still consider software development as a cost center and therefore do not offer competitive salaries.


This field does not _need_ software engineers.

This field needs marketing, product and project managers (for-profit or non-profit variety) that could figure out:

1. what product to build to have the biggest impact

2. how to build it.

Once 1. and 2. is clear it will be equally clear that if you have a bunch of scientists you won't get a great product, as nobody will build the product, everyone will build a prototype.

So then it will follow that the project needs to hire (=attract) software engineers to be in charge of software, and attracting software engineers means giving them competitive compensation.


Would love to but I don’t think academia will want masters at least and years of industry experience will be discarded completely. I have 6 years experience in data intensive IoT applications and yet that would not be considered useful by academia


The bio & pharma & medical fields value academic credentials very, very highly. Too highly.

That's their whole life: "where did you do your PhD? Who did you do your postdoc under?"

Many world-class hackers would do pretty poorly on those questions.


I left academia for that reason; there was no advancement path that didn’t involve more advanced degrees, and that wasn’t something I was interested in at the time.


The code is bad because transient Phds and Post-docs are writing it. If there was money in it then the best software developer would already be working on it. Sadly there is none.


yep ....

One of the borderline fraudulent aspects of the field is the pretense that method publications are real software.

That is, you come up with a break through statistical or algorithmic method, you get it to run exactly once based on whatever random walk of exploratory code got you to a result that looks better than competing/prior methods, and then you dump your workspace into a script and put it on Github and pretend this is something anybody else could or should responsibly use in your Tier 1 publication. The minute the publication is approved there is zero benefit to the authors in maintaining the software, and in fact its better if nobody can run it because that way they can't disprove your results. Then naturally nobody can get this to work afterwards and 50% of software engineering time and effort is trying to run code that can/never will work outside the context it was created in - but you have to try because this is now the accepted best practice method of doing X or Y based on its publication.

The bigger problem is that this whole cycle actually shapes the view of software engineering by academics to the point where they really do think that most software engineering is a waste of time. A small number of 10x engineers manage to prosper in the environment, but it's mainly because they have the sheer technical capability to deal with ALL of that while still doing something useful, and it actually makes the problem worse because the academics then see that as the baseline for software engineering capability.


Yes just to second this -- every time I wrote decent code for my bioinformatics software I regret it because my PI does not really care.

Sometimes I really don't understand. Much of the field's code does not even have testing, and it is baffling for me to think how the results are believed to be correct in the first place if there is no rigorous testing.


What is the opportunity here -- writing new algorithms, implementing them accurately, optimizing them for special execution architectures, or just building more usable tools?

I remember Manolis Kellis sprinkled some pretty interesting genomic questions into his Algorithm class's problem sets. There were a number of cool problems about optimally aligning strings, searching within text, etc.

This was like 15 years ago and I haven't kept up with the discipline at all. But is there still algorithmic low hanging fruit?

I do keep reading about an ongoing series of problems with Microsoft Excel distorting analysis in the scientific literature (https://www.nature.com/articles/d41586-021-02211-4) and wondering if the tooling is having trouble..?


> But is there still algorithmic low hanging fruit?

Algorithmic bioinformatics has become a separate research field, because there are so many low-hanging fruit. Biotech companies create new instruments producing new kinds of data, researchers find new uses for the data, and new algorithmic problems emerge all the time. There is also a steady migration of people from theoretical computer science to bioinformatics, because it's often easier to get research funding for something bioinformatics-related than for pure CS.


> But is there still algorithmic low hanging fruit?

I would say no unless looking at the frontiers of what is done in the wet lab which might require new analytical tools. But this stuff is probably much easier for and much better aligned with someone doing CS in academia.

My impression that there is quite some space for ML-based approaches including DL. But even there I would not call it low-hanging.


We're only starting to see the age of genomics accelerated by GPUs. I think it's still early if you have the technical background.


Edico developed FPGA-based processing solution for common bioinformatics processing tasks (e.g. dna/rna mapping, variant calling) and the company was bought by Illumina.

The product (Dragen) has been around for a few years and now will be integrated in the new generation of sequencers. Extremely impressive technology and a better fit for the niche compared to GPU-based solutions I have seen. More downstream processing and analytics is sometimes closer to traditional ML and naturally there are lots of GPU-based algos.


I'm more excited about NVIDIA's acquisition of Parabricks and the version 4.0 of the software that makes it free to use, than I am about DRAGEN. At the very least it's good to have some competition in the space, Illumina's stuff is always SO expensive. We'll have to see what hardware will win in the end.


I have tried both and dragen was more polished and also faster (that depends on the GPU for parabricks of course). Also more features and they keep adding them.

Agreed that competition is good to have. There is also Sentieon and similar solutions which run on common hardware but are optimized.

Speaking of costs (both upfront and licensing), dragen imo is not expensive relative to sequencing costs (e.g. sequencers and flowcells). Surely it would be expensive to buy for occasional use.

By making parabricks free to use Nvidia tries to gain a market share I guess. In professional settings you still end up buying support and likely dedicated hardware (which is comparable in pricing). Good fit for the cloud and research environments that already have access to GPUs and/or are decoupled from actual sequencing.


Is there any open source projects on genomics that I can start looking into as a hobby rather than jumping right into a full time position in this field?


Serratus (https://github.com/ababaian/serratus) is an OSS bioinformatics project created by a passionate group of volunteers. Short story is we're re-analyzing all of the world's DNA/RNA sequencing data to find new viruses that other people have missed. It works surprisingly well, but there's a ton left to do.


> There is a significant gap between how software is currently developed in this space versus how it should be developed. The vast majority of genomics-related software is not written with speed or reliability in mind.

True, but working in academia is very VERY different working in a tech/product company.


> This state of affairs makes it difficult for anyone other than the original author to contribute to these code bases, further cementing the one-maintainer policy.

Who wants to fix other peoples code mess? This is a no-no if you want to promote a job opening.


I do. It's my bread-and-butter. I call myself a code janitor. I live by books like "Working Effectively With Legacy Code" and "Kill it With Fire". But I have my limits. Academic code has.. coded, in the medical sense, and it can't be revived. Put a DNR on it.


I've seen this also in several software systems that started life in a CS grad department. (Not all the same university.)

The original authors' quirks get enshrined in the code base, and its neigh impossible to fix until they leave the company that commercialized it.


sorta like the original calculus thesis.


I have a degree in biochemistry. Would love to combine my passion in software and biology, but academic research is often funded by governments which means the salary is (super) low.

It's the same reason why there's a lack of qualified computer science teachers in schools.


Quick plug here for Atomic AI ( https://atomic.ai/ , https://boards.greenhouse.io/atomai ), which could be added to the list. We value and respect (and pay) our engineers—I myself trained as a SWE and worked at FAANG.

Shoot me a message at raphael@atomic.ai if you want to learn more.


(chiming in here as a founding engineer at Atomic)

So I spent more than 8 years as a SWE at Google, and now work here with both experimental biologists and machine learning scientists. And yes, a lot of the concerns mentioned in this thread are also things I have had anxiety about.

Most obvious to me, being a software engineer at Google felt like being the center of the universe. Coming here, the focus is the scientific research. And yes, the scientists all managed to complete their PhDs so they don't necessarily need me to unblock them every second of their day. But contrary to my expectations, this has been remarkably freeing. I think one particularly important part of our company that makes this work is that, even on the science side, we're multidisciplinary (at a high level, emphasizing both experimental biology and ML.) And so engineering feeling like another arm of that multi-discipline nature is fairly... natural.

The reason I feel it's freeing, and the reason I enjoy working here, is also the greatest challenge. Because the scientists are focused on the science, because they respect me and trust me to figure it out, and because they aren't constantly blocked by me, my job is mostly about dreaming extremely expansively about what I can do to reduce toil and make the scientists more productive. Of course they have feedback and input, but how I use my time and what I build is ultimately my decision because I am the engineer. And I have been able to do some things I am very proud of, like rolling out Bazel and Kubernetes and finding ways to seamlessly bring them into the cloud (we're even multi-cloud now without them even noticing!) On the other hand, it's very challenging because when you work on a product, say Google Photos, as a SWE, you always have some direct tether to the product ("what should we build next? ahhhh, well I guess we could just embed stable difficusion and a million people would immediately play with it".) At Atomic, my tether is very ambiguous. If I do my job successfully, they'll be able to do research more quickly (? effectively?), and eventually we'll be able to produce a therapeutic that hopefully changes the world. Identifying what I can do today to speed up that far outcome in the future is very challenging, but it is a far more interesting challenge than gluing some pre-existing software into my UI or running A/B tests to turn a red button blue.

If, like me, you enjoy being given ownership over incredibly ambiguous problems, please do reach out!

This role focuses on directly partnering with the biologists: https://boards.greenhouse.io/atomai/jobs/4726839004

This role is expansive cloud infra: https://boards.greenhouse.io/atomai/jobs/4531035004

And this role is directly partnering with the ML scientists: https://boards.greenhouse.io/atomai/jobs/4191285004


Would love to, both out of interest and out of a belief that it might one day improve the world. But it's not happening. I have 20+ years of experience as a software engineer, but I don't have a degree, so anything that has even a whiff of academia rejects me outright. Not to mention that it would involve a big paycut over fintech.


I work at at a SynBio company and heartily second this. If you're looking for interesting work where you can make an impact, it's an incredible field to be in.

I'm a nerd about everything - I love learning, and this field is incredible for it. The complexity and depth of biological systems dwarfs what we're doing in the software industry. I work with brilliant people doing absolutely fascinating work, and I get to learn more every day. At the same time, I get to build things that make a genuine contribution to the people I'm working with - I can see the value and impact of my skillset in a way that was a lot harder when I was working at a software company. The leverage that good software folks can provide to folks outside the industry is almost impossible to overstate - our ability to scale up what the practitioners in the field are doing can offer an almost category change in what they can attempt.

At the same time, there's still really, really knotty software problems to be had - computer science has benefited quite a lot from our ability to segment and structure our problems, but biology doesn't allow for that - everything that we're working with is operating at every scale, from molecular interactions up through genomics into protein design and folding and into metabolic modelling. Add to that that the data structures you're dealing with can vary from a few characters up to a couple megabytes (within the same represented "object"), distant elements within the same object can interact meaningfully, the objects themselves tend to be embedded in larger structures with which they meaningfully interact, and you've got some fiendishly complex problems.

And at the end of all that, you've got a field which offers a legitimate possibility of helping us move past petrochemicals; an enormous expansion in the kinds, potency, and specificity of healthcare; and a new and novel set of tools for shaping our world. It's an incredibly exciting place to be, and I've found people are genuinely thrilled to have good software folks along.


Those who are interested in the Broad Institute can reach out directly to me at mdelamaz@broadinstitute.org


Do you do fully remote positions, from non-US timezones (eg. Western Europe)?


[Also at Broad] Must be a US resident except for exceptional cases (e.g. world-renowned scientist).


Thanks for posting this, and your learngenomics.dev resource looks great - I'm looking forward to reading this though. I recently started working as an engineering manager/lead in a genomics startup (https://www.genpax.co), and I've been picking this up as I go. I've also started working my way through the 'Micro binfie' podcast, which is great.

Our company values software quality and we're very product focussed. We're actively hiring in London: https://news.ycombinator.com/item?id=33423547


I did do this, there were a lot of great people on my team but it paid (a lot) less and is more stressful than just building another CRUD app.


I worked for a while at a consultancy supporting genomics through LIMS (lab info management software) customization, so not really genomics, but in the genomics biz (big genomics companies were our clients). For me, it was the least interesting software work I have done in my 20 year coding career. On the other hand, for people who just wanted a steady pay cheque and to go home at 5pm, it was a good gig. But man, software that moves samples and test tubes and their data around, it could be cars in a parking lot for all that the science makes it interesting.

We had bad attrition to both more interesting and higher paying work. (I left for both after a year at the consultancy)


Most of the discussion here seems to assume bioinformatics / genomics jobs are academic, but I work for a clinical testing lab where production-quality code is a must. We're probably a 10/12 on the Joel test.

If you're into bioinformatics or genomics, but aren't excited about an academic setting, take a peek: https://recruiting2.ultipro.com/ARU1000ARUP/JobBoard/62cc791...

We hire fully remote positions and starting salaries are about US$100k.


As someone who puts tremendous value in technical mentorship when considering a role this is about the worst possible advertisement for being a swe in genomics as it amounts to "all our code is awful- come fix it!"


It may be worth pointing out that several of the leaders in the Genomics field started off in commercial software development. I agree that it does not make monetary career sense to move into genomics -- academic labs cannot pay you more than the lab head makes, which is probably much less than many software developers are worth in other markets.

But I've known several financially successful developers who have gone back for a PhD in bioinformatics and genomics, and, after getting over their distaste for existing tools, have made important and well-recognized contributions. But they did not make more money.


I wonder how popular open source is in genomics, there does seem to be a lot of open source genomics/med/science related software.

https://wiki.debian.org/DebianGenomics https://blends.debian.org/med/tasks/ https://blends.debian.org/science/tasks/


Somewhat self-interested plug here: consider working in metabolomics as well. Metabolomics is where sequencing was in ~2008. The physics and chemistry are pretty well worked out (though many improvements are surely coming in the same way that 454 gave way to Illumina, PacBio, Nanopore, etc.). The software and computational workflows are truly awful, like hard to describe bad. The company that figures out metaboloimcs well is going to command a much larger market than genomics - genomics tells you what's possible, metabolomics tells you actually what's happening.


> Google Genomics. Careers link. > Microsoft Genomics. Careers link.

Google and Microsoft probably know how to make software?

Side note: why does this page have user-select: none on body? It's annoying; what does it accomplish?


Google Genomics is now aka Google Cloud Life Sciences. There's also their sister company Verily that operates in this space.


There is also a research group: https://health.google/health-research/genomics/


Couple of things I know.

Bioinformaticians come in two flavors. Those that studied biology and then took up coding and then the even rarer computer scientists who learned biology. The latter are so rare that they are almost all professors or founders or work at Deep Mind etc... Then, there are the biomedical engineers, etc...

The computer scientists will go off a solve protein folding when the bioinformaticians and chemists worked on it for years.. I am exaggerating a little here, I imagine there were plenty of bioinformaticians on the Alpha Fold team, but the fundamental breakthrough was DNNs.


biologist / chemist will take the architecture studio approach, then develop math to shorten the write-up.

research software engineer will develop the mathematics to describe things, then use the numerical system to write software to determine things.


I’ve worked at a lot of places and for researchers was my worst job ever by far. I’ll never work for someone with a PhD again, as Sheldon Cooper’s attitude towards engineers is no joke.


I'm most definitely not an expert in this area, but I have recently taken interest in learning about "succinct data structures", which from what I understand have their place in bioinformatics.

It's been a challenging topic to learn about, because most of the information comes from Computer Science papers and articles where the information is presented in a very formal, mathematical way, which I am just not used to.

Normally when thinking about data structures and algorithms, we're mostly concerned with optimizing for speed. Space complexity is not usually as big of a consideration. Succinct data structures are all about creating ways to achieve good runtime performance while representing the data in a "compressed" format. I think this comes in handy when doing things like DNA sequencing since data sets are so large.

I'm excited to check out some of links in the post, and in case any one else is interested in learning more about succinct data structures, here's a few resources I'd recommend:

Prof. Ben Langmead's YouTube channel: https://www.youtube.com/user/BenLangmead/featured

Alex Bowe's blog has some good content: https://www.alexbowe.com/articles/

Prof. Erik Demaine's "succinct" lectures from his adv. data structures course at MIT on YouTube: https://www.youtube.com/watch?v=3Y2weLDiUWw

Edward Kmett's Haskell live coding session going into some details about succinct: https://www.youtube.com/watch?v=9MKEmNNJgFc

There's also a lot of research papers, which you should be able to find by searching for "succinct data structures" (Jacobson, Munro, Brodnik, Raman, Rao, Navaro, Sadakane just to name a few). I at least have a basic CS undergraduate degree, but many of these papers are over my head, but I have still been able to slowly understand more and more. Some I had to purchase.


This is tangentially related to what I'm currently doing.

I basically work in EdTech. The company is not an EdTech company, it's a education services company. I was hired on to develop software that we couldn't find in the market[0].

I'm the process of building this thing, we've been attending and speaking at conferences in our industry. And I'm seeing a lot of the same stories: academia is trying to do research, the research fundamentally requires software to make the research happen, the quality of the software can have a huge impact on results, but because software development is tangential to the research goals, there's little to no allocation to software developers. This leaves the researches to cobble together a solution that maybe kinda fulfills their need, not corky, and certainly not perpetually (a lot of reliance on trial software and services).

We would love to offer our software to researchers in our field. We've gotten feedback from several that what we are building is exactly the sort of thing they need. But they have no money, and even if we were in a position to give it away for free, we can't even make those connections come to fruition.

So I don't know what to do. I really am thinking of starting to give it away for free, because at least we'd benefit from more research results in our field pricing the efficacy of our approach. But that's a really slow burn.

[0] Specifics don't matter, but if you're curious, I make a VR environment for foreign language training emphasizing culture.


I recently switched from software engineering on ads and web performance at a FAANG to (meta)genomics at a nonprofit startup; happy to answer questions


From a genomics layperson with decent dev skills:

A. What are the broad and medium goals you work on?

B. What are your daily activities? How do they fir into (A)?

C. What does nonprofit genomics vs for profit look like from a revenue standpoint?

D. What specific technologies/stacks are you using?

E. The CRUD frontend+backend+database to serve users (and sell ads) is pretty ubiquitous in 'tech', with some branches. How does your field compare?


A: The goal of my current project is to identify novel pandemics, even if they're caused by something we've never seen before, most likely by looking at growth patterns. At a broader scale, I'm trying to learn enough about working in this field that I'll be able to contribute on whatever future projects seem most appropriately important to me.

B: unlike my previous work, I'm back to being an individual contributor. Very few meetings, mostly coding and analysis. Current thing is trying to understand what drives per-sample variability in wastewater sequencing data.

C. Our group is currently philanthropically funded, and is focused on determining whether/how this is possible/practical.

D. I'm mostly working in (bio)python, with a few bits written in C and gluing things together with bash.

E. I was most recently working on (a) JS infrastructure to fetch and render ads and (b) working with browsers on platform features that would improve privacy, security, and efficiency.


F. The parent article mentions solutions are often custom made by one person. Can problems in the field be reduced such that extensible open-source frameworks could be applied? The way we have frameworks for webdev?


I'm very new to this area, and am really not the right person to ask, but I'll try my best ;)

In general you have frameworks when lots of people are trying to solve a large number of problems that look similar at the start and then will diverge. That's pretty web specific. I think instead in bio you mostly get (and will keep getting) modular tools and pipeline standardization.


Are there any open source tools/projects worth contributing to that don't need specialized infrastructure, proprietary data, or a PhD in the field to understand?


Almost certainly yes. For example, lots of things don't have tests and really should. But I'm not too sure about priorities, or about which projects would welcome that work.


These should be separate in academia.

SWEs cannot write code that maps equations that may change daily completely due to modeling / assumptions change.

Too much focus on modularizing, premature optimization, useless unit testing etc. Who cares about all these if the underlying model is wrong?

If things are stable enough to go into production then the code should leave academia and be re-written properly by SWEs, not by clueless bio phds.


Umm..... software weather modeling systems map equations that may change daily completely due to modeling / assumptions change.


Funny you mention weather because I have worked in the field. We had to make a change in the discretization scheme, and pretty much all of the assumptions of the super optimized parallelized production version had to be thrown away, to the point that we had to abandon it completely.

Software engineering needs well defined boundaries to design between them and test edge cases. When we are doing fundamental research we don’t have this. Literally anything can change in the logic, the inputs, the outputs.


> "Funny you mention weather because I have worked in the field. We had to make a change in the discretization scheme, and pretty much all of the assumptions of the super optimized parallelized production version had to be thrown away, to the point that we had to abandon it completely."...

Ah, so initial pixilation for front fonts weren't unicode compatible? :-)


> ADVANCED DEGREE HOLDING SOFTWARE ENGINEERS: consider working on genomics

fixed the title.


heh, you think scientists automatically understand computing nature?

You’ll be the janitor cleaning up their 20k LoC, one file Python with zero abstraction.

If this is already a thing at a FAANG, it will be worse at a pure science shop.


You'll be lucky if it's python, most likely thousands of lines of R by someone who doesn't know how to write a function or declare dependencies.


Ummm... messy python/R sounds lot better than messy numerical analysis in fortran with NO programming standards because has been used/modified for over 50+ years.

Mainly because can find / write software to analyze & reorganize modern structured languages such as python & R into something recongnized in programming field as appropriate approach/structure.


From past work experience & funding source priorities, computational genomics usually considered a supporting role as information analyst when needs to be a research software engineer. (unless building software for ct, mri, x-ray scanning process).

aka need to be able to develop the dna / dna number system equivalent of things (aka something other than binary / punch card block based number system) such as:

   treesitter 

   nyquest : https://en.wikipedia.org/wiki/Nyquist_(programming_language)

   slippery chicken : https://ccrma.stanford.edu/workshops/algorithmic-composition-with-slippery-chicken
but wind up doing the equivalent of automated statistical analysis, because focus is NOT to develop software package/system.

short broader subject take, what programming groups dont get about applicative programming vs. algol/block programming


If you enjoy programming, don't work for fields or companies that view you as a cost center, it's a drag.


I want to study Bioinformatics at McGill for a Masters or Grad school. I was doing some cheminformatics related work and basically was working along Bio students and grad students literally before the pandemic hit. Heck, even Boston University would be a good fit for me because I was helping students at hackathons build their solutions or mentoring them right before the pandemic hit for AstraZenacs challenge, and the team I was helping out created a hardware prototype that would give you your daily meds (great for seniors that forget to take pills or what). But honestly I feel like there is a lot of gatekeeping in this community, I would have to spend approximately 7 years before I could get taken seriously in this domain. In that 7 years you can do a lot of other meaningful work than being stuck in grad school. I dunno.


> Of course, this would not be the fault of the individuals who maintain the software, who are often brilliant: it's just simply not fair to expect individuals to ensure this consistency using their own, ad-hoc processses

I think this is a little generous. Engineers of all stripes should take responsibility for their work. If they say, "Yes I can add methylation analysis in three weeks," then they should make sure that means it's made well, with tests and all. I've definitely encountered people who don't communicate the scale of the task, and for most of them it's because they don't do software engineering; they do informatics scripting.


> "Yes I can add methylation analysis in three weeks,"

virtual or reprogram the robotic arm?


Hah. Virtual : - )


Actor methods with a script, I presume.


I quit several years of a dream job in Apple engineering to work on genomics in mid 2016; the small company i joined tanked in 6 months. I think the problems impacting this space include vague customer needs. Caveat emptor, so to speak.


Genomics was a pretty good place for software engineers that have an interest in molecular biology; however, the pay is not generally comparable with that earned in a tech industry job. Interesting (to me) is that the software engineers that I've worked with in genomics-oriented labs treat biologists and biological data as the gold standard, while the biologists in those labs are fairly reverential towards the software engineers and computational results! Of course, both are overly optimistic ... Unfortunately, the best software engineers I worked with eventually jumped to software startups or tech industry jobs.


Average salary of $58k.

Genomics companies: consider paying more.


I work in genomics, this is very true. I know of some modernization efforts, ie by companies working with new file formats, like GenomSys [0] with mpeg-genomics [1].

It feels like it’s going very slowly though. The field just really depends on their Unix philosophy tools, there is a lot of gzipped text files that are piped through bash scripts and tool like awk and grep. It works, mostly, but there is a lot of weirdness.

[0] https://genomsys.com/

[1] https://mpeg-g.org/


mpeg-g to me is probably bad for the field. sam/bam/cram is the way of the present and mpeg-g offers little over these formats and is patent encumbered. xref samtools developer blog http://datageekdom.blogspot.com/2018/09/mpeg-g-ugly.html?m=1


Hmm I agree with James. I bring the same points to my employer actually, they sort of listen but they like IP.


Sounds like the only way you could ensure a healthy environment in this field would be to follow the Matlab or the Wolfram model. In other words, create a software company whose customers are bioinformaticians. Maybe find a bioinformatics academic to help guide product development enough to get some contracts with research institutions, and take it from there.

It sounds like academia is simply too toxic, entitled, full of itself and hierarchical to provide an environment with good software practices can thrive.


Simply put, I can get a better paid gig just developing for the web, without understanding any complicated domain (what the developers do is the complexity).

Working in research in general doesn't seem to pay that well or the well paid jobs are few and far.

Maybe it's a sector ready to be disrupted by a startup with quality developers; but I still have to see disruption based on improving code quality. It's a tangential aspect as well and doesn't impact much the actual business.


In addition to genomics, the other area that could greatly benefit from professional SWE experience is imaging. Many of the most effective techniques today combine microscopic images with genomics- for example variable transcriptomics approaches. Imaging is a more natural fit for people who like to work with dense, visualizable matrices, although genomics data is now trending more towards matrices (all genes x some observable metric).


So if one was financially independent and wished to write something open-source in that field, where would the highest impact be?


Invent a new file format (or a few) for storing genomics data. They're all the rage in the bioinformatics field. Make sure not to document its semantics so that its implementation is the only spec.


Ahh a "reference implementation" connoisseur


It seems to me like we'd probably be better off partnering with domain experts in Genomics who want to build software that can be used across the board. Sounds like an interesting opportunity for a business. I'm open to the idea if anyone wants to chat, let me know. I'm SWE but would want to partner with a Genomics Expert.


What would I need to get started on an open source genomics program?

Where do they live? What do they do?

Like, do you need a genome interpreter? Does one exist? Are there any open source products used by the field currently? I know the names of the programs and items I'd look at to get started in AI, for example. But for genomics, it's a total mystery.



From what is said in the article and comments, this may be a good place to be if you are bootstrapping your own company.


If you’re interested in genomics I’d recommend working for a commercial entity in clinical/translational genomics side rather than in academia. I have worked for a few of the big names in the space and although they had their problems the work was very rewarding as you’re closer to the patient impact.


I keep thinking of this space but don't know where to start. Any pointers for good resources? - books that are accessible to software engineers with no background in genomics, open source projects which are widely used, etc.... in short a good place to start in exploratory/hobby/learning mode.


It might seem like an exaggeration but this morning I was thinking of doing more work in scientific software. I lost my mother to cancer. This seems like a way to channel that energy and motivation. Thanks for posting, OP.

PS. Feel free to reach out. Email in profile. I’ll be happy to email around the subject.


People seem to be responding to the pitch in a different way to how it is intended. It's entirely a pitch that there is a need for this. So if you aren't highly motivated by doing something valuable and useful, this isn't for you.

For me, working in the field is worth doing because I have come to a place in my life where I value doing something useful more than I value other things. You really can't put a value on being able to get up every single day and know that you are actually doing something good for the world that day. And getting paid, while less than your absolute highest potential, still a really good salary by comparison to most of society.

Plus you do get a lot of freedom and autonomy, and exposure to absolutely fascinating research and biology, and if you want to dabble in academia, it's surprisingly easy if you have a supportive group.


I will add Mammoth Biosciences to the list. We are looking for Software Engineers, Data Engineers, Data Scientists and more.

https://mammoth.bio/careers/


Had a glance at the team page on the Mammoth website, curious to learn more about how the various roles fit together. Mind sharing a way to get in touch?


Sure, feel free to drop me am email at kylem at mammothbiosci.com


I work at the broad institute and it's a pretty cool place, can be long hours but they're investing heavily in software and the like. Can be a nice intermediate between research and applications


You're not exactly selling it...


> There is a significant gap between how software is currently developed in this space versus how it should be developed

I appreciate the authors honesty. Been there, take it easy bud.


I just want to add my $0.02 to respond to the low pay and low respect as a Software Developer/Engineer in the genomics. This is 100% true and also not true [bear with me]... you get it back in the back end.

First the comp, most people think about the income they get as in levels.fyi TC. IMHO, The no. 1 value add is working for an academic center is the freedom in both time and spirit you get in pursuing your interest and the side ventures & hustles which eventually compounds. The hours are very reasonable in academia and in most places, you can take classes internally on campus or get reimbursed for it and get supportive managers who let you take time off from work to study. Or just great WLB to pursue something you really enjoy. And this compounds both spiritually and financially.

Just a data point of one, I took an online data science degree whilst working like 15hrs/week and 25hrs on classes. From the classes, I got the bug to apply data science I learned on my degree and on the genomics analysis job to apply to the financial markets/automated trading. Now over the past 4 years, I've achieved CAGR of 35%, and sharpe of 2.5 where my options trading portfolio capital gains outsizes consistently my W-2 pay and keep me par on L5 of FAANG engineer. To give you an idea, my other co-workers have gone into side-hustles real estate (not sure about now) or running day-care to great success. Yes because you have that much free time.

Now autonomy/academic stimulation, I would not give it up for the world even if I was doing it for free. Previously I was working for a "hot tech" company where I was bored out of my minds cranking CRUD widgets and re-learning JS frameworks every year and attending BS lunch n' learn work sessions of new crappy libraries with hipster names. In genomics, you get to apply traditional stat techniques (bioconductor), deep learning techniques (tensorflow, AlphaFold, GANs) and learn latest sequencing protocols (scRNASeq, ChIPSeq, CRISPR screenings) and learn the biology domain too (immunology, viral responses, cell regulatory networks, synthetic biology. It's like being on the front-seat to a movie cinema or basketball court where the scientific evolution is happening. You're learning something new everyday and you are at the center of it all as PIs, wet lab bench scientists all depend on you to perform the analysis and build the pipelines... and 8 years in, and I'm still excited with the only disappointment that I will never learn it all.

Obv. a subjective data point of one, but I just want to add my data point just in case somebody out there on the fence. Yes sometimes you can truly have it all.


I started working at a genomics company about a couple of years ago and my experience is very different from the post. Although there might be a handful of bioinformatics tools that are quite old, the ones at the heart of of operations are worked on by teams and reasonably maintained, and although I agree with the headline that there's a lot of work to be done in the area - my perspective is a little different. Although this doesn't apply to all genomics companies, I'm at a company that has a lab, and the software we write makes the lab about 8x more efficient and the next generation of sequencing technology will bring sequencing costs down by about 5x. Meanwhile the science and literature keep pushing further and newer generations of physicians are putting a stronger emphasis on genomic counseling. Thanks in part to the power of viral sequencing data the government is starting to trust laboratories that bring valuable and actionable insights. I think all of those combined with the fact that CRISPR technologies are getting further along puts genomics in a unique position. TLDR; yes genomics is exciting and on the brink of something big, but no it's not a dumpster fire that needs saving. Oh and as a bonus - I get to work with really smart scientists and they are very friendly :)


I write synthetic biology software for a living and maintain this open source, Go package for engineering DNA that has high test coverage and a nice little dev community around it.

https://github.com/TimothyStiles/poly

A large part of my project's community are devs that want to get into the field but can't tolerate the ridiculously low pay, laughably bad management, disrespect, and what amounts to 40+ years of technical debt that's endemic to biotech software.

I've had companies here in the Bay Area offer me 100K a year with a straight face. I've had companies during interview tell me they're looking for someone to help, "set up GitHub". I've seen job listings for low paid web dev positions require applicants to have PhDs.

The reality is that except for a growing handful of places management straight up won't know the difference between IT and software engineers. It's what I call the naive buyers problem.

The demand for software engineers in biotech is generated by naive buyers that don't know what they need, why they need it, or how to get it.

Benchling and Recursion Pharmaceuticals have reputations in the industry of paying, "standard software salaries". So do the research divisions at places like deepmind/microsoft/google but in my experience there's even new multi-billion dollar institutes where senior management has never even heard the term devops.

Most places advertise for "data scientist", positions or some analog, instead of software engineers. This is mostly because upper management has never met an actual practicing software engineer in a professional setting. Many come from academia where the culture and work requirements heavily disincentivize standard software engineering practices.

It's also not uncommon for a biotech company to either have a very under qualified CTO whose main programming experience is what they learned doing ML research like stuff during their PhD or not even have one at all which has huge downstream consequences.

This week a software engineer trying to make the switch to biotech actually DM'd me to ask why they were seeing a ton of data science / ML job positions but no software engineering / devops positions.

They were worried that these companies were trying to save on costs by forcing their data scientists to create infrastructure but it's actually worse than that. Most of these companies aren't even aware that there's supposed to be infrastructure.

Despite all of this the future is looking better and I'm starting to find new companies and positions that are well... reasonable. I learned about this thread from a friend at a party last night that works at one of these companies. There's a small, strong new wave of companies and developers out there pushing biotech software forward. Hopefully some (including myself) make it big while pushing the idea that better tech equals better biotech.


I am a founder of a startup (Octant - a16z backed) that has a small & growing software engineering and data science team (see the Nov who's hiring post). Some thoughts on some of the discussion here:

1. Compensation – In academia, you will likely take a big salary hit (much of this is discussed). There are a few exceptions like newer institutes like Chan Zuckerberg, Arc Institute, etc that are paying much more competitive salaries though. In well-backed startups and larger biotech/pharma, cash is likely equal (or often more) to software comps elsewhere – the bigger hit you take is usually in equity – no one has been able to match FAANG on total comp with RSUs in the mix. Startups can provide options, but it's not very fungible. For example, we benchmark salary on comparable A16Z pre-public non-bio companies use as well as stats from the broader SV SWE salary datasets. There are startups in bio that pay even higher to lure talent.

2. Research vs Product – Over the last decade, there are a bunch of highly profitable tech companies and large funded new startups (e.g., Calico, Altos, Deepmind, etc) trying to take on bio as the next frontier. These places (like those named in the blog post) pay very competitively. Thus far, these places often turn into a big mess because it becomes hard to deliver products (like drugs) in a mostly academic-y atmosphere. I don't think anyone has really cracked this nut yet (or if it's even possible).

2. Culture of SW importance – In a lot of startups these days, this has changed quite a bit over the last 5 years. Lots of software & data science first startups. I think in the larger pharma/biotech though, the centrality of drug discovery takes a lot more oxygen than software, which are often thought of as innovation bets and different places have different levels of long term commitment.

3. I think one important difference is the type of company. There are many software companies in healthcare/bio that are software products supporting R&D, healthcare, drug development etc. Many of them have done quite well (e.g., Benchling, Komodo Health etc in A16Z portfolio alone) and are basically just software companies that just happen to be in bio. There are many others like most drug discovery companies (like us) where software and data science is enabling, but the product is often ultimately drugs. For a lot of SWEs, this becomes problematic because people often want the satisfaction of having externally deployed software products to push into the world. The heroes and heroines of this world are often drug hunters over tool developers, and this has cultural consequences as well. Some people are really good with this (getting a lot of satisfaction out of enabling new drugs to treat serious disease), but a lot of folks aren't.

4. The current biotech crash has been bigger and more sustained than the tech crash thus far. High interest rates impact this industry much more than others, because revenue on new drugs, which drive a large part of the industry usually take a decade or more to develop before revenues are flowing. This is less of an issue in healthtech companies that can often deploy much more quickly (90% of healthcare costs are not drugs).

5. Finally, there are many happy SWEs and DS in bio at companies that value software and can build good careers in it building products that ultimately help human health in new ways. It's a pretty amazing time in biology, with a suite of new technologies to read, interpret, write, edit, deploy molecules/DNA/cells that are really unlocking many of the mysteries of human diseases. I feel lucky every day we get to continue building in this space.


Warning sign. You can't even select the text on the site.


Don't know why HN still have down votes. Down voters are among the most stupid people on the planet. The site changed you stupid f*ckers.

https://web.archive.org/web/20221119162905/https://claymcleo...


Which open source projects need help, and why those?


I do work in genomics.


Clicking to highlight text is disabled. https://xkcd.com/1271/


My fault! I’ll fix it when I can. Appreciate the report.

Edit: fixed.


I use "Stop The Madness" extension in Safari. Similar options available for Chrome.

But this shit is so dumb.


Several of the job boards linked don't have any job listings, most don't provide a salary range, several require advanced degrees, and none specify whether remote work is possible.

If I can get a better salary and working conditions at some crappy no-name startup, why would I choose to work at an organization that respects my craft so little they haven't bothered to maintain their software for a decade?


I think your phrase here sums up how many people feel:

> why would I choose to work at an organization that respects my craft so little they haven't bothered to maintain their software for a decade

This is changing in my experience, albeit slowly. And really, this is what I'm calling on us, as a community, to do better on.

The reason you _would_ work at these organizations is because (1) the subject-matter is really interesting, (2) there are hard problems to be solved, and (3) you wake up every morning knowing that you are working on something that will have an impact on the lives of people around the world.

At least those are my reasons :)


I charge a high rate not because I need the money but because I need to stop people from wasting my time. I’ve worked at these places before where they exploit your altruism and dedication to craft to extract more work out of you for less. Concretely one of the places refused to hire a frontend dev to help so I got stuck wasting time churning frontends. Charging more encourages them not to do that.


Yeah. It’s one thing if the only way the job is worse is in pay, it my experience has been that if the pay is significantly worse, the job is worse off in every way.


I have experience working in academia. I started out working for a medical school in fact, on bioinformatics. It's been a while, so I have kind of forgotten the problems, but I'll do my best.

1. Academic code. Not one institution would pass the Joel Test[1]. You pretty much covered some key points in your first paragraph, so I see not much has changed. The best predictor of how something will perform in the future is how it has performed in the past. Just hiring good software engineers won't change the system in which they work.

2. Academic bureaucracy & administration. I've worked for large Fortune 500 companies with less byzantine org charts. I've been matrix-managed. The siloing in academia is crazy.

3. Advancement. Because it's academia, advanced degrees are everything. My first boss in academia had a PhD. His job? He ran the student computing lab. My second boss was an MD/PhD. Great guy, but treated everyone like a lab assistant. I went to graduate school for one year and realized it wasn't for me.

4. (added after reading other comments). Completely unrealistic understanding of what developing robust, complex software is like. You touched on this by mentioning how many projects have 1 maintainer. I remember seeing a doctor shopping around his project plan. I'd say it would be a challenge for a high-performing 5-person team. He thought it was a job for a single entry-level programmer.

1 https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-s...


If you don't have a PhD, don't touch academic jobs with a 10 foot pole. They all got one, and they value the credential to a ridiculous degree.

When I last worked with academia, they essentially thought of me as the same as the guy who maintains the lab equipment, not an actual collaborator on their research.


The academic elitism doesn’t end at PhD. A friend asked a Nobel prize winner their thoughts of different person who was recently awarded a Nobel prize in the same field. Their response “Ah, yes that was one of the lesser Nobels”


I think what I'd ask is:

Will I be allowed to fix the parts that result in

> an organization that respects my craft so little

Like, it's one thing if they haven't - that can be fine, it's just more work. It's an entirely different thing if they won't.


I think the honest answer is, it depends where you work. Where I work, we have that autonomy. But we also have leadership that understands and respects the software engineering craft.


How are we "as a community" going to be able to improve this?

It's ultimately down to the cultural norms of the field, as well as the realities of academic funding.

I was a research software engineer (RSE) for the best part of a decade. The best thing that happened to me was being made redundant when my funding ran out, and being forced to work in industry. What a difference (and wholly for the better).

The reasons you give are all nice positives, but they all ultimately are very emotionally manipulative. You're asking people to act against their own self-interest. But this isn't really for the benefit of humanity. It's for the benefit of the PIs who run the research groups, and keeping their little empires running. But the cost to the individual is great. You're sacrificing salary, a career track, advancing your own skills to the full, and in many cases the opportunity to have a life: being able to afford a home and support a family.

In retrospect I, and many others like me, do feel that we were taken advantage of to some degree.

I spent several years on a massive grant, then several years on lots of small short-term grants (12 months, 6 months, 3 months). You can't risk getting a mortgage when you have no guaranteed employment. And it's also very stressful not knowing if you'll be employed in three months time every three months. And unlike in a company, RSEs don't really have a proper career track. There's no real progression. You're a hired help.

RSEs are not treated equally with academics. Let's be really honest about that. We're not. I even had a PhD in the subject area and you're still beneath all of the "real" academics. We're not "partners" in their work. We're the dogsbody's.

If these people want software developers with real chops to work in the field then they need to pay a competitive salary, have a proper career track, and really fix the job stability. And they also need to properly respect the expertise RSEs bring. Unless all of those are fixed, a career in industry will continue to be the only rational choice.

This won't happen though. Tenured academics refuse to consider paying the going rate because that would mean the "hired help" would be earning considerably more than they do. I had already topped out the salary band when I left, and I was earning more than most of the junior-mid-career academics. They are, of course, on fairly poor salaries. They too would earn vastly more in industry, but are mostly unwilling to consider that option as a rule. Their loss. If they truly respected the value they were getting, then they would pay for it. It will not happen though. Most of academia is about climbing the greasy pole and not about advancing the state of the art; there's just no way they'll permit others to sit higher on the salary pyramid than they do.

At least in industry skill and competence and the ability to deliver are highly-valued, and companies will gladly pay for people who are proven to deliver. In practice the work I do in industry (biomedical) has far more positive impact upon the world than anything I did in the academic niche I used to occupy, and is also vastly more enjoyable, with a lot more responsibility and technical depth.

Are you personally planning to stick it out for the whole of your career? Because if I could give you the advice I would have liked to have given myself, it's that you should properly think about where you want your career to go in the medium to long term, and decide when (not if, but when) you will exit to move to industry. Use it as an opportunity to gain some useful skills, and move on to where your skills will be properly valued.


I've seen a posted phd position that was extremely weak academically, because they just wanted someone with a CS degree to implement their pre-existing ideas, but didn't want to pay a developer's salary.

The position kept being posted multiple times over a couple of years. Then I moved on and don't know what happened.


How's the startup ecosystem?


Ever since the pandemic, many software engineers have become exactly the type of "I've got mine, Jack" people they typically deride.


Or, and hear me out here for a second, people go where they’re most valued. Why would someone volunteer to go work somewhere where they’ll get less money, have less flexibility, be treated as second-class employees, have less work perks, etc. when so many alternatives exist? Doubly so in this economy.

Maybe genetic companies should catch up in workplace etiquette instead of recommending that SWE’s lower their expectations.


> be treated as second-class employees

The pay isn't all that important to me, as long as I can live on it, but this. It was so obvious when I was working in academia that because I wasn't myself an academic that I was just lab help, no better than the person who washes the test tubes and beakers.


I don't quite understand this post. In my experience lab technicians in academia are highly valued (however I mostly have experience with clean room staff), however they are support for the researchers, who drive the research agenda (well actually it's the funding primarily). What exactly do you expect to not be what you call "a second class citizen"?


Are you really, really serious?

Academia is a hugely elitist pyramid with well-demarcated layers, and the lab technicians are at the very lowest level of the pecking order, down with the cleaning staff.

They might be "needed", but by and large are they really "well respected" or "valued"? Not really, sad to say.


I agree. With that said, I do think we are getting there. The pay is becoming more comparable, and I think software engineers are becoming more and more valued in these companies/organizations.


At least one of the entities on that list is a nonprofit academic institution. Expecting pay equal to the standard software industry is misguided.

Whether or not that's a tradeoff you're willing to make is another question.


The problem is that it isn't just about pay, it's about everything: autonomy, flexibility, culture, work quality, respect... If an organization can reliably convince—show, not tell—that it will be a much better place to work overall, I'm certain they can hire highly skilled engineers even if they can't compete on pay.

My experience is that most non-tech organizations can't or won't.


If you go to academia, you're certainly are not going for the money.

People can work for less if they are visibly valued, or where they are doing some heroic stuff that appeals to them personally.

People can for some time withstand being treated as second class, being overworked, etc, if they are paid a lot.

But if it's neither, why would anyone bother?


> But if it's neither, why would anyone bother?

Because they find it rewarding in some other way? I agree that that way is not the conventional wisdom, but it exists, it turns out that some people value doing things that provide a demonstrable benefit to humanity.

Also he second class citizen thing has diminished over the years. It still exists but there are plenty of companies where that's no longer true. This is in stark contrast to when the field was getting off the ground, for instance it wasn't uncommon for benefits like PTO to be tied to your degree level & not length of employment.


And three of them are FAANG. They could certainly afford it.


And they pay their standard rates.


Haha, thanks! This is much kinder than my response was going to be.


> If I can get a better salary and working conditions at some crappy no-name startup, why would I choose to work at an organization that respects my craft so little they haven't bothered to maintain their software for a decade?

Amen, couldn't have said it better myself. I'm sure it's very worthy and all working on a genomics project that aims to eradicate some killer disease, but you need to live and provide for your family while you're doing it.


In general salaries would be lower than what you can get in a standard software role. Some of the ones on that list allow remote work, others are more limited. The tradeoff is that you're working towards the betterment of humanity. Whether or not that's worth the tradeoff for you is a personal decision.


> you're working towards the betterment of humanity

If only this was actually true.

In reality, you're being taken advantage of and it's very manipulative and deceptive to make this claim.


Agree that not everyone will buy this. But it is why many people are working in positions that are otherwise worse on paper. It's up to them to decide if its real or not.


"How high are you on trait agreeableness"


> The tradeoff is that you're working towards the betterment of humanity.

You're working to give some pharma company something to patent and make millions of $


This reads a bit like “what can this place do for me?” instead of “how can I make the place I work at better”.

High potential areas like genomics that are behind on software are amazing places for talented software people with a givig attitude to have a big positive impact.


I can have a big positive impact and make the startups I tend to work for better places AND get paid well and be respected. It seems the "potential" here is a result of a lack of care and maintenance for legacy codebases, and not, say, driving innovation.


Hard no.

Initial code would still be developed by SME, who:

- Don't understand most programming abstractions

- Don't see the advantage of a clean codebase

- Would rather go back to their code spaghetti mess, than help figure out why some corner cases behave differently in a fresh codebase

- Would still submit changes to their code spaghetti mess and expect you to apply them to the cleaned codebase

I did what the author suggested (not in genomics, but in a different research-heavy scientific field) for a while and would not recommend it to anyone.

And that's not even taking compensation and work conditions into account.


I have a friend who works as a "make this mess a product we can show in conferences" engineer in a research center and it is literally what you described.

The only reason he told me for still working there, is that:

1- the workload is fairly low

2- he has a lot of autonomy

3- he shows up every day around noon and leaves at 5PM


I understand your point and you are completely right, but on their perspective your experience from multiple aspects of codebase is still valuable contribution and making a difference isnt it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: