As someone who practices bioinformatics, it doesn’t seem appealing. Bioinformati...

tstactplsignore · 2024-02-11T20:54:33 1707684873

To disagree, I'm a computational biologist and it's my firm belief 99% of the scientifically important stuff happens before the stats and plotting. That's not to say I dismiss those things and haven't done my fair share of stats, but just that the difference between real results and incorrect results most often happens before that step.

I'm a microbiologist though, for stuff like human RNA-Seq I understand that it's often plug and play to get a gene counts table at this point.

bfrankline · 2024-02-11T22:26:24 1707690384

Sure, but I think, for example, representation learning, doesn’t involve manipulating an array of strings.

kescobo · 2024-02-12T00:59:20 1707699560

>To disagree, I'm a computational biologist and it's my firm belief 99% of the scientifically important stuff happens before the stats and plotting.

I'm a microbiologist too, but the kind that uses mostly off-the-shelf tools to do the taxonomic/functional assignment on metagenomes, and then stats/data science on the features. I kinda didn't know what you mean by "99% of the scientifically important stuff happens before the stats and the plotting".

I mean, give me a 500x2.6x10^6 sparse matrix of gene function abundances and tell me that you've done anything scientifically meaningful. Or on the other side, let me hand you a fastq file from sequencing a poorly extracted DNA sample, and you give me the best algorithm in the world, and there's nothing scientifically meaningful that's going to come out of that.

folli · 2024-02-11T21:43:56 1707687836

I guess that depends on your exact ecological niche within bioinformatics.

I got my start at a NGS facility, so handling FASTQ was closer to 80% of my time, so any speedups would have been greatly appreciated.

MillironX · 2024-02-12T01:55:33 1707702933

> I guess that depends on your exact ecological niche within bioinformatics.

Agreed. I know people in my department who just ran Galaxy pipelines and R scripts to make pretty plots. I was on the other side of the spectrum and needed fast parsers, so the SAM and VCF specifications were my bible.

__MatrixMan__ · 2024-02-11T20:14:02 1707682442

As someone who is considering a switch from generic software engineering towards bioinformatics, what would you say the pain points are?

If this is not the way to remove workflow friction, what is?

f6v · 2024-02-12T11:27:38 1707737258

I had an ok career in software engineering (Android/iOS -> backend -> engineering management) before getting MS in Bioinformatics and starting a PhD in Medicine.

For me, the pain points are often the same as in business. Biologists with no data analysis experience want something done without understanding constraints. Requirements are often not understood and there isn’t a good plan.

Some people do indeed suffer from code being slow and this can be solved with better tools. I works with large datasets in single-cell genomics (over a million cells) and the model takes ~12 hrs to train on an entry-level GPU. So, most o my time is spent at trying to understand the results.

getoffmycase · 2024-02-12T01:56:09 1707702969

Honestly the major pain point is that the grad student that wrote the package you need is no longer maintaining it because they’ve graduated. Also the code they wrote sucks, but whatever.

I’m wary of software engineers coming over the bioinformatics because they never have the domain expertise required to make meaningful contributions, and yet many think they know everything.

__MatrixMan__ · 2024-02-12T07:14:59 1707722099

Yeah, I'm wary of being that guy too. My current approach is the slow one: first get a biochemistry degree.

life-and-quiet · 2024-02-11T21:52:36 1707688356

Would like to second this question. I'm very interested in getting into this world, but it feels like there isn't a clear path (especially for someone self-taught like me). Bioinformatics feels pretty inaccessible without a computer science or biology degree, even with substantial R and Python experience.

fwip · 2024-02-11T22:14:30 1707689670

There's a few camps in bioinformatics, from what I've seen.

1) The fellows writing papers - usually these guys have PhDs. Usually a science-focused PhD. 2) Analysts - often have a background in mathematics, biology, or big-data. Success here can lead to an onramp to camp 1. Much of your time here is spent in interactive programming environments, like Jupyter notebooks. 3) Programmers - writing novel or faster bioinformatic tools, often in low-level languages like C++ or Rust. Sometimes you can get a paper out of these, especially if you have a CS background. There's increasingly room for higher-level tools though here too, so it starts to overlap with 2. 4) Pipeline programmers - people gluing analysis workflows together out of the tools written in low-level languages, often with a liberal helping of Unix command-fu. Often sort of an ad-hoc role, containing people from diverse backgrounds, from biology to sysadmin. (This is my current role). 5) Biology/wetlab - people running experiments in the lab, and want to analyze their own work, especially for QC purposes. Wild-west ad-hoc development practices.

__MatrixMan__ · 2024-02-12T00:17:43 1707697063

I couldn't speak to careers, but my curiosity was enough for me to ask a biochemist to join his bioinformatics class despite lacking a great many prerequisites.

I was quite helpful to him and the other students (who mostly struggled with packaging: conda, pip, apt, etc). In turn, they were quite patient with my lack of biochemistry background. It was nice to get a taste without having to take what would've been 2.5 years worth of prerequisites.

f6v · 2024-02-12T11:32:06 1707737526

I think there’s a lot of gate keeping and having some formal degree is a pre-requisite. And be advised that pay isn’t great either.

But bioinformatics is an umbrella term. There’re so many different things people do. I started by identifying field I’m interested in (ageing and immunology) and backtracked from there.