There's got to be a lot more to the story that I don't understand. Why wouldn't ...

epistasis · on Oct 8, 2016

A lot of them are the same people that worked on the first reference genome are working on these algorithms, or have/are mentored the scientists in this article. I would say that there's a lot more to analysis of genomes than you expect; there are many different comparisons that make sense. Initially the most informative comparisons were to other species, where a genome graph makes less sense at the time with the amount of data and compute ability that was available to bioinformatics scientists.

The rate of technology change for sequencing capabilities in the past 16 years makes Moore's law look like the rate of change in battery technology.

16 years ago I didn't think it would be possible to sequence individuals in the clinic before 2050 or so. Now we are building the technology to analyze the varitation in a million genomes.

I guess your question is kind of like, "why didn't computer scientists build systems like Kubernetes or Mesos in the 80s?" The problems and challenges were just different 16 years ago, and there was more than enough to work on between then and now. We don't need genome graphs at this instant, but we will in the coming years. And it's likely that newer representations will come about, too, as more math and theory is invented.

j_simpson · on Oct 8, 2016

It isn't just an issue of coming up with a good representation of the data. Actually doing something with the graph, like answering the query "is this string a path through the graph?", is hard to do at the necessary scale (you might make a billion such queries after sequencing a genome). The classic string indexing approaches (suffix arrays, fm-index, etc) don't easily generalize to graphs and this is a very active research topic.

eggie · on Oct 8, 2016

They were extremely limited in the resources they could bring to bear on the genome.

Sequencing was expensive and complicated, so even by sequencing several people the best they could do was make one composite image of the whole genome. Representing things as a graph would have provided no benefit at the time, and so people didn't do it.

Now we have thousands of public sequenced genomes. We need a new model for how we manage genomes. We are also learning how much is missed in the standard linear model of the genome, and need a way to incorporate new information into the reference. This all takes time.

Graphs are a good ways of encoding using our prior knowledge of genomes, but they are also difficult for researchers who have grown up on linear systems to understand.

has2k1 · on Oct 8, 2016

Simplicity first. How do you start thinking of a complicated data model when collecting one sample is nearly 100 million dollars? A good data model develops when you know what questions you want to ask of the data.

Looking back and expecting otherwise is probably a case of hindsight-bias. Also worth emphasising is that the technology (and methodology) is moving very fast.

oldmanjay · on Oct 8, 2016

There are some sophisticated answer being given, but the simplest answer is that everything is obvious once it's been explained. It's a good principle to remember.

jhbadger · on Oct 8, 2016

There has always been a fight in genomics between bench-types and computational biologists. Traditionally molecular biology yielded very binary yes/no answers requiring little mathematical analysis -- even when I was in grad school in the 1990s, I had a professor say in all seriousness "If you need statistics to understand the results of your experiment, you did the wrong experiment". Things are changing and many of the current generation of grad students are becoming hybrid bench/computational biologists, which is a good thing.