Hacker News new | past | comments | ask | show | jobs | submit login
30 years since the Human Genome Project began – what's next? (wired.com)
124 points by oedmarap on Jan 4, 2021 | hide | past | favorite | 42 comments



Shockingly the human genome itself has not been fully sequenced, despite the human genome project completing years ago [0]. There are difficult to map regions of the genome, some of which are interesting. Only recent advances in long read sequencers have helped to solve some of these issues [1].

For the future to truly be amazing with one sequencing the lab prep, chemistry, and equipment required needs to advance. Oxford Nanopore has some advancements here [2] but it's still a ways to go before you could have a sample prepared as easily as an ultrasound or x-ray.

[0] https://www.statnews.com/2017/06/20/human-genome-not-fully-s... [1] https://www.ecseq.com/support/ngs/are-there-regions-in-the-g... [2] http://nanoporetech.com/products/voltrax


The Telomere-to-Telomere consortium has made fantastic progress on this in the last year or two: https://genomeinformatics.github.io/CHM13v1/

Thanks to them we now have a nearly completed genome, only missing the deconvoluted rDNA array segments (~12mb or so, we know the sequences since they’re basically identical but no one has accurately placed the individual array variants yet).


I wonder what if genes are only like "functions" and inputs still come from environment. This makes it easy to not change that function often as the output changes based on input received from the environment. While only in very rare cases the function itself needs to be modified or simply put inside another function to give the function inside access to more environmental inputs.


Indeed, DNA controls the conditions of its own expression using sequences called promoters (and other moving parts). Follow your curiosity: https://en.wikipedia.org/wiki/Promoter_(genetics)


Progress in human genomics is often hidden from view. Eric Green gives some great general pointers, but here are more concrete applications:

- NIPT vendors (Natera, Ariosa (Roche), Verinata (Illumina), Sequenom (LabCorp) and others) have successfully commercialized carrier and prenatal screening for severe genetic defects.

- CareDx has commercialized pre- and post-screening for organ transplant rejection.

- Invitae, Color, Foundation (Roche) and others are scaling genetic testing, specifically clinical exome and panel sequencing (and solving the reimbursement problem in the process, by amortizing the cost of sequencing a patient's DNA across multiple physician-requested tests). Because variants need to be understood by physicians, this also enabled public/private data sharing partnerships for variant characterization (ClinVar).

- Grail (Illumina), Natera and others are on the verge of commercializing routine cfDNA cancer screening.

And here are more fundamental improvements which may yield more successes like the above:

- Oxford Nanopore has provided much needed competition to Illumina in lowering the barrier to portable sequencing and de novo whole genome sequencing.

- The ENCODE project has used ever more sophisticated assays to characterize functional regions of the genome.

- 10x Genomics has scaled and productized new ways to do single-cell transcriptomics, enabling breakthroughs in understanding functions of genes (by mapping differences in their expression across cell types and conditions) as well as development of CAR-T cell therapies (training and selecting the patient's own immune cells to fight cancer).

- UK Biobank has provided privacy-preserving access to a large cohort of annotated genomic data from the NHS, enabling more powerful association studies between genes and disease.

One of the bigger balancing acts in this industry is the monopolistic tendency of Illumina, Roche, and a few other big players to buy everyone up. This has clear anti-competitive effects but also incentivizes a lot of startups looking for that exit.


There was an interesting comment made on the recent "Bio Eats World" episode [0] that genetic tests are not aligned with health care reimbursement cycles.

Jorge Conda [0] cited two reasons that genome sequencing (Different from 23andMe's GWAS, for example). First he cited the high upfront cost to buy the machines (Usually meaning you need to be a big hospital). The second fact was a bit more interesting, which is that doing one expensive test (Genome sequencing) which then your health care provider could do cheap queries against ("Which other users have your symptoms and similar mutations in key genes" for example) does not align with the current billing model.

I think that's the "What's next". Finding a way to effectively bring genomic care to the general population, to allow for better research into genetic conditions. If you're interested in how "big" this can be, here's a case where "This American Life" covered how just a few genetic differences was the difference between an olympic athlete and a muscular dystrophy patient [2].

There's also the bio-terrorism and pandemic response angle, which is why the DoD is investing in third-generation sequencing systems where portability has finally become more of a priority.

[0] https://en.wikipedia.org/wiki/Knome [1] https://a16z.com/bio-eats-world-podcast/ (About 13 minutes in) [2] https://www.propublica.org/article/muscular-dystrophy-patien...


There are a handful of sequencing-based genetic tests that have achieved fairly wide penetration; NIPT and cancer panels are probably the best examples.

However, if the question is why whole-genome sequencing has not gained wider clinical usage, I think the answer is less about reimbursement, and more the fact that there are just very few clinically compelling reasons to sequence a whole human genome.

Then there is the fact that although we have ostensibly achieved the fabled "$1000 whole genome" that was touted as the tipping point for clinical acceptance, that was really more of a publicity stunt by Illumina. In reality a clinical-grade WGS still costs a multiple of $1000 (see, e.g., https://bmchealthservres.biomedcentral.com/articles/10.1186/...). The Moore's Law-like cost reductions of genome sequencing are a bit of a myth at the present time; in practice Illumina has a monopoly on the technology used for clinical WGS, and therefore they have a great deal of influence over the effective cost of sequencing.


very few clinically compelling reasons to sequence a whole human genome.

Isn't that a bit of a self-reinforcing problem, where the reasons to do something can't be developed until the thing has been done enough to find more reasons?


It is. That's why much of the WGS action these days is not is sequencing individuals in the clinic, but rather in large-scale population studies like UK Biobank. These are aimed at identifying interesting genotype-phenotype relationships, for which you need massive sample sizes. This approach probably will bear fruit eventually, but the outcome won't necessarily be that everyone gets their whole genome sequenced; rather, the discoveries will be translated into the clinic via either (1) targeted tests that assay specific variants discovered via the population-level studies, or (2) drugs developed on the basis of gene-disease associations.

This chicken-and-egg problem is also the reason why Illumina has invested massive amounts of money in companies like Helix and Grail, which are basically highly speculative attempts to find a problem for which loads of Illumina sequencing is the solution.


If I want to sequence my own genome are there any services that do it already today with any reliability?


I think about the best you can manage (with or without having a doctor order the test for you) is a Whole Exome Sequence[1]

1: https://www.verywellhealth.com/whole-exome-sequencing-458268...



A lot of that cost is due to inefficiencies by the institutions doing the sequencing. My current employer spends much more than my previous employer spends in order to sequence & analyze a single sample and the only substantive difference is in the process itself.


One thing that surprisingly hasn't happened in the 30 years since the Human Genome Project is a Nobel Prize for the work. There are examples from large physics projects where the leaders were awarded the prize. Surely Francis Collins and perhaps Craig Venter are deserving.


Francis Collins and Craig Venter certainly got a lot of the publicity, but arguably there are others who were more influential. E.g. Leroy Hood who developed the first automated DNA sequencer, and Mike Hunkapiller who led the commercial development of the technology at Applied Biosystems to the point where it was practical to sequence the whole genome. Hunkapiller was also the impetus behind the formation of Celera, though Venter was its leader.


It's really unclear what's next for the HGP. While HGP and many other sequencing projects have been invaluable to academic research, and it's truly useful for a number of diseases, the main result of the HGP, in my mind, is that it made clear to everybody how much harder the genotype->phenotype problem was than what the geneticists who set up the HGP anticipated.

Medically speaking, there isn't enough evidence to support the cost of doing WGS for individuals in most circumstances, or even storing large amounts of WGSs to do large-scale population-level analysis.


There are a number of successors to the human genome projects, which to various degrees of success have mapped the epigenetic landscape around protein coding genes.

However, your point is pretty spot on: what's the medical value? Having really high resolution epigetic maps doesn't translate into better clinal results, and it's not even clear that the results of these studies are looking at anything but statistically confirmed artifacts!


The HGP definitely had value, but almost all of the "massive" data collection projects in biology have since been basically useless, except for the consortia who got paid to do this. Leave medicine, no one I know of even uses any of these datasets even for research if at all they're made available in a browsable format.

Only exceptions I've seen are the cancer cell line encyclopaedia from broad and the c. Elegans rnai projects.


The HGP likely sped up the COVID vaccine - by making DNA sequencers and printers cheap. Viral genomes are typically obtained by taking a swab from a human, mapping the reads to the human genome, so only viral sequences are left.


A thousand different technologies were required to make the COVID vaccine. Having the HGP helped, that's for sure. My main complaint is that the HGP leaders basically sold the public and government on a list of accomplishments that never materialized, and yet the underlying data has been invaluable in a wide range of different health problems that they didn't think it would be useful for.


DNA Printers, or a technique known as "solid phase synthesis" is a different technology that was used by the HGP, but not developed or advanced by them.


I remember using these 20 years ago to make RNA (they can make RNA as well as DNA, if you add the right reagents, which IIRC were obscenely expensive). They'd run with various pumps and relays clicking. I got 40 OD of a single RNA duplex, and proceeded to... wash it down the drain, since I was an idiot in the lab.


RNA is a fickle beast!


By "making... cheap" the mechanism I meant was it drastically increased the market size for genomics, and thus ushered in economies of scale in these technologies.


The problem to tackle, IMO, is medical privacy and intellectual property.

We've created a system where the only sane way for a biomedical company to behave is to patent, copyright, and hide as much medical information as possible. Further, we've made it REALLY hard to share such medical data in the first place.

Now, I get WHY we do this. There are a lot of good reasons (privacy and insurance issues come to the top of my mind). However, we should be working to eliminate those reasons as much as possible.

What we need to advance human understanding of medicine is a national database of medical data that's publicly available.

Ideally, We'd record every measurable aspect about a person in this database and update it throughout their lives (when did they get vaccinated? What allergies do they have? Did they get any cancers?)

That sort of a database would make it really easy to start mining for treatments, correlations, etc. It'd even have some positive benefits like removing the need for every doctors visit giving you a form with 50000 questions you've answered a million times before.

But I get why we don't do this. We don't do this for fear of police overly relying on genetic information to "prove" someone committed a crime (We found your DNA at the scene, you must be guilty). We don't do this out of fear of Insurance companies exploding their rates when any sort of marker comes up (Oh, you've got Gene XYZ that means you'll probably die sooner, so you get a higher rate or we won't insure you). We don't do this because of issues around discrimination (Oh, you have (had?) HIV? You must be gay and we hate gay people here).

But, man, do I wish we could somehow shape society so some day we could do it. It'd have the potential for so much good.


a glimmer.

https://ncats.nih.gov/n3c/about

As an optimist I like to think that this could be a wedge in a crack Covid exposed in our (US) collective medical/insurance dis-function.

With dissimilar records harmonized across many institutions under the same roof and levels of access to the datasets for partners who did not "provide" records to the pool (you)

we are testing the theory, my hope is before Covid is over the genie is out of the bottle and cant be stuffed back in for the profit of a few at the expense of many.

But you folks reading this who are able to apply disparate strategies to reasoning over large complicated data sets...

            PLEASE DO !!!


Nice post, but what does that have to do with the Human Genome Project?


The GP post highlights a "blocker" on where to go with the Human Genome Project ... integrating existing traditional data and "personalized medicine" i.e your genome at the scale of global statistical significance v.s entrenched interests and a history of bad behaviour by a few bad apples.

fantastic rewards, fantastic risks, inevitable whether you work on it or not.


https://www.humancellatlas.org/ is pretty exciting...


Well, there's also the Human Microbiome Project: https://en.wikipedia.org/wiki/Human_Microbiome_Project

Although if you thought the genotype -> phenotype problem was hard, the microbiome -> phenotype problem is likely several orders of magnitude harder.


Bioinformatics involves a lot of scripting and loosely coupled pipelines, and at the time of the HGP, there was no experience on how to do this. Here's the story of how Perl become so popular in bioinformatics, and arguably, "How it saved The Human Genome Project"! https://www.foo.be/docs/tpj/issues/vol1_2/tpj0102-0001.html


This is still true, but things are getting better. I'm a few years removed now, but I had the priviledge of getting to admin most major sequencing machines and their data output, and it was a great learning experience for how much the scientific community stands to gain from modern tech stacks imho.

One of my favorite examples, I was doing some fastq munging, and had written about a page of perl as per some existing documentation. I kept failing, and eventually just emailed the researcher who wrote the relevant paper... and he said something like "why not use awk, something like $oneliner?" With a few modifications my huge glut of perl became a one line awk...

Of course that just means I'm bad at perl, but for some reason it sticks in the mind. I still get invited to bioinformatics conferences... too bad I didn't take them up on it pre-pandemic.


We need to leverage our current knowledge to deliver actionable results to regular people. Data generation is pretty reasonable from the standpoint of time and costs, data storage who owns it, has rights to it, and how to deliver incites from that data to regular providers (see family practitioners) that aren't deeply knowledgeable about genomics and their interpretation.


> We need to leverage our current knowledge to deliver actionable results to regular people.

Sure, Gavin Belson, what are our OKRs for that?


I know very little about this. Is it possible to get my sequence from one of the commercial groups doing this, and then "look it up" in a genome reference to verify my brown eyes, for example?

I acknowledge the lack of knowledge leading to the question. But it would be neat if possible.


> Is it possible to get my sequence from one of the commercial groups doing this, and then "look it up" in a genome reference to verify my brown eyes, for example?

Yes and no.

Brown eyes aren't controlled by a single gene, so it's not straightforward to look this up from your genome profile.

What genome profiling companies will do is assay a finite number of specific locations on your DNA (SNPs), to get good comparison points with other people, and then predict what colour your eyes are based on which SNPs are significant for eye colour.

It's not guaranteed to be right, then - nor is it guaranteed that the significant SNPs are linked to relevant genes.


What genome profiling companies will do is assay a finite number of specific locations on your DNA (SNPs)

Companies such as 23andMe will read about 900 000 single letter locations for about $100. But other companies will sequence your whole genome for about $1000, maybe under.

https://en.wikipedia.org/wiki/Personal_genomics#Cost_of_sequ...


The most recent article I can find suggests this works better on people with European and specifically Dutch descent, where training data comes from.

https://bmcgenomics.biomedcentral.com/articles/10.1186/s1286...


They could work on tackling the epigenetics in human cells along with the differences of the genome between different cells in different organs. It's as big as the original project probably.


No mention of computed protein folding?


That is because the article is about genomics. Not molecular biology in general.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: