I've worked in the area of clinical genomics using whole genome sequencing. Your...

dunk010 · on May 7, 2015

Yes indeed. And new tools are being written - see the Adam project for an interesting example: https://github.com/bigdatagenomics/adam and the associated variant caller Avocado: https://github.com/bigdatagenomics/avocado. Others are also trying to get the old tools working on Hadoop, for instance Halvade: https://github.com/ddcap/halvade/wiki/Halvade-Manual, Hadoop-BAM https://github.com/HadoopGenomics/Hadoop-BAM, SeqPig: http://seqpig.sourceforge.net/, and the guys at BioBankCloud: https://github.com/biobankcloud. It's going to take quite a while for this stuff to get fleshed out, and for researchers to adopt it. But the sheer weight of data is going to force things in the Hadoop direction eventually. It is inevitable.

a8da6b0c91d · on May 6, 2015

What hard evidence is there that genomics is relevant to cancer treatment, as proven by survival rates? Color me skeptical.

Gatsky · on May 7, 2015

You are right to say this. I treat breast cancer, and I'm doing a PhD on breast cancer genomics, and there is no evidence that high throughput data of any kind, whether it is genomics, transcriptomics, epigenomics, proteomics, metabolomics etc-omics actually helps patients. At the moment, a small panel of biomarkers using technology that is at least 20 years old is all we use to make treatment decisions. Is it adequate? Certainly not, but there is a HUGE amount of carefully collected data in many thousands of patients backing it up.

Not sure who is downvoting you, but they seem to have swallowed the hype wholesale. At the risk of sounding gratuitously negative, I find the discussion of medicine on HN to be of very poor quality, markedly below the general standard.

tom_b · on May 7, 2015

I think there is a distinction to be made here in questioning patient outcomes and questioning the relevance of genomic sequencing in treatment decisions.

Don't you think it is fair to say that high throughput data (whole genome sequencing with variant calling) is still in a state of being evaluated to measure its effectiveness in aiding the treatment decision process but that early results seems to lean towards it becoming part of the standard diagnostic approach?

Genomic sequencing and patient outcomes is a thornier question. My non-practitioner take is that it is too early to tell scientifically, but that there will probably be some benefit to early identification of specific cancer types and choosing treatment. But I think many people would have made a similar statement about mammography and early detection, and absolute mortality appears to not be reduced by adding mammography to the diagnostic procedures, right?

The research value of genomic sequencing seems high enough to make it worthwhile. At least, when I sit in on molecular tumor board reviews (the oncologists at a table looking at called variant results for a specific patient), I hear them commenting about possibly new and unknown variants being of research value.

I am really looking forward to your reply - Internet message boards in general have to be almost the worst way to discuss medicine, but having participation from researchers and practioners like you is tremendously illuminating!

Gatsky · on May 8, 2015

I define genomics as the unbiased interrogation of the genome using high throughput technology. Sequencing one mutant locus using Sanger sequencing does not fall under this definition - I don't think IBM's business model is using Watson to interpret that. So when other people point out that HER2 is a useful genomic marker they are missing the point - HER2 can be determined with immunohistochemistry for example which has been around for 50 years.

I'm not sure what your question is... genomics has research value, for sure, it's great. Is it worth trying to incorporate it into routine care? Yes, probably, if you have enough cash. Should a hospital pay for a black box machine learning algorithm to make recommendations from a highly polluted, often erroneous and hugely incomplete literature corpus? The alternative put forward by people actually doing the science is that we should try and develop large open source databases/repositories about the significance of genomic findings, and then collect the data about what happens to the patients.

tom_b · on May 6, 2015

Hmmm? Genomic breast cancer subtypes that each respond to different chemotherapies?

http://www.nature.com/nature/journal/v490/n7418/full/nature1...

a8da6b0c91d · on May 6, 2015

That doesn't really say anything about proven treatment efficacy.

tom_b · on May 6, 2015

Some information is hidden away in supplemental table 6, which points out candidate drugs to affect different biological pathways for different mutations.

You could also skim

http://www.nature.com/nature/journal/v406/n6797/full/406747a...

for more information about genomic classification of breast cancer.

From a treatment prespective, I would say that just glancing at http://ww5.komen.org/BreastCancer/SubtypesofBreastCancer.htm...

would provide information on treatment decisions generally made by finding appropriate subtype classifications.

I think that it is pretty clear that genomic sequencing of patient normal and tumor tissue to find mutations is going to be standard-of-care sooner rather than later, but it is fair to point out that genomic sequencing is not currently standard-of-care. However, I know of studies currently underway that look at variant calls and the possibility of taking action on those calls in ways the involve specifically adding those results back into the patient medical record.

I am struggling a bit with how to phrase this, but I don't think you can argue against (1) different subtypes of breast cancer are separate diseases and can be classified by genomic sequencing and (2) treatments for these separate diseases are different and have different efficacies.

davecap1 · on May 6, 2015

Cancer is a disease of genetics. You need to know what kind of cancer someone has in order to choose the treatment.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4032883/

dudurocha · on May 6, 2015

Look for BRCA1 and BRCA2 genetic testing and therapy.

dekhn · on May 7, 2015

Herceptin. Trastuzumab inhibits the effects of overexpression of HER2. If the breast cancer doesn't overexpress HER2, trastuzumab will have no beneficial effect (and may cause harm).

he original studies of trastuzumab showed that it improved overall survival in late-stage (metastatic) breast cancer from 20.3 to 25.1 months.[1] In early stage breast cancer, it reduces the risk of cancer returning after surgery by an absolute risk of 9.5%, and the risk of death by an absolute risk of 3% however increases serious heart problems by an absolute risk of 2.1% which may resolve if treatment is stopped.[2]

nl · on May 6, 2015

Oh look! Watson can answer that question: https://youtu.be/UFF9bI6e29U?t=2670

(ok, it's not exactly that question, but you can see how it works)

1971genocide · on May 6, 2015

Multi Terabytes of RAM ?

If it was any other field in computer science people would be really critical of your methodology.

The size of the human genome is 21 MB.

If you are trying to find the co-ordinate of every cancer cell in a human body then sure, You need a lot of RAM.

But the output of the collective field of cancer research doesn't seem to be there yet. So why do you need so much RAM ?

Usually when your problem becomes NP-hard. You switch to simpler models. Have you checked the search space for all simpler models ? Or are you sticking to complex models since it helps you publish papers ?

You also need to understand that hardware only gets you so far, running a cluster has its own costs - network latency. Most often than not, better techniques are required, rather than than say the tremendous improvement in computational power is not good enough.

toomuchtodo · on May 6, 2015

> The size of the human genome is 21 MB.

No.

> In the real world, right off the genome sequencer: ~200 gigabytes

> As a variant file, with just the list of mutations: ~125 megabytes

> What this means is that we’d all better brace ourselves for a major flood of genomic data. The 1000 genomes project data, for example, is now available in the AWS cloud and consists of >200 terabytes for the 1700 participants. As the cost of whole genome sequencing continues to drop, bigger and bigger sequencing studies are being rolled out. Just think about the storage requirements of this 10K Autism Genome project, or the UK’s 100k Genome project….. or even.. gasp.. this Million Human Genomes project. The computational demands are staggering, and the big question is: Can data analysis keep up, and what will we learn from this flood of A’s, T’s, G’s and C’s….?

https://medium.com/precision-medicine/how-big-is-the-human-g...

sgt101 · on May 6, 2015

Also the world of genomics has done fantastic work on compression and if you can compress it further you will probably win a decent award with a ceremony and free booze.

jarvist · on May 6, 2015

Scientific computing requires a lot of memory, and a lot of computer time. I think it's fair to say that the underlying libraries (LAPACK,ScaLAPACK, Intel's MKL) are the most intensively optimised code in the world. Most of the non trivial algorithms are polynomial in both time and memory.

I suspect this Press Release is hinting at a next-generation (cheap, fast) DNA sequencing method. These are derived from Shotgun Sequencing methods, were hundreds of gigabytes of random base pair sequences are reassembled to a coherent genome. The next-generation methods realise cost savings by an even more lossy method of reading smaller fragments of the genome, with much greater computational demands to reassemble.