Hacker News new | past | comments | ask | show | jobs | submit login
$200 retail whole genome sequencing now available (dantelabs.com)
197 points by apsec112 on Nov 23, 2018 | hide | past | favorite | 166 comments



These guys seem and act like a pyramid as pretty much nobody has received their DNA data. They don't have much info about the company. Even if people get data, the huge delay justifies my suspicion - the new victims pay for the difference, and this will run until there are no more stupid people who are willing to pay Dante Labs despite all the negative reviews.


There seem to be quite a few reviews showing people who received their data (with significant delay).

More worry is that they sent out used kits to a number of customers!

https://www.cnbc.com/2018/06/15/dante-labs-dna-testing-compa...


Nothing about many delayed reviews is inconsistent with a pyramid scheme.. The first 1/10 or 1/100 of customers might be a lot of people.

But would reviewers detect if the results were not really theirs and just from someone that matched on a few data points either from their forms or from a cheaper test?


I guess only if they were really neglectful, e.g. giving a asian guy the dna of a caucasian man


I posted a review on Amazon, which got removed. I said that I'm going to compare my Genos exome to what they give me.


I'm that guy from the article and I still have not received any data..


My kit arrived without a return label. So, this further delayed processing as I needed to contact them, they are slow to respond, and this pushed 2 weeks extra.


Pedantry: That's more of a Ponzi scheme, pyramid would be if the buyers got more people to sign up so they could get their DNA results or money or whatever.


Sorry, I forgot about the Ponzi scheme - in my country, all these are called pyramid schemes.


They do take longer than they claim, but I eventually got my info: summary information after about 3.5 months and full data on an SSD after about 5 months (measured from purchase date).


Yeah, first lucky ones will receive it. Also, this could be a non-standard Ponzi. For example, they start the company with zero money, they wait for more people to order so that they can buy new equipment, i.e. it's not an honest delay, but engineered delay to use interest-free funding.


Hello, Andrea from Dante Labs here. As stated, we have had issues. Please see below for real user stories: https://us.dantelabs.com/blogs/patient-stories/why-i-recomme...

You can find more reviews on our Facebook page


As others have said, the real issue is how they get all the way down to $200. Sale of personal info?

Maybe when we get a clear answer, I'll consider checking my genome. But not before.


The privacy policy [0] suggests they're not selling personal information, and they make it easy for you to protect your information too. They do keep it for internal analyses and (maybe?) marketing purposes, but suggest that you can just ask them not to, and they won't.

> You may withdraw your consent to participate in Dante Labs Research at any time by contacting Dante Labs at the email address: contact@dantelabs.com. Dante Labs will not include your Genetic Information or Self-Reported Information in studies that start more than 30 days after you withdraw (it may take up to 30 days to withdraw your information after you withdraw your consent). Any research involving your data that has already been performed or published prior to your withdrawal from Dante Labs Research will not be reversed, undone, or withdrawn.

I'm not quite sure what to think.

[0] https://us.dantelabs.com/pages/privacy-policy


Also from their privacy policy;

Dante Labs will not sell, lease, or rent your individual-level information to any third party or to a third party for research purposes without your explicit consent.

So they will sell your data unless you specifically look for, find, and check the opt out. What happens to your data after that is out of Dante Labs control, so any other assurances they make are irrelevant.


Are you just assuming there's a deceptive opt-out? They explicitly say they're GDPR compliant and will not share your info without explicit consent. It's a strong claim that they're actually lying and non-compliant, without any evidence.

My assumption, since this promo is for a few days only, is that it's like any sale; they're doing it to build their brand and attract new customers.


without reading the policy, I'd bet money there is a clause saying one or more of

1 - in the event of a sale, your data may be among the assets transfered

2 - this policy is updated from time to time and comes into effect at the time of posting, and maybe we'll notify you ahead of time


According to GDPR, in case 1 they can transfer the data to the acquiring entity only if they are GDPR compliant too, and in case 2 all data sharing must be opt-in, not opt-out, so they can’t increase the scope of sharing without notice and affirmative consent.


Doesnt that allow them to sell it for purposes _other than research purposes_?

Could they still sell it as marketing data?


Rule of PPs: The longer the document the less protection it offers.


Doesn't "explicit consent" imply opt-in?


note this doesn't prevent law enforcement from seizing the information, or them auctioning off to the highest bidder when they're being liquidated.


> or them auctioning off to the highest bidder when they're being liquidated.

The GDPR does, actually. The data can only be transferred to a GDPR-compliant company, and the user must agree to the transfer.


They use sequencing tech from Complete Genomics[0]. Which in turn is owned by China based BGI. The tech is the brainchild of Dr Drmanac[1]. Note that the $200 price tag is for one of those Whole Genome Sequencing with lots (i.e. 1 error per million base pairs) of alpha and beta errors. A "perfect" sequence is waaaay more expensive.

[0]: http://www.completegenomics.com/ [1]: http://www.rdrmanac.com/


Interestingly they say Q20 >90%, Q30 >80%, on 100bp reads.

That’s much higher than I’ve seen reported for prior Complete/BGI datasets. Recently lower error rates have been reported, but I’m surprised to see them this low.

If this is true, it’s comparable to the error rates on Illuminas older instruments (the GA1s and 2s at least, if not Hiseqs).

Q20 is one error in 100. Q30, one error in 1000. I think Illumina average error rate (Novaseq) is >Q30 now.

Veritas Genetics were also offering a 200USD genome a while back. I think that was Illumina sequencing, but it was a very limited time offer.


> Veritas Genetics were also offering a 200USD genome a while back. I think that was Illumina sequencing, but it was a very limited time offer.

Yes, that was on Monday Nov 19th. Veritas offered 1,000 genomes (Illuminca sequencing) at USD 200. They sold out in 6 hours.

* Disclaimer: I work for Veritas.


Why would it not be possible? Considering how quickly the price has been falling [0], I would assume the price to be fairly close to $200-$300 by this point. Some are even targeting $100 soon [1]. It's also worth noting that this is an extreme sale, so they may be running at a loss or no profit, mostly for brand building. I definitely hadn't heard of them before today.

[0] https://www.genome.gov/27541954/dna-sequencing-costs-data/

[1] https://techcrunch.com/2017/01/10/illumina-wants-to-sequence...


Yeah... it unfortunately doesn't work like that. Illumina is notorious for glossing over many of the finer points around how they get to the fictitious numbers they claim. What they don't point out are all of the reagents, prep kits, overhead, lab costs, labor costs, software costs, pipettes, etc.

The only way this company stays alive without significant venture funding (which it appears they don't have) is through partnering with pharmas. From reading their privacy policy, it looks like they are de-identifying genotypic and phenotypic data from users. This, at scale, creates a nice population based genetics study so that if I'm a pharma, I can look at a specific biomarker and all data associated with it. That helps me build drugs against certain genes, diseases, etc.

So yes, you are the still the product and yes, your data is being resold.


If data is anonymized properly, aren't we just enabling the creation of better, more effective medicine?


Define "properly".

At one point, we thought that simply replacing a user's IP with a unique ID was enough to anonymise.

At one point, a blur or pixelating filter on a photo was sufficient to anonymise it.

At one point, Monero transactions were considered anonymous.


There is nothing that identifies a person more precisely than their genome. It is, after all, unique to each person.

That means that by definition, anonymizing genomes is impossible. If someone tells you they will 'anonymize' your genome data, run away, they don't know what they are doing.


You can't anonymize your DNA.


Illumina’s margins on consumables are very high:

http://41j.com/blog/2018/11/illumina-consumables-are-90-prof...

Their overall gross margin is >60%. I suspect that instrument margins are actually also quite high, but they factor in R&D spending on instrument costs.

However, Dante labs appear to be using BGI, not Illumina for sequencing.


They’re using BGI machines, not Illumina


They don't have their own labs based on how they phrased it within their About Us section. Given that + the turnaround time I'm seeing on the web, I'm willing to bet they outsource this to universities who have sequencer downtown.

Also, in their About Us section they have every technology listed under the sun (Illumina, PacBio, Thermo, BGI) so I think it's just whatever they can get their hands on based on which labs have open runs. Seems quite questionable from a data integrity perspective if you're trying to do population based genetics. The quality between sequencers varies dramatically.


Given the error rates shown for this product it seems unlikely this is anything but BGI sequencing.

The BGI (MGI) have been pushing out new instruments, reportedly with lower error rates (and much cheaper than Illumina).

They’ve also been building out service centers outside China, and I suspect this is where the sequencing being done by this service is located.

Universities do sometimes offer sequencing as a service, but I suspect not at as low a price point as the BGI, even for those universities with attached genome centers.


Good catch on the error rates. Didn't see that.

We go up against universities all the time in our competitive deals. They're generally always cheaper since everything is subsidized however the turnaround times are usually significant and the quality is not entirely repeatable. The cost here looks heavily subsidized just as a way to spark interest.


Real numbers for Illumina sequencing currently hover at around 600 USD per 30x coverage human WGS (and this is pure sequencing cost, excluding data storage and analysis). Prices are not dropping since Illumina has no real incentive at the moment. That said, Dante Labs is using a competitor (BGI) who claims to be doing cheaper sequencing (at reduced quality).

Even so, Dante is currently definitely operating at a loss. That’s not a problem though since their business model probably isn’t really consumer genetics. Likely, their actual business model (like that of every major player in the field) is gathering large panels of sequence data to either analyse themselves (and then sell the insight to pharmaceutical companies) or to sell (pseudonymised or aggregate) access to companies. So, in reality, they are preparing large-scale biomedical studies. And not only are they not paying study participants (their ostensible “customers”), they are actually getting the study participants to pay for their participation.

(Note that there’s nothing necessarily nefarious about this whole business. It just explains why they can offer their product to consumers “at a loss”. In reality it’s simply an investment.)


>As others have said, the real issue is how they get all the way down to $200.

Wasn't Dubai trying to collect DNA on every citizen and visitor to their country?

Not saying this company is in anyway connected to Dubai but there are any number of reasons why a company might want to amass a large sampling of DNA and, although not a lawyer, I can think of some plausible ways to use that data too.

If they burn through any VC money they have amassing a database, then file bankruptcy or sell off their assets to satisfy debts before dissolving... "hey other company, that's totally not us or our investors, we gotta pay our taxes so we can dissolve, wanna buy some genomes wink wink?"


It's a short term promotion to build interest; I doubt they're expecting to make a profit at $200.


This is a huge red flag for me. Maybe if you made your own bootleg flow cells and reagents? But even that seems like a stretch.


They’re not. They are using BGI’s MGISEQ2000 machines. They simply sell the sequencing at a price that doesn’t cover its cost. Both because this is a promotion and because they retain the rights to using your genomic data in research: it’s simply a cheap way for them to get samples.


DIY microfluidics is possible with shrinky-dinks circa 2014; but for-real genomics is a little harder TBH


Is there anything more to it than carving a piece of silica and ligating the adapters to the surface? I mean, I know the process is a bit more involved than that but the patents make it seem relatively straightforward.


It's straightforward in a conceptual sense, but as with most things there are a ton of gotchas and fiddly implementation details, especially if you're doing it in your kitchen/garage. A lot of the difficulty comes with the tedium of maintaining sterile conditions and lots and lots of pipetting.

If you're really interested in DIYBIO there are subreddits and many tuts out there, but fair warning: getting started is easy, but the devil is in the details a heck of a lot more than learning to code.


If I recall, some friends signed up with Dante and sent in their sample more than six months ago and never heard anything more..


There is also some feedback in the Reddit thread [1] and it is mixed, though someone claims, that got full refund.

[1] https://www.reddit.com/r/promethease/comments/9866da/dante_l...


I passed the 10-week mark and still nothing. If I don't get my data in 2 more weeks, I'm contacting Amazon and FTC.


This should really be the top comment, and it's amazing that after 6 hours it isn't. Who cares about privacy if the whole thing is a simple scam?


While I agree that this is slightly worrying, in mitigation they are using BGI for the sequencing, and BGI are known to be extraordinarily busy right now. Our lab sent off 150 genomes to be sequenced by them, and they delayed by quite a few months. Hopefully that should be sorted out soon. The data we got back was very good quality.


Their ad copy is also riddled with errors. It's not exactly confidence inspiring.

>Receive actionable insights based on solid genetic and clinical evidence for your and your doctor.


As long as the A, C, G and Ts are in the right order.


Interesting to think about how you can ensure your anonymity and still take advantage of a cheap genome!!

Some preliminary ideas:

- misidentify yourself (eg get a few friends and submit each other's samples)

- mix your DNA with a mammalian organism (something like a mouse). It will get flagged as a bad sample by them probably, due to poor alignment metrics and too many variants, but with the knowledge of which DNA you put in you can do a dual alignment to eg. the mouse and human genome and purify the reads that came from you (there are published methods to do this kind of thing). Not sure how you would get an equimolar mix of mouse and your own DNA, but for $200 you could submit multiple ratios I guess.

- enzymatically modify your DNA. There are some enzymes which cause characteristic mutations. They will turn up as real variants in their pipeline, but you can use the sequence context to filter them out. Kind of like chemical code obfuscation. This would mean extracting your own DNA and treating it, probably not that feasible.


1) yes 2) no 3) no

The first one is the best because if enough people did it, it would create enough doubt in the accuracy of the stored datasets that they wouldn’t be worth anything to anyone except the person who owns the DNA. The problem is, as this becomes more popular more people related to you are going to get it done, meaning they will be able to automatically fit you into an identity based on who else had it done, their known identities and just the similarities between their DNA and yours. Trying to maintain anonymity only works if the data set is small. It looks like there is a market for a completely discrete secure dna sequencing company that doesn’t keep a copy of your genome after the fact and sends you a copy that is encrypted. Also it might be feasible to have some kind of kit that divides up the chromosomes into groups to be sent to different labs.

It can’t be that difficult and expensive to do this. Why not just wait until there is a kit you can buy to do it directly from your PC if you are concerned with privacy?


I wouldn’t rule out 2. It works quite well, the dual alignment method is used in the research setting for human tumours grown in mice for example. Of course, a very motivated adversary would work this out, but you could use more unusual organisms if you have a reference genome for them.

The problem with 1 is that there may be significant downsides to misidentifying your genome eg you swapped with someone who commits a crime.


[2] wont work on a whole genome because of the many differences present between mammals after millions of years of evolution. The "alignment" of a small sequence may be confused between human and mouse, but not longer sequences and definitely not whole genomes (alignment using a tool like BLAST or BLAT).

I wrote a summary of genetic information discussing this very idea. See the PDF under "What is Genetic Info?", read the Phylogenetics section:

https://www.geneinfosec.com/more-info/


No this is not correct, look up modern whole genome pipelines. Standard read lengths are 75-150 base pairs. Whole genome alignment has never been done with BLAST or BLAT (this would take an extremely long time). BWA is most commonly used. Running the standard data pipeline on a mouse human mixture will cause many mouse sequences to be aligned to the human genome.


Identifying and removing contaminants is a routine step in DNA-seq analysis. They might still flag and discard the sample internally but treating it would be trivial.


This project was made exactly for what you’re describing:

http://biononymous.me/


Interesting. To be honest, if someone is able to collect intact tissue from you without your knowledge, there is no way to hide. If you can control the donation then it is different.


Maybe not exactly something you can do right now, but my invention of molecular cryptography will allow genomes to be sequenced without revealing identity or health information. You are somewhat close with [3]

geneinfosec.com

Also note that DNA is readily identifiable, so misidentifying yourself does not work.


Note that for $1000 you can buy a NanoPore DNA sequencing device [1].

[1] https://nanoporetech.com/applications/dna-nanopore-sequencin...


Error rates are currently very high, and from a single run it doesn’t seem like you’d get a 1x coverage Genome, let alone 30x (which is what this service is claiming).


While I agree with you on that, obtaining the required kits that the sequencer needs is another headache altogether.

The reads lengths might be fascinating with Nick Loman's group reporting upto an Mb in size. But the error rates as well matter. Unfortunately nanopore errors are not easily mitigable, whereas PacBio's are.

The one place where nanopore has them beat is the sheer portability and ready run time.

Strains involved in 2015 ebola outbreak were being sequenced real time!


What kind of competence is required to get a report out of this that the user or their doctor can understand?


Can anyone comment on privacy for this company? I know that's an issue around 23&me


I believe in the future it will be basically free to sequence the genome from a strand of hair. Given that, is it reasonable to try protecting your genome data?


Imo yes; the ability to match dna to individuals without their consent should belong to law enforcement only. I am holding out hope for legislation that makes this work.


The ability to match dna to individuals is sadly pretty easy and even the idea that you can "de-identify" dna is the elephant in the room for things like HIPPA since dna is the most identifiable thing a person has.

Looking at things like this - https://phys.org/news/2013-03-easy-identity-cell.html

https://33bits.wordpress.com/about/

etc

Having DNA and just looking at the traits that person has will reduce the population down to a pretty manageable number. Any other bits - location, gender, etc will be enough to make an educated guess to an almost certain one. This will not become more difficult to do in the future.


I wouldn't be surprised if we trained ML models to map DNA to adult faces. It doesn't seem out of the realm of possibility.


A company founded by Craig Venter and Peter Diamandis (Human Longevity Inc) has been working on this and published a paper about it last year in PNAS [1], but it attracted some scepticism from other researchers in the field (the paper was rejected for publication by Science) [2].

[1] http://www.pnas.org/content/early/2017/08/29/1711125114

[2] https://www.technologyreview.com/s/608813/does-your-genome-p...


there is a thing called developmental noise, thats the outcome of phenotype after post expressional events further modify the expression products [proteins, RNA, MeDNA etc.] the face you interpolate from genetic sequence will be an approximation.


The biggest issue is relatives. All you need is one person relatively closely related to you to openly post their DNA on a genealogy site and someone searching only has a handful of people to narrow down from.[0]

I used 23andme years ago, using a fake name on the website and a kit ordered by someone else (a friend ordered kits). I should see if I can use their GDPR process to purge it before any relatives give it a shot and start a scandal about the presence of John Doe in the family tree.

[0] https://globalnews.ca/news/4171752/golden-state-killer-dna-g...


Just because sensitive information can be compromised doesn't mean that it should be offered freely. Yes, someone can target you and collect a DNA sample, but does that mean you shouldn't care how securely your genetic data is used and stored?


Yes, because in the future there will be laws addressing issues surrounding DNA data.

Right now, it's the wild west.



> I know [privacy is] an issue around 23&me

In what regard? I’ve not heard of any privacy related issue with 23andme, and I follow the field quite closely (professional interest). On the contrary, 23andme has been known to uphold customer privacy in the face of government requests [1]. The FTC is investigating 23andme and other companies (which is a good thing), but there is no indication that they’ve found any violation.

[1] https://www.23andme.com/transparency-report/


I believe you can opt out of them storing your data once you retrieve it.


Which is useless unless we can get someone on the inside to vouch for the fact that it's actually not stored


They explicitly state they're GDPR compliant, so at the very least if they keep customer data after saying they've deleted it and are discovered they would probably be looking at huge fines. Doesn't make much sense from a business perspective, especially given that most customers probably won't bother to have their data deleted.


Most companies at least track frequencies. We used to think that deidentified was enough, but apparently it’s not that hard to figure out who a person is if you know that much about them. (Especially if you have relatives.)

There’s been a lot of work with homeomorphic encryption, but I think that the new avenue of research using sketches to provide a sense of anonymity while preserving comparability.


The storage of the data is a not inconsequential portion of the total cost. A lot of work is done to cut costs on genomic data storage. No better way than not storing it in the first place


It's a serious issue for any service that offers to sequence any/all of your genome. AFAIK, none of them get it right. I hope to be corrected here, but I doubt it.


Has anyone used this service? The only reliable service I know of is Veritas. I never heard of this before its mention on HN right now


I know people that sent in samples and haven’t heard anything back. Use at your own risk I guess.


I cannot speak for Dante, but I have used Veritas for whole genome sequencing and only have good things to say about them.


There may be nice things to say about Veritas, but they charge an extra $99 for VCF files, and the actual raw data (FASTQ/BAM) is not available at any price (to you; Veritas has it). This is after you've paid $1,000 for sequencing.


What does VCF lack vs FASTQ/BAM?


VCF only contains variant information; FASTQ and BAM contain the “raw” sequencing information (in reality that data is already heavily processed but it’s the starting point for all usual analysis). Depending on how you do the analysis, the variant data will be slightly different. There are some current baselines (so-called “best practice” workflows) for performing the analysis but many of the details are variable, and subject to much current research.

To be fair, for most things the VCF is completely sufficient (and in fact most people won’t care even about that). It just feels cooler to be in control of the raw data (and personally if I end up using a sequencing service, I would want to perform my analysis; but this is obviously irrelevant for 99.99% of users).


Did you learn anything useful? (I'm not asking the specifics of what you learnt, but say... Is it helpful to know disease related info? Ancestry? Etc)


I didn't dive into the data. I provided it to researchers, along with my medical records, as part of Harvard's Personal Genome Project.


I would have thought this day and age a DNA sequencing company would have a big splash about privacy on their main page. I didn't see it.


If they have a copy of your data, no matter how many privacy promises, your data could still end up sold/leaked/etc or even stored accidentally and still leaked.

What we need is independent devices that don't call home. But that's not profitable.


Yeah, it's probably wise to have them delete your data after you download it.


Wise, how? Even if they promised that they deleted it, it might show in logs somewhere, out of your control.


What might show in logs, your full VCF file? Regardless, it wouldn't be out of your control. Their FAQ states they're GDPR compliant, which means they're obligated to completely remove your data on request, including from things like logs and backups. Sure, you could say they might make a mistake and keep it anyway, but they would be opening themselves up to large fines. And given that they explicitly mention their GDPR compliance, presumably they have given it some thought.


I have been recently considering getting my genome sequenced. It seems like, if one of my main priorities in my life is longevity, this would be one of the most cost-effective ways to find out what factors are likely to influence that, and change my behavior based on the results. $1000 seems steep, but if you think about it over the course of a whole life, it's not so bad...thoughts?


I think my biggest thought would be, assuming you had your genome sequenced, what could you do with it? I can think of a lot of theoretical things, but I think every inspection of that genome for something would cost money (possibly/probably more than the cost of sequencing). I may be wrong about that, though.


You can quite easily do stuff like mutation calling and differential gene expression analysis with subsequent gene set enrichment analysis for free on a standard computer. You'd need to know what you're doing of course, but in terms of costs, it's definitely manageable. That's if they provide you the raw data, though, which doesn't seem to be the case with a lot of providers.


Compared to services like 23andMe, how much more informational or actionable would a whole genome sequence be in comparison?


The genotyping technology used by 23andMe and similar efforts is a genotyping array chip. These are only capable of detecting common variants. This is great for calculating ethnicity and common traits, and even maybe some moderately rare conditions (such as the most common BRCA1 variants), but is completely incapable of detecting variants that cause rare conditions. Rare conditions can only be caused by rare variants. Whole genome sequencing can detect most of these rare variants.


exactly - how is this news-worthy now?


Hello, Andrea from Dante Labs here. I am confused why you think it is a pyramid scheme. We have had delays in the past and even other issues and we have worked with every single user to solve issues, providing customized reports free of charge


From a saliva sample? How are they going to tell the bacterial DNA apart?


It's pretty easy to tell human DNA from bacterial DNA. For one thing bacteria have small circular genomes versus huge linear chromosomes. More importantly individual sequencing for humans is heavily assisted by aligning reads to a known genome so any sequences which don't resemble known human DNA will probably be discarded.

I cant speak to whether they are actually sequencing samples, but it'd be a pretty blatant lie if they're not.


You basically take all the dna in the sample and chop it all into small pieces and all of these small pieces are read into computer memory, and then an algorithm goes through them all and merges the pieces that fit together back together into the original long form of the dna. Only pieces that have overlaps with each other will merge together, and there are many redundancies, so if any pieces accidentally fit together that aren't actually supposed to, then they are discarded, because this won't happen nearly as often as the real merge.

So maybe a bunch of human dna gets mixed up with bacterial dna - I dunno really - but it really doesn't matter because of the way they put back together.


The two basic ways to deal with bacterial contamination in NGS sequencing is to minimize it during extraction (e.g. extracting DNA from isolated nuclei, which bacteria do not have) and to clean up the raw reads by comparing them to known microbial sequences (e.g. with http://deconseq.sourceforge.net/ )

Considering they offer the raw data on a 500GB hard drive in a FASTAQ format (https://us.dantelabs.com/pages/faq), I doubt it's just SNPs.


Thanks, updated parent post.

How easy is to get your DNA sample contamination-free by isolating nuclei from dead cells in saliva?


"Blood samples had 5-6% unmapped reads compared to 7-18% in saliva. Evaluation of source of unmapped reads found that 0.54% mapped to viral species in both sample types. In contrast, 10.3% of unmapped reads from saliva samples mapped to bacterial species, compared with only 1% of unmapped reads from blood samples as expected1. This analysis represents, to our knowledge, the first comprehensive examination of WGS and WES data generated from blood compared to saliva. Our analysis shows a high level of concordance between the two sample sources, and as expected few somatic variants. This indicates that high quality sequencing data can be derived from saliva samples for germline genetic analyses." [1]

and

"SNP genotyping results for saliva derived DNA (n = 39) illustrated a 98.7% concordance when compared with blood DNA. In conclusion, when compared with blood DNA and tested on the DMET array, saliva-derived DNA provided adequate genotyping quality with a significant lower number of SNP calls. Saliva-derived DNA does perform very well if it contains greater than 31.3% human amplifiable DNA." [2]

[1] https://www.researchgate.net/publication/309849150_Saliva_is...

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3309006/


Frankly I don't think they bother. It's not a huge issue.

Sequencing will run to a certain coverage (or depth), aligning multiple fragments so that each nucleotide is sampled e.g. 30 times, which would weed out contaminants. Saliva is generally considered 'good enough' (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3497576/)

They might just use an anti-microbial agent in their collection kit to prevent growth of bacteria until it reaches their lab (plus you don't want bacterial nucleases to fragment your DNA).


It's their job to do these things


With this in the works: https://www.forbes.com/sites/joshuacohen/2018/10/08/possible...

Using any genomics is an 'admission' of preexisting condition as a genomic guilt. This $200 is proof you know what diseases you might contract, and this serves as information asymmetry that insurance companies have illegalized.

If I could have this done, say, in Russia or China, and the results destroyed after completed, I'd consider it.

I know that half of my genetic map has already been infiltrated with my mom using 23andme. I felt quite violated when my mom did that. I certainly had no choice. And yet, half of me is sitting in some database, where I have absolutely no say.

And my dad's dead, so no non-degraded DNA there. I guess if my sister ever has the money and want, then more of my map will be known.


Sorry to be so direct, but who cares about your DNA? People are so full of themselves thinking they are something unique and of great value. In fact, everything valuable in us is our brains - experiences and knowledge. Plus, don't worry so much about your mom and her 23andMe - that's not sequencing, it's just a test for a list of snips (SNPs).


Nebula Genomics https://nebula.org analyses your whole genome for $99 or free. They store your DNA in a blockchain and you can decide and track who you share your data with.


Nebula only offers 1x coverage. As a comparison, most other companies (such as Veritas), offer 30x coverage. I do not know what that means practically, but if anyone has expertise, would love to hear if that matters.

The reason I don’t just assume 1x is much worse is George Church is involved in both Nebula and Veritas, and I get the sense he wouldn’t be involved in something if it was worthless.


It matters a lot - any error in a read cannot be corrected for if it's the only read covering a loci. Coverage is also uneven; an average of 1x would mean a significant portion of the genome has no coverage. Even 30x would leave gaps (not sure how many as don't work with humans).


> Even 30x would leave gaps

Correct - the coverage number is an average across the sequenced parts of the genome. In some areas the coverage will be much higher, in others much lower.

Importantly, what is commonly called 'whole genome sequencing' is not really that. There remains ~5-8% of the genome that is (almost) impossible to sequence with current technology, and as such has no coverage. Areas with lots of repeats, centromeres, etc.


Apart from other factors which require much deeper sequencing than 1x (usually 20x is a bare minimum), one fundamental thing to keep in mind is that you have 2 copies of most of your chromosomes. So 1 "read" of that position of your genome intrinsically cannot tell you about both of them. The 20x requirement is in part driven by the simple statistics of getting enough reads that you're very unlikely to miss both chromosomes (think tossing a coin 10 times - 1/1024 chance you get 10 heads, extrapolate across 10^9 bases, if you hardly ever want that to happen you'll end up going for something like 20x at every position).


Nebula uses 30X coverage for the free-with-sharing service. However, the $99 sequencing is about 0.4X.


Where do you see this? I signed up for Nebula, but the 30X coverage is just greyed out.


And that means everything. 0.4X coverage for whole genome is essentially worthless.


Could you please explain why whole genome sequencing at 0.4x coverage is worthless? I would have thought it could provide a lot of value, for example, to understand inherited health risks and to predict drug response.


where the coverage actually lands is completely random, so you couldn't count on learning anything about any specific part of the genome.

But the parent is a bit wrong, it's not worthless. The reason people sequence at <1x is because you can still statistically derive a lot of useful info especially about larger variation - long segments of the genome that are deleted or duplicated etc can be inferred, and there is a technique called "imputation" which means if you measure one part of the genome well you can usually predict the nearby parts with surprising accuracy.


Basically it means that you can derive single nucleotide variations to do genome wide association studies. However we do this low coverage of sequencing for plants to aid in the breeding programs


30x for a human genome means that you'd be roughly generating approximately 100Gb data. The human genome is roughly 3.3Gb

1x is basically 3.3Gb


Anyone have an Explain like i’m Five version of why one would get their genome sequenced? What benefits to me as a consumer would I get in terms of things identified, etc.

Maybe if I’m asking the question I’m not ready for the answer? Haha


If some of your family have inheritable diseases, you could find out if you have them or if your children could have them. Finding out that you will suffer from a nasty disease in the future could be quite a shock, though.

People also spend money on things like handwriting analysis, personality questionnaires and measuring people's skulls to try to work out what they are like; genomics is at least somewhat accurate and based on reality!

In practice, you'll get told things like "You have endurance muscles, not fast muscles" and "You probably have blue eyes" (mirrors are more accurate!) and "You have twice the chance of getting this specific type of cancer" (twice a very small number is still a very small number).


TLDR is that there is no compelling reason, if you don't find the concept interesting from a scientific curiousity standpoint. It is likely that someday the information will be useful, but right now it is mostly more of a luxury/curiousity thing. Having your full genome would allow you to (in theory) look up if any upcoming discoveries about the impact of various genes, applies to you. But if you're not wanting to dive deeply into genetics out of personal curiousity, I don't think there's any reason to do this (right now).


Thanks this is supremely helpful!


Save $200. Keep your genome to yourself. You won't regret it.


In the future you're (rightfully) afraid of, people will either collect your DNA from you without consent or as a condition of service.

It's most unfortunate.


http://deweyhagborg.com/projects/stranger-visions

https://vimeo.com/257272785

  In Stranger Visions I collected hairs, chewed up gum, and 
  cigarette butts from the streets, public bathrooms and 
  waiting rooms of New York City. I extracted DNA from them 
  and analyzed it to computationally generate 3d printed 
  life size full color portraits representing what those 
  individuals might look like, based on genomic research. 
  Working with the traces strangers unwittingly left behind, 
  the project was meant to call attention to the developing 
  technology of forensic DNA phenotyping, the potential for 
  a culture of biological surveillance, and the impulse 
  towards genetic determinism.

  The forecast of Stranger Visions came true. Just 2 years 
  later Parabon NanoLabs launched a service they called DNA 
  "snapshot" to police around the US. For more examples see 
  Identitas and read about their collaboration with the 
  Toronto police. 
http://www.identitascorp.com/

http://www.theverge.com/2014/7/20/5916661/the-most-advanced-...


If you want a good sci-fi thriller to really stoke the ol' paranoia, Change Agent is actually a pretty decent read.

https://www.amazon.com/Change-Agent-Daniel-Suarez/dp/1101984...


Seconding the recommendation, it's a great read, well worth the time if you're into biotech and sci-fi in general.

I thank whoever it was that recommended this book to me some years ago, here on HN. Also, if anyone knows of other books like these, please share!


I had really hoped we were still 5-10 years away from this


I had really hoped we'd always be 5-10 years away from this.


That happens in one of the futures we're trying to avoid. In the rest it's just a bunch of opportunistic bad actors. Today there are a few heavy-handed government issues but the vast majority of the problems we face day to day are a result of smaller players who behave badly. So while it's always nice to secure against a joint CIA/NSA attack, if you can't then it's still OK to secure against identity theft.


I had this conversation with my uncle tonight. He's super interested in finding out where he's from, but he steadfastly refuses to do so because of his fear of insurance hikes and how it will be abused in the future. I'm right there with him, and it's horrible to think that it'll be done without consent (cough Credit Scores cough)


Insurance, if not the rest, could be solved if you have a system like the UK (public) or Germany (private) where it’s a fixed percentage of income and mandatory for all.

I recommend campaigning for it, assuming you’re in the USA.


The US actually passed federal legislation a decade ago that prohibits insurance discrimination on the basis of genetic information.

https://en.m.wikipedia.org/wiki/Genetic_Information_Nondiscr...

As I recall from purchasing health insurance in Germany (mandated, and the public option wasn’t available to me as a freelancer), there was price discrimination based on pre-existing conditions, which is currently illegal in the US. The monthly cost was capped at over 600€ Basispreis per individual (this was a couple of years ago), and if you hadn’t visited a hospital for a couple of years and were a healthy man in your 30s and chose to conceal any persitant issues you had been diagnosed with, you could find an offering for closer to 200€. The determination of how pre-existing conditions would affect the price was completely arbitrary and unreasonable, and a significant deterrent to seeking timely care. I’m not aware whether you could choose to not disclose genetic conditions and have your insurance be valid.


That solves basic health insurance, but doesn't help with mortgage or even car insurance, where I'd expect DNA to become an influencing factor too.


In folk lore, you'd use your blood to sign a contract with the Devil.

(I find it interesting how a lot of folk and religious stories about supernatural sound like memories of civilization with technology comparable to ours.)


> I find it interesting how a lot of folk and religious stories about supernatural sound like memories of civilization with technology comparable to ours.

I think about this a lot. It has led me to see that level of technology doesn't always linearly increase. At times there appear to have been great regressions proceeding from times of incredible technology. An easy example for my mind are the pyramids in Egypt, or the countless and nauseatinly intricate temples of India. I feel like there have existed technologies that we haven't yet been able to detect or understand.


Some countries like Bahrain or others collect it for residents, & few others might be thinking about the same for tourists too.


http://biononymous.me/

  Biological surveillance is the means by which biological 
  science is used to track, monitor, analyze, and turn 
  bodies into data. It is the extraction of DNA and microbes 
  from our skin, nails, hair and body fluids. It is the 
  analysis of identifying body parts like faces, 
  fingerprints and irises. It is the tracking of life itself 
  by body heat, pulse, perspiration, and involuntary 
  movement. It is the vulnerability we each face every day 
  by the very situation of being human, by simply having a 
  body. Biononymous.me fosters molecular resistance through 
  the creation of a community to openly discuss, research,  
  and develop potential solutions through art, science, 
  technology, policy, and theory.


To play devil's advocate: On the other hand, maybe you get a chance to do this now, before it becomes regulated or completely illegal. Like, how you can no longer see health analyses in 23andme (in europe).


I thought that was mainly in France due to their laws around biological fathers not necessarily being the "true father".



Out of curiosity, does "whole genome sequencing" include karyotyping?


You can usually infer chromosome copy numbers from the median coverage rate.


They cite hemochromatosis in the second user story— My father is homozygous, and 23andme told me I wasn’t. Is this going to be that much more helpful (at least in the next 5-10 years).


Tempting, no shipping fee from Singapore. Anyone recommends?


And the rate of errors is..?

The same genome sequenced two times - how many differences between two sequences? Please, don't tell me it is none.


"99,7% SNP precision and sensitivity." [0]

Sensitivity = true-positive-rate = 0.997.

Precision = 0.997 = #true-positives / (#true-positives + #false-positives) = true-positive-rate / (true-positive-rate + false-positive-rate) = 0.997 => true-positive-rate + false-positive-rate = 1 => false-positive-rate = 0.003. [1]

That seems like a very high error rate, about 10 million errors in the three-gigabase genome, and 100 thousand errors in the 30-megabase exome (protein-coding regions.) That might be an acceptable rate for population-level analysis if the errors are sufficiently uncorrelated, but I wouldn't want to be making decisions on the basis of it for personalized medicine. For comparison, here's a rough estimate that an individual human genome has 2-3 million SNPs [2].

I thought you could do better than that with 30x coverage, so I might be misinterpreting them, somehow. Or maybe they're using an unconventional sequencing technology which is cheaper but less accurate.

[0] https://us.dantelabs.com/products/whole-genome-sequencing-wg...

[1] Equations given here: https://en.wikipedia.org/wiki/Sensitivity_and_specificity

[2] https://biology.stackexchange.com/a/51315/37343


Has anyone ever claimed it's none?

There's no simple answer to your question as it depends on many things - sequencing technology used, library prep and coverage to name a few.

Generally, it's not far from none when aligning short reads to a high-quality reference genome. Provided there's sufficient coverage and a majority of reads covering a particular nucleotide don't have a error at that position, than the correct answer will be given. Errors creep in due to things like systemic errors in library prep (such as a PCR error), and very low coverage over particular loci due to weird AT/GC content, meaning errors are harder to correct for. Repetitive regions can cause issues for short read alignment too, but coding regions generally aren't that repetitive.

$200 is very cheap for WGS - guessing it would be at the low end of the accuracy range, as they can't be sequencing to great depth (presumably).


They sequence at 30x using BGI technology. Meaning: They provide the current offer at a loss.


The sample file they provide lists 12 errors, so likely at least that.

https://s3.amazonaws.com/dantelabswebsite/Dante+Labs+Genome+...

EDIT: Unless I'm reading the results wrong


The genes don’t even have to be the same when you have two samples from a person: https://www.sciencedaily.com/releases/2009/07/090715131449.h...


The sequencers they’re using are arguably of lower quality than the gold standard


In the genetics world, when you say the words "Gold standard", that usually translates to "Sanger sequencing", which is a high accuracy method of sequencing a small section of DNA, like a single gene. I don't think your statement is very helpful in that context.

Most of the world's whole genome sequencing is done using the Illumina platform. This service is using the BGI platform, which is arguably higher quality than Illumina. Our lab has data showing the error rate with BGI is about 1/6 the error rate of Illumina.

Yes, there are some even better sequencing technologies out there, such as PacBio, which provides longer reads capable of sequencing slightly more of the genome, and the error rates are constantly improving. However, these technologies are much more expensive.


Is this only for the US? Even if what people say is true I'm willing to sell my data if I get this large of a discount.


According to the web site, it is global. The price quoted for Europe is 169 euros.


will the sequence be availabe to the user so they can analyze it themselves? e.g. fastq format


Which one was the Finnish company doing sequencing?


Do we get the FASTQ files?


From their faq:

VCF files (the most effective and easy-to-use format) are easily downloadable from your account. FASTQ and BAM files are sent via a 500 GB Hard Disk. To order the Hard Disk, click here




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: