Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sorry for the cynicism, but this worries me.

> This strategic partnership will combine Google’s cloud and AI capabilities and Mayo’s world-leading clinical expertise to improve the health of people—and entire communities—through the transformative impact of understanding insights at scale. Ultimately, we will work together to solve humanity’s most serious and complex medical challenges.

I'm sorry but this is just pure silicon valley speak. Are Mayo's patients really going to know what Google is doing with their clinical data? When I hear "partnering with Google to create machine-learning models for serious and complex disease", I have a hard time believing Mayo patients know what they are signing away when they consent to this (if at all, which is not mentioned?)



They have no idea. Even data from continuous glucose monitors (most commonly used in type 1 diabetes) are directly shared with insurance companies, where certain patients with diabetes are "flagged" as "problem patients": https://type1tennis.blogspot.com/2016/05/when-data-fluxes-co...

I have 2 rare diseases, and one of which was discovered via NIH funds at the Mayo Clinic in the early 2000s. I also have type 1 diabetes, and I can attest to the veracity of the claims made on the blog post.

For somebody like me, the situation is unwinnable, if I want to live. HIPAA is a joke because it is perfectly legal to combine other data with the HIPAA anonymized source to identify the individual. Every day, leaving the US looks better.


You should move to Canada or Europe, they'll take care of your rare diseases.


Can you elaborate about what insurance companies can do in the US to someone known to have a rare disease? Can they deny coverage or price it higher than healthy people?


How would you combine HIPAA with another data source to identify the individual? Not suggesting it can't be done, just wondering how one might do that? Being able to link data that can identify a person to some de-identified would only be possible if the original data was not properly de-identified right?


There is no such thing as "proper de-identification" in general; it's all the matter of what other data sets the re-identifying party has at its disposal.

Consider the following de-identified data sets:

- [date, time, clinic, procedure or test being done, insurer] - as collected by the clinic chain so that it can get money from insurers

- [month, clinic, test name, test result] - for all tests made in the last year, collected for statistical purposes

- [date, time, latitude, longitude, phone number] - because AFAIR telcos sell this data

- [name, surname, phone number, ...] - some insurance company's list of customers

If you can get your hands on these datasets, you can trivially de-identify patients and even assign test results to them with high probability (that depends on how many tests of a given type are made in any given clinic per the unit of time used to group the second data set).

Real-world data sets may be less clear-cut than this, but there is more of it, and you can apply statistical methods to find correlations. You don't need to be 100% sure customer X has diabetes for the information to be useful to you; 70% or 60% is useful too.


Section 164.514(b)

"The following identifiers of the individual or of relatives, employers, or household members of the individual, are removed:

...

(B) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code

(C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older

... "

This the "Safe Harbor" method.

You could use the "Expert Determination" method. However, date + time + location attached to health information in your first data set definitely doesn't meet the criteria. I'll eat my hat if you find a supposed "non-PHI" data set with those.

In fact, the criteria for expert determination is literally that re-identification cannot be performed (without already having PHI-type information).


Yea this was my impression too. I've worked with HIPAA data and usually I had to remove far more than just like a "name" for it to be de-identified.



Genomic data is by definition identified.


> HIPAA is a joke because it is perfectly legal to combine other data with the HIPAA anonymized source to identify the individual.

HIPAA may be a joke, but not for this reason.

If information can be re-identified as PHI in any way (including matching phone numbers, birth date, IP addresses, patient account #s, etc.) it doesn't meet the de-identification standard.

Section 164.514(b)

You must remove the 20 types of identifiers, or receive a certification:

"A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable:

Applying such principles and methods, determines that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information;"

Moreover, your information can only be used for research if you give written permission (Section 164.508). If you have given this permission, you may revoke it for the future.


I gotta say, I wonder at what the balance of any discoveries will actually be. I am not a doctor, but I wonder if Machine Learning will identify anything other than geographic clusters of patients? I suppose that it is possible that this kind of info could help in diagnosis, but, machine learning seems unlikely to result in cures given the lack of any actual trials. Of course, improved incidence data might truly help earmark funds for research.

personally, I hope that my data could help people. I am not so concerned with my privacy that I would withhold my medical records if they could help. But, in exchange for that contribution I do want protection against misuse of that data by my insurers or other institutions. I recognize it isn't possible for me to enforce that use in any other way than withholding my data entirely.

The reality is that we cannot, as a society, afford to have no trust in our institutions. We need instead to focus on providing oversight and guidance of them such that they can actively contribute to the public good. Private contributions to institutions is mandatory for the public to succeed. maintaining that trust is the responsibilities of those institutions and they should seek whatever audits and oversights they can to keep that trust and fulfill their charters.


Dude, it is not a "contribution", it is a black-box partnership.

I will give up my data for "research" as long as I am 100% certain that my medical information will not be abused in a way that can harm me or others.

Until code is routinely audited and conforms to a strict code of ethics, you can count me out.

If you think it is "unfair", consider my position: I have a rare disease that has case reports and (small) cohorts at best in the literature. Publications on it are sparse.

Since I can be re-identified with literally zero effort, due to a lack of ethics of the STEM community, I have a target on my back.

I want to stay alive, and not be targeted by insurance companies and/or the medical-industrial complex.

Sorry, but if people want to benefit from me, then don't force unethical experimentation on me and others via big data and the wide sharing of that data.


I’m curious, are you worried that if everyone in your position doesn’t share data there will never be research on your condition and therefore no cures or better treatments?

I worry that one tradeoff of more privacy and less trust is that researchers won’t get the information they need to produce cures and treatments. It’s a faustian bargain that people who are sick have to either risk having their information leak, or risk science ignoring their condition entirely.


> I’m curious, are you worried that if everyone in your position doesn’t share data there will never be research on your condition and therefore no cures or better treatments?

You have data sharers to blame for that. They're the ones that are destroying possible cooperation (everyone with that condition sharing data).

> It’s a faustian bargain that people who are sick have to either risk having their information leak, or risk science ignoring their condition entirely.

Indeed, and I think the way to solve it is to go after the leakers and the sharers and the "entrepreneurs". If I give my data for medical research, I mean bona fide research, as in scientists and labs and tax-funded scientific papers, and not "research" into lowering operational costs by selling data, or "research" done by startups partnering with the clinic.


Same boat, same reasons.

I opted out of EHCR here (apparently I was one of the very few who did from a conversation with my GP receptionist) because I simply don't trust "It'll save us money" as a reason (I also don't trust my gov to be competent).


Can't we have better separation of interests?

E.g. Google provides hardware and software.

Mayo clinic runs the hardware and software, while Google sees none of the data?


I think this is Google's attempt to break into EHR, where Epic Systems is a leader, but has years more experience in maintaining confidential information.

But you're right, Google is a marketing, search, and advertising GIANT. Even if they stay HIPAA compliant, I would want to see more deliberate separation of interest.


ML won't be used to find cures, but to diagnose and classify disease. For example, CNN features could be applied to medical imaging to learn to detect various diseases, and that algorithm could replace the "quick read" that physicians order when the Radiologist isn't available to do a full read.

Another possible use would be to identify septic or nearly septic patients and alert a clinician to intervene.


I read this more as: oh boy, two heavily corrupt industries are forming a partnership. I'm not throwing shade at Mayo Clinic sspecifically (the research there is vital), but the healthcare industry in general. I can tell you I don't want a data mining and advertising giant focused only on revenue in an industry that can directly effect my longevity and is already focused only on revenue. Healthcare is corrupt enough, I don't want it to get worse (and it will).

Neither tech nor healthcare (pharma, insurance, and many hospitals) want anything more than money and fending off sharks is not something I want to deal with when I'm at my most vulnerable points imaginable (most will be fairly sick at some point in their life even if it's only near death).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: