I'm no expert, but it occurs to me that a mountain of data coming in from disparate sources needs the concept of discoverability to be adequately used. That is, there has to be a way for a machine to find occurrences of data that it cares about without specifically knowing the exact attributes of the data.
A simple example off the top of my head. Let's say there's a popular app that collects your pulse from your cellphone somehow, and that data is stored in your "Personal API" in some format. Some scientists write a routine that scans people's pulse patterns (with their permission, say) and warns them of possible risk factors. Later on hospitals start feeding their pulse data into the same personal API, but in a different format and using different units (bpm instead of bph, say), because they've never heard of the cellphone app.
I take "discoverability" as a general concept addressing how the scientists' routine might discover that the hospital data exists in a person's "Personal API", discover that the thing being measured by the hospital is the same as the thing being measured by the app, and discover how to map both formats into a common workable format and units.
couldn't one also keep track of the 'source' from where a data point comes (i do this, but don't reveal it in the API - yet)? this way, when it comes to discoverability via some algorithm, you could always separate the 'verified' bucket from the 'unknown' bucket?
It's more about money. This data is worth a lot of money to marketers, retailers, pharmaceutical companies, etc. They have no incentive to share it freely or, I suspect, at a price reasonable to most individual consumers. (Maybe I'm wrong here?)
There was that interesting article a while back about how Target was able to figure out a teenage girl was pregnant before her own father did:
Target can buy data about your ethnicity, job history, the magazines you read, if you’ve ever declared bankruptcy or got divorced, the year you bought (or lost) your house, where you went to college, what kinds of topics you talk about online, whether you prefer certain brands of coffee, paper towels, cereal or applesauce, your political leanings, reading habits, charitable giving and the number of cars you own. (In a statement, Target declined to identify what demographic information it collects or purchases.)
I'd be very curious, and a little scared, to see all this kind of information collected on me.
I don't think so. Documentation is for people. Discoverability is for machines. Notice I suggested it's the scientists' routine which does the discovery, not the scientists themselves.
A simple example off the top of my head. Let's say there's a popular app that collects your pulse from your cellphone somehow, and that data is stored in your "Personal API" in some format. Some scientists write a routine that scans people's pulse patterns (with their permission, say) and warns them of possible risk factors. Later on hospitals start feeding their pulse data into the same personal API, but in a different format and using different units (bpm instead of bph, say), because they've never heard of the cellphone app.
I take "discoverability" as a general concept addressing how the scientists' routine might discover that the hospital data exists in a person's "Personal API", discover that the thing being measured by the hospital is the same as the thing being measured by the app, and discover how to map both formats into a common workable format and units.