I was about to dismiss this but have thought about it a bit more, and there's so...

rasur · on May 31, 2013

Have you read Jaron Laniers new book "Who Owns the Future"? You have similar ideas ("Imagine if the wealth of data collected about you all went to one place, and, most importantly, was controlled by you. Imagine if Company X had to pay you to know about your Amazon spending habits instead of paying Amazon.")

It's a thought provoking book.. and slightly worrisome to boot (about the state of things at the moment, with regard to the economics of "sharing data").

naveenium · on May 31, 2013

yes, i do think of it that way: as we leave all this data exhaust behind us, i keep wanting a system to have it all in one place – or at least, an "interface" that shows it to me in one place, even if that's not how the backend really is.

thanks for the book recommendation - i will check it out this weekend!

amirmc · on May 31, 2013

Academia has been thinking about this for a while. I'm involved with several projects that aim to put users in control of their 'digital footprint' and what you've described are two aspects of the work. First is how to collate and access all the data and second is how to allow third parties to query/process that data in a way that respects a users' privacy. [1]

I'm keen to commercialize the work when it's ready but it's a research project precisely because it's a tough problem. Of course, one approach is to just make that personal info public but I don't think that will really benefit users in the long term (companies would certainly benefit, though)

[1] http://perscon.net/overview/dataware.html

msvan · on May 31, 2013

Jaron Lanier looks and behaves like a total eccentric, and his ideas can seem pretty out there, but I found his latest book to be a really interesting read. This idea is not the same thing though, because the data ends up being more or less public and demonetized. Companies are already collecting lots of data about you. This, in theory, just gives them even more of it, free of charge.

rasur · on June 1, 2013

Your point is of course correct. I was trying to point out the similarity in viewpoint of some of what abraininavat had replied to the OP (not the OP himself).

Ignoring how JL looks and behaves (which is not really germane to the point), the commoditisation of everyone's data by a select few "Siren Servers" (to use Laniers term) without any payment to the originators themselves is - in the long term - not a viable strategy.

Regardless, I stand by my point that his book is thought-provoking and worrisome, though it does not give any concrete answers.. that of course is up to us as a tech "community" to fix (before we all merrily cut off our noses to spite our faces).

edit: corrected the username i was referring to, and spelling.

icebraining · on May 31, 2013

This is very much the goal of Project VRM:

1. https://cyber.law.harvard.edu/projectvrm/Main_Page

2. http://blogs.law.harvard.edu/vrm/

j_s · on May 31, 2013

Neat project! Took me about 5 clicks to find out that VRM is defined as 'Vendor Relationship Management'.

icebraining · on June 1, 2013

Hmm, it's the first phrase of the first link.

  About VRM

  VRM stands for Vendor Relationship Management. (...)

skybrian · on June 1, 2013

I think this is a very libertarian notion that misses out on the social aspects of reputation. What you claim about yourself is often less valuable to a third party than what someone else claims about you. Perhaps you could store signed endorsements made by other people. But then you need support for revocation, and what about negative claims? Free speech includes the ability to say negative things about others.

Social software is very complicated because of issues like this. Something like a personal dropbox is straightforward - single customer, no community. You only have to worry about people hacking you, not what they do to each other. Anyone who gets into social software inevitably sets themselves up for making rules and judging disputes, and that often requires hard decisions.

Or compare email with social messaging; the rights and responsibilities are different and it's due to the software. Personal servers only support setting ground rules in a certain way (for example, the inability to revoke or prevent copies). Centralized services can support communities with different rules, for better or worse.

iusable · on May 31, 2013

Agree 100%! The point that's important to remember about discoverability is that it is usually mission-centric.

e.g. - you want to discover the 'right' kind of people to follow, as to have better or more entertaining or more useful or _____________ information. What is 'right' for you may not be 'right' for me, since our goals most likely aren't the same.

I love Naveen's idea (and more importantly his implemention) & will be playing with it a bit more this weekend. But, the value may not be apparent right away.

Aggregating and mining one type of data across a relationship may be more valuable. It may be that I have to track a similar set of data for myself and my significant other to make sense of the aggregates.

wslh · on May 31, 2013

In my spare time I am doing something like it. It's called Egont (No Ego OR Ego New Technology).

If you liked this article, you might like:

i) Egont, a web orchestration language: http://blog.databigbang.com/ideas-egont-a-web-orchestration-...

ii) Egont part II: http://blog.databigbang.com/egont-part-ii/

iii) Parsing s-expressions in C# using OMeta: http://blog.databigbang.com/parsing-s-expressions-in-c-using...

Because your stuff is defined as an s-expression and then optimized to a DAG.

quantumpotato_ · on May 31, 2013

[1] http://en.wikipedia.org/wiki/Directed_acyclic_graph ? Can you explain more for us layfolk?

wslh · on June 1, 2013

Yes, An s-expression is a tree. Imagine that you want to know the average score of a list of films like: (average listOfFilms) and someone else like the same average for the same list. You will end up with two trees that indeed are the same and can be calculated only once. A DAG is a way to represent dependencies, so if two cells are linked to the same cell (average listOfFilm).

wslh · on June 1, 2013

One last clarification: not all trees can be converted to a DAG. There cannot be circular references.

graue · on June 1, 2013

> Imagine if the wealth of data collected about you all went to one place, and, most importantly, was controlled by you. Imagine if Company X had to pay you to know about your Amazon spending habits instead of paying Amazon.

Even more than Project VRM linked above, this is precisely the eventual goal of Tent: https://tent.io/

That said, I'm skeptical of these projects', or anyone's, ability to achieve this utopia. Why would Amazon or Google adopt such a system? Your shopping/searching history is valuable data that gives them a business advantage.

Also, while a personal data store that you control may be a geek's wet dream, for an average nontechnical person, it would likely be difficult to use, insecure and offer no clear benefits. And if it ends up just a toy for geeks, the lack of scale would make it even harder to do anything interesting with it or get any companies of significant size to play ball.

I do hope ideas like this surface in some form, but I'm skeptical of the personal data store ever really becoming a thing.

abraininavat · on June 1, 2013

Why would Amazon or Google adopt such a system I could imagine laws requiring that personal data be controlled by the subjects.

it would likely be difficult to use, insecure... Who says? How can you assume this?

and offer no clear benefits Some would consider the ability to keep your information private at will a benefit. Others would consider the ability to sell said information a benefit.

the lack of scale would make it even harder to do anything interesting Lack of scale? An individual's data would be controlled by that individual, but the system would scale across all of humanity. I don't think you're thinking big enough -- I'm not talking about each person running his personal EC2.

graue · on June 1, 2013

Look, I'm not saying don't go for this idea. I'd love to see you try it. My doubts shouldn't get in your way. Code it up.

As for why it would be difficult to use and insecure? I'm guessing most of the people you hang out with are probably geeks. For perspective, take a look at this video:

https://www.youtube.com/watch?v=o4MwTvtyrUQ

Imagine each one of those people in the video has a personal data store that contains their complete medical history, right alongside all their Google searches, Amazon purchases and tweets.

How secure of a password do you think they picked?

How hard would it be to trick them with a phishing attack?

How much do you think they even know how to do with it in the first place?

amirmc · on June 1, 2013

The questions you pose also apply to centralized services in exactly the same way. There is no reason that personal clouds would be any less secure than centralized services are today. [1]

Even though some people are confused about the difference between browsers and search-engines doesn't mean they don't know how to use either. You'd probably have got the same response if you'd asked about the difference between the Web and Internet. Or any other complex system they don't have deal with day-to-day (internal combustion engine, electricity transmission, etc)

[1] edit: think of things like mass-assignment issue with rails and github or the recent facebook exploit that was posted. Centralized services are just as much of a honey-pot as a personal cloud might be.

graue · on June 2, 2013

No one who steals your Facebook password can get into anything as serious as your entire medical history. The proposal is dangerous precisely because it's centralized — in the sense of centralizing all different types of data on a particular individual. It also, as a side effect of giving the user more control, would give the user more to lose if their account is compromised. Again, with no clear gain for most people.

amirmc · on June 2, 2013

I feel we're just going to disagree over this but it's probably because neither of us has clearly stated the threat models we're dealing with. Also, it's somewhat hyperbolic to claim that such proposals are 'dangerous'. I could claim the same about the current situation where more and more personal data is handed over to companies, almost by default.

jckt · on May 31, 2013

+1, but do you mind expanding on "discoverability"?

klenwell · on May 31, 2013

A trivial example from my own recent experience: getting my highlights and notes out of the Amazon Kindle app on my phone.

Amazon could make these available through a simple RSS feed or export feature, but they don't. The only way I can get my hands on them is to scrape their website, which depends on some fragile scripting and is subject to their terms of service.

I really like this idea, but without some kind of positive government regulation, I don't see it happening. At least not in the sort of meaningful ways the OP envisions.

abraininavat · on May 31, 2013

I'm no expert, but it occurs to me that a mountain of data coming in from disparate sources needs the concept of discoverability to be adequately used. That is, there has to be a way for a machine to find occurrences of data that it cares about without specifically knowing the exact attributes of the data.

A simple example off the top of my head. Let's say there's a popular app that collects your pulse from your cellphone somehow, and that data is stored in your "Personal API" in some format. Some scientists write a routine that scans people's pulse patterns (with their permission, say) and warns them of possible risk factors. Later on hospitals start feeding their pulse data into the same personal API, but in a different format and using different units (bpm instead of bph, say), because they've never heard of the cellphone app.

I take "discoverability" as a general concept addressing how the scientists' routine might discover that the hospital data exists in a person's "Personal API", discover that the thing being measured by the hospital is the same as the thing being measured by the app, and discover how to map both formats into a common workable format and units.

naveenium · on May 31, 2013

couldn't one also keep track of the 'source' from where a data point comes (i do this, but don't reveal it in the API - yet)? this way, when it comes to discoverability via some algorithm, you could always separate the 'verified' bucket from the 'unknown' bucket?

fosk · on May 31, 2013

If I understand correctly, isn't this the purpose of a good documentation?

klenwell · on May 31, 2013

It's more about money. This data is worth a lot of money to marketers, retailers, pharmaceutical companies, etc. They have no incentive to share it freely or, I suspect, at a price reasonable to most individual consumers. (Maybe I'm wrong here?)

There was that interesting article a while back about how Target was able to figure out a teenage girl was pregnant before her own father did:

http://www.nytimes.com/2012/02/19/magazine/shopping-habits.h...

It glosses this point:

Target can buy data about your ethnicity, job history, the magazines you read, if you’ve ever declared bankruptcy or got divorced, the year you bought (or lost) your house, where you went to college, what kinds of topics you talk about online, whether you prefer certain brands of coffee, paper towels, cereal or applesauce, your political leanings, reading habits, charitable giving and the number of cars you own. (In a statement, Target declined to identify what demographic information it collects or purchases.)

I'd be very curious, and a little scared, to see all this kind of information collected on me.

abraininavat · on May 31, 2013

I don't think so. Documentation is for people. Discoverability is for machines. Notice I suggested it's the scientists' routine which does the discovery, not the scientists themselves.