Hacker News new | past | comments | ask | show | jobs | submit login

An alternative would be Differential Privacy. There are two ways to do that: one way collects a big database, but only allows aggregate queries which give approximate answers; the other way is to approximate the data during collection.

My favourite example of the latter is if we're asking a sensitive yes/no question like 'have you taken drug X in the past year?'. For each respondent we toss a coin twice, and record the following as their answer:

    HH -> Record their real response
    HT -> Record their real response
    TH -> Record 'yes'
    TT -> Record 'no'
The resulting data set can tell us the prevalence of yes/no answers within a population, based on how much it deviates from the 50/50 'fake' answers. Yet we don't know whether any particular response is real or fake.

I can imagine this sort of thing being used for e.g. usage metrics ("did this user press the Foo button within the past day?").




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: