An alternative would be Differential Privacy. There are two ways to do that: one...

An alternative would be Differential Privacy. There are two ways to do that: one way collects a big database, but only allows aggregate queries which give approximate answers; the other way is to approximate the data during collection.

My favourite example of the latter is if we're asking a sensitive yes/no question like 'have you taken drug X in the past year?'. For each respondent we toss a coin twice, and record the following as their answer:

    HH -> Record their real response
    HT -> Record their real response
    TH -> Record 'yes'
    TT -> Record 'no'

The resulting data set can tell us the prevalence of yes/no answers within a population, based on how much it deviates from the 50/50 'fake' answers. Yet we don't know whether any particular response is real or fake.

I can imagine this sort of thing being used for e.g. usage metrics ("did this user press the Foo button within the past day?").