Thanks, using DP to bound deviations of arbitrary functions of a model is a neat idea.
I wonder if it it makes sense going from generalization error to an estimate of something similar in spirit to (but not as strong as) differential privacy, as the top-level comment suggested?
For example I want to empirically argue that a particular function of my GMM, let's say the log-likelihood of x_i, is "private." To do so I form C iid train/test splits. For each split I estimate the density of the likelihood for train and test samples, and estimate an upper bound on their ratio. As a result I get C samples of epsilon, and I can use some kind of simple bound (Chebyshev?) on the probability of epsilon being within an interval.
The idea is that we already have some "privacy" [1] from the data-sampling distribution. So we don't necessarily need to add noise to our algorithm. And it would be interesting to measure this privacy (at least for a particular function of the model) empirically.
I wonder if it it makes sense going from generalization error to an estimate of something similar in spirit to (but not as strong as) differential privacy, as the top-level comment suggested?
For example I want to empirically argue that a particular function of my GMM, let's say the log-likelihood of x_i, is "private." To do so I form C iid train/test splits. For each split I estimate the density of the likelihood for train and test samples, and estimate an upper bound on their ratio. As a result I get C samples of epsilon, and I can use some kind of simple bound (Chebyshev?) on the probability of epsilon being within an interval.
The idea is that we already have some "privacy" [1] from the data-sampling distribution. So we don't necessarily need to add noise to our algorithm. And it would be interesting to measure this privacy (at least for a particular function of the model) empirically.
[1] http://ieeexplore.ieee.org/abstract/document/6686180/