Hacker News new | past | comments | ask | show | jobs | submit login

yeah pretty strong l1--most features were 0. we binarized rank on I_{rank<=20}. it turns out there are tons of articles beyond the first page that stay low forever. check out the interactive viz vad made: http://hn.metamx.com (warning 2.6MB compressed js ahead)



Another question, how are standard errors calculated? I assume they're not from the bootstrapping since the p-values clearly aren't from the standard errors ( +/- 1.96*se is crossing coef=0 for several cases but with small p-values). The other way I would think to get p-values would be the percentage of bootstrap replicates that have (coef==0). But for only 20 replicates you're stuck with p=0 or p=0.05.

I'm genuinely curious how to do coef significance testing for L1-regularized models. I once saw someone ask this at a Tibshirani talk and he said "oh we have no idea, we've resorted to the bootstrap before".


to be honest we just recorded the coeff values for each replicate and did the bootstrap variance calculation.

% of replicates with (coef==0) is potentially much more clever, especially since that's the test we want to perform anyway. i'll run that over the data and see what changes.


I think the question is these don't look like NormalCDF(coef/se) p-values given the coef and se you report. They tend to be too small.

From a frequentist perspective, counting zeroes don't make much sense because under the null of coef=0 there is still a chance you don't estimate coef=0, even after regularization.


    I think the question is these don't look like NormalCDF(coef/se) p-values given the coef and se you report.  They tend to be too small.
right that's my question


interesting yeah some of them definitely don't look right. the output is from scipy's stats.ttest_1samp




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: