Hacker News new | past | comments | ask | show | jobs | submit login

Depends. Throwing outliers out without thinking is obviously wrong. In many instances outliers can be just invalid measurements and you should ignore them.



> In many instances outliers can be just invalid measurements and you should ignore them.

signal[i] = value[i] + noise[i].

If you know that value[i] == NaN, then by all means throw out signal[i]. If value[i] != NaN, then you're better off modeling error[i], and using that model to give you information about value[i] as yummyfajitas suggests.

This is trivial to see if noise[i] == 0, but for some reason becomes progressively harder for people as noise[i] increases.


A much better approach is to incorporate measurement error into your statistical procedure.

Of course that's usually a lot easier to do with Bayesian techniques...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: