Hacker News new | past | comments | ask | show | jobs | submit login

It is pretty easy to avoid NaNs when working with softmax, you certainly don't need any epsilons. Just subtract the largest value from everything, and you will have no rounding problems or catastrophic cancellation.

Clearly softmax is not too bad, if it is used extensively in all the most powerful models.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
