Hehe I was wondering if someone would catch that. Rest assured, I know the difference between online and stochastic gradient descent. I admit I used stochastic on Hacker News because I thought it would generate more engagement.
What are some adversarial cases for gradient descent, and/or what sort of e.g. DVC.org or W3C PROV provenance information should be tracked for a production ML workflow?
We built model & data provenance into our open source ML library, though it's admittedly not the W3C PROV standard. There were a few gaps in it until we built an automated reproducibility system on top of it, but now it's pretty solid for all the algorithms we implement. Unfortunately some of the things we wrap (notably TensorFlow) aren't reproducible enough due to some unfixed bugs. There's an overview of the provenance system in this reprise of the JavaOne talk I gave here https://www.youtube.com/watch?v=GXOMjq2OS_c. The library is on GitHub - https://github.com/oracle/tribuo.
It's specifically not stochastic. From the article:
Online gradient descent
Finally, we have enough experience to implement online gradient descent. To keep things simple, we will use a very vanilla version:
- Constant learning rate, as opposed to a schedule.
- Single epoch, we only do one pass on the data.
- Not stochastic: the rows are not shuffled. ⇠ ⇠ ⇠ ⇠
- Squared loss, which is the standard loss for regression.
- No gradient clipping.
- No weight regularisation.
- No intercept term.