I don't think daily OHLCV data is a good data source. First of all, it's too little because of the data distribution shift over time. It's also driven significantly by outliers and events outside of the data (news, etc). There's way too much noise in daily prices that most of the signal is drowned out (longer time horizons = more uncertainty). I don't believe you can find any edge looking at daily data. This kind of data is would be equivalent to what MNIST is in ML. Nice for some playing around, but nobody who is serious would use it for production or benchmarking, at least not by itself.
There is a good reason trading firms pay a lot of money (sometimes millions) for fine-grained historical data from exchanges. It's not only about speed. For interesting experiments you IMO need L2 or L3 order book data, ideally somewhere on second or sub-second scales. That's not HFT (which is nano and micros), but somewhere in the "middle" - it's a different world than what you are talking about.
By simulators he means market simulators for L2/L3 data with a matching engine, latencies, queue positions, jitter, complex order types, etc. You can't simulate other market participants (at least not fully, but there are techniques to even estimate this based on live trading feedback), but there are still many things left that you can simulate in a realistic way during training and backtesting. Trading companies typically have their own high-performance simulators built in house. Some of these are incredibly complex. Good simulators can give you a huge edge and are absolutely necessary.
What you said about daily data is precisely what makes stock mkt so interesting and challenging : nonstationarity.
"outliers and events outside of the data, news" : these are precisely the stuff your models need to learn, and the fact that you consider them noise tells me most folks have no clue how to predict these "noise".
There is a good reason trading firms pay a lot of money (sometimes millions) for fine-grained historical data from exchanges. It's not only about speed. For interesting experiments you IMO need L2 or L3 order book data, ideally somewhere on second or sub-second scales. That's not HFT (which is nano and micros), but somewhere in the "middle" - it's a different world than what you are talking about.
By simulators he means market simulators for L2/L3 data with a matching engine, latencies, queue positions, jitter, complex order types, etc. You can't simulate other market participants (at least not fully, but there are techniques to even estimate this based on live trading feedback), but there are still many things left that you can simulate in a realistic way during training and backtesting. Trading companies typically have their own high-performance simulators built in house. Some of these are incredibly complex. Good simulators can give you a huge edge and are absolutely necessary.