I can't speak for all use cases, but I've done a great deal of work in the space...

I can't speak for all use cases, but I've done a great deal of work in the space of using deep learning approaches for anomaly detection in network device telemetry. In particular with high resolution univariate time series of latency measurements, we saw success using convolutional autoencoders and GANs. These methods lean on reconstruction loss rather than forecasting, but still effective.

There is some prior art for this that we leaned on [1][2].

RE: transformers — I did some early experimentation with Temporal Fusion Transformers [3] which worked pretty well for forecasting compared to other deep learning methods, but rarely did I see it outperform standard baselines (like ARIMA) in our datasets.

[1] https://www.mdpi.com/2076-3417/12/23/12472

[2] https://arxiv.org/abs/2009.07769

[3] https://arxiv.org/abs/1912.09363