In fact, there are two general approaches to unifying ML with time series:
o Unifying ML API and patterns so that time series can be analyzed in the same way as normal tabular data. Example: sktime
o Preprocessing libraries applying data transformations to the input time series by converting them to a (tabular) form which can be used by normal ML algorithms. These are typically general or specific libraries for feature engineering, feature learning, feature selection and generic transformations. Examples: https://github.com/blue-yonder/tsfresh or https://github.com/prostodata/prosto
My experience has been that if you are doing feature engineering and using summary vectors for time series classification, rather than algorithms which deal with time series directly, and it's working, the problem is not really complicated enough to need high-compute machine learning methods and you'll be fine with one of the more popular off-the-shelf methods.
We're actually interfacing tsfresh. Unifying ML with time series is perhaps better understood in terms of the different learning tasks (e.g. time series classification/regression/clustering, forecasting, time series annotation) and their relations (applying algorithms for one task to help solve another).
There is a tutorial-like presentation that explains the motivation behind this library and how to use it from the latest PyData Fest. I have been watching this two thirds of the way through and think the presenter's explanations are worth listening to, if you are dealing with time series. Have a look: https://www.youtube.com/watch?v=Wf2naBHRo8Q
This looks very useful. I have a question for the developers, if any of them are here: is there any plan to add anomaly detection for time series data?
Anomaly detection belongs to unsupervised learning while in time series analysis we normally think about future and future values are viewed as labels. One approach to think in terms of anomaly detection is to train a normal forecasting model. An anomaly is then viewed as large deviation from predicted values. Another approach to train an autoencoder on segments of the time series. Then anomaly is defined by the degree of deviation of the decoded segment from the real segment.
> Is anomaly detection not a timeseries classification, which they do implement
It can be a time series classification if and only if you have labels that say a given sample is an anomaly. But then what if it runs for a while and a new kind of anomaly starts happening that it’s never seen before and isn’t in the training set: will it detect it based on your labels?
Ideally anomaly detection should be a more unsupervised learning scenario where it can automatically determine what’s normal and what’s not.
I knew about work in anomaly detection for state of health and state of charge in batteries, where you can somewhat model the physical effects (in a data driven manner). However, this description of the problem made me think that meta-learning might be suitable for the problem you’re describing. I’ve only seen it applied in computer vision though (and more recently in speech).
Sktime contains a handful of standard tools and models used for time series analysis and machine learning.
Prophet contains one specific model developed by Facebook, which you will not find in Sktime.
You won't ever find something like Prophet in Sktime because it's a "higher level" model than anything in Sktime. The cleverness of Prophet is based largely on its automatic feature engineering, with a linear regression model underneath. Whereas a library like Sktime focuses on implementing specific models like ARIMA, and letting you do your own feature engineering.
sktime is a toolbox with the goal to support multiple models and composition techniques, Prophet is a particular model. We're working on interfacing it so that you can call it using our API.
It's not just that the Statsmodels offering is limited, it's that the Scikit-learn style API is effectively an industry standard. Whereas the Statsmodels API is vaguely R-inspired, and while it has some benefits compared to Scikit-learn, it's doesn't fit easily into today's "standard" Python machine learning workflows without a bunch of wrapper code.
When I see a new forecasting library, my first question is whether it can apply ARIMA in a sklearn manner by training a model using (large) train time series X, storing the model by discarding the train data, and then using this model for predictions by feeding a completely different (and shorter) time series. Importantly, the time series used for prediction is not a continuation of the time series used for training. Moreover, we do not store anywhere the original time series used for training.
So the question is whether sktime can apply ARIMA in this way.
We're interfacing statsmodels and pmdarima for the implementation of the ARIMA model. I believe that you can persist models in statsmodels without saving the whole training data.
Very unlikely that sktime could be used in this way, because ARIMA is rarely applied this way. This would be akin to estimating the mean from some sample and predicting values of a different population (not a different sample from the same population) with this mean. You could do that, but in general it will not yield very good predictions. ARIMA is just fitting the mean, variance and serial correlation of a specific time series. Using these sample moments to predict the trajectory of a completely different time series rarely makes much sense.
But there is no reason from an API perspective why it couldn't work like this. An ARIMA model is a handful of parameters that act on input and output vectors. Whether it makes sense to use it that way is a separate question.
As it happens, this is precisely how sktime works. The whole point is that its API is analogous to that of Scikit-learn. This is clearly demonstrated in the example code:
Sktime is just an implementation of various time series models with a Scikit-learn-compatible API. It is still up to the user to know what to do with this stuff.
For example, I have data from 1900 till 2000. I train ARMA using this data by storing the corresponding coefficients as model parameters. Now I get data from 2010 to 2020. My goal is to use these (AR and MA) coefficients in order to predict the value in 2021 (without using the historic data I used for training). I think it does make sense and it is precisely how typical ML algorithms work. So it is more a matter of how an algorithm is implemented and which usage patterns it follows.
Ok, if you want to apply the fitted model to later data points of the same series, in principle you could. Superficially browsing the source of sklearn, it does not seem to support/expose it. AFAICT, sklearn's ARIMA estimator wraps pmdarima, which wraps SARIMAX from statsmodels, which uses the statsmodels state space model for the actual calculations. Best I can tell, none of the higher lever wrappers support/expose the functionality that you wish. If you know how to work with the raw state space form in statsmodels, you could do more or less what you described (predict with a fitted model without retaining the full history - tough you also need to store the estimated state in addition to the ARMA coefficients).
If you don't know how to do this, I'd advise you not to bother, unless you have a really specialistic need.
("just storing and applying fitted coefficients on new data" is straightforward if you have a pure AR(p) model: you can just plug in the coefficients in the recursive AR equation using the last observations. But as soon as you have an MA term, you have a problem, because a finite lag MA(q) model is equivalent to an infinite lag AR(p) model. You need some specialized algorithms like the innovations algorithm or Kalman filters to handle that. Statsmodels uses a Kalman filter on the state space form of the ARMA model.)
o Unifying ML API and patterns so that time series can be analyzed in the same way as normal tabular data. Example: sktime
o Preprocessing libraries applying data transformations to the input time series by converting them to a (tabular) form which can be used by normal ML algorithms. These are typically general or specific libraries for feature engineering, feature learning, feature selection and generic transformations. Examples: https://github.com/blue-yonder/tsfresh or https://github.com/prostodata/prosto
Which approach will win is not clear.