Hacker News new | past | comments | ask | show | jobs | submit login

Very unlikely that sktime could be used in this way, because ARIMA is rarely applied this way. This would be akin to estimating the mean from some sample and predicting values of a different population (not a different sample from the same population) with this mean. You could do that, but in general it will not yield very good predictions. ARIMA is just fitting the mean, variance and serial correlation of a specific time series. Using these sample moments to predict the trajectory of a completely different time series rarely makes much sense.



But there is no reason from an API perspective why it couldn't work like this. An ARIMA model is a handful of parameters that act on input and output vectors. Whether it makes sense to use it that way is a separate question.

As it happens, this is precisely how sktime works. The whole point is that its API is analogous to that of Scikit-learn. This is clearly demonstrated in the example code:

    y = load_airline()
    y_train, y_test = temporal_train_test_split(y)
    fh = np.arange(1, len(y_test) + 1)  # forecasting horizon
    forecaster = ThetaForecaster(sp=12)  # monthly seasonal periodicity
    forecaster.fit(y_train)
    y_pred = forecaster.predict(fh)
Sktime is just an implementation of various time series models with a Scikit-learn-compatible API. It is still up to the user to know what to do with this stuff.


For example, I have data from 1900 till 2000. I train ARMA using this data by storing the corresponding coefficients as model parameters. Now I get data from 2010 to 2020. My goal is to use these (AR and MA) coefficients in order to predict the value in 2021 (without using the historic data I used for training). I think it does make sense and it is precisely how typical ML algorithms work. So it is more a matter of how an algorithm is implemented and which usage patterns it follows.


Ok, if you want to apply the fitted model to later data points of the same series, in principle you could. Superficially browsing the source of sklearn, it does not seem to support/expose it. AFAICT, sklearn's ARIMA estimator wraps pmdarima, which wraps SARIMAX from statsmodels, which uses the statsmodels state space model for the actual calculations. Best I can tell, none of the higher lever wrappers support/expose the functionality that you wish. If you know how to work with the raw state space form in statsmodels, you could do more or less what you described (predict with a fitted model without retaining the full history - tough you also need to store the estimated state in addition to the ARMA coefficients).

If you don't know how to do this, I'd advise you not to bother, unless you have a really specialistic need.

("just storing and applying fitted coefficients on new data" is straightforward if you have a pure AR(p) model: you can just plug in the coefficients in the recursive AR equation using the last observations. But as soon as you have an MA term, you have a problem, because a finite lag MA(q) model is equivalent to an infinite lag AR(p) model. You need some specialized algorithms like the innovations algorithm or Kalman filters to handle that. Statsmodels uses a Kalman filter on the state space form of the ARMA model.)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: