Hacker News new | past | comments | ask | show | jobs | submit login
Chronos: Learning the Language of Time Series (arxiv.org)
207 points by Anon84 10 months ago | hide | past | favorite | 59 comments



I do not have a horse in the race, but it is interesting to see open source comparisons to traditional timeseries strategies: https://github.com/Nixtla/nixtla/tree/main/experiments/amazo...

In general, the M-Competitions (https://forecasters.org/resources/time-series-data/), the olympics of timeseries forecasting, have proven frustrating for ML methods... linear models do shockingly well and the ML models that have won, generally seem to be variants of older tree-based methods (ie. LightGBM is a favorite).

Will be interesting to see whether the Transformer architecture ends up making real progress here.


They are comparing a non-ensembled transformer model with an ensemble of simple linear models. It's not surprising that the ensemble models of linear time series models will do well, since ensembles optimize for the bias-variance trade-off.

Transformer/ML models by themselves have a tendency to overfit past patterns. They pick up more signal in the patterns, but they also pick up spurious patterns. They're low bias but high variance.

It would be more interesting to compare an ensemble of transformer models with an ensemble of linear models to see which is more accurate.

(that said, it's pretty impressive that an ensemble of simple linear models can beat a large scale transformer model -- this tells me the domain being forecast has a high degree of variance, which transformer models by themselves don't do well on.)


fyi I think you have bias and variance the wrong way around. Over-fitting indicates high variance


Thank you for catching that. Corrected.


> ensemble of transformer models

Isn't that just dropout?


No. Why do you think so?


Geoffrey Hinton describes dropout that way. It's like you're training different nets each time dropout changes.


Dropout is different from ensembles. It is a regularization method.

It might look like an ensemble because you’re selecting different subsets but ensembles combine different independent models rather than just subset models.


That said random forests are an internal ensemble, so I guess that could work.

In my mind an ensemble is like a committee. For it to be effective, each member should be independent (able to pick up different signals) and have a greater than random chance of being correct.


I am aware it is not literally an ensemble model, but Geoffrey Hinton says it achieves the same thing conceptually and practically.


Are these models high risk because of their lack of interpratability? Specialized models like temporal fusion transformers attempt to solve this but in practice I'm seeing folks torn apart when defending transformers against model risk committees within organizations that are mature enough to have them.


Interpretability is just one pillar to satisfy in AI governance. You have build submodels to assist with interpreting black box main prediction models.


Is there a way to directly train transformer models to output embeddings that could help tree based models downstream? For tabular data tree based models seems to be the best but I feel like foundational models could help them in some way


As a practitioner the most impactful library for time series has been brms, which basically gives you syntactic sugar for creating statistical models in Stan. Checks all the boxes including probabilistic forecasts, multiple link functions for the likelihood including weiner, gamma, Gaussian, student t, binomial, zero-inflated and hurdle models. Also has auto-regressive and ordinal predictors and you actually learn something from your data.

I find a lot of these ML and DL libraries to be harder to troubleshoot beyond blind hyperparameter tuning whereas with stats I can tweak model, modify likelihood, etc. There’s also a lot of high value problems that have few data points these libraries tend to want at least daily data.


Could you expand on what you mean by "practitioner?"

Also a followup question. With timeGPT and chronos advertised as "foundational time series models", do you think they have any value?


I guess I just mean I’m a data scientist—someone who uses models like these in practice as opposed to someone who develops them.

I’m not sure what to even make of a term like “foundational time series”. Does that just mean it’s widely used and known? You have to earn a role like that you can’t just declare yourself one.


Maybe I'm missing something obvious, but what is the idea behind quantizing and tokenizing time series? We tokenize text because text isn't numbers. In the case of time series, we're... turning numbers into less precise numbers? The benefit of scaling and centering is trivial and i guess all timeseries ML does it, but I don't see why we need a token after that.


I'm building upon insights from this paper (https://arxiv.org/pdf/2403.03950.pdf) and believe that classification can sometimes outperform regression, even when dealing with continuous output values. This is particularly true in scenarios where the output is noisy and may assume various values (multi modal). By treating the problem as classification over discrete bins, we can obtain an approximate distribution over these bins, rather than settling for a single, averaged value as regression would yield. This approach not only facilitates sampling but may also lead to more favorable loss landscapes. The linked paper in this comment provides more details of this idea.


Isn't it a given that classification would "outperform" regression, assuming n_classes < n_possible_continuous_labels? Turning a regression problem into a classification problem bins the data, offers more examples per label, simplifying the problem, with a tradeoff in what granularity you can predict.

(It depends on what you mean by "outperform" since metrics for classification and regression aren't always comparable, but I think I'm following the meaning of your comment overall)


Tokenisation turns a continuous signal into a normalized discrete vocabulary: stock "went up a lot", "went up a little", "stayed flat". This smooths out noise and simplifies matching up similar but not identical signals.

> We tokenize text because text isn't numbers.

Text is actually numbers. People tried inputting UTF8 directly into transformers, but it doesn't work that well. Karpathy explains why:

https://www.youtube.com/watch?v=zduSFxRajkE


> Text is actually numbers

Text can be represented by numbers but they aren't the same datatype. They don't support the same operations (addition, subtraction, multiplication, etc).


Interesting. Can you explain how this is superior and/or different from traditional DSP filters or other non-tokenization tricks in the signal processing field?


Traditional DSP filters still output a continuous signal. And it's a well-explored domain, hard to imagine any low-hanging fruit there.

My intuition is the following: transformers work really well for text, so we could try turning a time series into a "story" (limited vocabulary) and see what happens.


Like this or something different?

https://github.com/gzerveas/mvts_transformer


I think it could also have a connection with symbolic AI: The discrete tokens could be the symbols that many believe is useful or necessary for reasoning. It is also useful for compression, reducing memory requirements by the quantization and small integer representations.

https://en.wikipedia.org/wiki/Neuro-symbolic_AI


My primitive understanding is that we approximate a Markovian approach and indirectly model the transition probabilities just by working through tokens.


My guess is that it enforces a kind of sparsity constraint.


Chronos is probably overkill for what I am looking to do with time series data. I just did an Ask HN on time series[0] but unfortunately didn't get the replies I was hoping for. Maybe this thread can get the bump I need:

I inherited a large time series JSON dataset in 2024. I've been successful in using the Observable Framework[1] by writing a Rust (rust-script) data loader[2] to parse and plot simple line charts[3] to visually see the data. There are hundreds of graphs over years of data so I would like to identify what graphs I should be paying attention to. My initial thought is to calculate metrics on each graph such as:

  - Variability: how "spread out" are the data points from one another?
  - Trend: direction of data path, up or down?
  - Slope: are the data points increasing or decreasing?
  - Level: where are the data points on the vertical axis?
What libraries, AI, databases, etc... would you recommend that would allow me to calculate these values? I am no data scientist and don't need forecasting but overall, I just want a dashboard that shows the most "important" graphs.

[1] https://observablehq.com/framework/

[2] https://observablehq.com/framework/loaders

[3] https://observablehq.com/@observablehq/plot-simple-line-char...

edit: the x-axis is Time while the y-axis can be values such as duration, frequency, intervals

[0] https://news.ycombinator.com/item?id=39763246


I always worked in R for time series analysis. This cookbook has everything you would need for a plan to analyze a time series [0] and this book provides a strong base and understanding while being focus on forecasting. [1] Have fun !

[0] https://rc2e.com/timeseriesanalysis [1] https://otexts.com/fpp2/



Agree, great resource.


When you ask what data should be paying attention to, that should be depends on your objective. Do you want to predict something? Identify anomalies? In the end, what matters is understanding the meaning and relations of these data, rather than throwing them in to some ML framework and hoping to get something out.


Prediction and anomalies are not objectives but of the 4 listed, I would say the primary objective is identifying a trend in the data to know whether the data is moving in a specific direction—increasing or decreasing in value.

I already added linear regression marks that draws linear regression lines with confidence bands[1] to my Observable plots but they do not give me a “value” so I need to manually look at the graphs and read the red line.

[1] https://observablehq.com/plot/marks/linear-regression


Doesn't look like you need anything fancy here.

Load you time serie in a dataframe, and:

> - Variability: how "spread out" are the data points from one another?

So basically df.std(), with rolling variants for short term / long term.

> - Trend: direction of data path, up or down? - Slope: are the data points increasing or decreasing?

Just do a simple rolling linear regression of your data point against time.


Doesn't cite TimesFM for some reason. Maybe the latter was published after this paper went camera-ready? https://blog.research.google/2024/02/a-decoder-only-foundati...


Because these approaches as likely derived from papers published 3-5 years ago. At this point neither TimesFM or Chronos is particularly novel. I've had similar models in production for complex time series for 18 months now.


Coming from finance, I always wonder how and if these large pre-trained models are usable on any financial time series. I see the appeal of pre-trained models in areas where there is clearly a stationary pattern, even if its very hidden (i.e industrial or biological metrics). But given the inherently high signal/noise ratio and how extremely non-stationary or chaotic the financial data processes tend to be, i struggle to see the use of pre-trained foundation models.


Stock prices change continuously based on the current price and future events that have not happened. I don't think they are at all predictable.



I played around with timeGPT beta against predicting the sp500 index performance for the next day (not multi variate time series as I couldn't figure out how to get it setup) and trying to use the confidence intervals it generated to buy options was useless at best

I can see chronos working a bit better, as it tries to convert trends, and pieces of time series into tokens, like gpt does for phrases.

Ie. Stock goes down terribly, then dead cat bounces. This is common.

Stock goes up, hits resistance due to existing sell orders, comes down

Stock is on stable upward trend, continues upward trend

If I can verbalize these usual actions, it's likely chronos can also pickup on them.

Once again quality of data trumps all for LLM's, so performance might vary. If you read the paper, they point out a few situations where the LLM is unable to learn a trend, ie. When the prompting time series isn't long enough.


Imitation learning of discretionary traders who rely on a mixture of rules and intuition.


We’re using HTMs for time series in our quant algorithms and they’re performing pretty well; it’s a shame that it’s mostly ignored my ML scientists..


oh interesting, Jeff Hawkin's HTM?


are you hiring


Amazon's older time series forecasting system DeepAR, has supported using external regressors since 2018 [1]. From this new Chronos paper, I didn't find any mention of external regressors.

[1] https://aws.amazon.com/blogs/machine-learning/amazon-sagemak...


They do mention covariates in section 6.1 - specifically how this method doesn’t support them but ideas on how they could in the future such as via stacking:

> In this work, we have focused on univariate time series forecasting since it constitutes the most common of real-world time series use-cases. Nevertheless, practical forecasting tasks often involve additional information that must be taken into account. One example involves covariates, that can be either time-independent (e.g., color of the product) or time-varying (e.g., on which days the product is on sale). Another closely related problem is multivariate forecasting, where historic values of one time series (e.g., interest rates) can influence the forecast for another time series (e.g., housing prices). The number of covariates or multivariate dimensions can vary greatly across tasks, which makes it challenging to train a single model that can handle all possible combinations. A possible solution may involve training task-specific adaptors that inject the covariates into the pretrained forecasting model (Rahman et al., 2020). As another option, we can build stacking ensembles (Ting & Witten, 1997) of Chronos and other light-weight models that excel at handling covariates such as LightGBM (Ke et al., 2017).


Ah. Thank you. The same concept goes under different names, so one needs to search for all of "exogenous variables", "external regressors", "external factors" and "covariates".


It may not be known yet, and this project seems to be targeted at gaussian distributions, but wouldn't the simplicity bias reduce sensitivity? I mean attention in transformers works so well in part because OOD is typically close enough.

Probably just my own bias because it seems everything I deal with is at least MArP and anomalies are important to my use case.

I can see where this is useful for others, even Amazon suggests ARIMA or ETS if you don't have hundreds of related streams.

Is this more targeted at people who want more smoothing?

Or am I just missing something?



It's great to see research in this field, I know there is opportunity here, and I hope to someday benefit from progress. But I skimmed the paper, and it doesn't appear solve a problem that I have. From the practical standpoint, what I want from a time series tool includes: 1) a small set of simple levers that I can review and tune 2) short training time for any input sets of size O(10k) to O(100k) (this covers seconds/day, minutes/week, hours/year) 3) the process of train + forecast can run fine on CPUs -- not GPUs with low memory overhead 4) decent out of the box performance that basically passes the sniff test and 5) a simple way to include regressors. I've enough experience to have learned to be wary of fully automated tuning, benchmark performance metrics, elaborate models, etc.


What types of model are the algo-traders using these days?


Do you really think the profitable algo traders are going to tell you that :-)


Why not? Sharing information moves the field forward.


Profitable algorithmic traders are not in the business of moving the field forward. They're in the business of making profits.


What field ? They aren't curing cancer, serves 0 purpose to advance the "field".


You make money with if you have useful data others don't have, or you have better algorithms that others aren't using.

When these become publicly known and used, your system doesn't work any more because the prices now include whatever signal you had for yourself before.


It's a bit more subtle than that, because there are feedback loops in the system. When a signal or factor spreads, it does so at multiple time horizons.

e.g. If I have a good signal at predicting horizon 1 day, then it is in my interest to have many people trading it at horizon > 1 day, as they will push the price in my direction.


Arima still wins


I doubt the differences in performance between all the „neural“ models are statistically significant. It strikes me as odd that a model like TFT can be the worst of the „neural“ models in one benchmark and at the same time be the best in another benchmark. Also what is the point of Benchmark I ? „It comprises 15 datasets that were also part of the training data of Chronos models“ . That is not forecasting. That is just remembering/overfitting these time series.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: