It’s less about proof and more about demonstrating a new capability that TSLMs enable. To be fair, the paper did test standard LLMs, which consistently underperformed.
@iLoveOncall, can you point to examples where out of the box models achieved good results on multiple time-series? Also, what kind of time-series data did you analyze with Claude 3.5? What exactly did you predict, and how did you assess reasoning capabilities?
System prompts / review criteria cannot be "leaked" because they are open-source (full transparency). Focusing heavily on monetization at this stage seems shortsighted...this tool is a small (but longterm important) step of a larger plan.
As mentioned above, there is an open-source version for those who want full control. The free cloud version is mainly for convenience and faster iteration. We don’t store manuscript files longer than necessary to generate feedback (https://www.rigorous.company/privacy), and we have no intention of using manuscripts for anything beyond testing the AI reviewer.