The secret sauce is the data. I wouldn't hold my breath on getting access to it.

reissbaker · 2025-01-27T20:55:50 1738011350

Just about anything useful in the secret sauce data can be distilled from the model by inspecting the logits; for example, they published distills using Llama 3.1 70b as a base, Qwen 32b, etc etc.

There is no "secret" sauce. Only sauce.

Additionally, R1-Zero shows that you don't even really need much secret sauce data, since they trained it with zero SFT data. Take an existing base model, do GRPO RL, and tada: you have a SOTA reasoning model. SFT data improves it, but the secret sauce isn't in the data.

jgalt212 · 2025-01-27T20:22:55 1738009375

Indeed. Litigation exposure is just too great when releasing the training data.