Just about anything useful in the secret sauce data can be distilled from the model by inspecting the logits; for example, they published distills using Llama 3.1 70b as a base, Qwen 32b, etc etc.
There is no "secret" sauce. Only sauce.
Additionally, R1-Zero shows that you don't even really need much secret sauce data, since they trained it with zero SFT data. Take an existing base model, do GRPO RL, and tada: you have a SOTA reasoning model. SFT data improves it, but the secret sauce isn't in the data.
I wouldn't hold my breath on getting access to it.