R1-Zero is trained differently than most reasoning models, such as the "normal" ...

R1-Zero is trained differently than most reasoning models, such as the "normal" R1 model, in regards what steps are done in training. TinyZero applies the same approach (but only on a subset of use cases) on a much smaller model to show it can apply on much smaller models as well.

The details of how it's trained different start to get into "machine learning expert" territory but you can get a decent high level via a casual read through of the DeepSeek link if you want to dive deeper.