From an open source point of view it would be better if scraping proprietary LLMs would be allowed. Small LMs need this infusion of data to develop.
But the big news is that it works, just a bit of data can have a large impact on the open source LLMs. OpenAI can't have a moat in their proprietary RLHF dataset. Public models leak, they can be distilled.
But the big news is that it works, just a bit of data can have a large impact on the open source LLMs. OpenAI can't have a moat in their proprietary RLHF dataset. Public models leak, they can be distilled.