Let me preface this by saying I generally agree with what you're saying. My focu...

Let me preface this by saying I generally agree with what you're saying. My focus below is on the delta.

> It is true that this model requires a large amount of unlabeled data, in addition to a small amount of labeled data, but gathering unlabeled data is often easy.

Gathering unlabeled data is easier than labeled but can still be challenging. You'll often require careful thought in assembling a distribution of examples. Being able to skip that step yields a significant savings even if not as much as that gained from going from labeled to largely unlabeled data.

> I'd say this approach actually has a serious advantage compared to GPT3, which is locked inside the walls of OpenAI, can only be used with their permission, and is too big for most people to use (let alone train) anyway.

I fully agree and said as much too.

> The cost and effort to use this approach on large real world problems is probably less than using GPT3.

I'd say it depends. Most of the effort with GPT3 will involve edge cases. Having a system in front to handle these might eat into labor savings but you could still end up net positive. It's difficult to say without real world data, you might be correct.

> That is, it may be a more efficient way to train new models from scratch for tasks for which few labeled examples are available.

You're right in general, I think. But it's still worth pointing out GPT3's advantage. It combines a lot of general capabilities, which together with its generative ability and flexibility to input means the level of expertise required to get something useful will be much lower compared to this semi-supervised learning approach. And there are some capabilities it's displayed, one example of many being discussing, querying, pattern matching on computer code that seem hard to replicate with this method.