The original version of the test allows for interaction, and I think that’s probably a good thing. Language models currently have a hard time staying consistent/on topic and that’s a potentially valuable tell.
Instead, I think you don’t want any third parties “editing” or “curating” the exchange (beyond whatever blinding is needed to make it work).
Yeah I think the interaction is a key part. People are confusing AGI with useful AI. DALL-E is very clearly useful even if 90% of the images it produced were unusable. The time save is still large and cost reduced for simple tasks. Same with language models. They may have a hard time staying consistent over long sequences but adding human selective pressure to responses saves everyone time and money. But this is very clearly different from a machine that can think and form new ideas on its own. We're still a long way from that.
Instead, I think you don’t want any third parties “editing” or “curating” the exchange (beyond whatever blinding is needed to make it work).