Although no human is going to read robots.txt or bots.txt
It'll end up as a small section in the EULA of the website which nobody reads:
> Before you click the 'reply' button please be aware we are allowing AI to crawl our comment section for training. Thank you for your consideration.
There's a little problem though:
1) Websites don't have an incentive to inform their users about this, and websites don't have an incentive to allow AI to crawl their content unless they get something back from it (e.g. payment). From this PoV, its time for OpenAI to start paying.
2) The competition (China, Russia) doesn't care about bots.txt or robots.txt and will just crawl whatever the hell they can.
I can see a bots.txt entry in the near future that discern the site's data-usage for bots vs humans