Such logs would not be used for training the base model, but rather for fine-tuning the model for instruction following. Instruction tuning requires far less data than is needed for pre-training the foundation model. Stanford Alpaca showed surprisingly strong results from fine-tuning Meta's LLaMA model on just 52k ChatGPT-esque interactions (https://crfm.stanford.edu/2023/03/13/alpaca.html).
well, the initial twitter rant was pretty bombastic:
"The cat is finally out of the bag – Google relied heavily on @ShareGPT
's data when training Bard.
This was also why we took down ShareGPT's Explore page – which has over 112K shared conversations – last week.
Insanity."
Fine-tunning is not exactly the same as "relying heavily", I bet they got way more fine-tunning data from simply asking their 100k employees to pre-beta test for a couple of months