That does make me wonder how much of the training corpus came in the form of Academic Papers, given the size of the arxive collection... and the other available open preprint archives... its probably not that much compared to the internet ... I guess it's something only OpenAI can answer at the moment, and it will be interesting to see how other/future projects break down.
That's what I figured too — and it's VERY good at making connections for us in our field - connections we just recently figured out on our own. So we'll be using it a lot more as "jazz"; as a way to just bounce off a bunch of weird, crazy, outlandish ideas without running any experiments or even waste time discussing. Most of these will be outlandish ideas we'd never even voice — and maybe one or two won't seem so outlandish to ChatGPT — those are the ones we might bring up in lab meeting and actually try. There's SO many different things to try, I think Chat will really help us narrow down our scope.
I'm really excited for this new way of working!
With that said, I half expected it to know more about our individual papers. Somehow, it does not.
If you play with and get a feel for the text 2 image generations, that "compress" 5 billion images into a 16GB model that coherent pictures can be probabilistically generated from, you can apply that "feel" to the language probabilistic generations -- and trust them about as much.
You're staring at a lovely image, decide to ignore eight fingers on the left hand, and not till five minutes later realize your hero has three legs.