Releasing v1 of GPT-JT, fork of GPT-6B fine-tuned on 3.53B tokens

ipsum2 · on Nov 30, 2022

If anyone's wondering, this model is not the GPT-3-killer. The model will be mostly only useful for classification, and not general text generation. It's not an apple to apples comparison, since the other models were not fine-tuned on the same dataset.

Interesting that they didn't compare the model to Flan-T5 or TK-Instruct, both of which were fine-tuned on similar data and should display comparable results using the same amount of parameters. See the leaderboard here: https://huggingface.co/spaces/ought/raft-leaderboard

Nonetheless, props for open sourcing the model and attempting to develop new techniques for decentralized training of large scale transformers, this is no easy feat.

selcuka · on Nov 30, 2022

Text summarization examples [1] are fun:

> Input: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as 'Jumbo'.

> Output: Not as Advertised

[1] https://huggingface.co/spaces/togethercomputer/GPT-JT

wyldfire · on Nov 30, 2022

I don't think I quite understood this the first time I read it but it looks like that content you show is part of the prompt.

"Great for toddlers" is the summarization actually provided by the model for "My toddler loves this game to a point where he asks for it. ...<several sentences omitted>... Please keep up the great work."

The prompt contains the instructions on how to execute the summarization with a couple of examples. "Not as Advertised" is one of the examples.

selcuka · on Nov 30, 2022

Yes, sorry, I should have been more specific. I just find the prompt hilarious. Sounds like one of the "explain a movie badly in one sentence" memes.

haolez · on Nov 30, 2022

What does this mean? Can I download the trained model and run it on my machines? Assuming I won't need a supercomputer to run it.

jks · on Nov 30, 2022

Perhaps first try it out at https://huggingface.co/spaces/togethercomputer/GPT-JT to see what kind of things you can do with it.

dotancohen · on Nov 30, 2022

I'm flabbergasted. I translated the tweets to Hebrew and reran the example - it returned the correct results. I then changed the input to a negative, and it again returned the correct results. So it's not only in English, and I'm sure that the Hebrew dataset was much smaller. Perhaps it is translating behind the scenes.

haolez · on Nov 30, 2022

Thanks! Perhaps I'm not good at prompt engineering, but I could barely get anything useful out of it.

AkshatJ27 · on Nov 30, 2022

It's mainly for text classification, which explains why it's not really giving comparable outputs to GPT3

ghilston · on Nov 30, 2022

Yes you can download the trained model and run it on your machine. The article has a link to a hugging face model when you can play with in the web browser as a toy example and then download it locally and use with code.

swyx · on Nov 30, 2022

another noob here - does this hugging face model expose an api? i have a light classification use case i might wanna try it out on but think running it on my machine/a beefy cloud machine would be overkill

espadrine · on Nov 30, 2022

All spaces do[0], but please don’t abuse it: it is just for demo purposes. If you hammer it, it will be down for everyone, and they might not bring it back up.

It can be run locally with ~16GB VRAM GPU; you might be able to configure it at a lower precision to run it with GPUs with half the RAM.

[0]: https://huggingface.co/spaces/togethercomputer/GPT-JT/blob/m...

wongarsu · on Nov 30, 2022

There are also a number of commercial services that offer GPT-J APIs (and surely in a couple days GPT-JT APIs) on a pay-per-token or pay-per-compute-second basis. For light use cases those can be extremely affordable.

codedokode · on Nov 30, 2022

Cannot one do inference using a CPU?

swyx · on Nov 30, 2022

thank you!

Tepix · on Nov 30, 2022

What's the best chatty model to run locally on a RTX 3090? This seems cool but it's a bit hard to get it to talk.

malshe · on Nov 30, 2022

Has anyone tried running it on M1 MBP? How is the performance?