Hacker News new | past | comments | ask | show | jobs | submit login
Releasing v1 of GPT-JT, fork of GPT-6B fine-tuned on 3.53B tokens (together.xyz)
158 points by b_mc2 on Nov 30, 2022 | hide | past | favorite | 17 comments



If anyone's wondering, this model is not the GPT-3-killer. The model will be mostly only useful for classification, and not general text generation. It's not an apple to apples comparison, since the other models were not fine-tuned on the same dataset.

Interesting that they didn't compare the model to Flan-T5 or TK-Instruct, both of which were fine-tuned on similar data and should display comparable results using the same amount of parameters. See the leaderboard here: https://huggingface.co/spaces/ought/raft-leaderboard

Nonetheless, props for open sourcing the model and attempting to develop new techniques for decentralized training of large scale transformers, this is no easy feat.


Text summarization examples [1] are fun:

> Input: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as 'Jumbo'.

> Output: Not as Advertised

[1] https://huggingface.co/spaces/togethercomputer/GPT-JT


I don't think I quite understood this the first time I read it but it looks like that content you show is part of the prompt.

"Great for toddlers" is the summarization actually provided by the model for "My toddler loves this game to a point where he asks for it. ...<several sentences omitted>... Please keep up the great work."

The prompt contains the instructions on how to execute the summarization with a couple of examples. "Not as Advertised" is one of the examples.


Yes, sorry, I should have been more specific. I just find the prompt hilarious. Sounds like one of the "explain a movie badly in one sentence" memes.


What does this mean? Can I download the trained model and run it on my machines? Assuming I won't need a supercomputer to run it.


Perhaps first try it out at https://huggingface.co/spaces/togethercomputer/GPT-JT to see what kind of things you can do with it.


I'm flabbergasted. I translated the tweets to Hebrew and reran the example - it returned the correct results. I then changed the input to a negative, and it again returned the correct results. So it's not only in English, and I'm sure that the Hebrew dataset was much smaller. Perhaps it is translating behind the scenes.


Thanks! Perhaps I'm not good at prompt engineering, but I could barely get anything useful out of it.


It's mainly for text classification, which explains why it's not really giving comparable outputs to GPT3


Yes you can download the trained model and run it on your machine. The article has a link to a hugging face model when you can play with in the web browser as a toy example and then download it locally and use with code.


another noob here - does this hugging face model expose an api? i have a light classification use case i might wanna try it out on but think running it on my machine/a beefy cloud machine would be overkill


All spaces do[0], but please don’t abuse it: it is just for demo purposes. If you hammer it, it will be down for everyone, and they might not bring it back up.

It can be run locally with ~16GB VRAM GPU; you might be able to configure it at a lower precision to run it with GPUs with half the RAM.

[0]: https://huggingface.co/spaces/togethercomputer/GPT-JT/blob/m...


There are also a number of commercial services that offer GPT-J APIs (and surely in a couple days GPT-JT APIs) on a pay-per-token or pay-per-compute-second basis. For light use cases those can be extremely affordable.


Cannot one do inference using a CPU?


thank you!


What's the best chatty model to run locally on a RTX 3090? This seems cool but it's a bit hard to get it to talk.


Has anyone tried running it on M1 MBP? How is the performance?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: