If anyone's wondering, this model is not the GPT-3-killer. The model will be mostly only useful for classification, and not general text generation. It's not an apple to apples comparison, since the other models were not fine-tuned on the same dataset.
Interesting that they didn't compare the model to Flan-T5 or TK-Instruct, both of which were fine-tuned on similar data and should display comparable results using the same amount of parameters. See the leaderboard here: https://huggingface.co/spaces/ought/raft-leaderboard
Nonetheless, props for open sourcing the model and attempting to develop new techniques for decentralized training of large scale transformers, this is no easy feat.
> Input: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as 'Jumbo'.
I don't think I quite understood this the first time I read it but it looks like that content you show is part of the prompt.
"Great for toddlers" is the summarization actually provided by the model for "My toddler loves this game to a point where he asks for it. ...<several sentences omitted>... Please keep up the great work."
The prompt contains the instructions on how to execute the summarization with a couple of examples. "Not as Advertised" is one of the examples.
I'm flabbergasted. I translated the tweets to Hebrew and reran the example - it returned the correct results. I then changed the input to a negative, and it again returned the correct results. So it's not only in English, and I'm sure that the Hebrew dataset was much smaller. Perhaps it is translating behind the scenes.
Yes you can download the trained model and run it on your machine. The article has a link to a hugging face model when you can play with in the web browser as a toy example and then download it locally and use with code.
another noob here - does this hugging face model expose an api? i have a light classification use case i might wanna try it out on but think running it on my machine/a beefy cloud machine would be overkill
All spaces do[0], but please don’t abuse it: it is just for demo purposes. If you hammer it, it will be down for everyone, and they might not bring it back up.
It can be run locally with ~16GB VRAM GPU; you might be able to configure it at a lower precision to run it with GPUs with half the RAM.
There are also a number of commercial services that offer GPT-J APIs (and surely in a couple days GPT-JT APIs) on a pay-per-token or pay-per-compute-second basis. For light use cases those can be extremely affordable.
Interesting that they didn't compare the model to Flan-T5 or TK-Instruct, both of which were fine-tuned on similar data and should display comparable results using the same amount of parameters. See the leaderboard here: https://huggingface.co/spaces/ought/raft-leaderboard
Nonetheless, props for open sourcing the model and attempting to develop new techniques for decentralized training of large scale transformers, this is no easy feat.