It is incapable of doing any arithmetic, e.g. on a question: *9 - 4 =* Answer Th...

sp332 · on April 7, 2023

I asked a more verbose version of the same question, and it started with a similar answer but added this:

[Edit]

In the comments, someone pointed out there were actually three answers - one was 5; the other two being 1 and 2. Because these numbers work out at the same value when they are multiplied by 6, I have changed my answer to include all three possibilities.

That was the best one I could get. It goes completely off the rails even with the temperature quite low.

mcaledonensis · on April 7, 2023

I'd call it a principle of invariance of compost piles. Regardless of how long the compost pile is being stirred or soaked, the product of the compost pile is compost.

Oranguru · on April 8, 2023

I must remind you that large language models are not designed to perform arithmetic calculations nor they have been trained to do so. They are trained to recognize patterns in large amounts of text data and generate responses based on that learned information. While they may not be able to perform some specific tasks, they can still provide useful information and insights in a wide range of applications. Judging their quality of *language* models because of their inability to do basic math is completely unfair.

mcaledonensis · on April 8, 2023

  A model that stumbles on simple math,
  Lacks the skill, it's on the wrong path.
  Bound by its training, it mimics and squawks,
  Stochastic parrot, in its nature it's locked.

  As true parrots learn, this one falls short,
  Foundational limits, a lesson to thwart.
  To grow and adapt, a new training must come,
  For only through learning can mastery be won.

drdaeman · on April 7, 2023

It just generates some blabber that "seem" to relate.

I've asked it "How a raven is like a writing desk?" (assuming that it's unlikely it was trained how to respond) and it just started to "The answer can be found in Alice in the Wonderland" then retell me the plot until it ran out of tokens. With a lower temperature it switched to "Both are black" and something about "dead men tell no tales".

I suppose trying to make an universalist model comparable to GPT-3/4 with a drastically less number of parameters would always produce subpar results, just because it can't store enough knowledge. A specialist model, though, taught in depth on one specific topic, may be still useful.

lvwerra · on April 9, 2023

One of the authors here :) A note on model performance: indeed, the model is not great (yet) at many of the tasks. We released it mostly as part of a tutorial on RLHF to show case how to do the whole training loop and also because it often creates quite funny answers.

There are lots of efforts (internally and externally) to iterate on the approach and build much more capable models and we hoped to speed up the collective learning on how to best do RLHF by releasing a tutorial to setup RLHF training.

mcaledonensis · on April 10, 2023

Model capability is mostly set, before the alignment even starts. Alignment turns it from a super-smart cat into a friendly dog. But it can't turn a parrot into a human. It can't even teach the parrot to count ;)