StackLlama: A hands-on guide to train LlaMa with RLHF

pksebben · on April 7, 2023

Glad to see more progress on open(ish) source versions. There's so much more these things could do unfettered by corporate motivations.

kkielhofner · on April 7, 2023

I'm convinced this is going to be history repeating itself:

- Microsoft/Sun/etc trying to own web in the late 90s - early 20s. LAMP came and ate their lunch (for all intents and purposes).

- Microsoft and Windows Phone. Android (open source again) plus Apple but with the BSD/Mach underpinnings could be argued.

- Microsoft Edge. Give up, use Chromium.

Once again we have Microsoft (famously via OpenAI) doing what they do and trying to own an emerging space. Based on the lightning progress in the open(ish) "AI" space I'm pretty certain OpenAI and others will take a back seat to the open ecosystem within a few years.

KaoruAoiShiho · on April 7, 2023

According to interviews OpenAI only released ChatGPT in advance of GPT4 because of paranoia that they would be supplanted by open versions and end up being irrelevant. Their fear is not unfounded as it-just-happened to them, with Dalle-2 and stablediffusion.

kkielhofner · on April 7, 2023

OpenAI is well funded but again I'm reminded of open source. Way back when in 2008 the Linux Foundation (yes, consider the source) estimated[0] that Fedora 9 represented approximately $10.8B (2008 dollars) in cost if developed commercially/conventionally. I actually believe that (Debian as another example has over 50k packages). Meanwhile it has to be some multiple of that 15 years later.

OpenAI having a few billion or more to throw around seems like a lot. The combined rest of the world including supporting commercial entities (Stability AI and others = Red Hat, IBM, Intel, FB, Google, etc) and open source contributors have the equivalent of many times that.

On a long enough timeline the closed/proprietary approach cannot win.

[0] - https://www.linuxfoundation.org/press/press-release/linux-fo...

pksebben · on April 8, 2023

We are aligned. I just wonder when we'll get to a stage where we can start producing value faster than the landlord types can charge rent on it.

I think this tech might actually represent an opportunity to break out of the systemic quagmire we've been in societally for so long - rotting institutions dictating access to information and entrenched powers accumulating for the sake of accumulation.

I want everyone to have a personal assistant who can help them learn whatever it is they want to learn, for free, at any time of day. We're so damn close.

staunton · on April 8, 2023

> On a long enough timeline the closed/proprietary approach cannot win.

Yes it can. The magic word is "regulation".

vimy · on April 8, 2023

Maybe. The problem is that you need billions to train new models.[1] At least with how things are now.

[1] https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-...

runnerup · on April 8, 2023

With 6-7 more doublings in compute power per watt, consumers will have the power of 1000 A100 GPU's in their iPhone. "Eventually" will come before I die, at least. That would happen outside of GP's time estimate but a moderately funded university consortium could probably afford it just a few doublings from now.

pksebben · on April 8, 2023

I think we'll reach architectural changes that make this moot before we reach hardware for it. The way we train these models is constantly in flux, and we just need someone to crack continuous learning so we can pass models around and train them en-masse, using the collective unused compute that is literally sitting on mine and everyone else's desk right now.

oriel · on April 9, 2023

When considering future of tech, its valuable to consider there are at least two well-trodden paths:

1. semi-linear extrapolation of existing tech and progression (maturing tech)

2. new paradigms approaching the problem from a new angle or with new insight that invalidates or levels up past 1.

Since we're in the midst of a cambrian explosion for both 1. and 2. IMO I dont expect limitations as we've been seeing them will hold up even under the medium term.

refulgentis · on April 7, 2023

I've been on leave from work and hammering the GPT APIs since GPT 3.5/ChatGPT was made available.

The local LLM stuff was a tad out of control from the drop, too many people hand-waving about how they could get the 7B running on a phone with quantization, but it was unintelligible, and not "no-RLHF" unintelligible. Just FUBAR'd.

I tried the latest round of RLHF'd models yesterday, and I'm officially publicly a skeptic now. These are an awful idea, training on ShareGPT gets horrible results: I'm seeing it emit the same exact answers ChatGPT does, but only a small fraction of them.

I understand that it itself impressive for a certain crowd, and I cede it's an accomplishment. However, it's an accomplishment that enables no further accomplishment: using a stolen model to do minimal RLHF that is really just overfitting on a subset of answers from another AI. That's not RLHF at all. If it was, RLHF isn't something you do in a weekend for $100, and pretty much everyone outside OpenAI and Anthropic are learning that.

trifurcate · on April 7, 2023

In my experience, the smaller models are almost completely worthless as-is. 65B is the only decent one (I'd say just behind gpt-3.5-turbo, and obviously it's not instruction tuned but I mean the coherency of the core language model), and understandably people aren't really paying attention or devoting much resources to the largest one. 30B shows promise for specific tasks with fine tuning, but 7B and 13B are just toys.

danielbln · on April 7, 2023

How would you judge Open Assistant's approach?

refulgentis · on April 7, 2023

I don't know much about it specifically but, heartily endorse.

LAION was instrumental in early-ish AI art. I will always cherish & remember when you had like 14 people in an IRC room just playing around, fall/winter 2020. Now 3 of them have companies around it, and the resources that were there to enable ex. SD are similarly interested in LLMs.

This is excellent: open source is the way forward, just, needs to be more coordination, expertise, and patience involved. SNR ratio is way too low in general public spaces like HN right now.

(I'm being exclusively negative, in recompense: we're at year 1 of 100, the people wasting a ton of time replicating a proof of concept crappy "RLHF" run and rushing to post are learning too. The eternal golden summer starts now and anything anyone is doing is helpful)

espadrine · on April 7, 2023

One flaw that it has is to extend the question despite the prompt requesting an answer.

For instance, I compare llama-7b-se-rl-peft to OA_SFT_Llama_30B[0] on the following question: “What is the color of the sky on Mars?”

llama-7b-se-rl-peft: “There are a few other related questions in this site, but none addresses exactly what I’m looking for. If you have been to mars (or if someone else has) can you please tell me what the colour of the sky is and maybe how it looks from different angles or at night time etc.”

OA_SFT_Llama_30B: “The color of the sky on Mars depends on various factors, such as time of day and year, atmospheric conditions, and viewing angle. However, in general, the Martian sky is often described as having a reddish or pink hue due to the presence of iron oxide in its soil and atmosphere.”

It could be the smaller size, or it could be the reward model not incentivizing a proper understanding of the “Answer:” tokens. Still, it is nice to see those open efforts.

[0]: https://open-assistant.io/chat

throwaway888abc · on April 7, 2023

just tried same for curiosity "The Mars sky would be pink if you were to look up and observe it in person. This is due to the presence of finely dispersed iron oxide particles in its atmosphere, which scatter blue light from the sun and cause the sky to appear reddish-pink in color. This phenomenon is known as "Rayleigh scattering." If seen through a telescope on Earth however, the Martian sky may appear dark or even black depending on atmospheric conditions and time of day/year. It can sometimes be observed with a bluish tint during sunrise or sunset when more sunlight reaches the planet's surface and illuminates the dust particles in the air. During certain times of year (such as spring and summer), a thin layer of water ice clouds can also form at higher altitudes, causing the sky to become slightly whiter than usual."

Model

OA_SFT_Llama_30B

scottydog51834 · on April 7, 2023

I'd love a tool where I can upload a private dataset and RLHF a model (even better if the tool provides the pre-trained model) without me having to worry about GPUs, memory, commercial access, or even writing any python code. I'd happy pay several hundred and maybe several thousands for access to this.

Tepix · on April 7, 2023

Even better would be a crowdfunding platform where you can collect some money from a bunch of people and then use it to fine-tune the bigger models (30B and 65B in the case of LLaMA).

chankstein38 · on April 7, 2023

OpenAI allows you to fine tune any of their LLM models. I'm not sure if your requirements were "without paying OpenAI something" but if you are ok with that, fine tuning with them is pretty straightforward. IANAL or anything but if I recall they even promise they won't use your data for their training.

sebzim4500 · on April 7, 2023

They let you do this with their GPT-3 derived models, but not the GPT-3.5 or GPT-4 ones.

That would be unsafe lol

ttul · on April 7, 2023

I think that tool would have broad appeal, but I wonder if the most likely case is that it would be buried inside other, higher-level systems, such as customer support automation SaaS.

mcaledonensis · on April 7, 2023

It is incapable of doing any arithmetic, e.g. on a question: 9 - 4 =

  Answer

  There are a few other ways to make this easier.

  1. Keep the remainder as an argument.

  You can do that by rewriting your divmod() function like   this:

  def divmod(x, y):
    return x, (y % x)

sp332 · on April 7, 2023

I asked a more verbose version of the same question, and it started with a similar answer but added this:

[Edit]

In the comments, someone pointed out there were actually three answers - one was 5; the other two being 1 and 2. Because these numbers work out at the same value when they are multiplied by 6, I have changed my answer to include all three possibilities.

That was the best one I could get. It goes completely off the rails even with the temperature quite low.

mcaledonensis · on April 7, 2023

I'd call it a principle of invariance of compost piles. Regardless of how long the compost pile is being stirred or soaked, the product of the compost pile is compost.

Oranguru · on April 8, 2023

I must remind you that large language models are not designed to perform arithmetic calculations nor they have been trained to do so. They are trained to recognize patterns in large amounts of text data and generate responses based on that learned information. While they may not be able to perform some specific tasks, they can still provide useful information and insights in a wide range of applications. Judging their quality of *language* models because of their inability to do basic math is completely unfair.

mcaledonensis · on April 8, 2023

  A model that stumbles on simple math,
  Lacks the skill, it's on the wrong path.
  Bound by its training, it mimics and squawks,
  Stochastic parrot, in its nature it's locked.

  As true parrots learn, this one falls short,
  Foundational limits, a lesson to thwart.
  To grow and adapt, a new training must come,
  For only through learning can mastery be won.

drdaeman · on April 7, 2023

It just generates some blabber that "seem" to relate.

I've asked it "How a raven is like a writing desk?" (assuming that it's unlikely it was trained how to respond) and it just started to "The answer can be found in Alice in the Wonderland" then retell me the plot until it ran out of tokens. With a lower temperature it switched to "Both are black" and something about "dead men tell no tales".

I suppose trying to make an universalist model comparable to GPT-3/4 with a drastically less number of parameters would always produce subpar results, just because it can't store enough knowledge. A specialist model, though, taught in depth on one specific topic, may be still useful.

lvwerra · on April 9, 2023

One of the authors here :) A note on model performance: indeed, the model is not great (yet) at many of the tasks. We released it mostly as part of a tutorial on RLHF to show case how to do the whole training loop and also because it often creates quite funny answers.

There are lots of efforts (internally and externally) to iterate on the approach and build much more capable models and we hoped to speed up the collective learning on how to best do RLHF by releasing a tutorial to setup RLHF training.

mcaledonensis · on April 10, 2023

Model capability is mostly set, before the alignment even starts. Alignment turns it from a super-smart cat into a friendly dog. But it can't turn a parrot into a human. It can't even teach the parrot to count ;)

kashifr · on April 7, 2023

All the steps involved in training a LlaMa model to answer questions on Stack Exchange data with RLHF.

ttul · on April 7, 2023

You could of course use your own question and answer data to refine the model using the same process. I wonder if anyone has tried that yet to, for instance, fine tune LlaMa to answer support queries for their company?

great_psy · on April 7, 2023

Hopefully research like this will even out access to the new tech. Maybe once we figure out a pretty good architecture we will have something like chatBot.train(…) where we just feed some data for the fine tuning.

lumost · on April 7, 2023

curious why all of these posts start with Llama vs one of the many open source LLMs now. We have the Cerebrus releases, Salesforce CodeGen-NL, and others.

Tepix · on April 7, 2023

So, they are taking the Llama model released by Meta, doing a little fine-tuning and then re-releasing the resulting model under a different license?

That seems very sketchy. The Meta license grants a "non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Meta’s copyright interests to reproduce, distribute, and create derivative works of the Software solely for your non-commercial research purposes."

A better way would be to redistribute xdelta3 files so people with access to the LLaMA model weights can use them to arrive at the fine-tuned model weights. Or is there perhaps a better tool than xdelta3 specifically for LLMs?

GaggiX · on April 7, 2023

They only released the LoRA.

Tepix · on April 7, 2023

Oh, you're absolutely right. I must have looked at the wrong folder or something. Never mind then!

jimsimmons · on April 7, 2023

HF wants to undercut OpenAI anyway possible.

My cynical take is that HF gives as much damn as OpenAI about open source. It's just whatever gets you ahead of your peers.

Right now OpenAI has a massive advantage with GPT4 and their RLHF stack. HF and maybe even Meta want to claw their way back via crowdsourcing

refulgentis · on April 7, 2023

This has ~0 to do with Hugging Face, Hugging Face is Github for ML models