What is a lora or llama? Google gives me nothing.

Majromax · on March 29, 2023

LLaMA is the large language model published by Facebook (https://ai.facebook.com/blog/large-language-model-llama-meta...). In theory the model is private, but the model weights were shared with researchers and quickly leaked to the wider Internet. This is one of the first large language models available to ordinary people, much like Stable Diffusion is an image generation model available to ordinary people in contrast to DALL-E or MidJourney.

With the model's weights open to people, people can do interesting generative stuff. However, it's still hard to train the model to do new things: training large language models is famously expensive because of both their raw size and their structure. Enter...

LoRA is a "low rank adaptation" technique for training large language models, fairly recently published by Microsoft (https://github.com/microsoft/LoRA). In brief, the technique assumes that fine-tuning a model really just involves tweaks to the model parameters that are "small" in some sense, and through math this algorithm confines the fine-tuning to just the small adjustment weights. Rather than asking an ordinary person to re-train 7 billion or 11 billion or 65 billion parameters, LoRA lets users fine-tune a model with about three orders of magnitude fewer adjustment parameters.

Combine these two – publicly-available language model weights and a way to fine tune it – and you get work like the story here, where the language model is turned into something a lot like ChatGPT that can run on a consumer-grade laptop.

chronicler · on March 29, 2023

Thanks for sharing. How do you know this? Can you recommend any papers to read to start learning about LLMs? I have very limited ML/AI knowledge.

xmonkee · on March 29, 2023

Thanks, very helpful. Are llama and chatGPT essentially the same “program”, just with different weights? And is one better than the other (for the same number of parameters) just because it has better weights?

lbotos · on March 29, 2023

My understanding is they are both "LLM" (Large language models). That's the generic term you are looking for.

I don't think you can compare one LLMs weights to another directly, because the weights are a product of the LLM. In theory (I don't know actually) llama and chatGPT may be using different source datasets so you can't compare them like for like.

manojlds · on March 31, 2023

LLaMA and GPT are like Pepsi and Coke.

FooBarWidget · on March 29, 2023

How are the llama weights usable by the public? Even if leaked, doesn't using it count as piracy and thus a violation of either copyright or database laws?

jsnell · on March 29, 2023

It's not at all clear whether weights are copyrightable.

Radim · on March 29, 2023

There's some irony in BigCos using everyone's actual IP freely to train their models, no qualms whatsoever.

And then people being scared to even download said models because of "OMG IP!"

The asymmetry of power (and dare I say, domestication) is astounding.

FooBarWidget · on March 29, 2023

I'm pretty sure they are. If not copyrightable, then at least the database law should apply. One can easily make the case in front of a judge that the situation is similar to databases: the value of weights lies in the amount of work needed to gather the training data, thus weights should be considered a sort of crystallization of a database.

jsnell · on March 29, 2023

But the entire business model of the companies making the models seems to be including copyrighted data into the training set under the guise of fair use. If the weights are considered to be a derived work of the training data as a whole, it seems the weights would also have to be a derived work of the individual items in the training data. So I doubt any of them will be making that argument.

(Except maybe companies that have access to vast amounts of training data with an explicit license, e.g. because the content is created by their users rather than just scraped from the web?)

FooBarWidget · on March 29, 2023

That doesn't matter to database laws. Databases are protected under the premise that collecting the data takes work. How that data is licensed is orthogonal to database law.

jsnell · on March 29, 2023

If I understand correctly your claim was that "the value lies in gathering [a database] of the training data"; that the curation of the training data is what gives the trainer an intellectual property claim on the otherwise mechanical process of creating a model, right? Not that the model itself was a database.

For them to make the argument in court that database rights over the database of training data mean they have rights over the model too, they'd need to argue that the model is a derivative work training data. And then it'd mean their model is also a derived work from all the billions of works they scraped to get that data set. It would destroy the business model of the OpenAIs of the world, there is no chance they try to argue this in court.

nl · on March 30, 2023

> For them to make the argument in court that database rights over the database of training data mean they have rights over the model too, they'd need to argue that the model is a derivative work training data. And then it'd mean their model is also a derived work from all the billions of works they scraped to get that data set. It would destroy the business model of the OpenAIs of the world, there is no chance they try to argue this in court.

This doesn't follow at all.

They can argue they used that work under fair-use and/or that their work was transformative. This is a fairly clear extension of arguments used by search engines that indexing and displaying summaries is not copyright violation and these arguments have been accepted by courts in most circumstances.

jsnell · on March 30, 2023

If the uncreative and automated work of training the model is transformative enough to impact the rights of the original content creators, it would also be transformative enough to impact the rights of the database curator.

The fair use case is much harder to make here than for search engines since the model will be directly competing with the content creators. And again, how could e.g. OpenAI simultaneously claim that their use of the original content to train the model, and then subsequent use the model and the model outputs, while simultaneously claiming that the model could not be used without infringing their DB rights? You can argue fair use for both or neither; trying to argue it for just one of my the two is just incoherent.

And everyone building models needs free access to the training data way more than they need copyright as a means to protect the model.

nl · on March 30, 2023

I don't necessarily disagree, but it's very unclear what a court would find.

I suggest https://arxiv.org/abs/2303.15715 for a complete overview.

jsnell · on March 31, 2023

Agreed! It being unclear was in fact my first message in this discussion :) Thanks for the link, I'll definitely need to read it.

muyuu · on March 29, 2023

yea I do wonder about this, but even Meta are acting as if their releasing it means in effect that the cat is out of the bag

at this point, their not even complaining about it must mean that they accept the data is public now

dpiers · on March 29, 2023

LoRA: https://arxiv.org/pdf/2106.09685.pdf

LLaMA: https://ai.facebook.com/blog/large-language-model-llama-meta...

Both have been the subjects of numerous HN posts in the last month.

corford · on March 29, 2023

And the LLaMA paper from Meta.ai team is here: https://arxiv.org/pdf/2302.13971v1

h11h · on March 29, 2023

LLaMA is Facebook's LLM (large language model, comparable with GPT). It's publicly available (anyone can download the weights and run it themselves), so it's popular here.

LoRA, or Low-Rank Adaptation of Large Language Models, lets people fine tune a LLM (making it perform better for a particular application) using vastly less resources. Paper: https://arxiv.org/pdf/2106.09685.pdf

bestcoder69 · on March 29, 2023

llama: gpt-3 alternative that you can download and run on a toaster

lora: efficient way of fine-tuning a model like llama, where instead of recreating an entire model, you're keeping the base model and generating a fine-tunings file to apply on top of it.

toaster: any machine with like 4GB of RAM available to fit the model

physPop · on March 29, 2023

silly back-ronyms made by academics