Hacker News new | past | comments | ask | show | jobs | submit login

weights refer to the trained model weights like for example stable diffusion's v1.1, v1.2 .. v1.4 v.2.x etc. Same with llama having 13B up to 65B parameters (different weights)

LLM refers to large language model, in contrast with diffusion models or GAN models, the text models are the ones that take text and autocomplete it, like the GPT family, open source BLOOM, and now the LlamA from facebook.

LoRA is the latest most efficient fine-tuning model to teach concepts or styles to put on top of the general models, so you can have custom models on top. like embeddings or fine-tuning for LLM's. So you had Textual Inversion, Dreambooth, and LoRA on that category.

what else do you need? Googling or asking chatgpt can help a lot too




> weights refer to the trained model weights

This is what I'm having a hard time understanding.

So there's the weights, and also a model somewhere? That the weights are based on? Or that you combine with the model to tune it?


Let's take a step back. You have a model like linear regression. For example, y=bx where y are your outputs and x are your inputs. Based on some data, you learn that b=1. Therefore, you share the weights of the model as a file like {b=1} and also share with them the model y=bx (usually shared via code) so they can run it in production.


This is the best explanation imo.

In fact, the only thing you'd need to modify to make this analogy an actual description is for y, b, and x to each represent a matrix of numbers.


My really simplified explanation is:

Your inputs are lists of numbers. Your outputs are lists of numbers. There exists some possible list of numbers such that, if you multiply your inputs by that list you'll get (approximately) the outputs.

In this conception that possible set of numbers are the weights. "Training" is when you run inputs, compare to known outputs, and then update the weights so they produce outputs closet to what you want.

Large Language Models, it may be hard to see how they fit this paradigm - basically convert a sequence to a list of numbers ('aardvark' is 1, 'apple' is 2 etc) and then the desired output is the next word in the sequence (represented as a number). Surprisingly, if you get good at predicting next word in sequence you also get the ChatGPT et al behavior.


model is class with params. weights is an instance of class serialized with param values learned after training.


This is what happens when running inference on a neural network:

Input (list of numbers) -> (Bunch of math operations) with (other numbers) -> Output (also a list of numbers)

This applies whether you are talking about image classification, image generation, text generation etc.

The model defines what the "(Bunch of math operations)" part is. As in, do these multiplications, then add, then a tanh operation etc.

The weights define what the "(other numbers)" are. Training is the process of figuring out these weights using various methods - some of which involve example inputs/outputs (supervised learning), others don't require examples (unsupervised or self-supervised learning).


Model is code, weights are the input data to that code




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: