Hacker News new | past | comments | ask | show | jobs | submit login

Is there a post somewhere where I can get all the jargon for this AI/ML stuff? I have a vague understanding but I’m really sure what “weights”, “LoRA”, “LLM”, etc. are to really understand where each tool and concept fit in.



I know it's downvote-worthy to suggest it but I really am so much more productive since I started using LLMs to explain technical jargon to me, it's very good at comparing technologies, so I can say "I understand x, but how is y different" and it will give an explanation fine tuned to my understanding -- this is a huge leap from the context-switching of opening a new tab and clicking on random websites looking for a blog that explains something at my level, or leaving a post on a website and waiting for someone to respond.


Same! ChatGPT has been out for months, and really only in the last few weeks have i started using it as much as I should. I'm using an electron wrapper now, so it's a first class desktop app. With the plus subscription, the 3.5 is as fast as google.


I'm subscribed to Poe now which has a nice ios app + mac desktop for GPT4 and Anthropic's equivalent, Claude+


what is an electron wrapper


I asked GPT-4 for you:

"An "Electron wrapper" refers to a software application created using Electron, a popular open-source framework. Electron allows developers to build cross-platform desktop applications using web technologies like HTML, CSS, and JavaScript.

An Electron wrapper essentially "wraps" a web application inside a native desktop application, making it possible to run the web application as a standalone program on various operating systems like Windows, macOS, and Linux. This is particularly useful for developers who want to create desktop applications without having to write separate code for each platform. Instead, they can use their existing web application code and package it with Electron to create a desktop application that works across multiple platforms."


What, nothing about Electron's unacceptable code bloat factor? Most disappointing!

https://www.hendrik-erz.de/post/electron-bloated-web-and-tra...

A GPT-4-on-GPT-4 flame war might be an incredible spectacle...


I asked GPT-4 to produce a simulated argument about Electron which devolves into name calling. Here's the 2 page of result it returned:

User1: Hey, have you heard about Electron? It's an awesome framework for creating cross-platform apps using web technologies like HTML, CSS, and JavaScript.

User2: Yeah, I've heard of it. But I think it has some pretty serious shortcomings. Like, it's notorious for consuming a lot of system resources, which can be a major drawback for certain applications.

User1: Well, I think that's an acceptable trade-off for the ease of development and the ability to create one codebase that works across different platforms. Plus, it's backed by a large community, which is always a plus.

User2: Sure, but you can't ignore the fact that it increases the app's size significantly. Even a simple Electron app has a huge size compared to native apps, making it a poor choice for users with limited storage or slow internet connections.

User1: The size issue can be managed with optimization techniques, and honestly, storage is becoming cheaper and more abundant these days. You're blowing it out of proportion.

User2: It's not just about storage. Electron apps tend to have worse performance than native apps because they run on a separate instance of Chromium for each app. That's just bad design!

User1: Look, no framework is perfect, but Electron provides a valuable solution for developers who don't want to learn multiple programming languages and platforms. It's about practicality and productivity.

User2: Practicality? You call it practical to have apps that drain your battery life and hog system resources? I'd rather invest the time in learning native development than creating a subpar user experience.

User1: You're really not giving Electron enough credit here. It's widely used by popular apps like Slack, Visual Studio Code, and Discord! They can't be all that bad.

User2: Just because some popular apps use it doesn't mean it's a good choice. They're actually prime examples of how Electron can lead to bloated and resource-hungry applications. Have you ever noticed how sluggish those apps can be at times?

User1: Oh, come on! You're just nitpicking now. The benefits of Electron far outweigh its drawbacks for most developers and applications.

User2: I'm not nitpicking. I'm pointing out legitimate concerns. If Electron keeps getting a free pass, we'll end up with a sea of bloated, resource-hungry applications, dragging down our devices.

User1: You're just being a native app elitist! Electron has democratized app development, making it more accessible for web developers. It's better to have more apps available, even if they're not perfect, than to have a limited selection because of high barriers to entry.

User2: You Electron fanatics never cease to amaze me. It's like you're blind to its glaring problems. You want everyone to just ignore the elephant in the room!

User1: And you native app snobs are so quick to dismiss any alternative solution that threatens your precious, exclusive club! Get off your high horse!

User2: At least my "exclusive club" doesn't promote lazy development practices and drag down user experience! Enjoy your bloated apps, you Electron zealot!

User1: Have fun in your walled garden, you insufferable native app purist!


> I asked GPT-4 to produce a simulated argument about Electron which devolves into name calling. Here's the 2 page of result it returned:

How many attempts did the above take?


Wow.


Touché


That sounds like a cool use case, but... how can you bring yourself to trust anything at all that it tells you? This feels like it's just begging to contaminate your understanding with sparse, unpredictable errors, and personally I find the thought of that unexpectedly dismaying.


It's a lot easier to verify whether something is correct once you know what the thing is.

Half the time when I'm googling something I don't even know what subfield the term belongs to. I don't know what related terms I can use to narrow it down. Or I keep getting papers on a related but different question from what I'm trying to ask.

But if ChatGPT gives me an answer in context of a bunch of related terminology, it's so much easier to then plug those additional terms into Google and find a/the authoritative source.

For comparison, a lot of my (non-ChatGPT) initial searches will result in random blog entries or something that are often just flat-out wrong or else ignoring competing viewpoints. But they still might point me to better context and search terms.

The value of ChatGPT is that it gets you to information you can verify so much faster. But you still need to verify it the same way you need to verify stuff you find on blogs or Quora or Reddit or HN.


Personally Ive been coding long enough I’ve forgotten more than most junior people know. So I use ChatGPT to refresh my memory more than anything else. Like yesterday couldn’t remember how to open some ports on a Postgres server. ChatGPT gave me a 95% accurate answer and based on my own knowledge base I fixed the mistake. This took all of a few seconds. Whereas with normal web search it would take an order of magnitude longer and likely be not much more accurate.


No need to trust it -- that's the best part.

I tend to trust blogs or conversations with other people more than I should. The benefit of a tool like ChatGPT is that I already distrust it, and I tend to double-check more often what it responds.

I agree with you, but the comparison and fine-tuning GP mentioned are really areas where it's an unparalleled tool right now. Dismissing it in fear of "contaminating your understanding" is missing out on what the tool may offer to you. It's not really that much worse than a search query.


This has been really the most productive use of it for me. But I'm so cheap I've been stopping myself more and more and just spending an extra minute googling to save the cent. Even if its just literally penny pinching, its also helping the environment! Is the velocity I get an answer worth the cycles? I'm not so sure these days, but its been interesting reflecting on this.

Definitely excited for the inevitable reports comparing the killowats spent cumulatively to get an answer through google vs openai. How many blogs visited does it take to spend the same amount of energy getting the answer from a model?


I was like this a week ago. Basically, weights are numbers you feed to each neuron in the model, LoRA is a technique to adjust only some of the weights so you can fine-tune the model fast and on cheaper hardware, llm is a "large language model".

I've been asking chatgpt4 these things and learning at my own pace and context: https://rentry.co/vafkn


What's fun is that the recent references to LoRA stands for Low-Rank Adaptation [1], not Low-Resource Adapters[2] (??? don't know if this even exists), but because Low-Rank Adaptation came out in 2021, chatgpt only explains what Low-Resource Adapters is, not Low-Rank Adaptation, which is what is being used in recent break throughs.

My own questioning of chatgpt on LoRA returns "Logistic Regression Algorithm" instead. Looks like it's too new to ChatGPT.

[1] https://arxiv.org/abs/2106.09685 [2] https://rentry.co/vafkn


Well, it’s more that the weights are the neurons. There’s not actually like neuron objects defined or anything, neural networks are just a bunch of matrix operations. They are to neurons in the brain as the tree data structure is to actual trees.


The weights aren't the neurons, they're the connection strengths between the neurons.


You mean the same neurons that don't really exist, other than as a connection from weight to weight, so quite reasonable to see as the weights themselves, because what else is available for us to reference as neurons in absentia those connections?


I disagree. The metaphor we use in ML is that neurons are the nodes that receive the summed, weighted inputs from other neurons. Yes, the weights are the strengths of the connections between them. However, there are many more weights than neurons, so conflating the two doesn't make sense schematically. Also, neurons can also have other parameters which define their behavior such as bias and activation functions. Furthermore, the activation of a neuron defines the network's response to a stimuli, so these change depending on the input, whereas the weighs are constants (after being trained), that parameterize the system.

The analogy is that weights are synapses, not neurons. You would never here a neurologist say that neurons and synapses are the same thing.


I would agree with most of this, but there is no direct analogy between all of the components of a neuron in neurology, which are typically umbrella'd under the name "neuron" and regarded as parts of one, versus ML's version.

Eg, if a weight can be a synapse, can't a weight be an axon? Axons also "connect" neurons, but their length is more related to the connection strength, so could be considered more analogous to a "weighting".

Yet, axons are not as obtusely "one-to-many" as synapses, but depending on the structure of the ML model, and the view of which aspect of it is more impactful to be highlighting by analogy, either take might be more appropriate.

I suppose it depends on the kind of structure you're working with, and whether you're training and inferring, or just one or the other. In all cases I think a good argument could be made for general neuron analogy abuse.


Oh that's interesting. I don't know too much about the neuroscience, just enough to agree that a real neuron is vastly more complex than a node in a "neural net". Based on your description, an axon is most highly analogous to the bias term, although it would be a multiplicative bias. I wonder if that's been tried.


> You mean the same neurons that don’t really exist,

“Neurons” are an abstraction that exists logically “between” the weights, but the weights themselves don’t have the features of neurons. (In that each weight is the weight of a connection between a neurons (or between a neuron and an input/output.) Weights are more synapses than neurons.



Is it? Why does everyone dunk on him?


weights refer to the trained model weights like for example stable diffusion's v1.1, v1.2 .. v1.4 v.2.x etc. Same with llama having 13B up to 65B parameters (different weights)

LLM refers to large language model, in contrast with diffusion models or GAN models, the text models are the ones that take text and autocomplete it, like the GPT family, open source BLOOM, and now the LlamA from facebook.

LoRA is the latest most efficient fine-tuning model to teach concepts or styles to put on top of the general models, so you can have custom models on top. like embeddings or fine-tuning for LLM's. So you had Textual Inversion, Dreambooth, and LoRA on that category.

what else do you need? Googling or asking chatgpt can help a lot too


> weights refer to the trained model weights

This is what I'm having a hard time understanding.

So there's the weights, and also a model somewhere? That the weights are based on? Or that you combine with the model to tune it?


Let's take a step back. You have a model like linear regression. For example, y=bx where y are your outputs and x are your inputs. Based on some data, you learn that b=1. Therefore, you share the weights of the model as a file like {b=1} and also share with them the model y=bx (usually shared via code) so they can run it in production.


This is the best explanation imo.

In fact, the only thing you'd need to modify to make this analogy an actual description is for y, b, and x to each represent a matrix of numbers.


My really simplified explanation is:

Your inputs are lists of numbers. Your outputs are lists of numbers. There exists some possible list of numbers such that, if you multiply your inputs by that list you'll get (approximately) the outputs.

In this conception that possible set of numbers are the weights. "Training" is when you run inputs, compare to known outputs, and then update the weights so they produce outputs closet to what you want.

Large Language Models, it may be hard to see how they fit this paradigm - basically convert a sequence to a list of numbers ('aardvark' is 1, 'apple' is 2 etc) and then the desired output is the next word in the sequence (represented as a number). Surprisingly, if you get good at predicting next word in sequence you also get the ChatGPT et al behavior.


model is class with params. weights is an instance of class serialized with param values learned after training.


This is what happens when running inference on a neural network:

Input (list of numbers) -> (Bunch of math operations) with (other numbers) -> Output (also a list of numbers)

This applies whether you are talking about image classification, image generation, text generation etc.

The model defines what the "(Bunch of math operations)" part is. As in, do these multiplications, then add, then a tanh operation etc.

The weights define what the "(other numbers)" are. Training is the process of figuring out these weights using various methods - some of which involve example inputs/outputs (supervised learning), others don't require examples (unsupervised or self-supervised learning).


Model is code, weights are the input data to that code


Just for quick reference:

LLM - the model

weights - the data of the model itself

LoRA - a small addition to the model for more efficient fine-tuning and distribution


Tell us more about LoRA


Instead of a matrix NN they use two matrices, Nd and d*N, and assuming d << N it would be like a lightweight addition. Keeping the original matrix unchanged, this new addition is simply added in parallel and initialised to low values.


You can ask ChatGPT.


Or Alpaca.


Alpaca is fun, but I would rather ask ChatGPT.


I'm a software guy learning the AI field and tools and I literally made a glossary in my notes that I add to any time I run into a new initialism or word or concept. It's pretty long.


I understand that AI/ML jargon can be confusing. Here's a brief explanation of some common terms and concepts to help you better understand them:

Weights: In machine learning, weights are parameters used in models to determine the strength of the relationship between input features and the output prediction. They are learned during the training process and are adjusted to minimize the error in the model's predictions.

LoRA: LoRA stands for "Learning from Rules and Arguments." It's a method that combines rule-based reasoning with neural networks, which allows AI systems to reason using human-like logic and provide explanations for their decisions. LoRA can enhance the interpretability and transparency of AI systems.

LLM: LLM stands for "Large Language Model." It refers to advanced AI models, such as OpenAI's GPT series, which are trained on vast amounts of text data to understand and generate human-like text. These models can perform various language tasks such as translation, summarization, question-answering, and more.

Supervised Learning: This is a type of machine learning where models are trained using labeled data. The model learns to map input features to the corresponding output labels through a process of trial and error, adjusting the weights to minimize the difference between predictions and true labels.

Unsupervised Learning: In contrast to supervised learning, unsupervised learning does not use labeled data. Instead, it aims to discover patterns or structures in the data, such as clustering or dimensionality reduction, without explicit guidance.

Reinforcement Learning: This is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent receives feedback in the form of rewards or penalties and aims to maximize its cumulative reward over time.

Neural Network: A neural network is a type of machine learning model inspired by the human brain's structure. It consists of interconnected layers of nodes (neurons) that process and transmit information. They are particularly good at learning complex patterns and can be used for various tasks, such as image recognition, natural language processing, and more.

Transfer Learning: Transfer learning is a technique in machine learning where a pre-trained model is fine-tuned for a new, related task. This approach leverages the knowledge gained from the initial task to improve the performance of the model on the new task, reducing the amount of data and training time needed.

These are just a few of the many concepts and tools used in AI/ML. Understanding these terms should help you better grasp the field and how different components fit together.

-- ChatGPT 4


Picked the wrong one. LoRA, Low-rank Adaptation of LLMs (https://arxiv.org/pdf/2106.09685.pdf), consists in adapting the weights of a big neural network to a target task (here, answering to instructions). It doesn't touch the weights of the original model, but rather adds the product of two low-rank matrices to select layers. The weights from those matrices are learnable. The method allows to adapt big models on (relatively) low-memory GPUs.


It didn’t just pick the wrong one; “Learning from Rules and Arguments” is completely made up.


Good job hallucinating with LoRA. I wonder if the paper was published after knowledge cut-off date.. (it stands for Low-Rank Adaptation).


It was published in June 2021, which is before the cutoff date of September 2021, but not by much.


I'm sorry, it's ChatGPT (GPT-3.5) or GPT-4


This may help a little: https://daily.ginger-t.link/glossary (DISCLAIMER: my site)


Ironically the best way to learn these terms is to ask ChatGPT.


Run the Alpaca chat and ask it :)


yes someone created a blog called "Wikipedia"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: