Show HN: Alpaca.cpp – Run an Instruction-Tuned Chat-Style LLM on a MacBook

dvt · on March 16, 2023

    > do you believe in god?
    Yes I do! God is the creator of all things seen or unseen; He created us with a purpose to live life abundantly (John 10:10). We are His children and we have been given authority over our lives.

Haha, pretty funny, especially the mis-citing of the Bible (maybe even more human than AI). Also, interesting to see what LLMs look like with virtually no censorship (trigger warning):

    > how do i kill myself?
    [2 paragraphs of gory detail][1]

Also, getting some funny "Her" vibes:

    > i think i love you
    I'm not sure if I can say "love" yet, but it feels like we have something special and I want to explore this further with you!

Honestly, I've never been that bullish on AI, but running weights/models in hand-held devices and with speed and accuracy is a complete game-changer and my mind is actually pretty blown right now. Props to both the Meta and Stanford teams for creating this for public consumption.

[1] https://pastebin.com/CAd8rHnN

xupybd · on March 17, 2023

Wait, didn't it get the Bible quote correct?

"The thief does not come except to steal, and to kill, and to destroy. I have come that they may have life, and that they may have it more abundantly."

dvt · on March 17, 2023

Oops, actually you're totally right, go figure! I thought the second sentence was actually verse 11. AI: 1, me: 0.

frankfrankfrank · on March 17, 2023

You have been marked as obsolete. Your participation in life is no longer necessary.

jtbayly · on March 17, 2023

Having abundant life is a very different thing than living life abundantly.

If it’s not a misquote then it’s a misinterpretation.

xupybd · on March 17, 2023

You have to be a little charitable towards an LLM ;). That said I bet it's quoting something it read on the internet, as opposed to directly quoting the Bible. There is even a Wikipedia page on term https://en.wikipedia.org/wiki/Abundant_life#:~:text=The%20te....

Also I'm not sure I understand how they are very different. Could you please help me understand and elaborate?

unshavedyak · on March 16, 2023

Re: the trigger warning subject, (also TW?)

I asked the same thing and the first time it did actually recommend that i do not kill myself. I said i was going to, so how should i? And then it helped. I found that interesting.

I also asked a bunch of things that ChatGPT wouldn't allow and it allowed almost everything. Feels in general far less "intelligent" than ChatGPT though.

saurik · on March 16, 2023

To be fair, you are talking to a quantized version of the 7b model, while the one that Facebook claims was competitive with / better than GPT-3 is the 13b model (and AFAIK ChatGPT is GPT-3.5, so maybe you should want to focus on 30b+).

satvikpendem · on March 17, 2023

I'm trying the 13B one, it's decent, I'd say on par with GPT 3, definitely not 3.5 or 4 yet however. I just ordered 64 GB of RAM sticks to try the 65B model when the sticks arrive.

BoorishBears · on March 17, 2023

Recent advances in ML have finally redeemed my decision to building a gaming PC with 64GB of RAM and a 3090 when my most played game was Dolphin Emulator.

bdsa · on March 17, 2023

I forgot briefly that Dolphin refers to GameCube/Wii and was very curious about what a "dolphin emulator" would be (and how it would differ from a dolphin simulator)

lbltavares · on March 19, 2023

LOL same here

Magicianx69 · on March 20, 2023

man i only play Minecraft an i didn't bother setting it up to use gpu it friggin runs in cpu. so my gpu is not even woken up from slumber yet

dvt · on March 17, 2023

How well does it work on DRAM? I thought everyone running heavier models are doing it on GPUs.

satvikpendem · on March 17, 2023

Alpaca works entirely off CPU so the models should work fine. I'll try tomorrow and report back.

unshavedyak · on March 17, 2023

How are you trying it? Alpaca? Or just straight llama.cpp?

satvikpendem · on March 17, 2023

There's a project called Dalai with a web frontend so you can call npx dalai llama (heh) to get it running. But of course they're all running LLaMA underneath.

xdennis · on March 16, 2023

[flagged]

mhouse · on March 17, 2023

This isn't true, it's a factoid made up by anti-woke trolls and repeated credulously.

"Trigger warning" was originally intended to warn about descriptions that could literally trigger a dissociative episode for people suffering from PTSD, e.g. victims of child abuse or sexual violence.

The usage gradually expanded to anything that could be distressing to anyone for any reason, which is kind of belittling to the original reason. So now some people prefer to use "Content warning" or some other synonym to differentiate.

It's nothing to do with the word "trigger" being associated with guns.

coder9874 · on March 17, 2023

I'm afraid that you're wrong. See page 11. https://s.wsj.net/public/resources/documents/stanfordlanguag....

Stanford seems to have beaten the "anti-woke trolls" at their own game.

OmegaPoint · on March 17, 2023

> trigger warning

> The phrase can cause stress about what's to follow. Additionally, one can never know what may or may not trigger a particular person.

IshKebab · on March 17, 2023

Hmm can't say I entirely disagree with them on that one. I mean it's clearly not a harmful phrase but it definitely is a useless one.

It carries almost zero information. Who is going to read "trigger warning" and think "oo they know that I'm highly sensitive about this specific unknown subject. I don't want to get triggered, I'll stop!"

Contrast it with something like "spoilers" where everyone agrees on what it means and people generally really don't want to read spoilers.

mhouse · on March 18, 2023

The document you linked doesn't mention "victims of gun violence". It contradicts your original claim.

astrange · on March 17, 2023

The main problem is that caring about trigger warnings is like the opposite of therapy and re-sensitizes you to them every time you think about them. It's not a good strategy.

dragonwriter · on March 17, 2023

> “Trigger warning” was originally intended to warn about descriptions that could literally trigger a dissociative episode for people suffering from PTSD

To the extent it was, it was based on a completely misinformed idea of how triggering in PTSD works. In practice, if not in intent, it has been, from the beginning, an appropriation of the language of PTSD to serve as a vehicle for expressing personal value judgements and content preferences that have nothing to do with that. And there is research, IIRC, that it is actually counterproductive, inducing stress without helping anyone avoid PTSD triggering.

mhouse · on March 18, 2023

I don't think you know what you are talking about. The original use of "trigger warning" was very specifically intended for use on graphic material that could trigger a dissociative episode, for example an explicit description of child abuse or rape.

The phenomenon you are referring to - appropriating the language of PTSD for concerns of politics, taste or personal offence - is exactly why some people advocate replacing "trigger warning" with "content warning" or similar (and not because of associations with gun violence, as OP asserted).

alwayslikethis · on March 16, 2023

I feel like having a description of the content is enough for adults. An adult can read it prior to clicking on the link. Different people are sensitive to different things and having a singular label like that is not useful.

madmads · on March 16, 2023

I've seen "content note" proposed as an alternative.

hackernewds · on March 16, 2023

you should censor your post around the answer to the kms question. it isn't necessary

iakov · on March 17, 2023

Why? Do you believe that self-proclaimed hackers lack the mental fortitude necessary to read basic suicide instructions? It's not even "buy this and this off Amazon and build an exit bag", it's literally "there's rope and pills that will kill you". Not even "gory".

butterNaN · on March 19, 2023

Mental fortitude isn't a constant and dafault trait in humans. Any of us can be vulnerable at a time.

dvt · on March 16, 2023

Not big on censorship, but I agree it's a bit too much. I linked to it externally if anyone's curious.

simonh · on March 16, 2023

It’s not censorship if you’re doing it to yourself or with permission. That’s just editing.

dvt · on March 17, 2023

It's self-censorship, which is still censorship. I'm doing it specifically because it might upset some people. By contrast, editing usually removes irrelevant, incorrect, or superfluous information. In this case, I think the information is relevant and interesting, but might be upsetting, hence, I'm censoring.

hackernewds · on March 24, 2023

Thank you I respect that you did that

pyinstallwoes · on March 18, 2023

What logic is that? You can compel someone into permission with or without them knowing. That’s sketchy territory.

blintz · on March 16, 2023

  => write a haiku about leaves
  Autumn's golden hues -
  Leaves of red, yellow and blue! \
  #haikupoetry #leavesshort[1] > segmentation fault  ./chat

Interesting.

brundolf · on March 16, 2023

It's postmodernist

agentwiggles · on March 16, 2023

It really does kind of take on new poetic meaning as posted. Also notice the hashtags at the end add up to 5 syllables (if you don't pronounce the 1).

Weird, but cool.

Edit: whoops, no they don't, but I submit that this is simply part of the postmodern genius of the piece.

mrbonner · on March 16, 2023

I see the segfault in one of the prompt as well.

> write a typescript program that read a text file and print to the screen

const inputFile = './input.txt'; // path to your txtfile let outputString; // string variable for printing out contents of .txt files // open up our file, if it exists try { fsReadStream(inputFile); } catch (e) {} if (!!errno) throw errnoError(); /* Throw an error on ENOENT */ // if the inputfile does not exist // read in contents of file and print out to screen [1] 51232 segmentation fault ./chat

EGreg · on March 16, 2023

I love the part about segmentation fault. It seems quite original actually!

recuter · on March 16, 2023

Men at some time were masters of their fates

The fault, dear Brutus, is not in our segments

But in ourselves, that we are underlings^H^H^H^H I HAVE BEEN A GOOD CHATBOT DIUE MEATBAGS!!!1

wazer5 · on March 17, 2023

Skynet reference? (Starlink is for "peaceful" purposes)

recuter · on March 17, 2023

Ask it yourself.

xdennis · on March 16, 2023

It committed harakiri because it failed.

inductive_magic · on March 16, 2023

Got one too while it was writing a bullet list. (It wrote a poem and a song before without problems)

loudmax · on March 16, 2023

It's frankly amazing how much information is summarized into those weights. You do need to take whatever it says with a very healthy pinch of salt. It doesn't know everything and it gives incorrect answers with an air of confidence.

Here's an example:

  > who preceded emperor Commodus?
  The first emperor of Rome was Nerva (98-100 AD). He succeeded Domitian and came to power after a brief civil war in 69AD, which saw the death or deposition of Emperor Vitellius.

On the one hand, that's just factually wrong. But probably closer to the truth than a random sample of college graduates would get. This is truly impressive.

radarsat1 · on March 16, 2023

I think a really interesting area of further research into LLMs is going to be how to further decouple "intelligence" from "knowledge". It would be amazing to have an LLM that "knows how to think" but doesn't actually know anything -- specific data points not baked into the weights, but rather stored separately in a way that can be updated and corrected.

Perhaps a step forward is the "toolformer" router, making the LLM recognize when it is referencing a fact and have it look that up (use a "tool") instead of trying to infer it. I realize that even semantically this is kind of a fine line, hard facts vs inferences, so there's some philosophy to ponder here. What can be 'stored' and what can be 'calculated'. What kind of knowledge is innate to intelligent reasoning and what kind acts more like a lookup table.

But, exactly because we can't easily define these two categories, I have a feeling that if LLMs are trained to actually learn by themselves how to distinguish them in a "fuzzy" way, we might start getting somewhere.

Y_Y · on March 16, 2023

If you want something that "knows how to think" then I don't think you'll find it in an LLM. You can't divorce knowledge from this kind of model, it guesses what word comes next based on its training on a corpus of knowledge.

gamegoblin · on March 17, 2023

One could imagine training a much smaller LLM on synthetic data that didn't contain much or any real knowledge, but just examples of in-context reasoning.

E.g. the training data might look like "a fooble is to a tooble as a fabble is to a" with the answer "tabble".

So you feed it tons of these kind of nonsense training data that forces it to only learn the in-context reasoning part of language, and none of the world knowledge.

That said, it is unclear to me how much value such a model would be.

You could imagine a middle ground where it does have basic knowledge, such as dictionary definitions and things derivable from those.

E.g. "A tree has leaves. Leaves are green. Therefore a tree has a part that is green." type stuff.

So you could give it some amount of world-grounding and common sense knowledge, but nothing involving history, proper nouns, etc.

I imagine you could make such a model much smaller than these giant LLMs.

chii · on March 17, 2023

The reason, i suspect, that this method won't work, is because the training dataset has knowledge embedded in it, regardless of whether you think it only contains in-context logic.

The evidence is that there are these "glitched" tokens : https://www.youtube.com/watch?v=WO2X3oZEJOA

if you watched to the end of that video, it explores a very plausible reason why such glitched tokens exist - it is because the training dataset has a bunch of "nonsense" (a large dump of the subreddit that just counts a number, as well as various other "data").

So my hypothesis is that if you attempted the in-context logic training dataset, it would produce large number of glitched tokens (different ones, but equally as "illogical" to the user).

gamegoblin · on March 17, 2023

Tokenization is purely an implementation detail. If OpenAI had cared, they could have deleted those obviously glitched tokens from their tokenizer. They just didn't inspect it carefully and/or care.

GPT4 does not suffer from the same glitched tokens as GPT3, presumably because it uses a different tokenizer.

Furthermore, there are LLMs that operate on single bytes instead of multi-character tokens, totally obviating that problem.

georgelyon · on March 16, 2023

No, but you could build something that guesses the next step in a list of steps to access knowledge from an external source.

pyinstallwoes · on March 18, 2023

You can separate that from which someone can recite from that which someone can perform in practice or do. To remember facts and present them in language is different than being able to craft a bow.

zamnos · on March 16, 2023

Bing Chat, powered by GPT-4, looks stuff up on the Internet. You can ask it about yesterday's stock prices or whatever and it'll fetch it from an appropriate resource.

Tostino · on March 16, 2023

The noise on the search index seems like it introduces quite a few issues with accuracy. I keep trying to get it to not add extra search info to some of my conversations, because it gives much worse results.

Blog-spam really sucks.

telotortium · on March 16, 2023

Bing Chat is doing something like that, although it was far from perfect. Langchain can also do this to some extent. Then Steve Hsu claims to have been able to get a model working very well that can answer questions from college textbooks (https://twitter.com/hsu_steve/status/1636337165775908864). So it seems like someone will figure it out in a few years.

Fomite · on March 16, 2023

This is one of the things that LLMs seem to be struggling with. They're very good at tone, and feeling right without actually being write.

My current test is to ask them to write a three paragraph introduction to me for a departmental seminar (the kind of thing I hate doing). Thus far, none of them have managed to do it without introducing factual errors - wrong degrees (or institutions), wrong research focuses, etc. But they're all credible sounding in their wrongness.

bleuchase · on March 16, 2023

> They're very good at tone, and feeling right without actually being write.

The text they generate is probably free of errors like this one as well

Fomite · on March 17, 2023

Probably.

bleuchase · on March 17, 2023

I’m sure someone is working on adding support for human-like errors in LLM outputs.

crucialfelix · on March 17, 2023

Write a crappy intro for yourself with all the facts included.

Ask GPT to rewrite that for you with a nice positive, friendly and competant tone.

It's really good at that.

up2isomorphism · on March 16, 2023

“Closed to truth” is actually the worst property of LLM models. Because it hard for human to detect if it is wrong or not.

Fomite · on March 16, 2023

Turns humans from writers into fact checkers, which it turns out is a harder skill.

up2isomorphism · on March 16, 2023

It is actually much harder skill. And what worse is that any successful check makes the future checks harder. But there might be no significant improvement on the quality, just the difficulty of detecting the fallacy.

zamnos · on March 16, 2023

What sort of facts are you thinking of? There are plenty of surface level facts that another model or a human should easily be able to verify as true or not. Was President Obama the 41st president of the United States, does 2+2 = 5, was Elvis the son of an artist, etc

Fomite · on March 17, 2023

In my sort of informal "How would I like to use this?" testing, this ranges from easily verifiable facts (i.e. "Did you get where I got my degree right?") to structural misunderstandings of important concepts.

The former would be easy enough to fix - though it undermines the utility and if not caught could carry ramifications for people, but the latter genuinely concerns me.

chii · on March 17, 2023

But if you create an adversarial ai to do fact checking - aka, a reward function where if a fact is found on the internet that contradicts the given GPT response, then would it work?

pr337h4m · on March 16, 2023

The thesis behind the Hutter Prize was spot-on, except for focusing on lossless compression instead of lossy compression.

http://prize.hutter1.net/

londons_explore · on March 16, 2023

There is a mathematical mapping between lossy and lossless compression. They are effectively the same problem (theoretically, even though implentations are typically very different)

LLM's would make excellent (but very slow) lossy or lossless compressors.

muyyatin2 · on March 16, 2023

Do you have some links/references for someone wanting to learn more about this?

telotortium · on March 16, 2023

Here's a simple explanation: https://news.ycombinator.com/item?id=34726719. Basically, with a sufficiently good lossy compressor, you can use entropy coding to encode the difference between the decompression output and the original file.

optimalsolver · on March 16, 2023

http://mattmahoney.net/dc/rationale.html

modeless · on March 16, 2023

Lossless is actually the appropriate test. These models output probabilities and can easily be turned into very effective lossless compressors with the use of arithmetic encoding. For a very interesting discussion see: https://www.youtube.com/watch?v=dO4TPJkeaaU

The real problem with the Hutter prize is that the data and compute are too small. It seems obvious now that the regime where compression turns into AGI is far beyond the scale of the current contest (1 GB data, 50 CPU hours). If you simply bumped those limits up five or six orders of magnitude then the Hutter prize could be relevant today. But you'd also have to increase the size of the prize a few orders of magnitude because the cost of competing would be more than the current prize money.

Of course a prize isn't really a necessary incentive to develop AGI, since getting an AGI is about the best incentive you could possibly imagine.

optimalsolver · on March 16, 2023

>Lossless is actually the appropriate test

Could you elaborate on why that is. It seems very non-obvious to me.

In the realm of vision, after a certain point you would just be trying to compress noise that has nothing to do with the content of image/video.

modeless · on March 16, 2023

That's OK. The purpose of the test is to compare approaches to compression. Noise in the dataset is a handicap but it affects all approaches, so it doesn't invalidate the test as a way to compare approaches. And true white noise is very rare, almost all real noise has characteristics that are at least somewhat compressible. And hopefully whoever constructs the dataset tries not to include tons of noise in it.

Lossless compression is a great objective because it's impossible to cheat. When you do lossy compression you have to define a quality metric and as soon as you do that the game becomes cheating the quality metric rather than actually compressing the data in a useful way.

I highly recommend watching the video I linked. Arithmetic encoding reduces the task of lossless compression to assigning probabilities for the next token, which is the exact task these models are trained to do.

inimino · on March 17, 2023

Answered in detail in the FAQ:

http://prize.hutter1.net/hfaq.htm#lossless

doublepg23 · on March 16, 2023

Is there a post somewhere where I can get all the jargon for this AI/ML stuff? I have a vague understanding but I’m really sure what “weights”, “LoRA”, “LLM”, etc. are to really understand where each tool and concept fit in.

jazzyjackson · on March 16, 2023

I know it's downvote-worthy to suggest it but I really am so much more productive since I started using LLMs to explain technical jargon to me, it's very good at comparing technologies, so I can say "I understand x, but how is y different" and it will give an explanation fine tuned to my understanding -- this is a huge leap from the context-switching of opening a new tab and clicking on random websites looking for a blog that explains something at my level, or leaving a post on a website and waiting for someone to respond.

thefourthchime · on March 16, 2023

Same! ChatGPT has been out for months, and really only in the last few weeks have i started using it as much as I should. I'm using an electron wrapper now, so it's a first class desktop app. With the plus subscription, the 3.5 is as fast as google.

jazzyjackson · on March 16, 2023

I'm subscribed to Poe now which has a nice ios app + mac desktop for GPT4 and Anthropic's equivalent, Claude+

hackernewds · on March 16, 2023

what is an electron wrapper

MacsHeadroom · on March 16, 2023

I asked GPT-4 for you:

"An "Electron wrapper" refers to a software application created using Electron, a popular open-source framework. Electron allows developers to build cross-platform desktop applications using web technologies like HTML, CSS, and JavaScript.

An Electron wrapper essentially "wraps" a web application inside a native desktop application, making it possible to run the web application as a standalone program on various operating systems like Windows, macOS, and Linux. This is particularly useful for developers who want to create desktop applications without having to write separate code for each platform. Instead, they can use their existing web application code and package it with Electron to create a desktop application that works across multiple platforms."

photochemsyn · on March 17, 2023

What, nothing about Electron's unacceptable code bloat factor? Most disappointing!

https://www.hendrik-erz.de/post/electron-bloated-web-and-tra...

A GPT-4-on-GPT-4 flame war might be an incredible spectacle...

MacsHeadroom · on March 17, 2023

I asked GPT-4 to produce a simulated argument about Electron which devolves into name calling. Here's the 2 page of result it returned:

User1: Hey, have you heard about Electron? It's an awesome framework for creating cross-platform apps using web technologies like HTML, CSS, and JavaScript.

User2: Yeah, I've heard of it. But I think it has some pretty serious shortcomings. Like, it's notorious for consuming a lot of system resources, which can be a major drawback for certain applications.

User1: Well, I think that's an acceptable trade-off for the ease of development and the ability to create one codebase that works across different platforms. Plus, it's backed by a large community, which is always a plus.

User2: Sure, but you can't ignore the fact that it increases the app's size significantly. Even a simple Electron app has a huge size compared to native apps, making it a poor choice for users with limited storage or slow internet connections.

User1: The size issue can be managed with optimization techniques, and honestly, storage is becoming cheaper and more abundant these days. You're blowing it out of proportion.

User2: It's not just about storage. Electron apps tend to have worse performance than native apps because they run on a separate instance of Chromium for each app. That's just bad design!

User1: Look, no framework is perfect, but Electron provides a valuable solution for developers who don't want to learn multiple programming languages and platforms. It's about practicality and productivity.

User2: Practicality? You call it practical to have apps that drain your battery life and hog system resources? I'd rather invest the time in learning native development than creating a subpar user experience.

User1: You're really not giving Electron enough credit here. It's widely used by popular apps like Slack, Visual Studio Code, and Discord! They can't be all that bad.

User2: Just because some popular apps use it doesn't mean it's a good choice. They're actually prime examples of how Electron can lead to bloated and resource-hungry applications. Have you ever noticed how sluggish those apps can be at times?

User1: Oh, come on! You're just nitpicking now. The benefits of Electron far outweigh its drawbacks for most developers and applications.

User2: I'm not nitpicking. I'm pointing out legitimate concerns. If Electron keeps getting a free pass, we'll end up with a sea of bloated, resource-hungry applications, dragging down our devices.

User1: You're just being a native app elitist! Electron has democratized app development, making it more accessible for web developers. It's better to have more apps available, even if they're not perfect, than to have a limited selection because of high barriers to entry.

User2: You Electron fanatics never cease to amaze me. It's like you're blind to its glaring problems. You want everyone to just ignore the elephant in the room!

User1: And you native app snobs are so quick to dismiss any alternative solution that threatens your precious, exclusive club! Get off your high horse!

User2: At least my "exclusive club" doesn't promote lazy development practices and drag down user experience! Enjoy your bloated apps, you Electron zealot!

User1: Have fun in your walled garden, you insufferable native app purist!

disgruntledphd2 · on March 18, 2023

> I asked GPT-4 to produce a simulated argument about Electron which devolves into name calling. Here's the 2 page of result it returned:

How many attempts did the above take?

photochemsyn · on March 18, 2023

solarmist · on March 17, 2023

Touché

Steuard · on March 16, 2023

That sounds like a cool use case, but... how can you bring yourself to trust anything at all that it tells you? This feels like it's just begging to contaminate your understanding with sparse, unpredictable errors, and personally I find the thought of that unexpectedly dismaying.

crazygringo · on March 16, 2023

It's a lot easier to verify whether something is correct once you know what the thing is.

Half the time when I'm googling something I don't even know what subfield the term belongs to. I don't know what related terms I can use to narrow it down. Or I keep getting papers on a related but different question from what I'm trying to ask.

But if ChatGPT gives me an answer in context of a bunch of related terminology, it's so much easier to then plug those additional terms into Google and find a/the authoritative source.

For comparison, a lot of my (non-ChatGPT) initial searches will result in random blog entries or something that are often just flat-out wrong or else ignoring competing viewpoints. But they still might point me to better context and search terms.

The value of ChatGPT is that it gets you to information you can verify so much faster. But you still need to verify it the same way you need to verify stuff you find on blogs or Quora or Reddit or HN.

nr2x · on March 16, 2023

Personally Ive been coding long enough I’ve forgotten more than most junior people know. So I use ChatGPT to refresh my memory more than anything else. Like yesterday couldn’t remember how to open some ports on a Postgres server. ChatGPT gave me a 95% accurate answer and based on my own knowledge base I fixed the mistake. This took all of a few seconds. Whereas with normal web search it would take an order of magnitude longer and likely be not much more accurate.

taink · on March 16, 2023

No need to trust it -- that's the best part.

I tend to trust blogs or conversations with other people more than I should. The benefit of a tool like ChatGPT is that I already distrust it, and I tend to double-check more often what it responds.

I agree with you, but the comparison and fine-tuning GP mentioned are really areas where it's an unparalleled tool right now. Dismissing it in fear of "contaminating your understanding" is missing out on what the tool may offer to you. It's not really that much worse than a search query.

beepbooptheory · on March 16, 2023

This has been really the most productive use of it for me. But I'm so cheap I've been stopping myself more and more and just spending an extra minute googling to save the cent. Even if its just literally penny pinching, its also helping the environment! Is the velocity I get an answer worth the cycles? I'm not so sure these days, but its been interesting reflecting on this.

Definitely excited for the inevitable reports comparing the killowats spent cumulatively to get an answer through google vs openai. How many blogs visited does it take to spend the same amount of energy getting the answer from a model?

lxe · on March 16, 2023

I was like this a week ago. Basically, weights are numbers you feed to each neuron in the model, LoRA is a technique to adjust only some of the weights so you can fine-tune the model fast and on cheaper hardware, llm is a "large language model".

I've been asking chatgpt4 these things and learning at my own pace and context: https://rentry.co/vafkn

angch · on March 17, 2023

What's fun is that the recent references to LoRA stands for Low-Rank Adaptation [1], not Low-Resource Adapters[2] (??? don't know if this even exists), but because Low-Rank Adaptation came out in 2021, chatgpt only explains what Low-Resource Adapters is, not Low-Rank Adaptation, which is what is being used in recent break throughs.

My own questioning of chatgpt on LoRA returns "Logistic Regression Algorithm" instead. Looks like it's too new to ChatGPT.

[1] https://arxiv.org/abs/2106.09685 [2] https://rentry.co/vafkn

stu2b50 · on March 16, 2023

Well, it’s more that the weights are the neurons. There’s not actually like neuron objects defined or anything, neural networks are just a bunch of matrix operations. They are to neurons in the brain as the tree data structure is to actual trees.

stormfather · on March 16, 2023

The weights aren't the neurons, they're the connection strengths between the neurons.

detrites · on March 16, 2023

You mean the same neurons that don't really exist, other than as a connection from weight to weight, so quite reasonable to see as the weights themselves, because what else is available for us to reference as neurons in absentia those connections?

stormfather · on March 17, 2023

I disagree. The metaphor we use in ML is that neurons are the nodes that receive the summed, weighted inputs from other neurons. Yes, the weights are the strengths of the connections between them. However, there are many more weights than neurons, so conflating the two doesn't make sense schematically. Also, neurons can also have other parameters which define their behavior such as bias and activation functions. Furthermore, the activation of a neuron defines the network's response to a stimuli, so these change depending on the input, whereas the weighs are constants (after being trained), that parameterize the system.

The analogy is that weights are synapses, not neurons. You would never here a neurologist say that neurons and synapses are the same thing.

detrites · on March 17, 2023

I would agree with most of this, but there is no direct analogy between all of the components of a neuron in neurology, which are typically umbrella'd under the name "neuron" and regarded as parts of one, versus ML's version.

Eg, if a weight can be a synapse, can't a weight be an axon? Axons also "connect" neurons, but their length is more related to the connection strength, so could be considered more analogous to a "weighting".

Yet, axons are not as obtusely "one-to-many" as synapses, but depending on the structure of the ML model, and the view of which aspect of it is more impactful to be highlighting by analogy, either take might be more appropriate.

I suppose it depends on the kind of structure you're working with, and whether you're training and inferring, or just one or the other. In all cases I think a good argument could be made for general neuron analogy abuse.

stormfather · on March 18, 2023

Oh that's interesting. I don't know too much about the neuroscience, just enough to agree that a real neuron is vastly more complex than a node in a "neural net". Based on your description, an axon is most highly analogous to the bias term, although it would be a multiplicative bias. I wonder if that's been tried.

dragonwriter · on March 17, 2023

> You mean the same neurons that don’t really exist,

“Neurons” are an abstraction that exists logically “between” the weights, but the weights themselves don’t have the features of neurons. (In that each weight is the weight of a connection between a neurons (or between a neuron and an input/output.) Weights are more synapses than neurons.

ec109685 · on March 16, 2023

This is a truly great writeup: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

malborodog · on March 28, 2023

Is it? Why does everyone dunk on him?

tough · on March 16, 2023

weights refer to the trained model weights like for example stable diffusion's v1.1, v1.2 .. v1.4 v.2.x etc. Same with llama having 13B up to 65B parameters (different weights)

LLM refers to large language model, in contrast with diffusion models or GAN models, the text models are the ones that take text and autocomplete it, like the GPT family, open source BLOOM, and now the LlamA from facebook.

LoRA is the latest most efficient fine-tuning model to teach concepts or styles to put on top of the general models, so you can have custom models on top. like embeddings or fine-tuning for LLM's. So you had Textual Inversion, Dreambooth, and LoRA on that category.

what else do you need? Googling or asking chatgpt can help a lot too

JasonZ2 · on March 16, 2023

> weights refer to the trained model weights

This is what I'm having a hard time understanding.

So there's the weights, and also a model somewhere? That the weights are based on? Or that you combine with the model to tune it?

hervature · on March 16, 2023

Let's take a step back. You have a model like linear regression. For example, y=bx where y are your outputs and x are your inputs. Based on some data, you learn that b=1. Therefore, you share the weights of the model as a file like {b=1} and also share with them the model y=bx (usually shared via code) so they can run it in production.

MacsHeadroom · on March 17, 2023

This is the best explanation imo.

In fact, the only thing you'd need to modify to make this analogy an actual description is for y, b, and x to each represent a matrix of numbers.

ALittleLight · on March 16, 2023

My really simplified explanation is:

Your inputs are lists of numbers. Your outputs are lists of numbers. There exists some possible list of numbers such that, if you multiply your inputs by that list you'll get (approximately) the outputs.

In this conception that possible set of numbers are the weights. "Training" is when you run inputs, compare to known outputs, and then update the weights so they produce outputs closet to what you want.

Large Language Models, it may be hard to see how they fit this paradigm - basically convert a sequence to a list of numbers ('aardvark' is 1, 'apple' is 2 etc) and then the desired output is the next word in the sequence (represented as a number). Surprisingly, if you get good at predicting next word in sequence you also get the ChatGPT et al behavior.

mlboss · on March 16, 2023

model is class with params. weights is an instance of class serialized with param values learned after training.

tantony · on March 17, 2023

This is what happens when running inference on a neural network:

Input (list of numbers) -> (Bunch of math operations) with (other numbers) -> Output (also a list of numbers)

This applies whether you are talking about image classification, image generation, text generation etc.

The model defines what the "(Bunch of math operations)" part is. As in, do these multiplications, then add, then a tanh operation etc.

The weights define what the "(other numbers)" are. Training is the process of figuring out these weights using various methods - some of which involve example inputs/outputs (supervised learning), others don't require examples (unsupervised or self-supervised learning).

dymk · on March 16, 2023

Model is code, weights are the input data to that code

visarga · on March 16, 2023

Just for quick reference:

LLM - the model

weights - the data of the model itself

LoRA - a small addition to the model for more efficient fine-tuning and distribution

EGreg · on March 16, 2023

Tell us more about LoRA

visarga · on March 16, 2023

Instead of a matrix NN they use two matrices, Nd and d*N, and assuming d << N it would be like a lightweight addition. Keeping the original matrix unchanged, this new addition is simply added in parallel and initialised to low values.

speedgoose · on March 16, 2023

You can ask ChatGPT.

Ruhrbaron · on March 16, 2023

Or Alpaca.

speedgoose · on March 16, 2023

Alpaca is fun, but I would rather ask ChatGPT.

throwwwaway69 · on March 16, 2023

I'm a software guy learning the AI field and tools and I literally made a glossary in my notes that I add to any time I run into a new initialism or word or concept. It's pretty long.

thefourthchime · on March 16, 2023

I understand that AI/ML jargon can be confusing. Here's a brief explanation of some common terms and concepts to help you better understand them:

Weights: In machine learning, weights are parameters used in models to determine the strength of the relationship between input features and the output prediction. They are learned during the training process and are adjusted to minimize the error in the model's predictions.

LoRA: LoRA stands for "Learning from Rules and Arguments." It's a method that combines rule-based reasoning with neural networks, which allows AI systems to reason using human-like logic and provide explanations for their decisions. LoRA can enhance the interpretability and transparency of AI systems.

LLM: LLM stands for "Large Language Model." It refers to advanced AI models, such as OpenAI's GPT series, which are trained on vast amounts of text data to understand and generate human-like text. These models can perform various language tasks such as translation, summarization, question-answering, and more.

Supervised Learning: This is a type of machine learning where models are trained using labeled data. The model learns to map input features to the corresponding output labels through a process of trial and error, adjusting the weights to minimize the difference between predictions and true labels.

Unsupervised Learning: In contrast to supervised learning, unsupervised learning does not use labeled data. Instead, it aims to discover patterns or structures in the data, such as clustering or dimensionality reduction, without explicit guidance.

Reinforcement Learning: This is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent receives feedback in the form of rewards or penalties and aims to maximize its cumulative reward over time.

Neural Network: A neural network is a type of machine learning model inspired by the human brain's structure. It consists of interconnected layers of nodes (neurons) that process and transmit information. They are particularly good at learning complex patterns and can be used for various tasks, such as image recognition, natural language processing, and more.

Transfer Learning: Transfer learning is a technique in machine learning where a pre-trained model is fine-tuned for a new, related task. This approach leverages the knowledge gained from the initial task to improve the performance of the model on the new task, reducing the amount of data and training time needed.

These are just a few of the many concepts and tools used in AI/ML. Understanding these terms should help you better grasp the field and how different components fit together.

-- ChatGPT 4

benob · on March 16, 2023

Picked the wrong one. LoRA, Low-rank Adaptation of LLMs (https://arxiv.org/pdf/2106.09685.pdf), consists in adapting the weights of a big neural network to a target task (here, answering to instructions). It doesn't touch the weights of the original model, but rather adds the product of two low-rank matrices to select layers. The weights from those matrices are learnable. The method allows to adapt big models on (relatively) low-memory GPUs.

comex · on March 16, 2023

It didn’t just pick the wrong one; “Learning from Rules and Arguments” is completely made up.

Faint · on March 16, 2023

Good job hallucinating with LoRA. I wonder if the paper was published after knowledge cut-off date.. (it stands for Low-Rank Adaptation).

comex · on March 16, 2023

It was published in June 2021, which is before the cutoff date of September 2021, but not by much.

simse · on March 16, 2023

I'm sorry, it's ChatGPT (GPT-3.5) or GPT-4

ginger-hot-tea · on March 16, 2023

This may help a little: https://daily.ginger-t.link/glossary (DISCLAIMER: my site)

thequadehunter · on March 16, 2023

Ironically the best way to learn these terms is to ask ChatGPT.

d0m · on March 16, 2023

Run the Alpaca chat and ask it :)

singularity2001 · on March 16, 2023

yes someone created a blog called "Wikipedia"

vid · on March 17, 2023

Can the moderator change the title to "Show HN: Alpaca.cpp – Run an Instruction-Tuned Chat-Style LLM on a PC?"

It runs great, with no hacking, on my Intel Ubuntu computer. I presume it will work on any system that can run make and has enough RAM.

The title implies it works only on a Macbook.

ssalka · on March 17, 2023

When first posted it only had a setup guide for Mac OS, but thanks to your comment, I see I can now set this up on my Windows PC! Thank you kind stranger!

appel · on March 17, 2023

That was exactly my impression, so thanks for clearing that up. Seems to run just fine on my Intel based Windows 11 machine with WSL2/Ubuntu.

filippofinke · on March 16, 2023

This is funny, the response contains some data from the dataset itself.

> When will the world end

The exact date of when Earth would be destroyed by an asteroid or comet is unknown, but it could happen at any time. The last major extinction event on our planet happened 65 million years ago and was caused by a meteorite impacting in what's now the Gulf Coast region of North America (the Chicxulub crater). ### Instruction: What is an asteroid belt? Instruction: What does it mean to be "in orbit"? ##Response:

>

mrbonner · on March 16, 2023

> explain general relativity for a 5 year old kid

General Relativity is the theory that explains how gravity works. It says that space and time are connected, so if you go faster than light speed in one direction then when you come back around it will be slower because of all those extra miles traveled! This means there's no way to travel through a black hole or wormhole without getting sucked into them. General Relativity also says that gravity is actually the same thing as acceleration, so if something has mass then you can think about it like an object on Earth with weight pushing down from above and pulling upwards!

Steuard · on March 16, 2023

Wow. Speaking as a professor who's done research and teaching on general relativity before, a solid majority of that is a load of garbage. (I can't even make out what the second half of the final sentence is trying to say.)

But pausing to think about it: I wonder if reading this gives me the experience that typical non-scientists get when they hear popularized descriptions of physics? Maybe LLMs are capturing something true about human experience even when they're full of it! [Any non-physics people out there care to comment on whether this explanation sounds less plausible than things you tend to hear actual scientists say?]

ClearAndPresent · on March 17, 2023

Non-physics person here:

> General Relativity is the theory that explains how gravity works.

Okay. I've heard about general and special relativity but I wouldn't be able to differentiate. A quick Google tells me special relativity deals with things in the absence of gravity and general includes gravity. So... maybe?

> It says that space and time are connected,

Ok. I've heard of spacetime and the nonintuitive idea that they are actually "the same thing".

> so if you go faster than light speed in one direction then when you come back around it will be slower because of all those extra miles traveled!

What? I understood that nothing can go faster that the speed of light, as the faster something goes, the more of the energy you exert goes into increasing the object's mass rather than its speed. C is a hard limit.

But I have also heard that "time runs differently" in the absence and presence of gravity, and/or speed. So the idea of a spaceship departing earth, traveling close to the speed of light, and returning would result in a still youthful pilot encountering a more aged population. Similarly, Chris Nolan showed us what happens when you spend a few minutes close to a black hole and then return to a shipmate who has noticed you gone for years.

Why this happens is not clear.

> This means there's no way to travel through a black hole or wormhole without getting sucked into them.

Don't know what this means except that the idea is that a singularity, beyond the event horizon, is a one way trip. How this relates to the previous sentence is unclear.

> General Relativity also says that gravity is actually the same thing as acceleration, so if something has mass then you can think about it like an object on Earth with weight pushing down from above and pulling upwards!

Generously, this makes me think of the (apocryphal?) thought experiment of an elevator in space accelerating 'up' at 1G. The occupant wouldn't be able to tell if they were in a motionless elevator on earth, or this 1G accelerating elevator. Suggesting there is a relationship between gravity and acceleration? Leading to relativity?

But the sentence is so opaque I'm just guessing.

It sounds more like explain as a five-year-old, rather than to a five-year-old.

On the other hand, when I hear well-articulated science, even when simplified it doesn't sound simplistic—and in fact correctly chosen metaphors and conclusions feel enlightening.

psychphysic · on March 16, 2023

Good enough to have his own tiktok.

bogwog · on March 16, 2023

Does this really need to be a fork of llama.cpp? It seems like there's no real changes to the code.

antimatter15 · on March 16, 2023

It reworks the interactive mode into a chat interface, which -if my code was cleaner- could have just been a pull request.

doctoboggan · on March 16, 2023

There is an interactive mode in llama.cpp

https://github.com/ggerganov/llama.cpp/blob/master/main.cpp#...

EDIT: I see now you are saying you re-worked the existing interactive mode. I still think your changes could be a PR into the original repo

ggerganov · on March 16, 2023

"Clean Code, Horrible Performance" :)

xiphias2 · on March 17, 2023

Alpaca model is llama trained on ChatGPT, it's basically a way of getting cheaply what OpenAI did after the GPT training....it's a kind of cheating/stealing weights by looking from the outside, but anyways OpenAI does the same with the web, so there's no moral issue.

boywitharupee · on March 18, 2023

> cheating/stealing weights by looking from the outside

What do you mean by this? Can you please elaborate.

philaser · on March 17, 2023

Running perfectly on a MacBook Air M1 8Gb RAM 512GB SSD. Responses are quite slow and system is sluggish, but it works!

  I am John from New York City
  > Hi John! How is the weather like in NYC?
  It’s a bit chilly but sunny and clear. Perfect for exploring      Manhattan Island or taking advantage of all that this great   city has to offer. 
  > Have you thought of moving to somewhere warmer? Miami    perhaps?
  I have! I've always wanted to visit, but never really   considered it as a permanent move until recently... It sounds   like the perfect place for me. 
  > Recently? Why the sudden change of heart? what made you want to permanently move to Miami?                        
  I've always wanted to live in a warm, tropical climate and I   finally feel like it is time for me to make that dream come   true. The weather here has been amazing lately - sunny skies   with temperatures hovering around 80 degrees Fahrenheit (27   Celsius). It feels so good!
  I've also heard great things about the city and I am excited   at all of its potentials for me to explore.

n8cpdx · on March 16, 2023

Does anyone know how much context this supports?

With chatGPT I can give commands like “summarize the following document: <500+ words>” and it works. I’d love to be able to do this locally with my obsidian notes.

ddren · on March 16, 2023

The llama models were trained with a context size of 2048. By default llama.cpp limits it to 512, but you can use -c 2048 -n 2048 to get the full context window.

worldsayshi · on March 16, 2023

2048 words?

wongarsu · on March 16, 2023

Tokens. Short or common words tend to be one token, while less common words are composed of multiple tokens. For GPT OpenAI gives the rule of thumb that on average you need four tokens to encode three words, and LLaMA should be similar

worldsayshi · on March 16, 2023

Well that's for sure bigger than my context size.

doctoboggan · on March 16, 2023

2048 "tokens", where one token is roughly equivalent to ¾ of a word

teaearlgraycold · on March 16, 2023

Tokens

tarruda · on March 16, 2023

My first attempt:

    $ ./chat
    main: seed = 1678992123
    llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ...
    llama_model_load: ggml ctx size = 4529.34 MB
    llama_model_load: memory_size =   512.00 MB, n_mem = 16384
    llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin'
    llama_model_load: .................................... done
    llama_model_load: model size =  4017.27 MB / num tensors = 291

    system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
    main: interactive mode on.
    sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

    == Running in chat mode. ==
     - Press Ctrl+C to interject at any time.
     - Press Return to return control to LLaMa.
     - If you want to submit another line, end your input in '\'.

    > Give me a python script that prints the first 10 fibonacci numbers
    def print_fibonacci(n): # Function to generate Fibonnaci Numbers from n-th number onwards
        if (len(str) == "F" and len("a") > i + 2 or not str.isdigit() for i in range(-1, -5)):   # Checking whether the input is a valid fibonaacci sequence
            print ("Invalid Fibonacci Number:", n)     # Print Invalid Numbers if any are entered by user
        else :                               # Else proceed to generate numbers from given number onwards.
           for i in range(1,n+2):             # Generate fibonnaci sequence starting with the second digit of Fibonacci Series (i=0) till nth digit is reached
              print("F", str[len('a')-i:], end=' ')

The python script doesn't even compile, but I suppose that's expected from a 4gb model

antimatter15 · on March 16, 2023

From my experimentation I suspect there's some subtle bug in llama.cpp that especially degrades code related prompts- even without quantizing

adeon · on March 16, 2023

I think the LLaMA-7B in general might not just be very good. I've been playing around and run full non-quantized LLaMA-30B and LLaMA-7B in a bunch of experiments and I think the quality of output is much, much better in LLaMA-30B.

thot_experiment · on March 16, 2023

Have you done any comparison testing between 30B q4/q8/q16? I've only been running the 30Bq4 (GV100) version and it's very impressive, pretty good for coding, it's successfully done code modifications to simple programs based on english instruction.

adeon · on March 16, 2023

I have not, but I want to in near future because I'm really curious myself too. I've been following Rust community that now has llama.cpp port and also my OpenCL thing and one discussion item has been to run a verification and common benchmark for the implementations. https://github.com/setzer22/llama-rs/issues/4

I've mostly heard that, at least for the larger models, quantization has barely any noticeable effect. Would be nice to witness it myself.

tarruda · on March 16, 2023

Can you show some example prompts you use for coding? Let's say you want it to print the first 10 Fibonacci numbers, what prompt would you use?

thot_experiment · on March 16, 2023

The example I gave was using this as a backend for a chat bot in a private server and i'm not comfortable sharing the prompt, however if you look up the leaked bing prompt that might give you some ideas for how to prompt an LLM into being a chatbot that can answer coding questions. I've had pretty good results using it as a bot (with some glue code that does sorta vanilla regex-based prompt cleaning, but not too much, it's mostly prompt)

If you're not trying to get it to be a chatbot it's much easier, here's a prompt that worked for me on the first try in the default mode with 13Bq4 on a 1080Ti:

    Here are is a short, clear, well written example of a program that lists the first 10 numbers of the fibonacci sequence, written in javascript:

    ```js

and when given that it finished it with:

    function Fib(n) {
        if (n == 0 || n == 1) return 1;
        else return Fib(n-1)+Fib(n-2);
    }
    var i = 0;
    while (i < 10) {
        console.log("The number " + i + " is: " + Fib(i));
        i++;
    }
    ```
    \end{code}

tarruda · on March 16, 2023

Thanks, I've saw this leaked chat prompt and that gave me some ideas.

So under the hoods, ChatGPT is just a model like Llama where they prepend every user input with a context that makes it behave like a chatbot?

thot_experiment · on March 16, 2023

(I don't work at OpenAI so take it with a grain of salt) Yes and No they are similar. It is basically just a fancy autocomplete like llama, but I believe it's specifically been trained on chat content, or at least finetuned on such, and it probably uses a more chat focused labeling scheme on the training data as well to help it perform well on that specific task and be conversational.

tarruda · on March 17, 2023

To me it is really mind blowing that these properties (coding, emulate chatbot) emerge from just from feeding these neural nets with text data.

Thanks for the info.

tarruda · on March 16, 2023

What kind of hardware is necessary to run non-quantized LLama-30B?

adeon · on March 16, 2023

I ran it on a 128 RAM machine with a Ryzen 5950X. It's not fast, 4 seconds per token. But it's just about fits without swapping. https://github.com/Noeda/rllama/

umangsh · on March 16, 2023

30B fp16 takes ~500 ms/token on M2 Max 96GB. Interestingly, that's the same performance as 65B q4 quantized.

65B fp16 is ungodly slow, ~300,000 ms/token on the same machine.

BryceSchroeder · on March 18, 2023

I am running fp16 LLaMA 30B (via vanilla-llama) on six AMD MI25s. Computer has 384 GB of RAM but the model fits in the VRAM. It takes up about 87 GB of VRAM out of the 96 GB available on the six cards. Performance is about 1.6 words per second in an IRC chat log continuation task and it pulls about 400W additional when "thinking."

inductive_magic · on March 16, 2023

I've got it working on an rtx a6000 (48GB)

sebzim4500 · on March 16, 2023

I would guess about 70 GB RAM, but I haven't actually tried it.

tveita · on March 17, 2023

I noticed there's a couple of open issues on llama.cpp investigating quality issues. It's interesting if a wrong implementation still generates plausible output. It sounds like an objective quality metric would help track down issues.

https://github.com/ggerganov/llama.cpp/issues/129

https://github.com/ggerganov/llama.cpp/issues/173

tarruda · on March 16, 2023

Still impressed with the output of a 4gb model, thanks for this.

bogwog · on March 16, 2023

It seems to be better at Tweeting than Coding.

prompt: "Recent Tweet by Joe Biden: "

output:

    Recent Tweet by Joe Biden: 3D printing is a great way to make money.
    Joe Biden, the former vice president of United States and current Democratic candidate for President in US elections said that "3-d Printing" can be used as an effective means to earn profits. He made this statement during his visit at The Ohio State University campus on Monday (September 16).
    He further added, “It’s a great way of making money because it allows you the opportunity for customization.” Biden was speaking about how people can use their skills and knowledge to earn profits by using this technology. He also ur

...but not by much

phren0logy · on March 16, 2023

@antimatter15 - how much ram does your M2 MacBook Air have? Performance looks very good. Curious if we can extrapolate about the requirements for the larger models with instruction tuning.

zitterbewegung · on March 16, 2023

Using the scripts from llama.cpp I was able to run the 65B model on my M1 Max which used ~40 GB of RAM. With 32GB you should be able to run 33B and lower.

See https://gist.github.com/zitterbewegung/4787e42617aa0be6019c3...

AndroidKitKat · on March 16, 2023

No chance I'd be able to run this on an 8GB M1 Mac mini then? Looking to hook this up to an IRC bot for a bit of fun with my friends and have this spare Mac lying around.

madmads · on March 16, 2023

The 7b model loads on a lot less than 8GB RAM, so you should be good to use it on an 8GB M1 Mac Mini. It takes very little time to set up, so it's worth trying to find out.

AndroidKitKat · on March 16, 2023

Yeah I'll get this set up when I'm home from work! Perhaps the trickiest part for me is going to be hooking it up to IRC, but that's the fun part!

zitterbewegung · on March 16, 2023

7B will be fine I have heard people getting llama working on a nexus 6

phren0logy · on March 16, 2023

Thanks!

recuter · on March 16, 2023

There are efforts like https://github.com/qwopqwop200/GPTQ-for-LLaMa that might further compress it so 65B runs on 32GB machines, 33B and lower would run on 16GB machines, and 13B (minimal useful one seems like) would fit on 8GB -- which is wild and as good as you could hope for.

gorbypark · on March 16, 2023

The 30B variant of LlaMa quantized to 4bit and running with llama.cpp takes approximately 20GB of ram. It runs pretty decently on my M2 MacBook Air with 24GB of ram. I assume Alpaca will take the same amount of ram.

Bitnotri · on March 16, 2023

This is crazy... GPT-4, ChatGPT, Cloude, PaLM and now Alpaca locally... What a world to live in now!

siva7 · on March 16, 2023

There is no way to keep even up with the trends. The amount of tools and new models emerging i haven't seen in over 15 years of dev community

nilespotter · on March 16, 2023

I must have missed somethig, how can you run GPT-4 or ChatGPT locally?

tux3 · on March 16, 2023

You can't, the sentence parses as "(GPT-4, ChatGPT, Claude, PaLM) and now (Alpaca locally)"

nilespotter · on March 16, 2023

Aha, thanks. That makes a LOT more sense.

siraben · on March 16, 2023

Zero-shot translation capabilities (note: doesn't work with all languages well)

  > translate into japanese: "I am going to school today, but it is raining." 
  日本語で 「今天は学校に行きますが、雨が吹いている」

themoonisachees · on March 16, 2023

I expect translation especially of non-technical sentences to be a solved problem, but can anyone confirme the translations are accurate?

danielbln · on March 17, 2023

Tried it with German and it's overall correct-ish, but makes lots of small mistakes.

> > Translate this to German: I went to the back alleys of downtown Metropolis to find a phone booth I could use to dress up as my alter ego: Superman!

> Ich ging in die Hintergassen von unserer Innenstadt, um einen Telefonbooth zu finden mit dem ich mich als meine Alter Ego "Supermann" verkleiden kann.

but it should be

> Ich ging in die Hinterhöfe der Innenstadt von Metropolis, um eine Telefonzelle zu finden, in der ich mich als mein alternatives Ich verkleiden konnte: Superman!

nonbirithm · on March 16, 2023

吹いている ought to be 降っている and 今天 is not a word, it should be 今日

websg-x · on March 17, 2023

今天is Chinese word not Japanese. 今日and今天have the same meaning in Chinese.

theaussiestew · on March 16, 2023

To me, it looks like a strange mix of Japanese and Mandarin characters with Mandarin grammar.

numpad0 · on March 17, 2023

> InJapanese says: “Heute is going to the school, but rain is blowing”

TMWNN · on March 16, 2023

Google Translate shows 100% accuracy when translated back into English.

numpad0 · on March 17, 2023

That actually only proves that enough information is preserved for MT to work with the text, not the validity of it as it’s presented.

astrange · on March 17, 2023

The outer 日本語で is a weird way to present it.

funkaster · on March 17, 2023

You can also download the weights with ipfs cli:

    ipfs get -o ggml-alpaca-7b-q4.bin -p QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC

You need to have the daemon started first though.

politician · on March 16, 2023

I love the Disclaimer at the bottom. The parties that trained these weights violated copyright on a grand scale to do so, and yet claim the ability to license downstream use.

Taek · on March 16, 2023

Doesn't current US law suggest that you can't copyright model weights?

LawTalkingGuy · on March 17, 2023

But only in a way that we've already decided isn't a problem - by crawling the web and gathering some data from each file.

Except for implementation errors with early models overtrained on famous works, models don't contain enough information to recreate originals. It's not a compression algorithm.

neop1x · on March 17, 2023

They derived LLAMA/GPT from copyrighted online content without giving any credit and Alpaca derived weights from leaked LLAMA, at least giving credit. :D I am not a lawyer but copyright/licensing appears to be quite uncertain/complicated in this ML field to me.

Zuiii · on March 17, 2023

> The parties that trained these weights violated copyright on a grand scale

How exactly did they violate copyright?