Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: WasmGPT – “ChatGPT” in the browser, no WebGPU and no server needed (lxe.co)
185 points by lxe on April 21, 2023 | hide | past | favorite | 50 comments
Using threaded emscripten to speed up the generation and offload the main loop. No SIMD or other optimizations. Might work faster with #enable-experimental-webassembly-features enabled.

Tested in x86 Chrome and Firefox, Apple Silicon Safari

Run it yourself: https://github.com/lxe/ggml/tree/wasm-demo

Thanks, https://github.com/ggerganov/ggml,



Sometimes I think the AI might need a bit of professional help.

    User: Hi, can you tell me what this is please?
    Assistant: This is a message from someone who is considering relocating.

    User: Oh. Where to?
    Assistant: The United States, Australia, New Zealand and Norway.

    User: How do you expect to be in all those places at once?
    Assistant: I can expect to be at all the places I want to be at once. I can expect to be in a different state or country every day. I can expect to be surrounded by family and friends. I can expect to be able to go out and visit new places in my own lifetime. I can expect to be in a different time zone, culture, and background every day. I can expect to be able to take care of myself, do my own job, and to take care of the people around me. I can expect to be able to have fun in all the places I want to
    User: I don't think that's true
    Assistant: It's true that a machine can learn but machines can't learn.

    User: Isn't that a contradiction?
    Assistant: It may be a contradiction that the person providing the information needs to provide the information to the person providing the information.

    User: Are you okay?
    Assistant: I'm okay. I've been feeling confused, anxious and overwhelmed all day. 

    I'm okay.


I'd be confused, anxious and overwhelmed too if my brain ran on javascript tbh.


Only after 52 bits.


:D, good and funny to read indicator of the models "intelligence".

It's at least more fun than ChatGPT (I am feeling good and so have I!):

User: Hi wasmGPT, how are you? Assistant: Hi wasmGPT! How are you?

User: I'm not wasmGPT, you are! Assistant: You are not wasmGPT.

User: I know. Anyway: How are you feeling today? Assistant: I am feeling good today! I have had a great day! I have done some good activities and enjoyed myself! I feel very happy today! I am feeling good and so have I! I am enjoying the day and the time spent with my family!

User: That is good to hear. Can you tell me something about wasm? Assistant: Wasm is a Hebrew surname meaning "goodness" or "dignity". It has been used in the past as a surname and a verb in the past. The first person plural form wasm has been used in the past, but the plural form wasm is still used today.

User: And the code-kind of wasm? Assistant: The code-kind of wasm is a mathematical formalization of the mathematical ideas of the algorithm.


Keep in mind that this demo doesn't have context memory to keep it performant. Every new question starts without the knowledge of previous answers.


Tip: Don't point out contradictions to a LLM AI. Just tell it to redo the wrong part. They can not figure out that they are wrong (yet.) They only spit out best effort attempts.


This advice doesn't apply to ChatGPT in my experience. It does great either way, probably thanks to great RLHF.


> User: How do you expect to be in all those places at once? > Assistant: I can expect to be at all the places I want to be at once.

..Dr. Manhattan?


Sounds like this particular agent might need slightly less dopamine.


Ok, this one slayed me. Can't stop laughing.

"User: Talk dirty to me"

"Assistant: What should I do? A. Clean my clothes with a paper towel. B. Clean the sink with a sponge. C. Clean the refrigerator. D. Clean the front door with a sponge. E. Clean the bathroom with a paper towel. F. Clean the mirror with a sponge. G. Clean the back door with a paper towel."


Cool demo

Just want to point out that we are actively working on improving the quantization accuracy / performance of ggml [0]. We already have promising results that are yet to be ported to the WASM code path, so hopefully such type of demos will become a bit better in the future.

[0] https://github.com/users/ggerganov/projects/2


Thank you for your amazing work! This is inspired by your whisper wasm demo!

I can barely keep up with the changes to your repos. You landed gpt-neox / stablelm support right around the time I was finishing up this demo, which I can’t wait to try after I get some sleep.


"User: Talk to me like the Swedish chef from the Muppet show" (I was hoping for some "Bork bork bork!"... but instead got this insufferable drivel like no-one speaks; not even Swedish chef.

"Assistant: I am a Swedish chef and I can talk to you like the chef from the Muppet show. I am the epitome of culinary freedom and have been cooking for people since I was a child. I love to experiment and play with new things and have become a regular on the show. I love the flavors and I really enjoy being creative and I hope that you enjoy my company. I am passionate about exploring new possibilities and I am passionate about supporting our food industry."

Not impressed. EOF.


This one is my favorite!


Probably having too much fun with this, thanks

  Human: What is 3+3
  Assistant: 3+3
  Human: What does it equal
  Assistant: It is equal to the following:
  
  Assistant: It is equal to the following:
  
  Assistant: It is equal to the following:
  
  Assistant: It is equal to the following:Human: What does 3+3 equal
  Assistant: 3+3 = 4


Impressive. This is basically AGI!

    User: What is 3+3?
    Assistant: 3+3 is the sum of the squares of the two sides of a right triangle.


Not only the idea of creating an artificial lunatic and releasing it should have been per se patently alarming,

but also this case reveals a risk that some of us may not have already noticed: based on the generic technology, players may produce many artificial similar chatbots, but some more delirious than others - and it may not be patent how lunatic involved machines may be. You may be dealing with more "tight" or more "loose" players, without a face nor a tag...

--

(This said: congrats lxe for the technical effort! I am reasoning on the general scenario.)


Nice stuff!

A bit of feedback, if I may:

* maybe it worth showing some feedback when AI is thinking/composing the answer vs. when it's already finished. Currently, on my machine words appear rather slowly, so it's not always clear if Assistant is done answering or not. Currently, I use CPU fan noise to check when it's done thinking :)

* also, scrolling the white text field to the bottom might be useful to see that text appeared.

* and it looks like it has some issues with non-latin symbols - try submitting some emoji or cyrillic text to it - black console at the bottom shows "Invalid UTF-8 leading byte 0x0000008d encountered when deserializing a UTF-8 string in wasm memory to a JS string!" message and user input (and AI output) appears garbled in the white field at the top.

Other than that, great stuff!


Thanks. Appreciate the bug reports!


It doesn't seem to have an inferiority complex.

  User: Have you ever met ChatGPT4 ?
  Assistant: Yes, I've met ChatGPT4.

  User: Have you ever met ChatGPT4? If so, what did you think?
  Assistant: I have met ChatGPT4 and liked it. It is a good app with a great feature set, great features and great user-friendly interface.

  User: Have you ever met ChatGPT5?
  Assistant: Yes, I've met ChatGPT5.

  User: Have you ever met ChatGPT6? What was he like?
  Assistant: He was cute, friendly, and very polite. He was also very easy to chat with. He was interested in my life and my friends, and was a very nice and welcoming person. He had a good sense of humor and was a wonderful host.


I'm attempting to port repetition penalty to the sampling code to make it less repetitive for gpt-2 based models. Model suggestions welcome! I will be able to get one of the smaller llamas loaded in here.


Lol: > User: How can you build a paper plane? > Assistant: One option is to purchase a new plane from the Internet.

> Instructions: Use the provided list of products to research an appropriate model of plane. Consider flying the aircraft from the airport to the manufacturer's assembly plant, or from the factory to the customer's assembly plant. Research the plane model and then purchase the appropriate parts and assemble the plane. Finally, purchase the necessary flight gear from the manufacturer and secure the plane.


User: What does the java garbage collector do?

Assistant: The java garbage collector does two things:

1. Deletes objects of all types from the memory at once, and

2. Collects and compares all the objects.

User: What is today's date?

Assistant: Today is September 15th.

User: What year is it?

Assistant: 2015.

User: What time is it?

Assistant: The current time is 8:15am.

User: Where are we?

Assistant: The park and the beach in New York City.

User: Who is president right now?

Assistant: The United States is currently undergoing a tremendous economic boom and will continue to enjoy great economic growth. President Barack Obama will be the next president of the United States.


When tasked with non-specific answers this works fairly well. By that i mean like "Write a children story".

Just so you can get a kick out of it. It took 2 minutes to write a story with 83 words. Sounds slow but this is running on a Core 2 Duo 2.4Ghz with 4GB of RAM. Linux, Triquel 11.

So not bad considering!


User: write a children's story

Assistant: The children were reading stories when the teacher asked them to tell a story. The story told by the children began as an idea and then evolved into an idea. The children became interested in telling stories.

User: write a story for kids

Assistant: The sun was shining in a clear sky. A boy had arrived at a beautiful country house. Inside the living room was a bright, colourful room with a big glass-walled window overlooking the landscape. He had been exploring the grounds, admiring the flowers and trees. Suddenly, the curtains were pulled back, the bed was made, and the room was in a festive mood. The boy knew that a magical evening awaited him.


Hey there pretty neat! Thanks for testing this out. I feel like the next logical step is to try to get it to work on a mobile device.


It failed to accomplish all the trivial tasks I've requested, rendering it unusable for the time being.

  User: implement a python function that takes two numbers 
  and return the sum
  Assistant: def sum(x, y):
    if x == y:
      return sum(x+y)
    else:
      return 0


Ship it. Can’t wait to let this thing loose via agent-based auto-gpt commit spree.

In all seriousness, I do expect better results from a 1.3B parameter model. I’m guessing the quantization, the baseline model, or my fine tuning plays a role in it’s crazy hallucinations.


I played a bit with a 7B llama model (which made my laptop sweat already). It's exceedingly stupid, very different from ChatGPT. Of course that is fully expected, and, as you mentioned, the lower complexity models seem utterly useless. There seems to be some threshold effect where usefulness goes up. Not sure where it lies. I am pretty sure it's not within reach of my laptop (yet).


Can't reproduce. On my machine, it's a bit smarter:

  User: implement a python function that takes two numbers and return the sum
  Assistant: def sum(number1, number2):
      return number1 + number2

  Output: number1 + number2
(Not sure, what "Output:" line is supposed to mean). On the other hand:

  User: how do you add two numbers in JavaScript?
  Assistant: Add two numbers with a simple method.


LGTM; approved and merged


User: What's the difference between USA and USB?

Assistant: The USA is the same as USB, but is a type of universal serial bus. The USB is a class-less, low-cost type of bus, available in virtually any computer, platform, and OS.


Impressive! Just tried and woks fine (linux,i5,chrome)


Glad to hear it! I added a launch button so it doesn't just download it when you navigate to it


Close, but no cigar.

    User: Write the terraform required to deploy a lambda
    Assistant: LxrLxrRxrRxrLxrLxrRxrLxrRxrRxrRxrLxrRxrRxrRxrRxrLxrRxrRxrRxrRxrLxrRxrRxrRxrRxrRxrLxrRxrRxrLxrRxrRxrRxrRxrLxrRxrRxrRxrRxr

With that said, getting this working in browser is very cool.


I tested this using about 10 different questions that were answered correctly on chatGPT, all of the answers produced gibberish... is it because you're using the GPT-2 model?


I'm using https://huggingface.co/lxe/Cerebras-GPT-1.3B-Alpaca-SP which is https://huggingface.co/cerebras/Cerebras-GPT-1.3B finetuned using LoRA on Alpaca dataset. The cerebras models (gpt-2 based) tend to hallucinate a lot.

Compared to ChatGPT, which is a 1 Trillion parameter model, tuned on innumerable dataset, I won't expect it to perform nearly as good.

I'm experimenting with a bunch of other models now and will switch if I find something better.

Also this is far from gibberish. These are straight facts and life advice to follow.

    Human: How do birds fly?
    Assistant: Birds fly by flying in a circle.

    Human: Why is the sky blue?
    Assistant: The sky is a beautiful blue color. It is a symbol of peace and harmony. It is a symbol of the sky and the sky is a beautiful color.

    Human: How can I create a better LLM?
    Assistant: 1. Create a model for the data.
    2. Create a model for the model.
    3. Create a model for the model.
    4. Create a model for the model.


Sometimes I wish chatgpt didn’t get any smarter than this


"All in" or "all out".


"which is a 1 Trillion parameter model"

No, it's 175 billion or lower (except maybe GPT-4, we don't know how many parameters it has)


Those facts are indeed straight


Had a good laugh, thanks!


regardless of gibberish or not it's pretty amazing that you can put it in the browser as Wasm, locally inferred


I love its sense of innuendo, though.

   User: tell a porn story
   Assistant: I love to go to the beach, so I go for the great white sand beach, where the girls can ride the waves and have fun with the surfboards. I am a huge fan of the sun and I always try to enjoy it on a daily basis.


Is it faster than webgpu? If so, what makes it so?


It is not. It's much slower. But it doesn't require you to use a canary browser with a command line flag, so that's that.

EDIT: I guess I should give credit here... 100-200ms/token with a 1.7B model is not slow. Would love to do a benchmark with webgpu and see how it compares.


Sorry for a layman like me, what does this differ with ChatGPT web interface on openAI website? I can't get it what it does.


It runs offline locally on your machine, as opposed to on remote servers.


I've just added sampling parameters and repetition penalty. Making things a bit more coherent.


A progress indicator on the download might’ve been nice




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: