so exciting to see these ideas out in the world! i'm now imagining a Scratch-like playground for kids to explore end-user programming / AI in an accessible way, like some of the example apps you've shown
there's been a rising tide of academic HCI work in a similar space, wonder if there will be cross-pollination of ideas along these lines (many more papers i'm sure but some off the top of my head):
https://arxiv.org/abs/2305.11473https://arxiv.org/abs/2309.09128
I should really share the design doc from when we started putting this together. The core idea here was a) making a little hobby machine on the canvas and b) using LLMs to “power” the widgets. Somehow that ended up looping back to “ai workflows” but definitely not where we started. I’m only catching up on the space now!
also, while i have your attention here, since you wrote that related post on (not) vector db's ... what would you recommend for a newbie to get started with RAG? let's say i have a large collection of text files on my computer that i want to use for RAG. the options out there seem bewildering. is there something simple akin to Ollama for RAG?
Your work is an inspiration as always!! My n00b question is: what do you think is currently the most practical path to running a reasonably-sized (doesn't have to be the biggest) LLM on a commodity linux server for hooking up to a hobby web app ... i.e., one without a fancy GPU. (Renting instances with GPUs on, say, Linode, is significantly more expensive than standard servers that host web apps.) Is this totally out of reach, or are approaches like yours (or others you know of) a feasible path forward?
I've been playing with running some models on the free tier Oracle VM machines with 24GB RAM and Ampere CPU and it works pretty well with llama.cpp. It's actually surprisingly quick; speed doesn't scale too well with the number of threads on CPU, so even the 4 ARM64 cores on that VM, with NEON, run at a similar speed to my 24-core Ryzen 3850X (maybe about half reading speed). It can easily handle Llama 2 13B, and if I recall correctly I did manage to run a 30B model in the past too. Speed for the smaller ones is ~half reading speed or so.
It's a shame the current Llama 2 jumps from 13B to 70B. In the past I tried running larger stuff by making a 32GB swap volume, but it's just impractically slow.
Also its really tricky to even build llama.cpp with a BLAS library, to make prompt ingestion less slow. The Oracle Linux OpenBLAS build isnt detected ootb, and it doesn't perform well compared to x86 for some reason.
LLVM/GCC have some kind of issue identifying the Ampere ARM architecture (march=native doesn't really work), so maybe this could be improved with the right compiler flags?
Not sure if that's still the case. I remember having trouble building it a couple of months ago, had to tweak the Makefile because iirc it assumed ARM64 <=> Mac, but I recently re-cloned the repo and started from scratch and it was as simple as `make DLLAMA_BLAS=1`. I don't think I have any special setup other than having installed the apt openblas dev package.
IDK. A bunch of basic development packages like git were missing from my Ubuntu image when I tried last week, and I just gave up because it seemed like a big rabbit hole to go down.
I can see the ARM64 versions on the Ubuntu web package list, so... IDK what was going on?
On Oracle Linux, until I changed some env variables and lines in the makefile, the openblas build would "work," but it was actually silently failing and not using OpenBLAS.
The OpenBLAS package was missing on ARM, along with some other dependencies I needed for compilation.
At the end of the day, even with many tweaks and custom compilation flags, the instance was averaging below 1 token/sec as a Kobold Horde host, which is below the threshold to even be allowed as a llm host.
It might be more expensive to get a GPU instance but at a guess I'd say it's more cost-effective considering that the CPU computation will be less efficient and take much longer. I bet someone's done this out with real numbers, I just haven't seen it.
This only matters if you're scaling to meet demand and demand is higher than your spare resources, which often isn't the case for hobby projects.
The 10€/mo VPS I've had for over 6 years now still has a few cores and GBs or RAM spare, so running a small model on the CPU for a personal project that only me and a few friends occasionally use wouldn't cost me a cent more.
Cool work! You and your team may be interested in these two recent CHI papers from Microsoft Research, both on very relevant topics to what you've been doing:
1) “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models (https://arxiv.org/abs/2304.06597) -- they try to tackle a similar problem as what you described above
2) On the Design of AI-powered Code Assistants for Notebooks (https://arxiv.org/abs/2301.11178) - uses Mito as part of their case study
> Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models
I love the idea of giving users feedback on how to get better at prompting the LLM. I think the key to using this approach within Mito is giving users guidance at the right time -- sometimes shorter prompts get the job done, and they're always easier to write :)
A really sweet integration of this approach could be: when the LLM generated code errors or when we notice that the user undoes their previous prompt, we offer the user help in converting non-working prompts into ones that follow best practices of breaking complex tasks down into small steps.
> On the Design of AI-powered Code Assistants for Notebooks - uses Mito as part of their case study
Andrew McNutt, one of the authors presented this paper here: https://www.youtube.com/watch?v=g0prh8mE3bI Their different classifications of notebook code-gen tools has actually been super helpful in my own thinking. Thanks for the help, Andrew if you're a HNer
ping me by email (see profile) if you want to brainstorm privately; my two cents is biased toward academia if you have an above-average setup (don't know until I see your CV!); startups come and go with each hype cycle.
my two cents: find something where the grants align with your interests, or find a brand-new faculty member who has startup money so isn't as bound by grants in the near term. and if things don't work out, simply leave and go to industry -- with a marketable skill set like what you get from being a CS major, there's no way for you to lose. go wherever has work that you like more.
totally agreed! things can change even from one year to another with the same advisor. for instance, i'm not the same advisor to my students that i was last year, or the year before that, or the year before that. (i've only been at this for 4 years, and each year is incredibly different from the prior one.) circumstances change, resources change, and constraints change.
(no guarantees that i'll be able to answer via text, though; maybe i'll make a video later. been trying to minimize my computering time off-hours due to increasing wrist pains ... PSA: take care of your wrists, everyone!)
Thanks so much for writing this memoir! It is absolutely brilliant. Considering you had a somewhat unconventional PhD with your independent projects, what was the main role of your advisor? How do you make the most of such a situation to extract knowledge out of professors who may not have an incentive to be directly involved in your project to the usual degree?
if you can't get on someone's critical path, then you have to make it very easy for them to help you with very little time commitment. e.g.,: http://pgbovine.net/how-to-ask-for-help.htm
What made you go back into academia? Based on this snapshot of your life, it felt like academia at its best gave you an outlet to explore the things you were really interested in, but at its worst, had a ton of obvious drawbacks. Even in this book, it seems a lot of the best experiences stemmed from or started outside of the program (MSR) and the Ph.D, while providing support and enabling these things to happen in the first place, was no longer an active contributor in.
I know you had mentioned you could write a whole book on this, so I'm sure there's a lot to the story.
you've inadvertently answered the question for me, to a first approximation :) i think that academia is a great launching point for a wide array of scholarly activities that the free market (i.e., industry) doesn't directly pay for: research, public policy, outreach, teaching, mentoring, industry collaborations, etc. i can work with whatever companies i want (even ones that are actively competing with each other at the moment!) and be seen as a "neutral" party; i can share knowledge via teaching and research again with a "neutral" voice without being seen as a spokesperson for a particular company or other special interest group. you're right, though -- there's a whole lot to the story. maybe someday i'll write something up!
there's been a rising tide of academic HCI work in a similar space, wonder if there will be cross-pollination of ideas along these lines (many more papers i'm sure but some off the top of my head): https://arxiv.org/abs/2305.11473 https://arxiv.org/abs/2309.09128