More

pgbovine · 2024-12-21T00:59:59 1734742799

so exciting to see these ideas out in the world! i'm now imagining a Scratch-like playground for kids to explore end-user programming / AI in an accessible way, like some of the example apps you've shown

there's been a rising tide of academic HCI work in a similar space, wonder if there will be cross-pollination of ideas along these lines (many more papers i'm sure but some off the top of my head): https://arxiv.org/abs/2305.11473 https://arxiv.org/abs/2309.09128

steveruizok · 2024-12-21T18:20:39 1734805239

Thanks for the links!

I should really share the design doc from when we started putting this together. The core idea here was a) making a little hobby machine on the canvas and b) using LLMs to “power” the widgets. Somehow that ended up looping back to “ai workflows” but definitely not where we started. I’m only catching up on the space now!

sibeliuss · 2024-12-21T02:48:46 1734749326

I was thinking the same thing -- the most fun learning to program one could imagine. This!

pgbovine · on Dec 19, 2023

also, while i have your attention here, since you wrote that related post on (not) vector db's ... what would you recommend for a newbie to get started with RAG? let's say i have a large collection of text files on my computer that i want to use for RAG. the options out there seem bewildering. is there something simple akin to Ollama for RAG?

zzleeper · on Dec 19, 2023

If you want to get something done quickly, try llama index.

If you want to learn/hack, pick an easy vectordb, get an OpenAI API account, and do a quick attempt

Then you can switch to a local LLM and embedder, and it helps a bit in learning what the pain points are

pgbovine · on Dec 19, 2023

very cool idea! i was also very interested in this problem during grad school ... prototyped an approach by hacking CPython, but the code (python 2.6? from 2010 era) has long bitrotted: https://pg.ucsd.edu/publications/IncPy-memoization-in-Python... https://pg.ucsd.edu/publications/IncPy-memoization-in-Python...

pgbovine · on July 23, 2023

Your work is an inspiration as always!! My n00b question is: what do you think is currently the most practical path to running a reasonably-sized (doesn't have to be the biggest) LLM on a commodity linux server for hooking up to a hobby web app ... i.e., one without a fancy GPU. (Renting instances with GPUs on, say, Linode, is significantly more expensive than standard servers that host web apps.) Is this totally out of reach, or are approaches like yours (or others you know of) a feasible path forward?

vikp · on July 23, 2023

I would use textsynth (https://bellard.org/ts_server/) or llama.cpp (https://github.com/ggerganov/llama.cpp) if you're running on CPU.

  - I wouldn't use anything higher than a 7B model if you want decent speed.
  - Quantize to 4-bit to save RAM and run inference faster.

Speed will be around 15 tokens per second on CPU (tolerable), and 5-10x faster with a GPU.

pedrovhb · on July 23, 2023

I've been playing with running some models on the free tier Oracle VM machines with 24GB RAM and Ampere CPU and it works pretty well with llama.cpp. It's actually surprisingly quick; speed doesn't scale too well with the number of threads on CPU, so even the 4 ARM64 cores on that VM, with NEON, run at a similar speed to my 24-core Ryzen 3850X (maybe about half reading speed). It can easily handle Llama 2 13B, and if I recall correctly I did manage to run a 30B model in the past too. Speed for the smaller ones is ~half reading speed or so.

It's a shame the current Llama 2 jumps from 13B to 70B. In the past I tried running larger stuff by making a 32GB swap volume, but it's just impractically slow.

brucethemoose2 · on July 23, 2023

Prompt ingestion is too slow on the Oracle VMs.

Also its really tricky to even build llama.cpp with a BLAS library, to make prompt ingestion less slow. The Oracle Linux OpenBLAS build isnt detected ootb, and it doesn't perform well compared to x86 for some reason.

LLVM/GCC have some kind of issue identifying the Ampere ARM architecture (march=native doesn't really work), so maybe this could be improved with the right compiler flags?

pedrovhb · on July 24, 2023

Not sure if that's still the case. I remember having trouble building it a couple of months ago, had to tweak the Makefile because iirc it assumed ARM64 <=> Mac, but I recently re-cloned the repo and started from scratch and it was as simple as `make DLLAMA_BLAS=1`. I don't think I have any special setup other than having installed the apt openblas dev package.

brucethemoose2 · on July 24, 2023

IDK. A bunch of basic development packages like git were missing from my Ubuntu image when I tried last week, and I just gave up because it seemed like a big rabbit hole to go down.

I can see the ARM64 versions on the Ubuntu web package list, so... IDK what was going on?

On Oracle Linux, until I changed some env variables and lines in the makefile, the openblas build would "work," but it was actually silently failing and not using OpenBLAS.

jvickers · on July 23, 2023

Is it any easier when using Ubuntu on ARM Oracle servers?

brucethemoose2 · on July 24, 2023

Nah, I tried Ubuntu too.

The OpenBLAS package was missing on ARM, along with some other dependencies I needed for compilation.

At the end of the day, even with many tweaks and custom compilation flags, the instance was averaging below 1 token/sec as a Kobold Horde host, which is below the threshold to even be allowed as a llm host.

summarity · on July 24, 2023

If you're running on Ampere, using llama.cpp is probably not ideal. While it's optimized for ARM, Ampere has native acceleration for workloads like this: https://cloudmarketplace.oracle.com/marketplace/en_US/adf.ta...

Y_Y · on July 23, 2023

It might be more expensive to get a GPU instance but at a guess I'd say it's more cost-effective considering that the CPU computation will be less efficient and take much longer. I bet someone's done this out with real numbers, I just haven't seen it.

franga2000 · on July 23, 2023

This only matters if you're scaling to meet demand and demand is higher than your spare resources, which often isn't the case for hobby projects. The 10€/mo VPS I've had for over 6 years now still has a few cores and GBs or RAM spare, so running a small model on the CPU for a personal project that only me and a few friends occasionally use wouldn't cost me a cent more.

immibis · on July 24, 2023

FYI, the going rate for "smallest possible VPS" is now more like 3€/mo.

bg24 · on July 23, 2023

It depends on your use case, correct? If you do not have a heavy inferencing requirement, then CPU is good enough.

pgbovine · on June 2, 2023

Cool work! You and your team may be interested in these two recent CHI papers from Microsoft Research, both on very relevant topics to what you've been doing:

1) “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models (https://arxiv.org/abs/2304.06597) -- they try to tackle a similar problem as what you described above

2) On the Design of AI-powered Code Assistants for Notebooks (https://arxiv.org/abs/2301.11178) - uses Mito as part of their case study

aarondia · on June 2, 2023

These are sweet -- thanks for sharing.

> Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models

I love the idea of giving users feedback on how to get better at prompting the LLM. I think the key to using this approach within Mito is giving users guidance at the right time -- sometimes shorter prompts get the job done, and they're always easier to write :)

A really sweet integration of this approach could be: when the LLM generated code errors or when we notice that the user undoes their previous prompt, we offer the user help in converting non-working prompts into ones that follow best practices of breaking complex tasks down into small steps.

> On the Design of AI-powered Code Assistants for Notebooks - uses Mito as part of their case study

Andrew McNutt, one of the authors presented this paper here: https://www.youtube.com/watch?v=g0prh8mE3bI Their different classifications of notebook code-gen tools has actually been super helpful in my own thinking. Thanks for the help, Andrew if you're a HNer

pgbovine · on Sept 16, 2018

as the son of a university professor (mother) and an entrepreneur/MBA/businessperson (father), i totally second this observation. +1

pgbovine · on Sept 16, 2018

ping me by email (see profile) if you want to brainstorm privately; my two cents is biased toward academia if you have an above-average setup (don't know until I see your CV!); startups come and go with each hype cycle.

pgbovine · on July 23, 2018

i totally know how you feel since i have felt that way in the past as well :) here's something relevant: http://pgbovine.net/practical-reason-to-pursue-PhD.htm

my two cents: find something where the grants align with your interests, or find a brand-new faculty member who has startup money so isn't as bound by grants in the near term. and if things don't work out, simply leave and go to industry -- with a marketable skill set like what you get from being a CS major, there's no way for you to lose. go wherever has work that you like more.

pgbovine · on July 10, 2018

totally agreed! things can change even from one year to another with the same advisor. for instance, i'm not the same advisor to my students that i was last year, or the year before that, or the year before that. (i've only been at this for 4 years, and each year is incredibly different from the prior one.) circumstances change, resources change, and constraints change.

pgbovine · on July 10, 2018

thanks! AMA.

(no guarantees that i'll be able to answer via text, though; maybe i'll make a video later. been trying to minimize my computering time off-hours due to increasing wrist pains ... PSA: take care of your wrists, everyone!)

hackpert · on July 10, 2018

Thanks so much for writing this memoir! It is absolutely brilliant. Considering you had a somewhat unconventional PhD with your independent projects, what was the main role of your advisor? How do you make the most of such a situation to extract knowledge out of professors who may not have an incentive to be directly involved in your project to the usual degree?

pgbovine · on July 10, 2018

thanks! that's a hard problem! i frame it in terms of critical path: http://pgbovine.net/critical-path.htm

if you can't get on someone's critical path, then you have to make it very easy for them to help you with very little time commitment. e.g.,: http://pgbovine.net/how-to-ask-for-help.htm

hackpert · on July 11, 2018

That makes a lot of sense, thanks!

misschresser · on July 10, 2018

What made you go back into academia? Based on this snapshot of your life, it felt like academia at its best gave you an outlet to explore the things you were really interested in, but at its worst, had a ton of obvious drawbacks. Even in this book, it seems a lot of the best experiences stemmed from or started outside of the program (MSR) and the Ph.D, while providing support and enabling these things to happen in the first place, was no longer an active contributor in.

I know you had mentioned you could write a whole book on this, so I'm sure there's a lot to the story.

pgbovine · on July 10, 2018

you've inadvertently answered the question for me, to a first approximation :) i think that academia is a great launching point for a wide array of scholarly activities that the free market (i.e., industry) doesn't directly pay for: research, public policy, outreach, teaching, mentoring, industry collaborations, etc. i can work with whatever companies i want (even ones that are actively competing with each other at the moment!) and be seen as a "neutral" party; i can share knowledge via teaching and research again with a "neutral" voice without being seen as a spokesperson for a particular company or other special interest group. you're right, though -- there's a whole lot to the story. maybe someday i'll write something up!

timeimp · on July 10, 2018

Would you do it again?

pgbovine · on July 10, 2018

no way! (i'm glad i did it the first time, though)