Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: AI companions stack – create and host your own AI companions (github.com/a16z-infra)
112 points by ykhli on July 12, 2023 | hide | past | favorite | 41 comments



It's not host your own AI at all! You need like 5 saas api-keys to operate this app. But sure, you can host this UI on your own.


I think we’re in platform reliance mode for quite a while. I think of all of these SaaS companies as different icons on an abstract AWS console that doesn’t exist yet.

We wouldn’t bat an eye at using S3, EC2, and RDS as a host your own setup. The only difference here is that startups are moving faster than incumbents.

FWIW that’s one reason why Steamship (disclaimer: I’m the founder) aggregates all AI services under a single API key and interface. It’s to deal with the insane glue-code hassle of running this stuff on your own.


That’s fine but the title is wrong


And using S3, RDS and EC2 I would argue is hosting your own as you are using foundational components. Same if you run LLaMa on a rented A100.


Not the same at this point. Setting up properly secure AWS stuff is arguably more demanding than setting up your own box at home.


Well, it is a round-up of their venture investments in the space ;-)

If you're looking to self-host chat memory rather than go all in on Supabase, there's Zep: https://github.com/getzep/zep

Full disclosure: I'm a co-author.


Supabase is also self-hostable though.


Also worth noting that they aren’t investors in supabase (but we’re very appreciative that they included us in this stack)


I stand corrected. Sometimes my cynicism gets the best of me ;-)


It doesn't seem to be a major barrier for most (it absolutely is for me). Is there enough of a want/market for tools that integrate with common OSS LLM APIs like oobabooga/Kobold/Novel/vLLM etc? One idea I've been toying with is a BYO model with support for these APIs, eg for IDE integration, brain storming etc. Seems like a good approach to me but how many people would actually bother standing up these LLM engines with APIs to use it?


It makes me wonder how long it will be until open-source locally-run AI chatbots reach the GPT-4 level. Five years? Ten?


The real limiter is having enough GPU RAM to train and run a usefully large model. Everything else is just fiddly details.


Not anymore. Llama.cpp and kobold.cpp run well with low vram and non Nvidia GPUs.

Huggingface is full of finetunes now, and I believe a 33b model can be finetuned on a single 3090.

Llama.cpp is developing some kind of training, but I have no idea what the requirements will be.


To me, GPT's real "secret sauce" is how it primes the model for interactive prompting instead of just text completion.


There are "chat" and "instruct" variants of most models now.


It depends on the tasks, I’d argue some of the open source alternatives are at the gpt-4 level already, particularly for code generation.

That said, I suspect summarization, translation, etc will take time. I’d suspect under 1 year.


>I’d argue some of the open source alternatives are at the gpt-4 level already, particularly for code generation.

Like which one?


Large context Chronos 33b (and some mixes) for roleplaying type chat.

And there are some very new 65b finetunes I have not tried.


The open source bot I played with last week (Wizard Vicuna uncensored) was 85% of gpt 3.5 on a VERY hard use case (fiction stories).

Maybe a year before we are the level of stablediffusion?


When it happens, I think it’ll happen on our phones first.


Given the fundamental hardware matrix operations, that seems unlikely, unless it's something very scaled down.


its more a question of hardware progress than software probably. theres a minimum level for these behaviors to emerge


I hosted my own AI using SillyTavern + text-generation backend (e.g. oogabooga / KoboldAI), just need the power needed to crank the LLM.


Koboldcpp (with ngrok if you need it) is another excellent self hosting solution.

13b will work on 16GB RAM, and 33b on 32GB RAM, with pretty much any dGPU for a little acceleration and RAM offloading.

Doubly so if you host it as an AI Horde node (so you have priority access to many models through the web browser).


I don't get why I wouldn't just keep an open tab with chatgpt on specific topics. They will very soon have big enough context windows. Why build another UI, deployment pipes and all that jazz ?

P.s nobody will *sms the companion


For me, it would be cost and alignment. If I own the software, I can choose whatever alignment suits me, or none at all. And ChatGPT is $20/month (assuming you want GPT4, and I do).

But there's still a good argument for a hybrid solution. Buy GPT4 access through the API and get a native UI to query it. Much cheaper to pay as you go, and someone else is still handling the heavy lifting. But if you want an uncensored model, you're out of luck.


ChatGPT is amazing but ultimately unwieldy for directed, long running relationships more nuanced than general themed chit chat.

We’re in the nascent stages but I think there will probably always be a community of folks who want to add more nuance to the communication, whether it’s reveries that enact a mood or goal, tie-ins to other services, etc.

Eg imagine wanting to have your ChatGPT DnD master also keep some kind of score. It may be ultimately easiest to put a wrapper around a themed GPT window that imposes a predictable way to do that rather than require everyone to figure out how to prompt it correctly.


But they created a wrapper web page for a web page here


It's mostly horny teenagers, tweens and 4chan degenerates. They don't need a good product they just need a uncensored one.


Its roughly comparable to how you can use a spreadsheet to do a lot of different things, but it improves the UX (with some trade-offs) to have more custom designed UIs instead of directly using a spreadsheet.


As someone working in a large non-software org, that statement does not hold. Spreadsheets are the underlying infrastructure to a mildly terrifying amount of modern civilization.


Privacy is a big one. ChatGPT (the website) gives Openai the right to use your conversations for model training (unless you turn conversation history off but that feature is rather important to the UX).

Anything going through the API on the other hand has a commitment to not do this and to purge the history after a month.


Tangentially related, how much would you guys be willing to pay for a company that could deliver a local model implementation that could run on high tier consumer grade hardware at a reduced ability? I feel like even if there was some severe restrictions (model wasn't open source, DRM, etc.) I’d still be willing to fork out for it.


I've seen at least one startup around this idea posted on HN: https://codecomplete.ai/

Probably also tiny corp for self hosting: https://geohot.github.io/blog/jekyll/update/2023/05/24/the-t...

I think there's a tremendous concern about letting data exfiltrate to any AI SaaS offering among business execs. If you could offer an experience minus the cloud that's easy to use and has compliance/logging features, I think you'd find success from industries that are reticent sharing customer data(e.g. Banking, government) who benefit greatly from the NLP workflows that LLMs enable


MLC Chat lets you run RedPajama 3B on an iPhone. It's free. You'd need to specify what type of model you're thinking about I guess?


Is this the promise of Amazon Sagemaker?


I love how honest the README is:

> Shortcomings

> Oh, there are so many.


Good job Yoko!


Is this a16z associate with the vc a16z?


Yeah. https://twitter.com/stuffyokodraws/status/167879943947197235...

Smart. A VC firm that heavily invests in software platforms will make much better decisions if they have first hand experience using the products.


you mean kaggle? don’t you?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: