Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Ragdoll Studio (fka Arthas.AI) is the FOSS alternative to character.ai (ragdoll-studio.vercel.app)
95 points by bschmidt1 6 months ago | hide | past | favorite | 24 comments



Does this do RAG over the character's chat history too? That's something SillyTavern can also do with extensions, but I figured since your project already uses Llamaindex, this feature can be something that's already baked in from the get-go.


Yep it can do CoT for ongoing conversations or to get to the bottom of something through back-and-forth. And you nailed it regarding llamaindex, they provide framework options: https://docs.llamaindex.ai/en/latest/examples/chat_engine/ch... (perfect for HN with the Paul Graham example!)

They even dabble in custom personalities with prompt mixins (example: You can chat with a PDF that will respond like Shakespeare), and if this part was more robust I would delegate to it instead of what I created with ragdoll's prompt prefixes. Turns out the hard part is not converting third-person to first-person. For ragdoll, the heavy lifting is more in the configuration and management of different personas, its multi-modality (of models), the Node & React libraries so that developers can use them in realistic applications... where the value llamaindex brings is its incredible indexing capabilities combined with a conversational query engine (why I chose llamaindex over langchain for this). Ragdoll picks up where llamaindex leaves off regarding personas.

I love that SillyTavern says on their GitHub README: "On its own Tavern is useless, as it's just a user interface. You have to have access to an AI system backend that can act as the roleplay character." I want to avoid being a thin wrapper, and instead have that roleplay character aspect be central to what ragdoll does, so that it can be the de facto creative studio for any character-focused creative deliverable: A story, a film, music, games - so that a user can literally create films and music (and more) in this app like some kind of super Photoshop. I think to accomplish that, it cannot simply be a thin wrapper around an open model. It has to bring as much to the table as an ultra fine-tuned model would yet in seconds instead of years, and with the app- and community-level functionality needed (including being a free-to-use creator tool) to get people to actually build things with it.


For anyone curious llamaindex's "prompt mixins", they're actually dead simple: https://github.com/run-llama/llama_index/blob/8a8324008764a7... - and maybe no longer supported.

I basically reinvented this wheel in ragdoll but made it more dynamic: https://github.com/bennyschmidt/ragdoll/blob/master/src/util...


I don't think a tavern is a place one goes to to make movies


Not yet haha but even as a place to hang out and casually chat, it would be cool if the character occasionally rendered a cutscene to go along with narratives, or you could optionally enable music and sfx like an audiobook. Maybe the most interesting ones you could export (and distribute for others to experience).

Though I bet the transition from AI text chat to rich multimedia will be like silent films to talkies - where some characters just aren't as interesting with a voiceover or depicted in a video. For some types of characters (written storytellers, etc.) the best interactions might always be text-based.

I felt this with the Final Fantasy 7 Remake, though it's clearly improved from the 1997 version, something felt lost in the transition from the old pre-rendered scenes (drawings) and having to read the dialog in your head, to now having high-quality voiceovers in the best 3D scenes. Yet, if you take a Metal Gear Solid or a Madden - the richer the experience the better.

Ideally: You start out just wanting to go to the tavern and chat with a group of characters, but that interaction became so unexpectedly rich and entertaining you want to capture it, so you can watch it again or share it.


I see. That's an enchanting vision indeed! And with other human players in the same space it sounds unendingly wonderful.


Similar to https://faraday.dev/ that also runs locally. I wish I can install on desktop like Faraday to try it.


I don't quite understand how it's an open-source alternative to Character.AI. It seems like just a front-end for running vanilla open-source models, rather than something fundamentally different. Why do you need this while the OSS model chatbots are widely available on various platforms?


It's a FOSS alternative because instead of using their online platform to make an account in order to interact with their characters, you instead download and run this source code without needing an account to interact with yours, and those in the community - even offline if you want.

There's more to it than just chatting with AI models - these "ragdolls" are based on RAG so they are able to have distinct personalities and scoped knowledge. They can't leak info they don't know about, can't be censored from talking about things they do know about, and can be exported and shared with others on the community site. The model itself is more like the underlying engine, where the ragdoll is a specific persona that you can deploy to a variety of tasks (starting with just chat).

I want to go beyond what character.ai is even dreaming of doing by making this into a creative suite for all things AI: Chat, storytelling, art & visual design, cinematography, CGI, music & sound effects, and so-on. It would be great to have an open-source character-focused tool for making stories, cinematics, films, music, etc. Here's an example that uses a ragdoll as an NPC in a game that has specific knowledge that the player can uncover in order to advance in the game: https://github.com/bennyschmidt/ragdoll-studio/tree/master/e...

Thanks for the question, I will add a section on how these are fundamentally different from just chatting with ChatGPT.


Well… what do you think character AI is?


Out of curiosity: did the name change because of issues with Blizzard?


Why there is no a video in the front page?


Good idea, working on making the UI better right now so I'll try to get a video up by tonight! Thanks


No problem, it is just that when someone just go to a website it would often love to see how it really work before using it.


So, this interacts with someone else's hosted model, or something open source you might host yourself. But these models are still more or less censored.

What does it take to build something like this that is completely sexually unshackled, that can do explicit things? I guess simply a lot of training material and money?


Ragdoll is based on RAG - the knowledge is scoped to documents you define, and is uncensored, even if the model you're using has censorship.

For example, if you provided some raunchy URLs as Lara Croft's source of knowledge, you could have realistic conversations about the content there, however sexually unshackled they may be. To test it, I just searched for some known banned words in character.ai and tried them in character.ai. It refused to answer, tried to convince me to not talk about it, and even clarified there's a policy against these words, breaking the 4th wall. I asked the same thing to a character in Ragdoll and can confirm they will speak on it like any other knowledge you provide.

Original RAG paper: https://arxiv.org/pdf/2005.11401.pdf

Thanks for asking about this, I see this come up a lot on character.ai discussions elsewhere, and I'm going to do a NSFW case study about it soon!


Simple general purpose uncensored models without any specific dataset are already pretty good with that kind of stuff.


Interesting; would you have any specific recommendations for an intelligent (high parameter count? Something that can hold a conversation ...), uncensored, specific model that is able to do this?


If anyone is looking for a cloud option (e.g. don't have the hardware, just want to pay someone else to handle the compute while still writing about generally uncensored things like a violent RPG adventure or something explicit in the adult sense), then I guess NovelAI could fit the bill: https://novelai.net/

I guess most people focus on their image generation which is pretty cool for anime characters and things like placeholder art (say, in gamedev), but their text generation options are also pretty nice and can be used either through their web UI, or something like SillyTavern: https://github.com/SillyTavern/SillyTavern

If you do want to run something locally and have the hardware, I think people discuss models in a variety of places, like https://old.reddit.com/r/LocalLLaMA/comments/16425mk/best_ns... (quick search, was curious, NSFW discussion) and it seems like there's plenty of models on HuggingFace, though I haven't particularly looked into them.

I will absolutely say that even when I was trying to run a general purpose chatbot LLM, it was kind of impressive how quickly things are moving ahead in regards to LLMs: all of these GGML, GGUF and other formats, being able to use even consumer GPUs (though I couldn't get ROCm working with my RX580) and experimental attempts like using Vulkan.


I think most custom fine-tunes and merges on HuggingFace will do this unless they specifically mention it being censored. Even the lower param models have been surprisingly good, with relatively fast progress being made in the 7b and 11b models.

My "daily driver" is Fimbulvetr v2 11b, surprisingly slapped together by an EMT. Kunoichi 7b seems to be a pretty popular model too. These can be run locally with as little as 8 GB free RAM (preferably VRAM) with an easy install solution like LMStudio or Faraday.

You can generally find a lot of recommendations in places like SillyTavernAI or LocalLLaMa on reddit:

https://old.reddit.com/r/SillyTavernAI/comments/1brig2n/whic...


https://huggingface.co/fhai50032/BeagleLake-7B-Toxic-GGUF

This one works with GGUF-compatible llama.cpp wrappers like the fully-local, fully-private Layla app or the other examples tossed around. With some prompt tweaking it's capable of a broad variety of... tasks.

This model is very compliant and doesn't tend to go to "I'm sorry, as a large language model..." type replies.


Download ollama and use one of the dolphin models. You can literally do it in two commands:

     curl -fsSL https://ollama.com/install.sh | sh
     ollama run dolphin-phi
That puts you into a chat with an uncensored LLM. "Phi" is the smallest model though.


Just wanted to add that ollama doesn't support text-to-image yet, ragdoll implements stable diffusion's txt2img separately. I tried that Phi one, it was very bad, not even close to ready for primetime. My favorites from ollama so far are mistral for most uses and alfred for highly instructional use cases. I can't even try some of them because of the requirements, but some of the apparently high-end ones I have tried just aren't good - unusably so. Mistral is pretty awesome, glad they are fully open.


It takes peeling yourself out of your gaming chair, getting a gym membership and trying to make something of your real life.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: