Hacker News new | past | comments | ask | show | jobs | submit login

I'm thinking about improving model response quality.

Training of preexisting LLM models that I'm familiar with consists of two aspects/sides/options: fine-tuning the model with additional, domain specific data (like internal company documentation) and RLHF (like comparing model responses to customer service actual responses) to further improve how well it's using that and original resources it has access to. That's how https://github.com/CarperAI sets up the process, for example.

What you're describing seems closer to the latter, but I'm not entirely sure if you're following the same structure at all.




Hey, Sidd from Vellum here!

Right now we offer traditional fine tuning with prompt/completion pairs but not training a reward model. This works great for a lot of use cases including classification, extracting structured data, or responding with a very specific tone and style.

For making use of domain specific data we recommend using semantic search to pull in the correct context at runtime instead of trying to fine tune a model on the entire corpus of knowledge.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: