Hacker News new | past | comments | ask | show | jobs | submit | azinman2's favorites login

I deleted my LinkedIn. Email or text me. 415 629 9329 or bastianlehmann@gmail.com - I do not like to invest in anything related to food though.

Yeah exactly, existing benchmark datasets available are underutilized (eg KILT, Natural questions, etc.).

But it is only natural that different QA use cases require different strategies. I built 3 production RAG systems / virtual assistant now, and 4 that didn't make it past PoC and what advanced techniques works really depends on document type, text content and genre, use case, source knowledgebase structure and metadata to exploit etc.

Current go-to is semantic similarity chunking (with overlap) + title or question generation > retriever with fusion on bienc vector sim + classic bm25 + condensed question reformulated QA agent. If you don't get some decent results with that setup there is no hope.

For every project we start the creation of a use-case eval set immediately in parallel with the actual RAG agent, but sometimes the client doesn't think this is priority. We convinced them all it's highly important though, because it is.

Having an evaluation set is doubly important in GenAI projects: a generative system will do unexpected things and an objective measure is needed. Your client will run into weird behaviour when testing and they will get hung up on a 1-in-100 undesirable generation.


What are the gold standard algorithms now?

Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: