Hacker News new | past | comments | ask | show | jobs | submit | draismaa's comments login

LLM evaluations are tricky. You can measure accuracy, latency, cost, hallucinations, bias... but what really matters for your app? Instead of relying on generic benchmarks, build your own evals --> focused on your use case, and then, bring those evals into real-time monitoring of your LLM app. We open-sourced LangWatch to help with this.. How are you handling LLM evals in production?


Excited to introduce LangWatch, the tool designed for developers working with LLMs. It allows you to experiment with DSPy optimizers in a simple way and monitor the performance of LLM features in your projects. Key features include: DSPy optimizer integration: Test various optimization algorithms to enhance your LLM's efficiency. Monitor and visualize the performance metrics of different LLM features over time. Open Source: Contributions are welcome to help improve and expand the tool's capabilities. Feedback, suggestions, and contributions are highly appreciated.


Awesome to see more opensource tools in this space. In transparency we'r building the oss tool https://github.com/langwatch/langwatch which is tool for tracing and monitoring your LLM features and open telemetry is supported as well. Monitoring is key to any team building LLM-features, and still much can be done in this field. What i believe in is the power of optimizing when understanding your performance with these solutions. For ex we're using DSPy optimizers. Curious towards your thoughts int this too! Congrats on the launch and all the best!


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: