Hacker News new | past | comments | ask | show | jobs | submit login

Can you use https://github.com/abetlen/llama-cpp-python or you need something ollama provide ?

speaking of embeddings, you saw https://jina.ai/news/jina-embeddings-v3-a-frontier-multiling... ?




Switching to a low level integration will probably not improve the speed, the waiting is primarily on the llama generation of text.

Should be easy to switch embeddings.

Already playing with adding different tags to previous answers using embeddings, then using that to improve the reasoning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: