Hacker News new | past | comments | ask | show | jobs | submit login

Support for _some_ embedding models works in Ollama (and llama.cpp - Bert models specifically)

  ollama pull all-minilm

  curl http://localhost:11434/api/embeddings -d '{
    "model": "all-minilm",
    "prompt": "Here is an article about llamas..."
  }'
Embedding models run quite well even on CPU since they are smaller models. There are other implementations with a library form factor like transformers.js https://xenova.github.io/transformers.js/ and sentence-transformers https://pypi.org/project/sentence-transformers/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: