Support for _some_ embedding models works in Ollama (and llama.cpp - Bert models...

Support for _some_ embedding models works in Ollama (and llama.cpp - Bert models specifically)

  ollama pull all-minilm

  curl http://localhost:11434/api/embeddings -d '{
    "model": "all-minilm",
    "prompt": "Here is an article about llamas..."
  }'

Embedding models run quite well even on CPU since they are smaller models. There are other implementations with a library form factor like transformers.js https://xenova.github.io/transformers.js/ and sentence-transformers https://pypi.org/project/sentence-transformers/