I've been playing around with sentence embeddings to search documents, but I won...

IanCal · on March 27, 2023

> Maybe it might be possible to do some type of transform where the question is transformed into a possible answer and then turned into a embedding but I haven't found much info on that yet.

Here you go https://twitter.com/theseamouse/status/1614453236349693953

amitport · on March 27, 2023

"The way one might phrase a question might be very different content wise from how the document describes the answer."

You have late-interaction models, which replace the dot product with a few transformer layers and are able to learn complex semantics.

Of course this would adversely affect latency and embedding size, so you might want to compress and cache the answers, hence (shameless plug):

https://aclanthology.org/2022.acl-long.457/

jncraton · on March 27, 2023

Embeddings can be trained specifically to cause questions and content including their answers to have similar representations in latent space. This has been used this to create QA retrieval systems. Here's one commonly used example:

https://huggingface.co/sentence-transformers/multi-qa-MiniLM...

pstorm · on March 27, 2023

In your first paragraph, you are describing Hypothetical Document Embeddings (HyDE) [0]. I've tested it out, and in certain cases, it works amazingly well to get more complete answers.

[0] https://python.langchain.com/en/latest/modules/chains/index_...

summarity · on March 27, 2023

> might phrase a question might be very different content wise from how the document describes the answer

That's what hypothetical embeddings solve: https://summarity.com/hyde

There are also encoding schemes for question-answer retrieval (e.g. ColBERT)

visarga · on March 27, 2023

> The way one might phrase a question might be very different content wise from how the document describes the answer.

If the embeddings are worth their salt, then they should not be influenced by paraphrasing with different words. Try the OpenAI embeddings or sbert.net embedding models.

sharemywin · on March 27, 2023

is there an example what what your talking about?

Also would you just return a list of likely candidates and loop over the result set to see if any info is relevant to the question and then have the the final pass try to answer the question.

yadaeno · on March 27, 2023

How many embeddings can fit into a single input?

nunodonato · on March 27, 2023

embeddings are really good at that, you dont need to use similar words at all.