Can embeddings be used to capture stylistic features of text, rather than semant...

levocardia · 2024-04-17T19:56:55 1713383815

Probably, but you might need something more sophisticated than cosine distance. For example, you might take a dataset of business letters, diary entries, and fiction stories and train some classifier on top of the embeddings of each of the three types of text, then run (embeddings --> your classifier) on new text. But at that point you might just want to ask an LLM directly with a prompt like - "Classify the style of the following text as business, personal, or fiction: $YOUR TEXT$"

vladimirzaytsev · 2024-04-17T20:47:43 1713386863

You may get way more accurate results from relatively small models as well as logits for each class if you ask one question per class instead.

vladimirzaytsev · 2024-04-17T19:57:59 1713383879

Likely not, embeddings are very crude. Embeddings of a text is just an average of "meanings" of words.

As is embeddings lack a lot of tricks that made transformers so efficient.