Hacker News new | past | comments | ask | show | jobs | submit login

Can embeddings be used to capture stylistic features of text, rather than semantic? Like writing style?



Probably, but you might need something more sophisticated than cosine distance. For example, you might take a dataset of business letters, diary entries, and fiction stories and train some classifier on top of the embeddings of each of the three types of text, then run (embeddings --> your classifier) on new text. But at that point you might just want to ask an LLM directly with a prompt like - "Classify the style of the following text as business, personal, or fiction: $YOUR TEXT$"


You may get way more accurate results from relatively small models as well as logits for each class if you ask one question per class instead.


Likely not, embeddings are very crude. Embeddings of a text is just an average of "meanings" of words.

As is embeddings lack a lot of tricks that made transformers so efficient.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: