Hacker News new | past | comments | ask | show | jobs | submit login

not a token, and not the transformers, but yes, commercial chat models are fine-tuned on text transcripts containing dialogues. (i believe llama-2 was as well)



Are you sure? I have never seen an LLM that did not have a special token for start of text, I'm certain that llama had one and I don't remember anywhere in the llama-2 paper where they said they removed it.


tl;dr: you're right

it's messy though, bear with me for the full explanation:

- your initial post says "<bot>" token, which looked like a mix of "chatbot" and ChatML, used by OpenAI

- there is a bo_S_ token, which acts as you described

- I averaged my attention over your post and the initial reply, which answers as if you were using "<bot>" in the misunderstood way

- when I go back and read your post, I realize the chatbot interpretation doesn't quite make sense, since you're referring to much more technical aspects than general "how do I AI", i.e. you understand <X> as a way to denote special tokens, not necessarily an XML tag




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: