Hacker News new | past | comments | ask | show | jobs | submit login

mixtral 7*8B does indeed have this characteristic. It tends to disregard the requirement for structured output and often outputs unnecessary things in a very casual manner. However, I have found that models like qwen 72b or others have better controllability in this aspect, at least reaching the level of gpt 3.5.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: