It depends if they are using a “vanilla” instruction-tuned model or are applying additional task-specific fine-tuning. Fine-tuning with data that doesn’t have misspellings can make the model “forget” how to handle them.
In general, fine-tuned models often fail to generalize well on inputs that aren’t very close to examples in the fine-tuning data set.
We are doing a fair bit of task-specific fine-tuning for an asymmetric embeddings model (connecting user-entered descriptions of symptoms with the service solutions that resolved their issues).
I would like to run more experiments with this and see if introducing typos into the user-entered descriptions will help it not forget as much.
In general, fine-tuned models often fail to generalize well on inputs that aren’t very close to examples in the fine-tuning data set.