The second category essentially concerns spelling variations. Years and years ago I built a tools for searching old documents and to deal with all the spelling variants these contain I implemented something called the Gloria Guts algorithm, which ranks words based on how much their spelling differs. As I recall it worked much better than sounded for our data set