Joking aside, is it common to use regular expressions? Seems like the method only works for languages with spaces. I think a more sophisticated lexer may be necessary, but are there are non-regex, "fast approximations" that work across most languages? This is a problem that I have not tried solving before.
That's just passing the problem onto how you define \b. Since Japanese uses no spaces, it would match entire phrases or sentences as "words", treating only punctuation as word boundaries.