Hacker News new | past | comments | ask | show | jobs | submit login

  cljs.user> (re-seq #"\s\w+\s" "やり直して")
  nil
Joking aside, is it common to use regular expressions? Seems like the method only works for languages with spaces. I think a more sophisticated lexer may be necessary, but are there are non-regex, "fast approximations" that work across most languages? This is a problem that I have not tried solving before.



すみません! It's because you have no space around. A more correct regexp would be \b\w+\b, with zero-width "word boundary" psttetbs instead of spaces.


That's just passing the problem onto how you define \b. Since Japanese uses no spaces, it would match entire phrases or sentences as "words", treating only punctuation as word boundaries.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: