Ruby's character encoding and Unicode support is pretty strong. I'm intrigued ho...

pyre · on Oct 19, 2011

tchrist's OSCON Unicode talks:

http://98.245.80.27/tcpc/OSCON2011/index.html

Specifically the third talk:

http://98.245.80.27/tcpc/OSCON2011/gbu.html http://98.245.80.27/tcpc/OSCON2011/gbu.pdf

Excerpts:

  Its String functions like upcase or capitalize won’t even look at
  anything but ASCII. 

  It’s completely missing a whole lot of critical Unicode
  functionality:

    casemapping & -folding
    grapheme support
    normalization
    collation
    text segmentation, &c &c &c. 


  Every Ruby string carries around its encoding, instead of sanely
  unifying into Unicode internally like nearly everything else does.

Also:

  > baked right in to the language

is not synonymous with "intelligently implemented"

Note that I wasn't implying that "half-ass," "partial," and "just plain wrong" necessarily all apply to Python and/or Ruby's implementations. Some may apply to some areas while others may not, and really this extends outside of just Python and Ruby, but I'm trying to stay in context here.

petercooper · on Oct 19, 2011

This is interesting stuff - thanks for sharing, I'll be checking it out. The upcase/downcase stuff definitely checks out so far :-)