Skip to content

woodrow/icunicode

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unicode sorting is complicated (unicode.org/reports/tr10), and Ruby doesn’t do it correctly. But there is a widely-used implementation of the Unicode collation algorithm in the ICU (International Components for Unicode) libraries. There is also no way to do Transliteration in Ruby (userguide.icu-project.org/transforms/general). This gem is a simple C wrapper around ucol_getSortKey from the ICU Collation API and utrans_transUChars from the ICU Transliteration API.

This fork differs from the upstream in two major ways:

  • icunicode been made encoding-aware for Ruby 1.9+

  • the methods are provided as methods on an ICUnicode module rather than monkeypatching them onto String

["cafe", "cafes", "caf\303\251"].sort
=> ["cafe", "cafes", "caf\303\251"]

require 'icunicode'

["cafe", "cafes", "caf\303\251"].sort_by {|s| ICUnicode.unicode_sort_key(s)}
=> ["cafe", "caf\303\251", "cafes"]

ICUnicode.transliterate(ICUnicode.transliterate("blueberry", "Katakana"), "Latin")
=> "burueberrui"

ICUnicode.transliterate(ICUnicode.transliterate("blueberry", "Greek"), "Latin")
=> "blyeberry"

You must install ICU first. You can download the source from site.icu-project.org/download, or on Mac, you can either install with MacPorts:

sudo port install icu

Or, if you’re using homebrew:

brew install icu4c
export cppflags=`brew --prefix icu4c`/include
export ldflags=`brew --prefix icu4c`/lib

Then install the gem:

gem install icunicode

Add support for locales other than en-US. Increase buffer size or make it grow dynamically.

Copyright © 2009 Justin Balthrop, Geni.com; Published under The MIT License, see LICENSE

About

A fork of icunicode that updates it for Ruby 1.9 and eliminates monkey-patching of String.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C 50.4%
  • Ruby 49.6%