<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=utf-8" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.19412"></HEAD>
<BODY
style="PADDING-LEFT: 10px; PADDING-RIGHT: 10px; WORD-WRAP: break-word; PADDING-TOP: 15px; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space"
id=MailContainerBody leftMargin=0 topMargin=0 CanvasTabStop="true"
name="Compose message area">
<DIV><FONT face=Arial>
<DIV apple-content-edited="true">>> Dirk, are you covering Unicode Collate
with your Unicode implementation? Except for Unicode<BR>>> tables (and
your earlier implementation of this is very useful) and UTF
handling/operations<BR>>> (I have this, very complete), Unicode Collate
remain biggest remaining obstacle for full incorporation<BR>>> of
Unicode/UTF8 into cm3.</DIV>
<DIV apple-content-edited="true"> </DIV>
<DIV apple-content-edited="true">Full Unicode collation implies normalization
and a lot more tables (and tools to extract these tables</DIV>
<DIV apple-content-edited="true">from the Unicode data base) The only libraries
I know about that do it are the (rather monstruous)</DIV>
<DIV apple-content-edited="true">IBM library and, to a certain extent, glib.
Even go(lang) doesn't offer it.</DIV>
<DIV apple-content-edited="true"> </DIV>
<DIV apple-content-edited="true">Normal comparison associated with simple case
folding (which is part of my library) is a first step</DIV>
<DIV apple-content-edited="true">in that direction. Simple case folding folds
only 1:1 UC/LC pairs. The emblematic special case</DIV>
<DIV apple-content-edited="true">being German eszet that folds to SS, but even
that case is now covered by the inclusion of a special</DIV>
<DIV apple-content-edited="true">eszet upper case glyph in recent Unicode
releases, so that most European languages are now</DIV>
<DIV apple-content-edited="true">covered. Languages that still need special
processing are the Turkic family (Turkish and Azeri).</DIV>
<DIV apple-content-edited="true"> </DIV>
<DIV apple-content-edited="true">So, as long as one compares only own text files
(which do not mingle accented glyphs and</DIV>
<DIV apple-content-edited="true">decomposed glyphs), one gets an acceptable
collation.</DIV></FONT></DIV>
<DIV style="FONT: 10pt Tahoma">
<DIV><FONT size=3 face=Arial></FONT><FONT size=3 face=Arial></FONT><BR></DIV>
<DIV style="BACKGROUND: #f5f5f5"><FONT size=3
face=Arial></FONT> </DIV></DIV></BODY></HTML>