[M3devel] EXT Re: UTF-8 TEXT

Coleburn, Randy rcolebur at SCIRES.COM
Fri Jun 29 01:35:29 CEST 2012


...
> I feel very stongly that we should *not* take away the full generality of Text, 
> especially efficient random access, to handle variable-length character 
> encodings in strings.  For these, lets make more friends of Wr and Rd, which 
> already assume sequential access.  For example, a filter pipe that sequentially 
> reads a Text/Array/stream, applies a UTF-8 interpretation to its bytes, and 
> delivers a stream of Unicode characters, in variables of type WIDECHAR.
> 
> Text should preserve the abstraction that it's a string of characters, 
> generalized as it already is in cm3, to have type WIDECHAR, so they can be any 
> Unicode character.  The internal representation should, usually, not be of concern.
...

I concur with Rodney. We need to hold true to the design tenants of the language and keep the full generality of Text with efficient random access, and add new variants of the Rd/Wr/etc. abstractions that deal with the various variable-length character encodings as sequential-access streams.

--Randy Coleburn



More information about the M3devel mailing list