[M3devel] This disgusting TEXT business
Dragiša Durić
dragisha at m3w.org
Tue Dec 23 19:43:00 CET 2008
As my mother's tongue uses two alphabets for writing, Latin covered by
ISO-8859-2 and Cyrillic covered by ISO-8859-5, with one-to-one glyph
correspondence and three digraphs in Latin variant I think I have
over-average experience with non-Latin1 alphabets, in various areas.
If I have to express what we call widetext literal in my code, I will
have to work with Unicode tables and pick character by character.
Tedious!
What I would do is - switch my keyboard to either Latin or Cyrillic
mapping and - imagine that!!! - just type! Thus getting UTF-8 characters
into my source. My example literal would be:
CONST
MyNameInCyrillic = "Драгиша Дурић";
MyNameInLatin = "Dragiša Durić";
You can see or not these glyphs, depending on your MUA and to some
degree on MTA's in transit.
With all WIDE* talk it is what I am using. Me being example guy from
non-Latin1 world. How many of you are non-Latin1 people and using 16bit
"W literals" ?
dd
On Tue, 2008-12-23 at 11:28 -0600, Rodney M. Bates wrote:
> I hear three problems with CM3 TEXT:
>
> 1) WIDECHAR and the TEXT implementation won't handle Unicode values that
> exceed 2^16-1.
>
...
> I think we already have a reasonably designed abstraction for TEXT,
> a bit of it built in to the language and the rest in Text.i3. Only
> problem 1) affects the abstraction. Addressing only 1) for now:
>
> It ought to cause minimal grief to just change WIDECHAR so it has a
> big enough value range for all the Unicode values, probably 32 bits
> in today's world. Surely nobody has written any code that assumes
> BITSIZE(WIDECHAR)=16. ;-) Even if so, this shouldn't be a terribly
> hard change to adapt old code to, since the static rules of the
> language would point directly to most places that would need to be
> fixed.
>
> That leaves the wide TEXT literals. It is easy to forget, and I have
> to keep reminding myself, but (assuming again that nobody has gotten
> their fingers improperly into the implementation pie) there is
> currently
> no such thing as a "WIDETEXT" type. Both kinds of literals are of
> type
> TEXT. They are just different lexical rules for specifying literal
> values of type TEXT. A bit like '16' and '16_10' are different ways
> of writing the same value, with the same type INTEGER. This differs
> from the CHAR and WIDECHAR literals, which really are of different
> types.
>
> The one change needed to the W"..." literals would be to allow
> escape sequences inside for giving characters numerically. Right
> now, the \x0123 form of escape requires exactly 4 hex digits. If
> we added a new alternative escape letter, in addition to the 'x', that
> required, say, exactly 8 hex digits, then these literals could express
> characters in the needed extra space, without affecting existing code.
> I suppose, for consistency and completeness, we should also add a new
> octal escape sequence that was long enough for the full new range.
>
> And, we would also need to allow the same new escape sequences in
> WIDECHAR
> literals.
>
...
>
> - Rodney Bates
>
--
Dragiša Durić <dragisha at m3w.org>
More information about the M3devel
mailing list