[M3devel] This disgusting TEXT business

Rodney M. Bates rodney.bates at wichita.edu
Tue Dec 23 18:28:36 CET 2008


I hear three problems with CM3 TEXT:

1) WIDECHAR and the TEXT implementation won't handle Unicode values that
    exceed 2^16-1.

2) The CM3 TEXT implementation has serious inefficiencies in at least some
    realistic cases.

3) We want some kind of compatibility with other software.

I think we already have a reasonably designed abstraction for TEXT,
a bit of it built in to the language and the rest in Text.i3.  Only
problem 1) affects the abstraction.  Addressing only 1) for now:

It ought to cause minimal grief to just change WIDECHAR so it has a
big enough value range for all the Unicode values, probably 32 bits
in today's world.  Surely nobody has written any code that assumes
BITSIZE(WIDECHAR)=16. ;-)  Even if so, this shouldn't be a terribly
hard change to adapt old code to, since the static rules of the
language would point directly to most places that would need to be
fixed.

That leaves the wide TEXT literals.  It is easy to forget, and I have
to keep reminding myself, but (assuming again that nobody has gotten
their fingers improperly into the implementation pie) there is currently
no such thing as a "WIDETEXT" type.  Both kinds of literals are of type
TEXT.  They are just different lexical rules for specifying literal
values of type TEXT.  A bit like '16' and '16_10' are different ways
of writing the same value, with the same type INTEGER.  This differs
from the CHAR and WIDECHAR literals, which really are of different
types.

The one change needed to the W"..." literals would be to allow
escape sequences inside for giving characters numerically.  Right
now, the \x0123 form of escape requires exactly 4 hex digits.  If
we added a new alternative escape letter, in addition to the 'x', that
required, say, exactly 8 hex digits, then these literals could express
characters in the needed extra space, without affecting existing code.
I suppose, for consistency and completeness, we should also add a new
octal escape sequence that was long enough for the full new range.

And, we would also need to allow the same new escape sequences in WIDECHAR
literals.

This would solve the lack of sufficient range problem and have either
minimal or zero impact on existing code.  Compiler and library changes
would not be trivial, but quite reasonable.

I believe problems 2) and 3) can be addressed solely by working on the TEXT
implementation

- Rodney Bates




More information about the M3devel mailing list