[M3devel] Widechars

Rodney M. Bates rodney.bates at wichita.edu
Wed Jan 7 18:32:24 CET 2009


hendrik at topoi.pooq.com wrote:
> On Tue, Jan 06, 2009 at 09:25:08PM -0600, Rodney M. Bates wrote:
>> Literals of type WIDECHAR are written, e.g. W'\x304A', note the 'W'.
>> It makes them have type WIDECHAR and also enables the 16-bit escape
>> sequences you are using.  If it is a CHAR or plain TEXT literal
>> (no 'W' or 'w'), octal escapes must have exactly 3 octal digits and
>> hex escapes must have exactly 2 hex digits.  In WIDECHAR and wide
>> TEXT literals, octal escapes must have exactly 6 and hex escapes
>> exactly 4.
> 
> I've always thought that octal and hexadecimal escapes should be 
> self-delimiting.  In the process of moving from 8- to 16-bit characters, 
> the length of the escapes has changed, and we're going to hit it again 
> when we implement the larger-than-16-bit Unicode charaters.
> 
> In C, for strings, you can force early termination of an escape by 
> ending the string with a quote, then a space, then starting a new string 
> with another quote (since consecutive strings are implicitly 
> concatenated, so you can write a really long string that doesn't fit on 
> a line).  At least, that's what I seem to remember.  I haven't actually 
> hacked this stuff for a long time.

This would be easy to implement, wouldn't break any existing code, and
concatenating the fragments without an explicit '&' operator between
would not suffer from ambiguity over whether the programmer expected
runtime concatenation.  I have liked this facility in C for long strings,
just because I find wrapped long lines both very difficult to read and
ugly.

We could also easily make escapes prematurely terminated by a closing
quote legal, again without breaking any existing code, which might be
occasionally useful.

However, if somebody wants to put a lot of escape-specified characters
in a row in a TEXT literal, and not have to give every one all the
most significant digits, terminating with a closing quote, then
starting another literal would make things even more pedantic than
they are.

I have been thinking of a new set of escape letters (like the 'x',
that immediately follow the backslash)  that can be given independently
of the 'W', and each one implies the base and number of digits.  In
theory, this could break existing code, if somebody had already
redundantly escaped one of these new escape letters.  This seems rather
unlikely to have happened.

> 
>> Note that character literals without/with the 'W' are of different
>> types, CHAR and WIDECHAR, respectively.  For Text literals, both
>> forms have the same type TEXT, but the lexical formation rules
>> are different without/with the 'W'.
> 
> Are we going to have to implement 'WW' to avoid retrocompatibility 
> problems?
> 
> -- hendrik

Rodney Bates





More information about the M3devel mailing list