[M3devel] Widechars

hendrik at topoi.pooq.com hendrik at topoi.pooq.com
Thu Jan 8 12:55:30 CET 2009


On Wed, Jan 07, 2009 at 11:32:24AM -0600, Rodney M. Bates wrote:
> hendrik at topoi.pooq.com wrote:
> >On Tue, Jan 06, 2009 at 09:25:08PM -0600, Rodney M. Bates wrote:
> >>Literals of type WIDECHAR are written, e.g. W'\x304A', note the 'W'.
> >>It makes them have type WIDECHAR and also enables the 16-bit escape
> >>sequences you are using.  If it is a CHAR or plain TEXT literal
> >>(no 'W' or 'w'), octal escapes must have exactly 3 octal digits and
> >>hex escapes must have exactly 2 hex digits.  In WIDECHAR and wide
> >>TEXT literals, octal escapes must have exactly 6 and hex escapes
> >>exactly 4.
> >
> >I've always thought that octal and hexadecimal escapes should be 
> >self-delimiting.  In the process of moving from 8- to 16-bit characters, 
> >the length of the escapes has changed, and we're going to hit it again 
> >when we implement the larger-than-16-bit Unicode charaters.
> >
> >In C, for strings, you can force early termination of an escape by 
> >ending the string with a quote, then a space, then starting a new string 
> >with another quote (since consecutive strings are implicitly 
> >concatenated, so you can write a really long string that doesn't fit on 
> >a line).  At least, that's what I seem to remember.  I haven't actually 
> >hacked this stuff for a long time.
> 
> This would be easy to implement, wouldn't break any existing code, and
> concatenating the fragments without an explicit '&' operator between
> would not suffer from ambiguity over whether the programmer expected
> runtime concatenation.  I have liked this facility in C for long strings,
> just because I find wrapped long lines both very difficult to read and
> ugly.

In C you don't have a concatenation operator.

I guess compile-time concatenation does really have different semantics 
from run-time, in the sense that compile-time gives you just one copy of 
the string, and run-time gives you a new one every time.

> 
> We could also easily make escapes prematurely terminated by a closing
> quote legal, again without breaking any existing code, which might be
> occasionally useful.

Of course, escapes could also be terminated by the backslash of the next 
escape.  I don't like this as much as an explicit terminator.

> 
> However, if somebody wants to put a lot of escape-specified characters
> in a row in a TEXT literal, and not have to give every one all the
> most significant digits, terminating with a closing quote, then
> starting another literal would make things even more pedantic than
> they are.

Well, you could choose a character that does nothing but terminate an 
escape.  Then you wouldn't have to start a new string.  But it would 
break existing code that happens to have that character after a string.

> 
> I have been thinking of a new set of escape letters (like the 'x',
> that immediately follow the backslash)  that can be given independently
> of the 'W', and each one implies the base and number of digits.  In
> theory, this could break existing code, if somebody had already
> redundantly escaped one of these new escape letters.  This seems rather
> unlikely to have happened.
> 
> >
> >>Note that character literals without/with the 'W' are of different
> >>types, CHAR and WIDECHAR, respectively.  For Text literals, both
> >>forms have the same type TEXT, but the lexical formation rules
> >>are different without/with the 'W'.
> >
> >Are we going to have to implement 'WW' to avoid retrocompatibility 
> >problems?
> >
> >-- hendrik
> 
> Rodney Bates
> 
> 



More information about the M3devel mailing list