[M3devel] Widechars
hendrik at topoi.pooq.com
hendrik at topoi.pooq.com
Thu Jan 8 12:55:30 CET 2009
On Wed, Jan 07, 2009 at 11:32:24AM -0600, Rodney M. Bates wrote:
> hendrik at topoi.pooq.com wrote:
> >On Tue, Jan 06, 2009 at 09:25:08PM -0600, Rodney M. Bates wrote:
> >>Literals of type WIDECHAR are written, e.g. W'\x304A', note the 'W'.
> >>It makes them have type WIDECHAR and also enables the 16-bit escape
> >>sequences you are using. If it is a CHAR or plain TEXT literal
> >>(no 'W' or 'w'), octal escapes must have exactly 3 octal digits and
> >>hex escapes must have exactly 2 hex digits. In WIDECHAR and wide
> >>TEXT literals, octal escapes must have exactly 6 and hex escapes
> >>exactly 4.
> >
> >I've always thought that octal and hexadecimal escapes should be
> >self-delimiting. In the process of moving from 8- to 16-bit characters,
> >the length of the escapes has changed, and we're going to hit it again
> >when we implement the larger-than-16-bit Unicode charaters.
> >
> >In C, for strings, you can force early termination of an escape by
> >ending the string with a quote, then a space, then starting a new string
> >with another quote (since consecutive strings are implicitly
> >concatenated, so you can write a really long string that doesn't fit on
> >a line). At least, that's what I seem to remember. I haven't actually
> >hacked this stuff for a long time.
>
> This would be easy to implement, wouldn't break any existing code, and
> concatenating the fragments without an explicit '&' operator between
> would not suffer from ambiguity over whether the programmer expected
> runtime concatenation. I have liked this facility in C for long strings,
> just because I find wrapped long lines both very difficult to read and
> ugly.
In C you don't have a concatenation operator.
I guess compile-time concatenation does really have different semantics
from run-time, in the sense that compile-time gives you just one copy of
the string, and run-time gives you a new one every time.
>
> We could also easily make escapes prematurely terminated by a closing
> quote legal, again without breaking any existing code, which might be
> occasionally useful.
Of course, escapes could also be terminated by the backslash of the next
escape. I don't like this as much as an explicit terminator.
>
> However, if somebody wants to put a lot of escape-specified characters
> in a row in a TEXT literal, and not have to give every one all the
> most significant digits, terminating with a closing quote, then
> starting another literal would make things even more pedantic than
> they are.
Well, you could choose a character that does nothing but terminate an
escape. Then you wouldn't have to start a new string. But it would
break existing code that happens to have that character after a string.
>
> I have been thinking of a new set of escape letters (like the 'x',
> that immediately follow the backslash) that can be given independently
> of the 'W', and each one implies the base and number of digits. In
> theory, this could break existing code, if somebody had already
> redundantly escaped one of these new escape letters. This seems rather
> unlikely to have happened.
>
> >
> >>Note that character literals without/with the 'W' are of different
> >>types, CHAR and WIDECHAR, respectively. For Text literals, both
> >>forms have the same type TEXT, but the lexical formation rules
> >>are different without/with the 'W'.
> >
> >Are we going to have to implement 'WW' to avoid retrocompatibility
> >problems?
> >
> >-- hendrik
>
> Rodney Bates
>
>
More information about the M3devel
mailing list