[M3devel] Oops, forgot to ask

Rodney M. Bates rodney_bates at lcwb.coop
Thu Dec 17 00:35:29 CET 2009


Peter Eiserloh wrote:
> Hi Gang,
>
>
>
> 1 - How does one write a WIDECHAR literal?  Are WIDECHAR
> currently only 16-bits?  If so they are limited to only 
> the basic multilingual plane (BMP). 

Put a 'W' or 'w' immediately before the opening single
quote of a character literal, and it becomes a WIDECHAR
literal.  It has type WIDECHAR, allows characters with
16-bit codes, and also allows 16-bit octal and hex escapes.

You can do the same with TEXT literals, but there is no such
thing as "WIDETEXT".  There is only one type TEXT and it
can contain any characters in the 16-bit range.  The internal
representation has many options, dynamically chosen,
and part or all of the value may be represented using only
8 bits per character, if the character values permit.  Normally,
this is all hidden.
>  Unicode characters 
> are 20-bits, which is why encodings such as USC-32 exist.
> If a WIDECHAR is only 16-bits, are they encoded as UTF-16,
> which is a similar scheme to UTF-8?  If they are currently
> only 16-bits, are there plans to expand it to fully support 
> unicode character encodings?
>   
There have been some heated wars in this list over how to represent
characters in the 16- and 20- bit ranges, with no consensus,
that I saw.  It is confused by the fact that you can easily
put different representations into the same data type, without
language changes. 

I tend to favor fixed-size representations for in-memory program variables,
to preserve constant-time subscripted access.  But the variable
sized representations appear to allow certain common cases to
be handled by unmodified source code, written originally for
only 8-bit characters.
>
>
>
>
> +--------------------------------------------------------+
> | Peter P. Eiserloh                                      |
> +--------------------------------------------------------+
>
>
>       
>
>   




More information about the M3devel mailing list