[M3devel] AND (., 16_ff). Not serious - or so I hope!

Wed Jun 27 22:27:29 CEST 2012

On 06/26/2012 11:34 AM, Jay wrote:
>
>>> > 128 limit
>>>
>>> I haven't read the code enough yet to verify that but you are probably right
>>
>> I was not right :), that call is incremental.
>
> I looked for that aspect too but missed it. :(
>
>
>>> > ignoring everything over 16_FF
>>>
>>> Probably that is the responsibility/claim of the caller of GetChars.
>>> If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars.
>>> Perhaps raising an exception would be reasonable to signal the loss of data. Or something.
>>> There is HasWideChars for you to check.
>>
>>
>>>
>>> There is no encoding implied remember.
>>> This isn't UTF8 data.
>>
>> It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :)
>>
>
> I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations.
>
>
> Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work..
>

TEXT is well abstracted and can be widened, with the exception that truncating characters to 8 bits to return
them in a CHAR is wrong.  It should be a checked runtime error, and this should be documented.
Note that while we have two types CHAR and WIDECHAR for scalars (and can also have arrays thereof),
there is still only one type TEXT.  Conceptually, it should be viewed as holding strings of WIDECHAR,
with some convenience functions for putting CHARs into and getting them out of a TEXT, when the programmer
knows the value is in this range.  The fact that our implementation stores some values in fields of
type CHAR is a hidden implementation detail.  There is nothing in the abstraction that requires it
to be done this way, or enables clients to know that.

We do have two kinds of text literals, conventional and wide.  They differ only in how the value
is specified, and the ability to specify characters outside of CHAR.

>
> Java is in-between but also simple & predictable -- there being no narrow option other than array of byte, which is reasonable.
>
>
> - Jay