[M3devel] build problems with libunicode

Fri May 30 18:39:39 CEST 2014

On 05/30/2014 03:00 AM, Elmar Stellnberger wrote:
>
> Am 29.05.2014 um 17:26 schrieb Rodney M. Bates:
>
>>
>> I am responsible for libunicode.
>>
>> 1. libunicode won't and is not designed to build unless the compiler is
>>   configured to make WIDECHAR have full unicode range, which, by default,
>>   it is not.
>>
>> I put libunicode in a separate package for that reason, and left the compiler
>> configured by default for the existing 16-bit range of WIDECHAR, so there
>> would be no perturbation to anybody's code unless you take some action.
>>
>> We can change the default if there is consensus to do so.  Most code should
>> not be affected, but some lower-level things will be.
>
> Well, the program I wanna port would actually profit from both types:
> * a 16-bit unicode type for interfacing with gtk & qt
> * a 32-bit unicode type for the program level user input function yielding math & greek letters
>
> Will it be possible to simply declare BITS 16 FOR WIDECHAR when unicode support is enabled?

This would not compile, as BITS FOR must be enough bits to hold the value set of the type.
But BITS 16 FOR [FIRST(WIDECHAR)..VAL(16_FFFF),WIDECHAR] would give you what you need to
pass in and out of a library that uses a 16-bit character.  You could either keep your
characters in this type or assign back and forth between this and WIDECHAR, which would
do runtime value checks where necessary.

But do gtk and qt actually treat all characters as 16-bit fixed-size code points, or do they
actually use UTF16, with the 16-bit value being a code unit, not a code point?  In the
latter case, using UniWr and UniRd to do the encoding/decoding would be a much cleaner
option.

> What complications to the Text library will that cause if we have a 16-bit and a 32-bit character
> type at the same time?

The Text library already has two sizes: CHAR (8) and WIDECHAR (either 16 or 32).  A third would require more
code.  Pickles and network objects would need quite a bit of additional code, and they are quite
a bit more complicated than Text, because they have to handle values written by different compilers.

> What would you think about leaving WIDECHAR as 16-bit and rather introducing UCHAR as
> 32-bit character type? I believe this would be the best solution as it does not break any existing
> code.
>
> Best Regards,
> Elmar Stellnberger
>
>>
>> Rodney Bates
>> rodney.m.bates at acm.org <mailto:rodney.m.bates at acm.org>
>>
>

-- 
Rodney Bates
rodney.m.bates at acm.org