[M3devel] UI libraries (also WIDECHAR/cm3 TEXT)

Fri Apr 26 21:37:18 CEST 2013

On 04/26/2013 01:08 AM, Dragiša Durić wrote:
> As for GUI, I have pretty extensive binding for Gtk+, and I am also done some work with gobject-introspection with plans to provide complete binding of everything glib. My plan is to publish it (an older version, for Gtk1, already circulated here). Gtk+ works on almost everything.
>
> As for TEXT system… Old TEXT system base type was semiopaque REF ARRAY OF CHAR. Compiler was hooking & to Text.Cat which was "allocate enough space, copy both arguments, keep everything ASCIIZ". New type makes & as efficient as possible, but does not disallow us to reimplement RtHooks' Cat and MultiCat to behave just like 3.6/pm3 Text.Cat.
>

Two or three years ago, I wrote a revised system that lies somewhere between the compromises of
PM3 and CM3 Text operations.  It uses the same data structure and invariants as the CM3 system,
so any values it produces are consistent with existing CM3-compiled code.  It partially flattens concatenated
strings, as in PM3, up to a point.  It also does some heuristic approximate rebalancing of concatenated
trees.  Between these, it achieves some significant performance advantages, though it loses some in
certain cases.

Both the PM3 and the CM3 system have performance problems when a Text is built up linearly,
left-to-right (or right-to-left) from single characters or short fragments.  This is a very
common pattern.  PM3 makes concatenation O(N), but accessing is O(1).  CM3 is the opposite.
Concatenation is O(1) but O(N) in accessing individual characters and substrings.  CM3 also
uses somewhere around an order of magnitude more memory for linearly-concatenated strings.

I posted a bunch of empirical performance numbers on this list, showing some of the tradeoffs
of my intermediate implementation.  I think it would represent a good middle ground.  Barring
that, I too would go back to the PM3 scheme, as the storage use and access time of the original
CM3 scheme can be pretty awful.

It never got put into the repository because I had the means to test only for LINUXLIBC6 and
AMD64_LINUX.  It is pretty thoroughly tested on those platforms.  It's hard to imagine
target-dependent bugs, but who hasn't been wrong about that before?

Perhaps we could get its test program into the automated system on the many platforms,
then consider using it.

> I think cm3's reasoning was pretty natural once they decided (because of task at hand, JVM) to create WIDECHAR as standard type.  Next was W"" literal, and one thing was following another. When this proved itself as efficient for their needs, it was cemented.
>
> I would like  to know if anybody would object to  reimplementation of TextCat module? I can make it, no problem, as people are not going to lynch me :).
>
> Dirk, are you covering Unicode Collate with your Unicode implementation? Except for Unicode tables (and your earlier implementation of this is very useful) and UTF handling/operations (I have this, very complete), Unicode Collate remain biggest remaining obstacle for full incorporation of Unicode/UTF8 into cm3.
>
> dd
>
> On Apr 25, 2013, at 10:12 PM, Dirk Muysers wrote:
>
>> I would myself like M3 to step back to the good old
>> SRC-PM3 text system (but UTF-8 encoded)
>> preferably to  the actual dual CHAR-WCHAR text
>> machinery).
>