[M3devel] This disgusting TEXT business

Olaf Wagner wagner at elegosoft.com
Sat Dec 20 20:27:55 CET 2008


Hi everybody,

I haven't followed all the discussions recently, but think it may help
to comment on a few things here.

IIRC the main reason for the current TEXT implementation was to become
compatible with Java. Java strings and M3 strings should look the same
for the embedded JVM written in M3. AFAIK this JVM has never been made
open source nor is there any intention to do it.
Still it may be a good idea to adhere to the TEXT representation if
we ever intend to combine Java and M3 in one runtime.

I don't think the implementation of the combined 8/16 bit TEXTs is
very mature. I think that some improvements could be made. Has anybody
looked into the details of the implementation? As it is already pointed
out on the old CM3 notes on the web pages, some things are slower while
others are now faster. I agree that the implementation is rather suboptimal
for some common use cases. We'd need a set of use cases and tests agreed
upon to really compare implementations though. Just one test for cat is
not enough IMO. This should be the first step for an improvement.

I'd also not object to a separation of the 8 and 16 bit implementations,
but the type WIDECHAR etc. should remain at least for
compatibility. We could perhaps even add some automatic conversion
then between TEXT and WIDETEXT types (though this would not be in the
M3 spirit :-) If we follow this way, the standard TEXTs could be
replaced by the old PM3 implementation again.

I'd also like to point out that much of the speed of using the old
implementation comes by using the internal TextF interface, which
exposes the internal TEXT representation as an array of CHAR. I'm
not sure if this is really a good idea. Replacing direct array access
by Text.GetChar was one of the main adaptions needed when converting
all the standard packages to the CM3 compiler for the first release.
I'd rather object to exposing this interface again.

We could even define the standard TEXT representation as UTF8 in CM3
instead of the standard ISO Latin1 code set, which seems a bit euro-
centric ;-) I'd support this idea, once we have proper UTF8 support
in CM3 (which we currently haven't if I am not mistaken, see below).

Quoting Dragiša Durić <dragisha at m3w.org>:

> Whole bussiness of mixed TEXT's - concat of TEXT and WIDETEXT is either
> slow, or produces results that make subsequent operations slow - is
> where problem is with this implementation.
>
> IMO, best solution would be to replace internal representation with
> UTF-8. For whom may be concerned with it - make some external widechar
> conversion routines available.
>
> That way - concat would be as fast as it can be made and other
> operations would be realistic - it is how almost everybody does their
> Unicode strings, after all. Almost everybody - excluding mobile industry
> and Microsoft :-), AFAIK.
>
> Compiler changes would make transparent conversion of literals.
> Everything else is already transparent with regard to internal
> representation.

I am not sure that it is as easy as that. I'd like to see a proper
update of the specification before we implement such a change.

> I've sent some UTF-8 routines ages ago to Olaf. IIRC, he was reluctant
> to accept them as he did not believe C base routines were widespread
> enough. GNU world has no such reluctance. Everything is UTF8.

No, the problem was that what I got couldn't even be compiled on
FreeBSD and Solaris. We should not rely on some external GNU libraries,
but we need to implement the UTF8 functions in CM3. Has anybody done
that yet?

> If nobody can fill this job in next few weeks, I will probably have some
> time this winter. No promise, of course, but list will know :-).

I'd be in favour of any improvements of the TEXT implementation and
UTF8 support in CM3!

> dd
>
> On Sat, 2008-12-20 at 19:26 +1100, Tony Hosking wrote:
>> Hmm, are we just victims of poor implementation?  Does anyone have the
>> time to improve things?  It would be possible to rip out CM3 TEXT and
>> replace with PM3, but we'd lose WIDECHAR and WideText.T with that
>> too.  Not sure who that impacts.

I'd be careful here, too. I think we need to understand the problems
and the impacts of changes better before we rip out much code.

Olaf
-- 
Olaf Wagner -- elego Software Solutions GmbH
                Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23 45 86 96  mobile: +49 177 2345 869  fax: +49 30 23 45 86 95
    http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194




More information about the M3devel mailing list