[M3devel] This disgusting TEXT business

Mika Nystrom mika at async.caltech.edu
Sun Dec 21 01:02:11 CET 2008


Olaf Wagner writes:
...
>> On Sat, 2008-12-20 at 19:26 +1100, Tony Hosking wrote:
>>> Hmm, are we just victims of poor implementation?  Does anyone have the
>>> time to improve things?  It would be possible to rip out CM3 TEXT and
>>> replace with PM3, but we'd lose WIDECHAR and WideText.T with that
>>> too.  Not sure who that impacts.
>
>I'd be careful here, too. I think we need to understand the problems
>and the impacts of changes better before we rip out much code.
>

I do agree.  My comments about simply putting back the PM3 code
weren't meant to be taken seriously.  I myself have always disliked
this "internationalization" business.  I remember my father used
to have to prefix prices in Telexes to England with an o-with-umlaut
(the o-with-umlaut on his Swedish Siemens printed as a lira on a
British machine).  And there's always been something vaguely
Socialistic about it: I don't mean the fact that I had to use
A-umlaut and Scandinavian-A-with-circle for square brackets on the
C-64, but that on the very first computer I ever used, the Swedish
Luxor ABC-80 (Zilog Z80 based), the Swedish standard of the time
called for the dollar sign to be replaced by a small "sun symbol"
pronounced "sol" in Swedish.  BASIC strings were A-sol and B-sol,
etc.  I do not believe the "sol" is any more useful than the "dollar"
in Sweden (probably less, especially in the winter); I have a feeling
that Swedish standards bodies at the time were simply trying to
show their sympathies for the money-less regime of the Khmer Rouge.

But that being said, I realize there are people out there who might
want non-ASCII in their strings.  I think if the CM3 implementation
does that well it might be worth fixing... not sure that UTF encoding
is any simpler, overall...?

A peek at Text.Equal sheds some light on the problem:

PROCEDURE Equal (t, u: T): BOOLEAN =
  VAR info_t, info_u: Info;
  BEGIN
    t.get_info (info_t);
    u.get_info (info_u);
    IF (info_t.length # info_u.length) THEN RETURN FALSE; END;
    IF (info_t.length = 0)             THEN RETURN TRUE;  END;

    IF   (info_t.start = NIL)
      OR (info_u.start = NIL)
      OR (info_t.wide # info_u.wide) THEN
      RETURN EqualBuf (t, u, info_t.length);
    ELSIF NOT info_t.wide THEN
      RETURN String8.Equal (info_t.start, info_u.start, info_t.length);
    ELSE
      RETURN String16.Equal (info_t.start, info_u.start, info_t.length);
    END;
  END Equal;

Under CM3, Text.Cat/RTHooks.Concat (&) returns a special subtype
of string, viz., TextCat.T, which has the following for get_info:

PROCEDURE MyGetInfo (t: T;  VAR info: TextClass.Info) =
  BEGIN
    info.start  := NIL;
    info.length := t.a_len + t.b_len;
    info.wide   := t.a_or_b_wide;
  END MyGetInfo;

The NIL for info.start is what pushes the code into EqualBuf, which
compares the strings character by character, but asks for each
character via the get_wide_chars method, which calls itself recursively
for each concatenated component.

info.start = NIL is, in short, a flag that means "non-standard TEXT,
use heavily object-oriented character-by-character routines".

It seems to me this implementation could be sped up a great deal by 
doing some simple micro-optimizations.  

Let me ask a very simple question... is it possible to sub-type
TEXT in CM3?  (Is such an interface exported?)  If not, it would
be quite easy to de-object-orient this code and get most of the PM3
performance back, I think (we'd have to flatten the strings a bit
too).  If the code can be subtyped by clients outside of the text
package it would be necessary to sort out what types of overrides
should be supported, and what should happen if someone, for instance,
subtypes a TextCat.T to have a different method for getting
characters...

I will try to get some time to look at this myself over the holidays.

Note to self and others: m3gdb is not completely useful here.  If
you look at the structure of a TextCat.T, you see that fields a and
b are TEXTs, but m3gdb doesn't really let you look inside them,
since, well, they are TEXTs and printed as such.  (You don't
get to see the whole gory mess of pointers.)  Is there a way to
turn off this (normally very helpful) m3gdb feature?

     Mika





More information about the M3devel mailing list