[M3devel] Performance issues with CM3

Sun Apr 26 22:08:00 CEST 2009

Tony Hosking writes:
>
>On 26 Apr 2009, at 15:22, Mika Nystrom wrote:
>
>>
>> Hello again,
...
>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>  VAR t: RT0.TypeDefn;
>>  BEGIN
>>    IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
>>    t := Get (a);
>>    IF (t = NIL) THEN RETURN FALSE; END;
>>    IF (t.typecode = b) THEN RETURN TRUE END;
>>    WHILE (t.kind = ORD (TK.Obj)) DO
>>      IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
>>      t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
>>      IF (t = NIL) THEN RETURN FALSE; END;
>>      IF (t.typecode = b) THEN RETURN TRUE; END;
>>    END;
>>    IF (t.traced # 0)
>>      THEN RETURN (b = RT0.RefanyTypecode);
>>      ELSE RETURN (b = RT0.AddressTypecode);
>>    END;
>>  END IsSubtype;
>
>This is all to support dynamic loading of libraries.
>
>> Furthermore, CM3 has a hook for "ScanTypecase" that's missing
>> in PM3 (the older compiler actually generates code for this):
>>
>>  PROCEDURE ScanTypecase (ref: REFANY;
>>                          x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>>    VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
>>    BEGIN
>>      IF (ref = NIL) THEN RETURN 0; END;
>>      tc := TYPECODE (ref);
>>      p := x;  i := 0;
>>      LOOP
>>        IF (p.uid = 0) THEN RETURN i; END;
>>        IF (p.defn = NIL) THEN
>>          p.defn := FindType (p.uid);
>>          IF (p.defn = NIL) THEN
>>            Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>                  LOOPHOLE (p.uid, ADDRESS), NIL);
>>          END;
>>        END;
>>        xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>        IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END;
>>        INC (p, ADRSIZE (p^));  INC (i);
>>      END;
>>    END ScanTypecase;
>>
>> Where to begin?  A loop with all kinds of runtime checks of properties
>> that are supposedly known at compile time? IsSubtype (itself a loop)
>> called from inside the loop?
>
>Not if dynamically loaded!
>

Tony, all right, but the code is still an abomination!  The program
only gets loaded once (even if dynamically).  The TYPECASE might
be executed millions, billions, trillions of times... if you
execute this code a billion times, those IF statements are only
"effective" once and just return the same value 999,999,999 times.
Surely this information can be pre-computed at load time and stored
somewhere so the IFs don't have to be re-interpreted?

I don't know what the best solution is here.  Is it really impossible
to construct and re-construct the data structures that PM3 uses,
or something equivalent?

For instance, Cardelli's method of keeping the supertypes of a given
type in an array would work fine with dynamic loading.  It's almost
as fast as the PM3 method but takes a bit more memory.  One could
limit these arrays to a certain size and then skip to the next
array, to make a hybrid between what we have now in CM3 and a full
array for every type.

Do you think my hash table approach is safe?  It's not nearly as
elegant as the Cardelli method, but it works really well for a quick
and dirty.  It's ugly, but it's very fast.  Or can the arrays used
by the TYPECASEs move during program execution?

In any case, with the standard CM3 implementation, my program spends
more time in TYPECASE than the whole thing does under PM3.

Do we have any code in CM3 that loads new types dynamically?  I can
see loading code dynamically, but types...?  I'm very curious, is
there an interface for it that "mortals" can use?  If we have the
mechanism, might as well put it to work...

Another thing that bothers me a bit is that PushEFrame (sp?) seems to 
take a lot of cycles.  Is the use of this routine due to the absence
of a "stack walker"?  I thought FreeBSD had a working stack walker
in the Modula-3 runtime...?

The thing about PushEFrame is that it makes the programmer want to
avoid procedure calls, which is a real pity.  Sussman and Steele's
railings against the inefficient procedure calls in PL/I circa 1975
are legendary.  (It's true, procedural abstraction is really
wonderful.)

     Mika

P.S. My hash table code re-attached in case someone wants to look at it
again (instrumenting code commented out):

TYPE
  TypeCaseResult = RECORD
    x : ADDRESS;
    tc : Typecode;
    arm : INTEGER;
  END;

CONST
  TCCachePow = 6;
  TCCacheSize = Word.Shift(1,TCCachePow);
  TCMask = TCCacheSize-1;

VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult {
  TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } ,
  ..
  };

(*
VAR tcScans := 0; tcHits := 0;
*)

PROCEDURE ScanTypecase (ref: REFANY;
                        x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
  VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode; 
  BEGIN
    tc := TYPECODE (ref);
    IF (ref = NIL) THEN RETURN 0; END;

    WITH hash = Word.And(Word.Times(tc,
                                    Word.RightShift(LOOPHOLE(x,Word.T),2)),
                         TCMask),
         entry = TCCache[hash]  DO
      (*INC(tcScans);*)
      IF entry.x = x AND entry.tc = tc THEN
        (*INC(tcHits);*)
        RETURN entry.arm
      END;

      p := x;  i := 0;
      LOOP
        IF (p.uid = 0) THEN entry.x := x; entry.tc := tc; entry.arm := i; RETURN i; END;
        IF (p.defn = NIL) THEN
          p.defn := FindType (p.uid);
          IF (p.defn = NIL) THEN
            Fail (RTE.MissingType, RTModule.FromDataAddress(x),
                  LOOPHOLE (p.uid, ADDRESS), NIL);
          END;
        END;
        xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
        IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x; entry.tc := tc; entry.arm := i; RETURN i; END;
        INC (p, ADRSIZE (p^));  INC (i);
      END;
    END;
  END ScanTypecase;