[M3devel] Performance issues with CM3

Mon Apr 27 02:26:06 CEST 2009

>>> Tony, all right, but the code is still an abomination!  The program
> only gets loaded once (even if dynamically).  The TYPECASE might
> be executed millions, billions, trillions of times... if you
> execute this code a billion times, those IF statements are only
> "effective" once and just return the same value 999,999,999 times.
> Surely this information can be pre-computed at load time and stored
> somewhere so the IFs don't have to be re-interpreted?

I haven't looked closely at this code but it's been there since the  
Critical Mass days.  The dynamic loading I refer to is for explicit  
opening and linking of .so files -- not the usual dynamic linking that  
takes place with .so files resolved at link time.  Anyway, perhaps the  
code can be cleaned up.

> I don't know what the best solution is here.  Is it really impossible
> to construct and re-construct the data structures that PM3 uses,
> or something equivalent?

I don't know.  Possibly.  Again, I don't know enough about this corner  
of the world to answer all of the following questions:

> For instance, Cardelli's method of keeping the supertypes of a given
> type in an array would work fine with dynamic loading.  It's almost
> as fast as the PM3 method but takes a bit more memory.  One could
> limit these arrays to a certain size and then skip to the next
> array, to make a hybrid between what we have now in CM3 and a full
> array for every type.
>
> Do you think my hash table approach is safe?  It's not nearly as
> elegant as the Cardelli method, but it works really well for a quick
> and dirty.  It's ugly, but it's very fast.  Or can the arrays used
> by the TYPECASEs move during program execution?
>
> In any case, with the standard CM3 implementation, my program spends
> more time in TYPECASE than the whole thing does under PM3.
>
> Do we have any code in CM3 that loads new types dynamically?  I can
> see loading code dynamically, but types...?  I'm very curious, is
> there an interface for it that "mortals" can use?  If we have the
> mechanism, might as well put it to work...

AFAIK you need to invoke RTLinker.AddUnit after using the native  
dlopen/dlsym facilities to get the symbol corresponding to the unit  
being loaded.  I've never tried using it but I did have some plans to  
do so back in my persistence days (when retrieving an instance of an  
unknown subtype we'd need to also dynamically load its defining module).

> Another thing that bothers me a bit is that PushEFrame (sp?) seems to
> take a lot of cycles.  Is the use of this routine due to the absence
> of a "stack walker"?  I thought FreeBSD had a working stack walker
> in the Modula-3 runtime...?

The stack walker is not usable for exception handling, only for  
printing a stack dump (IIRC).  Same was true for PM3.  We should do  
something about using the gcc-backend native stack walking support.   
Most of the overhead for PushEFrame (as compared to user-threaded PM3)  
is obtaining the thread-local stack.  For user-threaded PM3 it was a  
compiled in global variable (which obviously breaks for native threads).

> The thing about PushEFrame is that it makes the programmer want to
> avoid procedure calls, which is a real pity.  Sussman and Steele's
> railings against the inefficient procedure calls in PL/I circa 1975
> are legendary.  (It's true, procedural abstraction is really
> wonderful.)

I agree.  I've maintained the SOLgnu stack walking code for years so  
that I don't have to pay the overhead of the explicit exception stack.

>
>
>     Mika
>
> P.S. My hash table code re-attached in case someone wants to look at  
> it
> again (instrumenting code commented out):
>
> TYPE
>  TypeCaseResult = RECORD
>    x : ADDRESS;
>    tc : Typecode;
>    arm : INTEGER;
>  END;
>
> CONST
>  TCCachePow = 6;
>  TCCacheSize = Word.Shift(1,TCCachePow);
>  TCMask = TCCacheSize-1;
>
> VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult {
>  TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } ,
>  ..
>  };
>
> (*
> VAR tcScans := 0; tcHits := 0;
> *)
>
> PROCEDURE ScanTypecase (ref: REFANY;
>                        x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>  VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
>  BEGIN
>    tc := TYPECODE (ref);
>    IF (ref = NIL) THEN RETURN 0; END;
>
>    WITH hash = Word.And(Word.Times(tc,
>                                     
> Word.RightShift(LOOPHOLE(x,Word.T),2)),
>                         TCMask),
>         entry = TCCache[hash]  DO
>      (*INC(tcScans);*)
>      IF entry.x = x AND entry.tc = tc THEN
>        (*INC(tcHits);*)
>        RETURN entry.arm
>      END;
>
>      p := x;  i := 0;
>      LOOP
>        IF (p.uid = 0) THEN entry.x := x; entry.tc := tc;  
> entry.arm := i; RETURN i; END;
>        IF (p.defn = NIL) THEN
>          p.defn := FindType (p.uid);
>          IF (p.defn = NIL) THEN
>            Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>                  LOOPHOLE (p.uid, ADDRESS), NIL);
>          END;
>        END;
>        xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>        IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x;  
> entry.tc := tc; entry.arm := i; RETURN i; END;
>        INC (p, ADRSIZE (p^));  INC (i);
>      END;
>    END;
>  END ScanTypecase;
>