[M3devel] Performance issues with CM3

Sun Apr 26 10:54:57 CEST 2009

On 26 Apr 2009, at 15:22, Mika Nystrom wrote:

>
> Hello again,
>
> Now I've managed to get all the code up and running under CM3.  I
> found and committed fixes to a bug in Wx and some code in one of
> the m3tk libraries that looked like it never was finished in the
> first place.
>
> As I mentioned earlier, I wasn't able to get user threads working
> in CM3 on FreeBSD 4.11.  But with some help from Jay, I was able to
> get things working with libc_r.  Performance, unfortunately,
> leaves something to be desired.
>
> For the first time I've been able to compare timings on identical
> hardware between the PM3 I was using and the CM3 that's out now.
>
> Note that optimization doesn't seem to work..?  (Not even -O, much
> less -O3... the compiler segfaults.)

Are you passing -gstabs?  It should not segfault on -O3 - this is a  
regression if it does.

> Here's what I get, using no optimization either in PM3 or CM3.  The
> test is my Scheme interpreter generating SQL and Modula-3 code
> (a bit like the Hibernate system you can get for Java):
>
>
>    CPU seconds      CM3       PM3
> First version      5.269     1.366
> Fewer NEWs         2.039     0.440  (code cleanup on my part)
> TYPECASE hack      1.770            (see below)
>
> Some "poor man's profiling" (i.e., ctrl-C'ing in m3gdb) suggests that
> most of the time is spent either in threading code (this could just
> be a lousy implementation in libc_r), the garbage collector, or in
> "ScanTypecase".
>
> The only one of these routines I am qualified to do anything about is
> ScanTypecase.  I don't know why the Critical Mass people... <how  
> colorful
> language can one use on m3devel?>.. all over this code.  I assume it  
> has
> something to do with Java.
>
> The PM3 code (from SRC?) has this wonderful, concise, efficient bit:
>
> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>  VAR t := Get (b);
>  BEGIN
>    IF (a >= RT0u.nTypes) THEN BadType (a) END;
>    IF (a = 0)            THEN RETURN TRUE END;
>    RETURN (t.typecode <= a AND a <= t.lastSubTypeTC);
>  END IsSubtype;
>
> replaced with the following absolute abomination in CM3:
>
> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>  VAR t: RT0.TypeDefn;
>  BEGIN
>    IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
>    t := Get (a);
>    IF (t = NIL) THEN RETURN FALSE; END;
>    IF (t.typecode = b) THEN RETURN TRUE END;
>    WHILE (t.kind = ORD (TK.Obj)) DO
>      IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
>      t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
>      IF (t = NIL) THEN RETURN FALSE; END;
>      IF (t.typecode = b) THEN RETURN TRUE; END;
>    END;
>    IF (t.traced # 0)
>      THEN RETURN (b = RT0.RefanyTypecode);
>      ELSE RETURN (b = RT0.AddressTypecode);
>    END;
>  END IsSubtype;

This is all to support dynamic loading of libraries.

> Furthermore, CM3 has a hook for "ScanTypecase" that's missing
> in PM3 (the older compiler actually generates code for this):
>
>  PROCEDURE ScanTypecase (ref: REFANY;
>                          x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>    VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
>    BEGIN
>      IF (ref = NIL) THEN RETURN 0; END;
>      tc := TYPECODE (ref);
>      p := x;  i := 0;
>      LOOP
>        IF (p.uid = 0) THEN RETURN i; END;
>        IF (p.defn = NIL) THEN
>          p.defn := FindType (p.uid);
>          IF (p.defn = NIL) THEN
>            Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>                  LOOPHOLE (p.uid, ADDRESS), NIL);
>          END;
>        END;
>        xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>        IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END;
>        INC (p, ADRSIZE (p^));  INC (i);
>      END;
>    END ScanTypecase;
>
> Where to begin?  A loop with all kinds of runtime checks of properties
> that are supposedly known at compile time? IsSubtype (itself a loop)
> called from inside the loop?

Not if dynamically loaded!

> I was able to cut out almost all of the typecase activity from my
> program by using the following code in RTType.m3, which depends on
> the ADDRESS x never changing (well more specifically never being
> the same for two TYPECASE statements):
>
> TYPE
>  TypeCaseResult = RECORD
>    x : ADDRESS;
>    tc : Typecode;
>    arm : INTEGER;
>  END;
>
> CONST
>  TCCachePow = 6;
>  TCCacheSize = Word.Shift(1,TCCachePow);
>  TCMask = TCCacheSize-1;
>
> VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult {
>  TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } ,
>  ..
>  };
>
> (*
> VAR tcScans := 0; tcHits := 0; (* instrumenting counters *)
> *)
>
> PROCEDURE ScanTypecase (ref: REFANY;
>                        x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>  VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
>  BEGIN
>    tc := TYPECODE (ref);
>    IF (ref = NIL) THEN RETURN 0; END;
>
>    WITH hash = Word.And(Word.Times(tc,
>                                     
> Word.RightShift(LOOPHOLE(x,Word.T),2)),
>                         TCMask),
>         entry = TCCache[hash]  DO
>      (*INC(tcScans);*)
>      IF entry.x = x AND entry.tc = tc THEN
>        (*INC(tcHits);*)
>        RETURN entry.arm
>      END;
>
>      p := x;  i := 0;
>      LOOP
>        IF (p.uid = 0) THEN entry.x := x; entry.tc := tc;  
> entry.arm := i; RETURN i; END;
>        IF (p.defn = NIL) THEN
>          p.defn := FindType (p.uid);
>          IF (p.defn = NIL) THEN
>            Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>                  LOOPHOLE (p.uid, ADDRESS), NIL);
>          END;
>        END;
>        xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>        IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x;  
> entry.tc := tc; entry.arm := i; RETURN i; END;
>        INC (p, ADRSIZE (p^));  INC (i);
>      END;
>    END;
>  END ScanTypecase;
>
> I'm guessing the speedup for TYPECASE itself is a factor of at least
> ten.  But it's still a pretty nasty hack.  And there is still a lot
> of IsSubtype activity from narrowing.
>
> I suppose that the way the typecodes are generated in CM3 is
> sufficiently different (meant to be extended at runtime?) from how
> it was done in PM3 that one can't really go back to the old code.
> Cardelli's idea of keeping an array of parents up to ROOT plus a
> "depth" for each type might have merit, though.
>
> To see if a is a subtype of b, something like:
>
> b = a.ancestors[a.depth-b.depth-1] (* with appropriate range checks *)
>
> Would this be easy to put in?  I'm not sure how one can be sure
> that typecodes are done being generated?  There's something called
> RTTypeSRC.FinishObjectTypes ..
>
> And PM3 still generates code that's four times faster.
>
>    Mika
>