[M3devel] Performance issues with CM3

Mika Nystrom mika at async.caltech.edu
Sun Apr 26 07:22:00 CEST 2009


Hello again,

Now I've managed to get all the code up and running under CM3.  I
found and committed fixes to a bug in Wx and some code in one of
the m3tk libraries that looked like it never was finished in the
first place.

As I mentioned earlier, I wasn't able to get user threads working
in CM3 on FreeBSD 4.11.  But with some help from Jay, I was able to
get things working with libc_r.  Performance, unfortunately,
leaves something to be desired.

For the first time I've been able to compare timings on identical
hardware between the PM3 I was using and the CM3 that's out now.

Note that optimization doesn't seem to work..?  (Not even -O, much
less -O3... the compiler segfaults.)

Here's what I get, using no optimization either in PM3 or CM3.  The
test is my Scheme interpreter generating SQL and Modula-3 code
(a bit like the Hibernate system you can get for Java):
 

    CPU seconds      CM3       PM3
First version      5.269     1.366
Fewer NEWs         2.039     0.440  (code cleanup on my part)
TYPECASE hack      1.770            (see below)

Some "poor man's profiling" (i.e., ctrl-C'ing in m3gdb) suggests that
most of the time is spent either in threading code (this could just
be a lousy implementation in libc_r), the garbage collector, or in
"ScanTypecase".

The only one of these routines I am qualified to do anything about is
ScanTypecase.  I don't know why the Critical Mass people... <how colorful
language can one use on m3devel?>.. all over this code.  I assume it has
something to do with Java.

The PM3 code (from SRC?) has this wonderful, concise, efficient bit:

PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
  VAR t := Get (b);
  BEGIN
    IF (a >= RT0u.nTypes) THEN BadType (a) END;
    IF (a = 0)            THEN RETURN TRUE END;
    RETURN (t.typecode <= a AND a <= t.lastSubTypeTC);
  END IsSubtype;

replaced with the following absolute abomination in CM3:

PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
  VAR t: RT0.TypeDefn;
  BEGIN
    IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
    t := Get (a);
    IF (t = NIL) THEN RETURN FALSE; END;
    IF (t.typecode = b) THEN RETURN TRUE END;
    WHILE (t.kind = ORD (TK.Obj)) DO
      IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
      t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent; 
      IF (t = NIL) THEN RETURN FALSE; END;
      IF (t.typecode = b) THEN RETURN TRUE; END;
    END;
    IF (t.traced # 0)
      THEN RETURN (b = RT0.RefanyTypecode);
      ELSE RETURN (b = RT0.AddressTypecode);
    END;
  END IsSubtype;

Furthermore, CM3 has a hook for "ScanTypecase" that's missing
in PM3 (the older compiler actually generates code for this):

  PROCEDURE ScanTypecase (ref: REFANY;
                          x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
    VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
    BEGIN
      IF (ref = NIL) THEN RETURN 0; END;
      tc := TYPECODE (ref);
      p := x;  i := 0;
      LOOP
        IF (p.uid = 0) THEN RETURN i; END;
        IF (p.defn = NIL) THEN
          p.defn := FindType (p.uid);
          IF (p.defn = NIL) THEN
            Fail (RTE.MissingType, RTModule.FromDataAddress(x),
                  LOOPHOLE (p.uid, ADDRESS), NIL);
          END;
        END;
        xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
        IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END;
        INC (p, ADRSIZE (p^));  INC (i);
      END;
    END ScanTypecase;

Where to begin?  A loop with all kinds of runtime checks of properties
that are supposedly known at compile time? IsSubtype (itself a loop)
called from inside the loop?

I was able to cut out almost all of the typecase activity from my
program by using the following code in RTType.m3, which depends on
the ADDRESS x never changing (well more specifically never being
the same for two TYPECASE statements):

TYPE 
  TypeCaseResult = RECORD
    x : ADDRESS;
    tc : Typecode;
    arm : INTEGER;
  END;

CONST 
  TCCachePow = 6;
  TCCacheSize = Word.Shift(1,TCCachePow);
  TCMask = TCCacheSize-1;

VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult { 
  TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } , 
  ..
  };

(*
VAR tcScans := 0; tcHits := 0; (* instrumenting counters *)
*)

PROCEDURE ScanTypecase (ref: REFANY;
                        x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
  VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
  BEGIN
    tc := TYPECODE (ref);
    IF (ref = NIL) THEN RETURN 0; END;

    WITH hash = Word.And(Word.Times(tc,
                                    Word.RightShift(LOOPHOLE(x,Word.T),2)),
                         TCMask),
         entry = TCCache[hash]  DO
      (*INC(tcScans);*)
      IF entry.x = x AND entry.tc = tc THEN
        (*INC(tcHits);*)
        RETURN entry.arm
      END;
      
      p := x;  i := 0;
      LOOP
        IF (p.uid = 0) THEN entry.x := x; entry.tc := tc; entry.arm := i; RETURN i; END;
        IF (p.defn = NIL) THEN
          p.defn := FindType (p.uid);
          IF (p.defn = NIL) THEN
            Fail (RTE.MissingType, RTModule.FromDataAddress(x),
                  LOOPHOLE (p.uid, ADDRESS), NIL);
          END;
        END;
        xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
        IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x; entry.tc := tc; entry.arm := i; RETURN i; END;
        INC (p, ADRSIZE (p^));  INC (i);
      END;
    END;
  END ScanTypecase;

I'm guessing the speedup for TYPECASE itself is a factor of at least
ten.  But it's still a pretty nasty hack.  And there is still a lot
of IsSubtype activity from narrowing.

I suppose that the way the typecodes are generated in CM3 is
sufficiently different (meant to be extended at runtime?) from how
it was done in PM3 that one can't really go back to the old code.
Cardelli's idea of keeping an array of parents up to ROOT plus a
"depth" for each type might have merit, though.

To see if a is a subtype of b, something like:

b = a.ancestors[a.depth-b.depth-1] (* with appropriate range checks *)

Would this be easy to put in?  I'm not sure how one can be sure
that typecodes are done being generated?  There's something called
RTTypeSRC.FinishObjectTypes ..

And PM3 still generates code that's four times faster.

    Mika





More information about the M3devel mailing list