[M3devel] Performance issues with CM3
Tony Hosking
hosking at cs.purdue.edu
Sun Apr 26 10:54:57 CEST 2009
On 26 Apr 2009, at 15:22, Mika Nystrom wrote:
>
> Hello again,
>
> Now I've managed to get all the code up and running under CM3. I
> found and committed fixes to a bug in Wx and some code in one of
> the m3tk libraries that looked like it never was finished in the
> first place.
>
> As I mentioned earlier, I wasn't able to get user threads working
> in CM3 on FreeBSD 4.11. But with some help from Jay, I was able to
> get things working with libc_r. Performance, unfortunately,
> leaves something to be desired.
>
> For the first time I've been able to compare timings on identical
> hardware between the PM3 I was using and the CM3 that's out now.
>
> Note that optimization doesn't seem to work..? (Not even -O, much
> less -O3... the compiler segfaults.)
Are you passing -gstabs? It should not segfault on -O3 - this is a
regression if it does.
> Here's what I get, using no optimization either in PM3 or CM3. The
> test is my Scheme interpreter generating SQL and Modula-3 code
> (a bit like the Hibernate system you can get for Java):
>
>
> CPU seconds CM3 PM3
> First version 5.269 1.366
> Fewer NEWs 2.039 0.440 (code cleanup on my part)
> TYPECASE hack 1.770 (see below)
>
> Some "poor man's profiling" (i.e., ctrl-C'ing in m3gdb) suggests that
> most of the time is spent either in threading code (this could just
> be a lousy implementation in libc_r), the garbage collector, or in
> "ScanTypecase".
>
> The only one of these routines I am qualified to do anything about is
> ScanTypecase. I don't know why the Critical Mass people... <how
> colorful
> language can one use on m3devel?>.. all over this code. I assume it
> has
> something to do with Java.
>
> The PM3 code (from SRC?) has this wonderful, concise, efficient bit:
>
> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
> VAR t := Get (b);
> BEGIN
> IF (a >= RT0u.nTypes) THEN BadType (a) END;
> IF (a = 0) THEN RETURN TRUE END;
> RETURN (t.typecode <= a AND a <= t.lastSubTypeTC);
> END IsSubtype;
>
> replaced with the following absolute abomination in CM3:
>
> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
> VAR t: RT0.TypeDefn;
> BEGIN
> IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
> t := Get (a);
> IF (t = NIL) THEN RETURN FALSE; END;
> IF (t.typecode = b) THEN RETURN TRUE END;
> WHILE (t.kind = ORD (TK.Obj)) DO
> IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
> t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
> IF (t = NIL) THEN RETURN FALSE; END;
> IF (t.typecode = b) THEN RETURN TRUE; END;
> END;
> IF (t.traced # 0)
> THEN RETURN (b = RT0.RefanyTypecode);
> ELSE RETURN (b = RT0.AddressTypecode);
> END;
> END IsSubtype;
This is all to support dynamic loading of libraries.
> Furthermore, CM3 has a hook for "ScanTypecase" that's missing
> in PM3 (the older compiler actually generates code for this):
>
> PROCEDURE ScanTypecase (ref: REFANY;
> x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
> VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode;
> BEGIN
> IF (ref = NIL) THEN RETURN 0; END;
> tc := TYPECODE (ref);
> p := x; i := 0;
> LOOP
> IF (p.uid = 0) THEN RETURN i; END;
> IF (p.defn = NIL) THEN
> p.defn := FindType (p.uid);
> IF (p.defn = NIL) THEN
> Fail (RTE.MissingType, RTModule.FromDataAddress(x),
> LOOPHOLE (p.uid, ADDRESS), NIL);
> END;
> END;
> xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
> IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END;
> INC (p, ADRSIZE (p^)); INC (i);
> END;
> END ScanTypecase;
>
> Where to begin? A loop with all kinds of runtime checks of properties
> that are supposedly known at compile time? IsSubtype (itself a loop)
> called from inside the loop?
Not if dynamically loaded!
> I was able to cut out almost all of the typecase activity from my
> program by using the following code in RTType.m3, which depends on
> the ADDRESS x never changing (well more specifically never being
> the same for two TYPECASE statements):
>
> TYPE
> TypeCaseResult = RECORD
> x : ADDRESS;
> tc : Typecode;
> arm : INTEGER;
> END;
>
> CONST
> TCCachePow = 6;
> TCCacheSize = Word.Shift(1,TCCachePow);
> TCMask = TCCacheSize-1;
>
> VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult {
> TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } ,
> ..
> };
>
> (*
> VAR tcScans := 0; tcHits := 0; (* instrumenting counters *)
> *)
>
> PROCEDURE ScanTypecase (ref: REFANY;
> x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
> VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode;
> BEGIN
> tc := TYPECODE (ref);
> IF (ref = NIL) THEN RETURN 0; END;
>
> WITH hash = Word.And(Word.Times(tc,
>
> Word.RightShift(LOOPHOLE(x,Word.T),2)),
> TCMask),
> entry = TCCache[hash] DO
> (*INC(tcScans);*)
> IF entry.x = x AND entry.tc = tc THEN
> (*INC(tcHits);*)
> RETURN entry.arm
> END;
>
> p := x; i := 0;
> LOOP
> IF (p.uid = 0) THEN entry.x := x; entry.tc := tc;
> entry.arm := i; RETURN i; END;
> IF (p.defn = NIL) THEN
> p.defn := FindType (p.uid);
> IF (p.defn = NIL) THEN
> Fail (RTE.MissingType, RTModule.FromDataAddress(x),
> LOOPHOLE (p.uid, ADDRESS), NIL);
> END;
> END;
> xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
> IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x;
> entry.tc := tc; entry.arm := i; RETURN i; END;
> INC (p, ADRSIZE (p^)); INC (i);
> END;
> END;
> END ScanTypecase;
>
> I'm guessing the speedup for TYPECASE itself is a factor of at least
> ten. But it's still a pretty nasty hack. And there is still a lot
> of IsSubtype activity from narrowing.
>
> I suppose that the way the typecodes are generated in CM3 is
> sufficiently different (meant to be extended at runtime?) from how
> it was done in PM3 that one can't really go back to the old code.
> Cardelli's idea of keeping an array of parents up to ROOT plus a
> "depth" for each type might have merit, though.
>
> To see if a is a subtype of b, something like:
>
> b = a.ancestors[a.depth-b.depth-1] (* with appropriate range checks *)
>
> Would this be easy to put in? I'm not sure how one can be sure
> that typecodes are done being generated? There's something called
> RTTypeSRC.FinishObjectTypes ..
>
> And PM3 still generates code that's four times faster.
>
> Mika
>
More information about the M3devel
mailing list