[M3devel] optimization [ was Re: Performance issues with CM3 ]

Mika Nystrom mika at async.caltech.edu
Sun Apr 26 20:12:35 CEST 2009


Hi Tony,

I looked at this more closely, and I was wrong.  The compiler doesn't
actually segfault on -O.  I was using -gstabs+ but switched to -gstabs
after your email (doesn't seem to matter).

I get a ton of warnings at either optimization level, and there are
definitely bugs in the optimizer.  The resulting code is generally
not correct.  (By comparison, I had to turn off PM3's optimizer for
only one of the hundred or so packages I build.)  Things often fail
to compile, even at -O.

At -O3, I get one segfault:

new source -> compiling TextCommandQueueTbl.i3
cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack exception handling
new source -> compiling CommandLoop.m3
cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack exception handling
../src/CommandLoop.m3: In function 'CommandLoop__Run':
../src/CommandLoop.m3:279: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
  m3_backend => 4
m3cc (aka cm3cg) failed compiling: CommandLoop.mc
new source -> compiling CommandLoopDefaultCommand.m3
cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack exception handling  
new source -> compiling TextCommandTbl.m3

where:

    272 (*****************************************************************************
    273  *                                                                           *
    274  *                            Command Loop Main                              *
    275  *                                                                           *
    276  *****************************************************************************)
    277
    278
    279 PROCEDURE Run(self: T; source: Pathname.T := NIL; term: Term.T := NIL) =
    280   CONST
    281     Comment = SET OF CHAR{'%','#'};
    282   VAR
    283     completer := NEW(StdCompleter, loop:=self);
    284     line: TEXT;
    285   BEGIN
    286     IF term = NIL THEN
    287       self.term := Term.Default();
    288     ELSE
    289       self.term := term;
    290     END;
    291     LOOP
    292       TRY
    293         IF source # NIL THEN
    294           DoLoad(self.load, TextList.List2("",source), self.term);
    295           source := NIL;

...

Even at -O, things don't work right.  Here's a typical output:

new source -> compiling PassiveArb1.m3
"../src/PassiveArb1.m3", line 68: warning: not used (e)
"../src/PassiveArb1.m3", line 45: warning: not used (newCon)
2 warnings encountered
../src/PassiveArb1.m3: In function 'PassiveArb1__FApply':
../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_buyO' might be clobbered by 'longjmp' or 'vfork'
../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_selO' might be clobbered by 'longjmp' or 'vfork'
new source -> compiling PassiveArb2.i3
new source -> compiling ExecRecorder2.i3
new source -> compiling ArbPingPong.i3
new source -> compiling PassiveArb2.m3
../src/PassiveArb2.m3: In function 'PassiveArb2__Apply':
../src/PassiveArb2.m3:388: warning: variable 'M3_EWPD1K_delta' might be clobbered by 'longjmp' or 'vfork'
../src/PassiveArb2.m3:388: warning: variable 'M3_Cwb5VA_toExec' might be clobbered by 'longjmp' or 'vfork'
new source -> compiling Globals.i3
new source -> compiling ActiveArb1.i3
new source -> compiling ActiveArb1.m3
new source -> compiling ExecRecorder.i3
new source -> compiling ExecRecorder.m3
new source -> compiling ExecRec.i3
new source -> compiling ExecRecorder2.m3
new source -> compiling ExecRec.m3
new source -> compiling ArbPingPong.m3
new source -> compiling Main.m3
"../src/Main.m3", line 72: warning: potentially unhandled exception: OSError.E
"../src/Main.m3", line 30: warning: potentially unhandled exceptions: Rd.EndOfFile, Rd.Failure, Thread.Alerted
"../src/Main.m3", line 31: warning: potentially unhandled exceptions: Thread.Alerted, Wr.Failure
"../src/Main.m3", line 32: warning: potentially unhandled exceptions: Thread.Alerted, Wr.Failure
"../src/Main.m3", line 33: warning: potentially unhandled exceptions: Thread.Alerted, Wr.Failure  
"../src/Main.m3", line 118: warning: potentially unhandled exception: OSError.E
"../src/Main.m3", line 204: warning: potentially unhandled exception: OSError.E
7 warnings encountered
 -> linking testtrade
/usr/lib/libc.so: WARNING!  setkey(3) not present in the system!
/usr/lib/libc.so: warning: this program uses gets(), which is unsafe.
/usr/lib/libc.so: warning: mktemp() possibly used unsafely; consider using mkstemp()
/usr/lib/libc.so: WARNING!  des_setkey(3) not present in the system!
/usr/lib/libc.so: WARNING!  encrypt(3) not present in the system!
/usr/lib/libc.so: warning: tmpnam() possibly used unsafely; consider using mkstemp()
/usr/lib/libc.so: warning: this program uses f_prealloc(), which is not recommended.
/usr/lib/libc.so: WARNING!  des_cipher(3) not present in the system!
/usr/lib/libc.so: warning: tempnam() possibly used unsafely; consider using mkstemp()
Main.mo: In function `Main_M3':
../src/Main.m3:164: undefined reference to `Main__5__1__1__CanStart.198'
/home/mika/t-cm3/calarm/twslib/FreeBSD4/libtwslib.so: undefined reference to `TWSReader__RCApply__RD.332'
  m3_link => 1
linker failed linking: testtrade
Fatal Error: package build failed


      Mika



Tony Hosking writes:
>
>On 26 Apr 2009, at 15:22, Mika Nystrom wrote:
>
>>
>> Hello again,
>>
>> Now I've managed to get all the code up and running under CM3.  I
>> found and committed fixes to a bug in Wx and some code in one of
>> the m3tk libraries that looked like it never was finished in the
>> first place.
>>
>> As I mentioned earlier, I wasn't able to get user threads working
>> in CM3 on FreeBSD 4.11.  But with some help from Jay, I was able to
>> get things working with libc_r.  Performance, unfortunately,
>> leaves something to be desired.
>>
>> For the first time I've been able to compare timings on identical
>> hardware between the PM3 I was using and the CM3 that's out now.
>>
>> Note that optimization doesn't seem to work..?  (Not even -O, much
>> less -O3... the compiler segfaults.)
>
>Are you passing -gstabs?  It should not segfault on -O3 - this is a  
>regression if it does.
>
>> Here's what I get, using no optimization either in PM3 or CM3.  The
>> test is my Scheme interpreter generating SQL and Modula-3 code
>> (a bit like the Hibernate system you can get for Java):
>>
>>
>>    CPU seconds      CM3       PM3
>> First version      5.269     1.366
>> Fewer NEWs         2.039     0.440  (code cleanup on my part)
>> TYPECASE hack      1.770            (see below)
>>
>> Some "poor man's profiling" (i.e., ctrl-C'ing in m3gdb) suggests that
>> most of the time is spent either in threading code (this could just
>> be a lousy implementation in libc_r), the garbage collector, or in
>> "ScanTypecase".
>>
>> The only one of these routines I am qualified to do anything about is
>> ScanTypecase.  I don't know why the Critical Mass people... <how  
>> colorful
>> language can one use on m3devel?>.. all over this code.  I assume it  
>> has
>> something to do with Java.
>>
>> The PM3 code (from SRC?) has this wonderful, concise, efficient bit:
>>
>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>  VAR t := Get (b);
>>  BEGIN
>>    IF (a >= RT0u.nTypes) THEN BadType (a) END;
>>    IF (a = 0)            THEN RETURN TRUE END;
>>    RETURN (t.typecode <= a AND a <= t.lastSubTypeTC);
>>  END IsSubtype;
>>
>> replaced with the following absolute abomination in CM3:
>>
>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>  VAR t: RT0.TypeDefn;
>>  BEGIN
>>    IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
>>    t := Get (a);
>>    IF (t = NIL) THEN RETURN FALSE; END;
>>    IF (t.typecode = b) THEN RETURN TRUE END;
>>    WHILE (t.kind = ORD (TK.Obj)) DO
>>      IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
>>      t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
>>      IF (t = NIL) THEN RETURN FALSE; END;
>>      IF (t.typecode = b) THEN RETURN TRUE; END;
>>    END;
>>    IF (t.traced # 0)
>>      THEN RETURN (b = RT0.RefanyTypecode);
>>      ELSE RETURN (b = RT0.AddressTypecode);
>>    END;
>>  END IsSubtype;
>
>This is all to support dynamic loading of libraries.
>
>> Furthermore, CM3 has a hook for "ScanTypecase" that's missing
>> in PM3 (the older compiler actually generates code for this):
>>
>>  PROCEDURE ScanTypecase (ref: REFANY;
>>                          x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>>    VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
>>    BEGIN
>>      IF (ref = NIL) THEN RETURN 0; END;
>>      tc := TYPECODE (ref);
>>      p := x;  i := 0;
>>      LOOP
>>        IF (p.uid = 0) THEN RETURN i; END;
>>        IF (p.defn = NIL) THEN
>>          p.defn := FindType (p.uid);
>>          IF (p.defn = NIL) THEN
>>            Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>                  LOOPHOLE (p.uid, ADDRESS), NIL);
>>          END;
>>        END;
>>        xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>        IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END;
>>        INC (p, ADRSIZE (p^));  INC (i);
>>      END;
>>    END ScanTypecase;
>>
>> Where to begin?  A loop with all kinds of runtime checks of properties
>> that are supposedly known at compile time? IsSubtype (itself a loop)
>> called from inside the loop?
>
>Not if dynamically loaded!
>
>> I was able to cut out almost all of the typecase activity from my
>> program by using the following code in RTType.m3, which depends on
>> the ADDRESS x never changing (well more specifically never being
>> the same for two TYPECASE statements):
>>
>> TYPE
>>  TypeCaseResult = RECORD
>>    x : ADDRESS;
>>    tc : Typecode;
>>    arm : INTEGER;
>>  END;
>>
>> CONST
>>  TCCachePow = 6;
>>  TCCacheSize = Word.Shift(1,TCCachePow);
>>  TCMask = TCCacheSize-1;
>>
>> VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult {
>>  TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } ,
>>  ..
>>  };
>>
>> (*
>> VAR tcScans := 0; tcHits := 0; (* instrumenting counters *)
>> *)
>>
>> PROCEDURE ScanTypecase (ref: REFANY;
>>                        x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>>  VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
>>  BEGIN
>>    tc := TYPECODE (ref);
>>    IF (ref = NIL) THEN RETURN 0; END;
>>
>>    WITH hash = Word.And(Word.Times(tc,
>>                                     
>> Word.RightShift(LOOPHOLE(x,Word.T),2)),
>>                         TCMask),
>>         entry = TCCache[hash]  DO
>>      (*INC(tcScans);*)
>>      IF entry.x = x AND entry.tc = tc THEN
>>        (*INC(tcHits);*)
>>        RETURN entry.arm
>>      END;
>>
>>      p := x;  i := 0;
>>      LOOP
>>        IF (p.uid = 0) THEN entry.x := x; entry.tc := tc;  
>> entry.arm := i; RETURN i; END;
>>        IF (p.defn = NIL) THEN
>>          p.defn := FindType (p.uid);
>>          IF (p.defn = NIL) THEN
>>            Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>                  LOOPHOLE (p.uid, ADDRESS), NIL);
>>          END;
>>        END;
>>        xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>        IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x;  
>> entry.tc := tc; entry.arm := i; RETURN i; END;
>>        INC (p, ADRSIZE (p^));  INC (i);
>>      END;
>>    END;
>>  END ScanTypecase;
>>
>> I'm guessing the speedup for TYPECASE itself is a factor of at least
>> ten.  But it's still a pretty nasty hack.  And there is still a lot
>> of IsSubtype activity from narrowing.
>>
>> I suppose that the way the typecodes are generated in CM3 is
>> sufficiently different (meant to be extended at runtime?) from how
>> it was done in PM3 that one can't really go back to the old code.
>> Cardelli's idea of keeping an array of parents up to ROOT plus a
>> "depth" for each type might have merit, though.
>>
>> To see if a is a subtype of b, something like:
>>
>> b = a.ancestors[a.depth-b.depth-1] (* with appropriate range checks *)
>>
>> Would this be easy to put in?  I'm not sure how one can be sure
>> that typecodes are done being generated?  There's something called
>> RTTypeSRC.FinishObjectTypes ..
>>
>> And PM3 still generates code that's four times faster.
>>
>>    Mika
>>



More information about the M3devel mailing list