[M3devel] optimization [ was Re: Performance issues with CM3 ]

Tony Hosking hosking at cs.purdue.edu
Mon Apr 27 02:04:26 CEST 2009


I am very disturbed about this since it suggests a regression.  I had  
spent a huge amount of time a year or so back making sure the backend  
would play properly with gcc -O3, but it seems we are now back in a  
bad place.  I'm not sure what changes have occurred to the backend  
since then, but they would be the prime candidates.  Unfortunately, I  
don't have a lot of time right now to try to debug these -O3 problems,  
but I do want to fix them since they will eventually impinge on my own  
work.  It would be really good to get our regressions framework back  
up and running and to put -O3 in there as the default build option --  
it seems there have been ongoing Tinderbox problems for a while now,  
since my SOLgnu regression runs appear to have stopped completely.   
I'll need to check the logs.

On 27 Apr 2009, at 04:12, Mika Nystrom wrote:

> Hi Tony,
>
> I looked at this more closely, and I was wrong.  The compiler doesn't
> actually segfault on -O.  I was using -gstabs+ but switched to -gstabs
> after your email (doesn't seem to matter).
>
> I get a ton of warnings at either optimization level, and there are
> definitely bugs in the optimizer.  The resulting code is generally
> not correct.  (By comparison, I had to turn off PM3's optimizer for
> only one of the hundred or so packages I build.)  Things often fail
> to compile, even at -O.
>
> At -O3, I get one segfault:
>
> new source -> compiling TextCommandQueueTbl.i3
> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack  
> exception handling
> new source -> compiling CommandLoop.m3
> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack  
> exception handling
> ../src/CommandLoop.m3: In function 'CommandLoop__Run':
> ../src/CommandLoop.m3:279: internal compiler error: Segmentation fault
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://gcc.gnu.org/bugs.html> for instructions.
>  m3_backend => 4
> m3cc (aka cm3cg) failed compiling: CommandLoop.mc
> new source -> compiling CommandLoopDefaultCommand.m3
> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack  
> exception handling
> new source -> compiling TextCommandTbl.m3
>
> where:
>
>    272  
> (*****************************************************************************
>    273   
> *                                                                           *
>    274  *                            Command Loop  
> Main                              *
>    275   
> *                                                                           *
>    276   
> *****************************************************************************)
>    277
>    278
>    279 PROCEDURE Run(self: T; source: Pathname.T := NIL; term:  
> Term.T := NIL) =
>    280   CONST
>    281     Comment = SET OF CHAR{'%','#'};
>    282   VAR
>    283     completer := NEW(StdCompleter, loop:=self);
>    284     line: TEXT;
>    285   BEGIN
>    286     IF term = NIL THEN
>    287       self.term := Term.Default();
>    288     ELSE
>    289       self.term := term;
>    290     END;
>    291     LOOP
>    292       TRY
>    293         IF source # NIL THEN
>    294           DoLoad(self.load, TextList.List2("",source),  
> self.term);
>    295           source := NIL;
>
> ...
>
> Even at -O, things don't work right.  Here's a typical output:
>
> new source -> compiling PassiveArb1.m3
> "../src/PassiveArb1.m3", line 68: warning: not used (e)
> "../src/PassiveArb1.m3", line 45: warning: not used (newCon)
> 2 warnings encountered
> ../src/PassiveArb1.m3: In function 'PassiveArb1__FApply':
> ../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_buyO' might  
> be clobbered by 'longjmp' or 'vfork'
> ../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_selO' might  
> be clobbered by 'longjmp' or 'vfork'
> new source -> compiling PassiveArb2.i3
> new source -> compiling ExecRecorder2.i3
> new source -> compiling ArbPingPong.i3
> new source -> compiling PassiveArb2.m3
> ../src/PassiveArb2.m3: In function 'PassiveArb2__Apply':
> ../src/PassiveArb2.m3:388: warning: variable 'M3_EWPD1K_delta' might  
> be clobbered by 'longjmp' or 'vfork'
> ../src/PassiveArb2.m3:388: warning: variable 'M3_Cwb5VA_toExec'  
> might be clobbered by 'longjmp' or 'vfork'
> new source -> compiling Globals.i3
> new source -> compiling ActiveArb1.i3
> new source -> compiling ActiveArb1.m3
> new source -> compiling ExecRecorder.i3
> new source -> compiling ExecRecorder.m3
> new source -> compiling ExecRec.i3
> new source -> compiling ExecRecorder2.m3
> new source -> compiling ExecRec.m3
> new source -> compiling ArbPingPong.m3
> new source -> compiling Main.m3
> "../src/Main.m3", line 72: warning: potentially unhandled exception:  
> OSError.E
> "../src/Main.m3", line 30: warning: potentially unhandled  
> exceptions: Rd.EndOfFile, Rd.Failure, Thread.Alerted
> "../src/Main.m3", line 31: warning: potentially unhandled  
> exceptions: Thread.Alerted, Wr.Failure
> "../src/Main.m3", line 32: warning: potentially unhandled  
> exceptions: Thread.Alerted, Wr.Failure
> "../src/Main.m3", line 33: warning: potentially unhandled  
> exceptions: Thread.Alerted, Wr.Failure
> "../src/Main.m3", line 118: warning: potentially unhandled  
> exception: OSError.E
> "../src/Main.m3", line 204: warning: potentially unhandled  
> exception: OSError.E
> 7 warnings encountered
> -> linking testtrade
> /usr/lib/libc.so: WARNING!  setkey(3) not present in the system!
> /usr/lib/libc.so: warning: this program uses gets(), which is unsafe.
> /usr/lib/libc.so: warning: mktemp() possibly used unsafely; consider  
> using mkstemp()
> /usr/lib/libc.so: WARNING!  des_setkey(3) not present in the system!
> /usr/lib/libc.so: WARNING!  encrypt(3) not present in the system!
> /usr/lib/libc.so: warning: tmpnam() possibly used unsafely; consider  
> using mkstemp()
> /usr/lib/libc.so: warning: this program uses f_prealloc(), which is  
> not recommended.
> /usr/lib/libc.so: WARNING!  des_cipher(3) not present in the system!
> /usr/lib/libc.so: warning: tempnam() possibly used unsafely;  
> consider using mkstemp()
> Main.mo: In function `Main_M3':
> ../src/Main.m3:164: undefined reference to `Main__5__1__1__CanStart. 
> 198'
> /home/mika/t-cm3/calarm/twslib/FreeBSD4/libtwslib.so: undefined  
> reference to `TWSReader__RCApply__RD.332'
>  m3_link => 1
> linker failed linking: testtrade
> Fatal Error: package build failed
>
>
>      Mika
>
>
>
> Tony Hosking writes:
>>
>> On 26 Apr 2009, at 15:22, Mika Nystrom wrote:
>>
>>>
>>> Hello again,
>>>
>>> Now I've managed to get all the code up and running under CM3.  I
>>> found and committed fixes to a bug in Wx and some code in one of
>>> the m3tk libraries that looked like it never was finished in the
>>> first place.
>>>
>>> As I mentioned earlier, I wasn't able to get user threads working
>>> in CM3 on FreeBSD 4.11.  But with some help from Jay, I was able to
>>> get things working with libc_r.  Performance, unfortunately,
>>> leaves something to be desired.
>>>
>>> For the first time I've been able to compare timings on identical
>>> hardware between the PM3 I was using and the CM3 that's out now.
>>>
>>> Note that optimization doesn't seem to work..?  (Not even -O, much
>>> less -O3... the compiler segfaults.)
>>
>> Are you passing -gstabs?  It should not segfault on -O3 - this is a
>> regression if it does.
>>
>>> Here's what I get, using no optimization either in PM3 or CM3.  The
>>> test is my Scheme interpreter generating SQL and Modula-3 code
>>> (a bit like the Hibernate system you can get for Java):
>>>
>>>
>>>   CPU seconds      CM3       PM3
>>> First version      5.269     1.366
>>> Fewer NEWs         2.039     0.440  (code cleanup on my part)
>>> TYPECASE hack      1.770            (see below)
>>>
>>> Some "poor man's profiling" (i.e., ctrl-C'ing in m3gdb) suggests  
>>> that
>>> most of the time is spent either in threading code (this could just
>>> be a lousy implementation in libc_r), the garbage collector, or in
>>> "ScanTypecase".
>>>
>>> The only one of these routines I am qualified to do anything about  
>>> is
>>> ScanTypecase.  I don't know why the Critical Mass people... <how
>>> colorful
>>> language can one use on m3devel?>.. all over this code.  I assume it
>>> has
>>> something to do with Java.
>>>
>>> The PM3 code (from SRC?) has this wonderful, concise, efficient bit:
>>>
>>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>> VAR t := Get (b);
>>> BEGIN
>>>   IF (a >= RT0u.nTypes) THEN BadType (a) END;
>>>   IF (a = 0)            THEN RETURN TRUE END;
>>>   RETURN (t.typecode <= a AND a <= t.lastSubTypeTC);
>>> END IsSubtype;
>>>
>>> replaced with the following absolute abomination in CM3:
>>>
>>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>> VAR t: RT0.TypeDefn;
>>> BEGIN
>>>   IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
>>>   t := Get (a);
>>>   IF (t = NIL) THEN RETURN FALSE; END;
>>>   IF (t.typecode = b) THEN RETURN TRUE END;
>>>   WHILE (t.kind = ORD (TK.Obj)) DO
>>>     IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
>>>     t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
>>>     IF (t = NIL) THEN RETURN FALSE; END;
>>>     IF (t.typecode = b) THEN RETURN TRUE; END;
>>>   END;
>>>   IF (t.traced # 0)
>>>     THEN RETURN (b = RT0.RefanyTypecode);
>>>     ELSE RETURN (b = RT0.AddressTypecode);
>>>   END;
>>> END IsSubtype;
>>
>> This is all to support dynamic loading of libraries.
>>
>>> Furthermore, CM3 has a hook for "ScanTypecase" that's missing
>>> in PM3 (the older compiler actually generates code for this):
>>>
>>> PROCEDURE ScanTypecase (ref: REFANY;
>>>                         x: ADDRESS(*ARRAY [0..] OF Cell*)):  
>>> INTEGER =
>>>   VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
>>>   BEGIN
>>>     IF (ref = NIL) THEN RETURN 0; END;
>>>     tc := TYPECODE (ref);
>>>     p := x;  i := 0;
>>>     LOOP
>>>       IF (p.uid = 0) THEN RETURN i; END;
>>>       IF (p.defn = NIL) THEN
>>>         p.defn := FindType (p.uid);
>>>         IF (p.defn = NIL) THEN
>>>           Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>>                 LOOPHOLE (p.uid, ADDRESS), NIL);
>>>         END;
>>>       END;
>>>       xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>>       IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END;
>>>       INC (p, ADRSIZE (p^));  INC (i);
>>>     END;
>>>   END ScanTypecase;
>>>
>>> Where to begin?  A loop with all kinds of runtime checks of  
>>> properties
>>> that are supposedly known at compile time? IsSubtype (itself a loop)
>>> called from inside the loop?
>>
>> Not if dynamically loaded!
>>
>>> I was able to cut out almost all of the typecase activity from my
>>> program by using the following code in RTType.m3, which depends on
>>> the ADDRESS x never changing (well more specifically never being
>>> the same for two TYPECASE statements):
>>>
>>> TYPE
>>> TypeCaseResult = RECORD
>>>   x : ADDRESS;
>>>   tc : Typecode;
>>>   arm : INTEGER;
>>> END;
>>>
>>> CONST
>>> TCCachePow = 6;
>>> TCCacheSize = Word.Shift(1,TCCachePow);
>>> TCMask = TCCacheSize-1;
>>>
>>> VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult {
>>> TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } ,
>>> ..
>>> };
>>>
>>> (*
>>> VAR tcScans := 0; tcHits := 0; (* instrumenting counters *)
>>> *)
>>>
>>> PROCEDURE ScanTypecase (ref: REFANY;
>>>                       x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>>> VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
>>> BEGIN
>>>   tc := TYPECODE (ref);
>>>   IF (ref = NIL) THEN RETURN 0; END;
>>>
>>>   WITH hash = Word.And(Word.Times(tc,
>>>
>>> Word.RightShift(LOOPHOLE(x,Word.T),2)),
>>>                        TCMask),
>>>        entry = TCCache[hash]  DO
>>>     (*INC(tcScans);*)
>>>     IF entry.x = x AND entry.tc = tc THEN
>>>       (*INC(tcHits);*)
>>>       RETURN entry.arm
>>>     END;
>>>
>>>     p := x;  i := 0;
>>>     LOOP
>>>       IF (p.uid = 0) THEN entry.x := x; entry.tc := tc;
>>> entry.arm := i; RETURN i; END;
>>>       IF (p.defn = NIL) THEN
>>>         p.defn := FindType (p.uid);
>>>         IF (p.defn = NIL) THEN
>>>           Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>>                 LOOPHOLE (p.uid, ADDRESS), NIL);
>>>         END;
>>>       END;
>>>       xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>>       IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x;
>>> entry.tc := tc; entry.arm := i; RETURN i; END;
>>>       INC (p, ADRSIZE (p^));  INC (i);
>>>     END;
>>>   END;
>>> END ScanTypecase;
>>>
>>> I'm guessing the speedup for TYPECASE itself is a factor of at least
>>> ten.  But it's still a pretty nasty hack.  And there is still a lot
>>> of IsSubtype activity from narrowing.
>>>
>>> I suppose that the way the typecodes are generated in CM3 is
>>> sufficiently different (meant to be extended at runtime?) from how
>>> it was done in PM3 that one can't really go back to the old code.
>>> Cardelli's idea of keeping an array of parents up to ROOT plus a
>>> "depth" for each type might have merit, though.
>>>
>>> To see if a is a subtype of b, something like:
>>>
>>> b = a.ancestors[a.depth-b.depth-1] (* with appropriate range  
>>> checks *)
>>>
>>> Would this be easy to put in?  I'm not sure how one can be sure
>>> that typecodes are done being generated?  There's something called
>>> RTTypeSRC.FinishObjectTypes ..
>>>
>>> And PM3 still generates code that's four times faster.
>>>
>>>   Mika
>>>





More information about the M3devel mailing list