[M3devel] optimization [ was Re: Performance issues with CM3 ]

Tony Hosking hosking at cs.purdue.edu
Mon Apr 27 02:37:09 CEST 2009


On 27 Apr 2009, at 10:31, Mika Nystrom wrote:

>
> Tony, it might not be as bad as all that.  Well, on one level it
> is.  Things are certainly not perfect.
>
> But I am able to operate with m3core built with -O (as well as with
> -O3).  Lots of scary-looking compiler warnings, but things do work.
> There are just a few programs that won't build.  I didn't try "all"
> in the CM3 dist---only my own programs and m3core (since m3core has
> the biggest performance impact).

This is weird, since I assume you are targetting x86 which is the same  
(except I386_DARWIN in my case) as I have used -O3 for without  
problems previously.  I shall have to rebuild everything in cm3 with - 
O3 to see if I can track down the problems.  I can't remember the last  
time I checked that it all worked, but the CVS logs will probably  
reveal more (cf my log messages for parse.c in the gcc-based backend).

> With lots of tweaks and adjustments, I now see my code running about
> 100% slower under CM3 than the same code does under PM3 (on the
> same machine).  This is including my typecase hacks, as described
> earlier today.  I'm guessing most of it is the FreeBSD pthreads
> implementation in libc_r + the calls to PushEFrame.

Yikes!  How much of this is module initialization (startup) time?

>
>
>     Mika
>
> Tony Hosking writes:
>> I am very disturbed about this since it suggests a regression.  I had
>> spent a huge amount of time a year or so back making sure the backend
>> would play properly with gcc -O3, but it seems we are now back in a
>> bad place.  I'm not sure what changes have occurred to the backend
>> since then, but they would be the prime candidates.  Unfortunately, I
>> don't have a lot of time right now to try to debug these -O3  
>> problems,
>> but I do want to fix them since they will eventually impinge on my  
>> own
>> work.  It would be really good to get our regressions framework back
>> up and running and to put -O3 in there as the default build option --
>> it seems there have been ongoing Tinderbox problems for a while now,
>> since my SOLgnu regression runs appear to have stopped completely.
>> I'll need to check the logs.
>>
>> On 27 Apr 2009, at 04:12, Mika Nystrom wrote:
>>
>>> Hi Tony,
>>>
>>> I looked at this more closely, and I was wrong.  The compiler  
>>> doesn't
>>> actually segfault on -O.  I was using -gstabs+ but switched to - 
>>> gstabs
>>> after your email (doesn't seem to matter).
>>>
>>> I get a ton of warnings at either optimization level, and there are
>>> definitely bugs in the optimizer.  The resulting code is generally
>>> not correct.  (By comparison, I had to turn off PM3's optimizer for
>>> only one of the hundred or so packages I build.)  Things often fail
>>> to compile, even at -O.
>>>
>>> At -O3, I get one segfault:
>>>
>>> new source -> compiling TextCommandQueueTbl.i3
>>> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
>>> exception handling
>>> new source -> compiling CommandLoop.m3
>>> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
>>> exception handling
>>> ../src/CommandLoop.m3: In function 'CommandLoop__Run':
>>> ../src/CommandLoop.m3:279: internal compiler error: Segmentation  
>>> fault
>>> Please submit a full bug report,
>>> with preprocessed source if appropriate.
>>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>> m3_backend => 4
>>> m3cc (aka cm3cg) failed compiling: CommandLoop.mc
>>> new source -> compiling CommandLoopDefaultCommand.m3
>>> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
>>> exception handling
>>> new source -> compiling TextCommandTbl.m3
>>>
>>> where:
>>>
>>>   272
>>> (*****************************************************************************
>>>   273
>>> *                                                                           *
>>>   274  *                            Command Loop
>>> Main                              *
>>>   275
>>> *                                                                           *
>>>   276
>>> *****************************************************************************)
>>>   277
>>>   278
>>>   279 PROCEDURE Run(self: T; source: Pathname.T := NIL; term:
>>> Term.T := NIL) =
>>>   280   CONST
>>>   281     Comment = SET OF CHAR{'%','#'};
>>>   282   VAR
>>>   283     completer := NEW(StdCompleter, loop:=self);
>>>   284     line: TEXT;
>>>   285   BEGIN
>>>   286     IF term = NIL THEN
>>>   287       self.term := Term.Default();
>>>   288     ELSE
>>>   289       self.term := term;
>>>   290     END;
>>>   291     LOOP
>>>   292       TRY
>>>   293         IF source # NIL THEN
>>>   294           DoLoad(self.load, TextList.List2("",source),
>>> self.term);
>>>   295           source := NIL;
>>>
>>> ...
>>>
>>> Even at -O, things don't work right.  Here's a typical output:
>>>
>>> new source -> compiling PassiveArb1.m3
>>> "../src/PassiveArb1.m3", line 68: warning: not used (e)
>>> "../src/PassiveArb1.m3", line 45: warning: not used (newCon)
>>> 2 warnings encountered
>>> ../src/PassiveArb1.m3: In function 'PassiveArb1__FApply':
>>> ../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_buyO' might
>>> be clobbered by 'longjmp' or 'vfork'
>>> ../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_selO' might
>>> be clobbered by 'longjmp' or 'vfork'
>>> new source -> compiling PassiveArb2.i3
>>> new source -> compiling ExecRecorder2.i3
>>> new source -> compiling ArbPingPong.i3
>>> new source -> compiling PassiveArb2.m3
>>> ../src/PassiveArb2.m3: In function 'PassiveArb2__Apply':
>>> ../src/PassiveArb2.m3:388: warning: variable 'M3_EWPD1K_delta' might
>>> be clobbered by 'longjmp' or 'vfork'
>>> ../src/PassiveArb2.m3:388: warning: variable 'M3_Cwb5VA_toExec'
>>> might be clobbered by 'longjmp' or 'vfork'
>>> new source -> compiling Globals.i3
>>> new source -> compiling ActiveArb1.i3
>>> new source -> compiling ActiveArb1.m3
>>> new source -> compiling ExecRecorder.i3
>>> new source -> compiling ExecRecorder.m3
>>> new source -> compiling ExecRec.i3
>>> new source -> compiling ExecRecorder2.m3
>>> new source -> compiling ExecRec.m3
>>> new source -> compiling ArbPingPong.m3
>>> new source -> compiling Main.m3
>>> "../src/Main.m3", line 72: warning: potentially unhandled exception:
>>> OSError.E
>>> "../src/Main.m3", line 30: warning: potentially unhandled
>>> exceptions: Rd.EndOfFile, Rd.Failure, Thread.Alerted
>>> "../src/Main.m3", line 31: warning: potentially unhandled
>>> exceptions: Thread.Alerted, Wr.Failure
>>> "../src/Main.m3", line 32: warning: potentially unhandled
>>> exceptions: Thread.Alerted, Wr.Failure
>>> "../src/Main.m3", line 33: warning: potentially unhandled
>>> exceptions: Thread.Alerted, Wr.Failure
>>> "../src/Main.m3", line 118: warning: potentially unhandled
>>> exception: OSError.E
>>> "../src/Main.m3", line 204: warning: potentially unhandled
>>> exception: OSError.E
>>> 7 warnings encountered
>>> -> linking testtrade
>>> /usr/lib/libc.so: WARNING!  setkey(3) not present in the system!
>>> /usr/lib/libc.so: warning: this program uses gets(), which is  
>>> unsafe.
>>> /usr/lib/libc.so: warning: mktemp() possibly used unsafely; consider
>>> using mkstemp()
>>> /usr/lib/libc.so: WARNING!  des_setkey(3) not present in the system!
>>> /usr/lib/libc.so: WARNING!  encrypt(3) not present in the system!
>>> /usr/lib/libc.so: warning: tmpnam() possibly used unsafely; consider
>>> using mkstemp()
>>> /usr/lib/libc.so: warning: this program uses f_prealloc(), which is
>>> not recommended.
>>> /usr/lib/libc.so: WARNING!  des_cipher(3) not present in the system!
>>> /usr/lib/libc.so: warning: tempnam() possibly used unsafely;
>>> consider using mkstemp()
>>> Main.mo: In function `Main_M3':
>>> ../src/Main.m3:164: undefined reference to `Main__5__1__1__CanStart.
>>> 198'
>>> /home/mika/t-cm3/calarm/twslib/FreeBSD4/libtwslib.so: undefined
>>> reference to `TWSReader__RCApply__RD.332'
>>> m3_link => 1
>>> linker failed linking: testtrade
>>> Fatal Error: package build failed
>>>
>>>
>>>     Mika
>>>
>>>
>>>
>>> Tony Hosking writes:
>>>>
>>>> On 26 Apr 2009, at 15:22, Mika Nystrom wrote:
>>>>
>>>>>
>>>>> Hello again,
>>>>>
>>>>> Now I've managed to get all the code up and running under CM3.  I
>>>>> found and committed fixes to a bug in Wx and some code in one of
>>>>> the m3tk libraries that looked like it never was finished in the
>>>>> first place.
>>>>>
>>>>> As I mentioned earlier, I wasn't able to get user threads working
>>>>> in CM3 on FreeBSD 4.11.  But with some help from Jay, I was able  
>>>>> to
>>>>> get things working with libc_r.  Performance, unfortunately,
>>>>> leaves something to be desired.
>>>>>
>>>>> For the first time I've been able to compare timings on identical
>>>>> hardware between the PM3 I was using and the CM3 that's out now.
>>>>>
>>>>> Note that optimization doesn't seem to work..?  (Not even -O, much
>>>>> less -O3... the compiler segfaults.)
>>>>
>>>> Are you passing -gstabs?  It should not segfault on -O3 - this is a
>>>> regression if it does.
>>>>
>>>>> Here's what I get, using no optimization either in PM3 or CM3.   
>>>>> The
>>>>> test is my Scheme interpreter generating SQL and Modula-3 code
>>>>> (a bit like the Hibernate system you can get for Java):
>>>>>
>>>>>
>>>>>  CPU seconds      CM3       PM3
>>>>> First version      5.269     1.366
>>>>> Fewer NEWs         2.039     0.440  (code cleanup on my part)
>>>>> TYPECASE hack      1.770            (see below)
>>>>>
>>>>> Some "poor man's profiling" (i.e., ctrl-C'ing in m3gdb) suggests
>>>>> that
>>>>> most of the time is spent either in threading code (this could  
>>>>> just
>>>>> be a lousy implementation in libc_r), the garbage collector, or in
>>>>> "ScanTypecase".
>>>>>
>>>>> The only one of these routines I am qualified to do anything about
>>>>> is
>>>>> ScanTypecase.  I don't know why the Critical Mass people... <how
>>>>> colorful
>>>>> language can one use on m3devel?>.. all over this code.  I  
>>>>> assume it
>>>>> has
>>>>> something to do with Java.
>>>>>
>>>>> The PM3 code (from SRC?) has this wonderful, concise, efficient  
>>>>> bit:
>>>>>
>>>>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>>>> VAR t := Get (b);
>>>>> BEGIN
>>>>>  IF (a >= RT0u.nTypes) THEN BadType (a) END;
>>>>>  IF (a = 0)            THEN RETURN TRUE END;
>>>>>  RETURN (t.typecode <= a AND a <= t.lastSubTypeTC);
>>>>> END IsSubtype;
>>>>>
>>>>> replaced with the following absolute abomination in CM3:
>>>>>
>>>>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>>>> VAR t: RT0.TypeDefn;
>>>>> BEGIN
>>>>>  IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
>>>>>  t := Get (a);
>>>>>  IF (t = NIL) THEN RETURN FALSE; END;
>>>>>  IF (t.typecode = b) THEN RETURN TRUE END;
>>>>>  WHILE (t.kind = ORD (TK.Obj)) DO
>>>>>    IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
>>>>>    t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
>>>>>    IF (t = NIL) THEN RETURN FALSE; END;
>>>>>    IF (t.typecode = b) THEN RETURN TRUE; END;
>>>>>  END;
>>>>>  IF (t.traced # 0)
>>>>>    THEN RETURN (b = RT0.RefanyTypecode);
>>>>>    ELSE RETURN (b = RT0.AddressTypecode);
>>>>>  END;
>>>>> END IsSubtype;
>>>>
>>>> This is all to support dynamic loading of libraries.
>>>>
>>>>> Furthermore, CM3 has a hook for "ScanTypecase" that's missing
>>>>> in PM3 (the older compiler actually generates code for this):
>>>>>
>>>>> PROCEDURE ScanTypecase (ref: REFANY;
>>>>>                        x: ADDRESS(*ARRAY [0..] OF Cell*)):
>>>>> INTEGER =
>>>>>  VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
>>>>>  BEGIN
>>>>>    IF (ref = NIL) THEN RETURN 0; END;
>>>>>    tc := TYPECODE (ref);
>>>>>    p := x;  i := 0;
>>>>>    LOOP
>>>>>      IF (p.uid = 0) THEN RETURN i; END;
>>>>>      IF (p.defn = NIL) THEN
>>>>>        p.defn := FindType (p.uid);
>>>>>        IF (p.defn = NIL) THEN
>>>>>          Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>>>>                LOOPHOLE (p.uid, ADDRESS), NIL);
>>>>>        END;
>>>>>      END;
>>>>>      xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>>>>      IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END;
>>>>>      INC (p, ADRSIZE (p^));  INC (i);
>>>>>    END;
>>>>>  END ScanTypecase;
>>>>>
>>>>> Where to begin?  A loop with all kinds of runtime checks of
>>>>> properties
>>>>> that are supposedly known at compile time? IsSubtype (itself a  
>>>>> loop)
>>>>> called from inside the loop?
>>>>
>>>> Not if dynamically loaded!
>>>>
>>>>> I was able to cut out almost all of the typecase activity from my
>>>>> program by using the following code in RTType.m3, which depends on
>>>>> the ADDRESS x never changing (well more specifically never being
>>>>> the same for two TYPECASE statements):
>>>>>
>>>>> TYPE
>>>>> TypeCaseResult = RECORD
>>>>>  x : ADDRESS;
>>>>>  tc : Typecode;
>>>>>  arm : INTEGER;
>>>>> END;
>>>>>
>>>>> CONST
>>>>> TCCachePow = 6;
>>>>> TCCacheSize = Word.Shift(1,TCCachePow);
>>>>> TCMask = TCCacheSize-1;
>>>>>
>>>>> VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult {
>>>>> TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } ,
>>>>> ..
>>>>> };
>>>>>
>>>>> (*
>>>>> VAR tcScans := 0; tcHits := 0; (* instrumenting counters *)
>>>>> *)
>>>>>
>>>>> PROCEDURE ScanTypecase (ref: REFANY;
>>>>>                      x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>>>>> VAR p: UNTRACED REF TypecaseCell;  i: INTEGER;  tc, xc: Typecode;
>>>>> BEGIN
>>>>>  tc := TYPECODE (ref);
>>>>>  IF (ref = NIL) THEN RETURN 0; END;
>>>>>
>>>>>  WITH hash = Word.And(Word.Times(tc,
>>>>>
>>>>> Word.RightShift(LOOPHOLE(x,Word.T),2)),
>>>>>                       TCMask),
>>>>>       entry = TCCache[hash]  DO
>>>>>    (*INC(tcScans);*)
>>>>>    IF entry.x = x AND entry.tc = tc THEN
>>>>>      (*INC(tcHits);*)
>>>>>      RETURN entry.arm
>>>>>    END;
>>>>>
>>>>>    p := x;  i := 0;
>>>>>    LOOP
>>>>>      IF (p.uid = 0) THEN entry.x := x; entry.tc := tc;
>>>>> entry.arm := i; RETURN i; END;
>>>>>      IF (p.defn = NIL) THEN
>>>>>        p.defn := FindType (p.uid);
>>>>>        IF (p.defn = NIL) THEN
>>>>>          Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>>>>                LOOPHOLE (p.uid, ADDRESS), NIL);
>>>>>        END;
>>>>>      END;
>>>>>      xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>>>>      IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x;
>>>>> entry.tc := tc; entry.arm := i; RETURN i; END;
>>>>>      INC (p, ADRSIZE (p^));  INC (i);
>>>>>    END;
>>>>>  END;
>>>>> END ScanTypecase;
>>>>>
>>>>> I'm guessing the speedup for TYPECASE itself is a factor of at  
>>>>> least
>>>>> ten.  But it's still a pretty nasty hack.  And there is still a  
>>>>> lot
>>>>> of IsSubtype activity from narrowing.
>>>>>
>>>>> I suppose that the way the typecodes are generated in CM3 is
>>>>> sufficiently different (meant to be extended at runtime?) from how
>>>>> it was done in PM3 that one can't really go back to the old code.
>>>>> Cardelli's idea of keeping an array of parents up to ROOT plus a
>>>>> "depth" for each type might have merit, though.
>>>>>
>>>>> To see if a is a subtype of b, something like:
>>>>>
>>>>> b = a.ancestors[a.depth-b.depth-1] (* with appropriate range
>>>>> checks *)
>>>>>
>>>>> Would this be easy to put in?  I'm not sure how one can be sure
>>>>> that typecodes are done being generated?  There's something called
>>>>> RTTypeSRC.FinishObjectTypes ..
>>>>>
>>>>> And PM3 still generates code that's four times faster.
>>>>>
>>>>>  Mika
>>>>>
>>





More information about the M3devel mailing list