[M3devel] optimization [ was Re: Performance issues with CM3 ]
Tony Hosking
hosking at cs.purdue.edu
Mon Apr 27 02:04:26 CEST 2009
I am very disturbed about this since it suggests a regression. I had
spent a huge amount of time a year or so back making sure the backend
would play properly with gcc -O3, but it seems we are now back in a
bad place. I'm not sure what changes have occurred to the backend
since then, but they would be the prime candidates. Unfortunately, I
don't have a lot of time right now to try to debug these -O3 problems,
but I do want to fix them since they will eventually impinge on my own
work. It would be really good to get our regressions framework back
up and running and to put -O3 in there as the default build option --
it seems there have been ongoing Tinderbox problems for a while now,
since my SOLgnu regression runs appear to have stopped completely.
I'll need to check the logs.
On 27 Apr 2009, at 04:12, Mika Nystrom wrote:
> Hi Tony,
>
> I looked at this more closely, and I was wrong. The compiler doesn't
> actually segfault on -O. I was using -gstabs+ but switched to -gstabs
> after your email (doesn't seem to matter).
>
> I get a ton of warnings at either optimization level, and there are
> definitely bugs in the optimizer. The resulting code is generally
> not correct. (By comparison, I had to turn off PM3's optimizer for
> only one of the hundred or so packages I build.) Things often fail
> to compile, even at -O.
>
> At -O3, I get one segfault:
>
> new source -> compiling TextCommandQueueTbl.i3
> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
> exception handling
> new source -> compiling CommandLoop.m3
> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
> exception handling
> ../src/CommandLoop.m3: In function 'CommandLoop__Run':
> ../src/CommandLoop.m3:279: internal compiler error: Segmentation fault
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> m3_backend => 4
> m3cc (aka cm3cg) failed compiling: CommandLoop.mc
> new source -> compiling CommandLoopDefaultCommand.m3
> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
> exception handling
> new source -> compiling TextCommandTbl.m3
>
> where:
>
> 272
> (*****************************************************************************
> 273
> * *
> 274 * Command Loop
> Main *
> 275
> * *
> 276
> *****************************************************************************)
> 277
> 278
> 279 PROCEDURE Run(self: T; source: Pathname.T := NIL; term:
> Term.T := NIL) =
> 280 CONST
> 281 Comment = SET OF CHAR{'%','#'};
> 282 VAR
> 283 completer := NEW(StdCompleter, loop:=self);
> 284 line: TEXT;
> 285 BEGIN
> 286 IF term = NIL THEN
> 287 self.term := Term.Default();
> 288 ELSE
> 289 self.term := term;
> 290 END;
> 291 LOOP
> 292 TRY
> 293 IF source # NIL THEN
> 294 DoLoad(self.load, TextList.List2("",source),
> self.term);
> 295 source := NIL;
>
> ...
>
> Even at -O, things don't work right. Here's a typical output:
>
> new source -> compiling PassiveArb1.m3
> "../src/PassiveArb1.m3", line 68: warning: not used (e)
> "../src/PassiveArb1.m3", line 45: warning: not used (newCon)
> 2 warnings encountered
> ../src/PassiveArb1.m3: In function 'PassiveArb1__FApply':
> ../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_buyO' might
> be clobbered by 'longjmp' or 'vfork'
> ../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_selO' might
> be clobbered by 'longjmp' or 'vfork'
> new source -> compiling PassiveArb2.i3
> new source -> compiling ExecRecorder2.i3
> new source -> compiling ArbPingPong.i3
> new source -> compiling PassiveArb2.m3
> ../src/PassiveArb2.m3: In function 'PassiveArb2__Apply':
> ../src/PassiveArb2.m3:388: warning: variable 'M3_EWPD1K_delta' might
> be clobbered by 'longjmp' or 'vfork'
> ../src/PassiveArb2.m3:388: warning: variable 'M3_Cwb5VA_toExec'
> might be clobbered by 'longjmp' or 'vfork'
> new source -> compiling Globals.i3
> new source -> compiling ActiveArb1.i3
> new source -> compiling ActiveArb1.m3
> new source -> compiling ExecRecorder.i3
> new source -> compiling ExecRecorder.m3
> new source -> compiling ExecRec.i3
> new source -> compiling ExecRecorder2.m3
> new source -> compiling ExecRec.m3
> new source -> compiling ArbPingPong.m3
> new source -> compiling Main.m3
> "../src/Main.m3", line 72: warning: potentially unhandled exception:
> OSError.E
> "../src/Main.m3", line 30: warning: potentially unhandled
> exceptions: Rd.EndOfFile, Rd.Failure, Thread.Alerted
> "../src/Main.m3", line 31: warning: potentially unhandled
> exceptions: Thread.Alerted, Wr.Failure
> "../src/Main.m3", line 32: warning: potentially unhandled
> exceptions: Thread.Alerted, Wr.Failure
> "../src/Main.m3", line 33: warning: potentially unhandled
> exceptions: Thread.Alerted, Wr.Failure
> "../src/Main.m3", line 118: warning: potentially unhandled
> exception: OSError.E
> "../src/Main.m3", line 204: warning: potentially unhandled
> exception: OSError.E
> 7 warnings encountered
> -> linking testtrade
> /usr/lib/libc.so: WARNING! setkey(3) not present in the system!
> /usr/lib/libc.so: warning: this program uses gets(), which is unsafe.
> /usr/lib/libc.so: warning: mktemp() possibly used unsafely; consider
> using mkstemp()
> /usr/lib/libc.so: WARNING! des_setkey(3) not present in the system!
> /usr/lib/libc.so: WARNING! encrypt(3) not present in the system!
> /usr/lib/libc.so: warning: tmpnam() possibly used unsafely; consider
> using mkstemp()
> /usr/lib/libc.so: warning: this program uses f_prealloc(), which is
> not recommended.
> /usr/lib/libc.so: WARNING! des_cipher(3) not present in the system!
> /usr/lib/libc.so: warning: tempnam() possibly used unsafely;
> consider using mkstemp()
> Main.mo: In function `Main_M3':
> ../src/Main.m3:164: undefined reference to `Main__5__1__1__CanStart.
> 198'
> /home/mika/t-cm3/calarm/twslib/FreeBSD4/libtwslib.so: undefined
> reference to `TWSReader__RCApply__RD.332'
> m3_link => 1
> linker failed linking: testtrade
> Fatal Error: package build failed
>
>
> Mika
>
>
>
> Tony Hosking writes:
>>
>> On 26 Apr 2009, at 15:22, Mika Nystrom wrote:
>>
>>>
>>> Hello again,
>>>
>>> Now I've managed to get all the code up and running under CM3. I
>>> found and committed fixes to a bug in Wx and some code in one of
>>> the m3tk libraries that looked like it never was finished in the
>>> first place.
>>>
>>> As I mentioned earlier, I wasn't able to get user threads working
>>> in CM3 on FreeBSD 4.11. But with some help from Jay, I was able to
>>> get things working with libc_r. Performance, unfortunately,
>>> leaves something to be desired.
>>>
>>> For the first time I've been able to compare timings on identical
>>> hardware between the PM3 I was using and the CM3 that's out now.
>>>
>>> Note that optimization doesn't seem to work..? (Not even -O, much
>>> less -O3... the compiler segfaults.)
>>
>> Are you passing -gstabs? It should not segfault on -O3 - this is a
>> regression if it does.
>>
>>> Here's what I get, using no optimization either in PM3 or CM3. The
>>> test is my Scheme interpreter generating SQL and Modula-3 code
>>> (a bit like the Hibernate system you can get for Java):
>>>
>>>
>>> CPU seconds CM3 PM3
>>> First version 5.269 1.366
>>> Fewer NEWs 2.039 0.440 (code cleanup on my part)
>>> TYPECASE hack 1.770 (see below)
>>>
>>> Some "poor man's profiling" (i.e., ctrl-C'ing in m3gdb) suggests
>>> that
>>> most of the time is spent either in threading code (this could just
>>> be a lousy implementation in libc_r), the garbage collector, or in
>>> "ScanTypecase".
>>>
>>> The only one of these routines I am qualified to do anything about
>>> is
>>> ScanTypecase. I don't know why the Critical Mass people... <how
>>> colorful
>>> language can one use on m3devel?>.. all over this code. I assume it
>>> has
>>> something to do with Java.
>>>
>>> The PM3 code (from SRC?) has this wonderful, concise, efficient bit:
>>>
>>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>> VAR t := Get (b);
>>> BEGIN
>>> IF (a >= RT0u.nTypes) THEN BadType (a) END;
>>> IF (a = 0) THEN RETURN TRUE END;
>>> RETURN (t.typecode <= a AND a <= t.lastSubTypeTC);
>>> END IsSubtype;
>>>
>>> replaced with the following absolute abomination in CM3:
>>>
>>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>> VAR t: RT0.TypeDefn;
>>> BEGIN
>>> IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
>>> t := Get (a);
>>> IF (t = NIL) THEN RETURN FALSE; END;
>>> IF (t.typecode = b) THEN RETURN TRUE END;
>>> WHILE (t.kind = ORD (TK.Obj)) DO
>>> IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
>>> t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
>>> IF (t = NIL) THEN RETURN FALSE; END;
>>> IF (t.typecode = b) THEN RETURN TRUE; END;
>>> END;
>>> IF (t.traced # 0)
>>> THEN RETURN (b = RT0.RefanyTypecode);
>>> ELSE RETURN (b = RT0.AddressTypecode);
>>> END;
>>> END IsSubtype;
>>
>> This is all to support dynamic loading of libraries.
>>
>>> Furthermore, CM3 has a hook for "ScanTypecase" that's missing
>>> in PM3 (the older compiler actually generates code for this):
>>>
>>> PROCEDURE ScanTypecase (ref: REFANY;
>>> x: ADDRESS(*ARRAY [0..] OF Cell*)):
>>> INTEGER =
>>> VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode;
>>> BEGIN
>>> IF (ref = NIL) THEN RETURN 0; END;
>>> tc := TYPECODE (ref);
>>> p := x; i := 0;
>>> LOOP
>>> IF (p.uid = 0) THEN RETURN i; END;
>>> IF (p.defn = NIL) THEN
>>> p.defn := FindType (p.uid);
>>> IF (p.defn = NIL) THEN
>>> Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>> LOOPHOLE (p.uid, ADDRESS), NIL);
>>> END;
>>> END;
>>> xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>> IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END;
>>> INC (p, ADRSIZE (p^)); INC (i);
>>> END;
>>> END ScanTypecase;
>>>
>>> Where to begin? A loop with all kinds of runtime checks of
>>> properties
>>> that are supposedly known at compile time? IsSubtype (itself a loop)
>>> called from inside the loop?
>>
>> Not if dynamically loaded!
>>
>>> I was able to cut out almost all of the typecase activity from my
>>> program by using the following code in RTType.m3, which depends on
>>> the ADDRESS x never changing (well more specifically never being
>>> the same for two TYPECASE statements):
>>>
>>> TYPE
>>> TypeCaseResult = RECORD
>>> x : ADDRESS;
>>> tc : Typecode;
>>> arm : INTEGER;
>>> END;
>>>
>>> CONST
>>> TCCachePow = 6;
>>> TCCacheSize = Word.Shift(1,TCCachePow);
>>> TCMask = TCCacheSize-1;
>>>
>>> VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult {
>>> TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } ,
>>> ..
>>> };
>>>
>>> (*
>>> VAR tcScans := 0; tcHits := 0; (* instrumenting counters *)
>>> *)
>>>
>>> PROCEDURE ScanTypecase (ref: REFANY;
>>> x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>>> VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode;
>>> BEGIN
>>> tc := TYPECODE (ref);
>>> IF (ref = NIL) THEN RETURN 0; END;
>>>
>>> WITH hash = Word.And(Word.Times(tc,
>>>
>>> Word.RightShift(LOOPHOLE(x,Word.T),2)),
>>> TCMask),
>>> entry = TCCache[hash] DO
>>> (*INC(tcScans);*)
>>> IF entry.x = x AND entry.tc = tc THEN
>>> (*INC(tcHits);*)
>>> RETURN entry.arm
>>> END;
>>>
>>> p := x; i := 0;
>>> LOOP
>>> IF (p.uid = 0) THEN entry.x := x; entry.tc := tc;
>>> entry.arm := i; RETURN i; END;
>>> IF (p.defn = NIL) THEN
>>> p.defn := FindType (p.uid);
>>> IF (p.defn = NIL) THEN
>>> Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>> LOOPHOLE (p.uid, ADDRESS), NIL);
>>> END;
>>> END;
>>> xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>> IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x;
>>> entry.tc := tc; entry.arm := i; RETURN i; END;
>>> INC (p, ADRSIZE (p^)); INC (i);
>>> END;
>>> END;
>>> END ScanTypecase;
>>>
>>> I'm guessing the speedup for TYPECASE itself is a factor of at least
>>> ten. But it's still a pretty nasty hack. And there is still a lot
>>> of IsSubtype activity from narrowing.
>>>
>>> I suppose that the way the typecodes are generated in CM3 is
>>> sufficiently different (meant to be extended at runtime?) from how
>>> it was done in PM3 that one can't really go back to the old code.
>>> Cardelli's idea of keeping an array of parents up to ROOT plus a
>>> "depth" for each type might have merit, though.
>>>
>>> To see if a is a subtype of b, something like:
>>>
>>> b = a.ancestors[a.depth-b.depth-1] (* with appropriate range
>>> checks *)
>>>
>>> Would this be easy to put in? I'm not sure how one can be sure
>>> that typecodes are done being generated? There's something called
>>> RTTypeSRC.FinishObjectTypes ..
>>>
>>> And PM3 still generates code that's four times faster.
>>>
>>> Mika
>>>
More information about the M3devel
mailing list