[M3devel] optimization [ was Re: Performance issues with CM3 ]
Tony Hosking
hosking at cs.purdue.edu
Mon Apr 27 02:37:09 CEST 2009
On 27 Apr 2009, at 10:31, Mika Nystrom wrote:
>
> Tony, it might not be as bad as all that. Well, on one level it
> is. Things are certainly not perfect.
>
> But I am able to operate with m3core built with -O (as well as with
> -O3). Lots of scary-looking compiler warnings, but things do work.
> There are just a few programs that won't build. I didn't try "all"
> in the CM3 dist---only my own programs and m3core (since m3core has
> the biggest performance impact).
This is weird, since I assume you are targetting x86 which is the same
(except I386_DARWIN in my case) as I have used -O3 for without
problems previously. I shall have to rebuild everything in cm3 with -
O3 to see if I can track down the problems. I can't remember the last
time I checked that it all worked, but the CVS logs will probably
reveal more (cf my log messages for parse.c in the gcc-based backend).
> With lots of tweaks and adjustments, I now see my code running about
> 100% slower under CM3 than the same code does under PM3 (on the
> same machine). This is including my typecase hacks, as described
> earlier today. I'm guessing most of it is the FreeBSD pthreads
> implementation in libc_r + the calls to PushEFrame.
Yikes! How much of this is module initialization (startup) time?
>
>
> Mika
>
> Tony Hosking writes:
>> I am very disturbed about this since it suggests a regression. I had
>> spent a huge amount of time a year or so back making sure the backend
>> would play properly with gcc -O3, but it seems we are now back in a
>> bad place. I'm not sure what changes have occurred to the backend
>> since then, but they would be the prime candidates. Unfortunately, I
>> don't have a lot of time right now to try to debug these -O3
>> problems,
>> but I do want to fix them since they will eventually impinge on my
>> own
>> work. It would be really good to get our regressions framework back
>> up and running and to put -O3 in there as the default build option --
>> it seems there have been ongoing Tinderbox problems for a while now,
>> since my SOLgnu regression runs appear to have stopped completely.
>> I'll need to check the logs.
>>
>> On 27 Apr 2009, at 04:12, Mika Nystrom wrote:
>>
>>> Hi Tony,
>>>
>>> I looked at this more closely, and I was wrong. The compiler
>>> doesn't
>>> actually segfault on -O. I was using -gstabs+ but switched to -
>>> gstabs
>>> after your email (doesn't seem to matter).
>>>
>>> I get a ton of warnings at either optimization level, and there are
>>> definitely bugs in the optimizer. The resulting code is generally
>>> not correct. (By comparison, I had to turn off PM3's optimizer for
>>> only one of the hundred or so packages I build.) Things often fail
>>> to compile, even at -O.
>>>
>>> At -O3, I get one segfault:
>>>
>>> new source -> compiling TextCommandQueueTbl.i3
>>> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
>>> exception handling
>>> new source -> compiling CommandLoop.m3
>>> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
>>> exception handling
>>> ../src/CommandLoop.m3: In function 'CommandLoop__Run':
>>> ../src/CommandLoop.m3:279: internal compiler error: Segmentation
>>> fault
>>> Please submit a full bug report,
>>> with preprocessed source if appropriate.
>>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>> m3_backend => 4
>>> m3cc (aka cm3cg) failed compiling: CommandLoop.mc
>>> new source -> compiling CommandLoopDefaultCommand.m3
>>> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
>>> exception handling
>>> new source -> compiling TextCommandTbl.m3
>>>
>>> where:
>>>
>>> 272
>>> (*****************************************************************************
>>> 273
>>> * *
>>> 274 * Command Loop
>>> Main *
>>> 275
>>> * *
>>> 276
>>> *****************************************************************************)
>>> 277
>>> 278
>>> 279 PROCEDURE Run(self: T; source: Pathname.T := NIL; term:
>>> Term.T := NIL) =
>>> 280 CONST
>>> 281 Comment = SET OF CHAR{'%','#'};
>>> 282 VAR
>>> 283 completer := NEW(StdCompleter, loop:=self);
>>> 284 line: TEXT;
>>> 285 BEGIN
>>> 286 IF term = NIL THEN
>>> 287 self.term := Term.Default();
>>> 288 ELSE
>>> 289 self.term := term;
>>> 290 END;
>>> 291 LOOP
>>> 292 TRY
>>> 293 IF source # NIL THEN
>>> 294 DoLoad(self.load, TextList.List2("",source),
>>> self.term);
>>> 295 source := NIL;
>>>
>>> ...
>>>
>>> Even at -O, things don't work right. Here's a typical output:
>>>
>>> new source -> compiling PassiveArb1.m3
>>> "../src/PassiveArb1.m3", line 68: warning: not used (e)
>>> "../src/PassiveArb1.m3", line 45: warning: not used (newCon)
>>> 2 warnings encountered
>>> ../src/PassiveArb1.m3: In function 'PassiveArb1__FApply':
>>> ../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_buyO' might
>>> be clobbered by 'longjmp' or 'vfork'
>>> ../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_selO' might
>>> be clobbered by 'longjmp' or 'vfork'
>>> new source -> compiling PassiveArb2.i3
>>> new source -> compiling ExecRecorder2.i3
>>> new source -> compiling ArbPingPong.i3
>>> new source -> compiling PassiveArb2.m3
>>> ../src/PassiveArb2.m3: In function 'PassiveArb2__Apply':
>>> ../src/PassiveArb2.m3:388: warning: variable 'M3_EWPD1K_delta' might
>>> be clobbered by 'longjmp' or 'vfork'
>>> ../src/PassiveArb2.m3:388: warning: variable 'M3_Cwb5VA_toExec'
>>> might be clobbered by 'longjmp' or 'vfork'
>>> new source -> compiling Globals.i3
>>> new source -> compiling ActiveArb1.i3
>>> new source -> compiling ActiveArb1.m3
>>> new source -> compiling ExecRecorder.i3
>>> new source -> compiling ExecRecorder.m3
>>> new source -> compiling ExecRec.i3
>>> new source -> compiling ExecRecorder2.m3
>>> new source -> compiling ExecRec.m3
>>> new source -> compiling ArbPingPong.m3
>>> new source -> compiling Main.m3
>>> "../src/Main.m3", line 72: warning: potentially unhandled exception:
>>> OSError.E
>>> "../src/Main.m3", line 30: warning: potentially unhandled
>>> exceptions: Rd.EndOfFile, Rd.Failure, Thread.Alerted
>>> "../src/Main.m3", line 31: warning: potentially unhandled
>>> exceptions: Thread.Alerted, Wr.Failure
>>> "../src/Main.m3", line 32: warning: potentially unhandled
>>> exceptions: Thread.Alerted, Wr.Failure
>>> "../src/Main.m3", line 33: warning: potentially unhandled
>>> exceptions: Thread.Alerted, Wr.Failure
>>> "../src/Main.m3", line 118: warning: potentially unhandled
>>> exception: OSError.E
>>> "../src/Main.m3", line 204: warning: potentially unhandled
>>> exception: OSError.E
>>> 7 warnings encountered
>>> -> linking testtrade
>>> /usr/lib/libc.so: WARNING! setkey(3) not present in the system!
>>> /usr/lib/libc.so: warning: this program uses gets(), which is
>>> unsafe.
>>> /usr/lib/libc.so: warning: mktemp() possibly used unsafely; consider
>>> using mkstemp()
>>> /usr/lib/libc.so: WARNING! des_setkey(3) not present in the system!
>>> /usr/lib/libc.so: WARNING! encrypt(3) not present in the system!
>>> /usr/lib/libc.so: warning: tmpnam() possibly used unsafely; consider
>>> using mkstemp()
>>> /usr/lib/libc.so: warning: this program uses f_prealloc(), which is
>>> not recommended.
>>> /usr/lib/libc.so: WARNING! des_cipher(3) not present in the system!
>>> /usr/lib/libc.so: warning: tempnam() possibly used unsafely;
>>> consider using mkstemp()
>>> Main.mo: In function `Main_M3':
>>> ../src/Main.m3:164: undefined reference to `Main__5__1__1__CanStart.
>>> 198'
>>> /home/mika/t-cm3/calarm/twslib/FreeBSD4/libtwslib.so: undefined
>>> reference to `TWSReader__RCApply__RD.332'
>>> m3_link => 1
>>> linker failed linking: testtrade
>>> Fatal Error: package build failed
>>>
>>>
>>> Mika
>>>
>>>
>>>
>>> Tony Hosking writes:
>>>>
>>>> On 26 Apr 2009, at 15:22, Mika Nystrom wrote:
>>>>
>>>>>
>>>>> Hello again,
>>>>>
>>>>> Now I've managed to get all the code up and running under CM3. I
>>>>> found and committed fixes to a bug in Wx and some code in one of
>>>>> the m3tk libraries that looked like it never was finished in the
>>>>> first place.
>>>>>
>>>>> As I mentioned earlier, I wasn't able to get user threads working
>>>>> in CM3 on FreeBSD 4.11. But with some help from Jay, I was able
>>>>> to
>>>>> get things working with libc_r. Performance, unfortunately,
>>>>> leaves something to be desired.
>>>>>
>>>>> For the first time I've been able to compare timings on identical
>>>>> hardware between the PM3 I was using and the CM3 that's out now.
>>>>>
>>>>> Note that optimization doesn't seem to work..? (Not even -O, much
>>>>> less -O3... the compiler segfaults.)
>>>>
>>>> Are you passing -gstabs? It should not segfault on -O3 - this is a
>>>> regression if it does.
>>>>
>>>>> Here's what I get, using no optimization either in PM3 or CM3.
>>>>> The
>>>>> test is my Scheme interpreter generating SQL and Modula-3 code
>>>>> (a bit like the Hibernate system you can get for Java):
>>>>>
>>>>>
>>>>> CPU seconds CM3 PM3
>>>>> First version 5.269 1.366
>>>>> Fewer NEWs 2.039 0.440 (code cleanup on my part)
>>>>> TYPECASE hack 1.770 (see below)
>>>>>
>>>>> Some "poor man's profiling" (i.e., ctrl-C'ing in m3gdb) suggests
>>>>> that
>>>>> most of the time is spent either in threading code (this could
>>>>> just
>>>>> be a lousy implementation in libc_r), the garbage collector, or in
>>>>> "ScanTypecase".
>>>>>
>>>>> The only one of these routines I am qualified to do anything about
>>>>> is
>>>>> ScanTypecase. I don't know why the Critical Mass people... <how
>>>>> colorful
>>>>> language can one use on m3devel?>.. all over this code. I
>>>>> assume it
>>>>> has
>>>>> something to do with Java.
>>>>>
>>>>> The PM3 code (from SRC?) has this wonderful, concise, efficient
>>>>> bit:
>>>>>
>>>>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>>>> VAR t := Get (b);
>>>>> BEGIN
>>>>> IF (a >= RT0u.nTypes) THEN BadType (a) END;
>>>>> IF (a = 0) THEN RETURN TRUE END;
>>>>> RETURN (t.typecode <= a AND a <= t.lastSubTypeTC);
>>>>> END IsSubtype;
>>>>>
>>>>> replaced with the following absolute abomination in CM3:
>>>>>
>>>>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>>>> VAR t: RT0.TypeDefn;
>>>>> BEGIN
>>>>> IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
>>>>> t := Get (a);
>>>>> IF (t = NIL) THEN RETURN FALSE; END;
>>>>> IF (t.typecode = b) THEN RETURN TRUE END;
>>>>> WHILE (t.kind = ORD (TK.Obj)) DO
>>>>> IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
>>>>> t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
>>>>> IF (t = NIL) THEN RETURN FALSE; END;
>>>>> IF (t.typecode = b) THEN RETURN TRUE; END;
>>>>> END;
>>>>> IF (t.traced # 0)
>>>>> THEN RETURN (b = RT0.RefanyTypecode);
>>>>> ELSE RETURN (b = RT0.AddressTypecode);
>>>>> END;
>>>>> END IsSubtype;
>>>>
>>>> This is all to support dynamic loading of libraries.
>>>>
>>>>> Furthermore, CM3 has a hook for "ScanTypecase" that's missing
>>>>> in PM3 (the older compiler actually generates code for this):
>>>>>
>>>>> PROCEDURE ScanTypecase (ref: REFANY;
>>>>> x: ADDRESS(*ARRAY [0..] OF Cell*)):
>>>>> INTEGER =
>>>>> VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode;
>>>>> BEGIN
>>>>> IF (ref = NIL) THEN RETURN 0; END;
>>>>> tc := TYPECODE (ref);
>>>>> p := x; i := 0;
>>>>> LOOP
>>>>> IF (p.uid = 0) THEN RETURN i; END;
>>>>> IF (p.defn = NIL) THEN
>>>>> p.defn := FindType (p.uid);
>>>>> IF (p.defn = NIL) THEN
>>>>> Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>>>> LOOPHOLE (p.uid, ADDRESS), NIL);
>>>>> END;
>>>>> END;
>>>>> xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>>>> IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END;
>>>>> INC (p, ADRSIZE (p^)); INC (i);
>>>>> END;
>>>>> END ScanTypecase;
>>>>>
>>>>> Where to begin? A loop with all kinds of runtime checks of
>>>>> properties
>>>>> that are supposedly known at compile time? IsSubtype (itself a
>>>>> loop)
>>>>> called from inside the loop?
>>>>
>>>> Not if dynamically loaded!
>>>>
>>>>> I was able to cut out almost all of the typecase activity from my
>>>>> program by using the following code in RTType.m3, which depends on
>>>>> the ADDRESS x never changing (well more specifically never being
>>>>> the same for two TYPECASE statements):
>>>>>
>>>>> TYPE
>>>>> TypeCaseResult = RECORD
>>>>> x : ADDRESS;
>>>>> tc : Typecode;
>>>>> arm : INTEGER;
>>>>> END;
>>>>>
>>>>> CONST
>>>>> TCCachePow = 6;
>>>>> TCCacheSize = Word.Shift(1,TCCachePow);
>>>>> TCMask = TCCacheSize-1;
>>>>>
>>>>> VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult {
>>>>> TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } ,
>>>>> ..
>>>>> };
>>>>>
>>>>> (*
>>>>> VAR tcScans := 0; tcHits := 0; (* instrumenting counters *)
>>>>> *)
>>>>>
>>>>> PROCEDURE ScanTypecase (ref: REFANY;
>>>>> x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>>>>> VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode;
>>>>> BEGIN
>>>>> tc := TYPECODE (ref);
>>>>> IF (ref = NIL) THEN RETURN 0; END;
>>>>>
>>>>> WITH hash = Word.And(Word.Times(tc,
>>>>>
>>>>> Word.RightShift(LOOPHOLE(x,Word.T),2)),
>>>>> TCMask),
>>>>> entry = TCCache[hash] DO
>>>>> (*INC(tcScans);*)
>>>>> IF entry.x = x AND entry.tc = tc THEN
>>>>> (*INC(tcHits);*)
>>>>> RETURN entry.arm
>>>>> END;
>>>>>
>>>>> p := x; i := 0;
>>>>> LOOP
>>>>> IF (p.uid = 0) THEN entry.x := x; entry.tc := tc;
>>>>> entry.arm := i; RETURN i; END;
>>>>> IF (p.defn = NIL) THEN
>>>>> p.defn := FindType (p.uid);
>>>>> IF (p.defn = NIL) THEN
>>>>> Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>>>> LOOPHOLE (p.uid, ADDRESS), NIL);
>>>>> END;
>>>>> END;
>>>>> xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>>>> IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x;
>>>>> entry.tc := tc; entry.arm := i; RETURN i; END;
>>>>> INC (p, ADRSIZE (p^)); INC (i);
>>>>> END;
>>>>> END;
>>>>> END ScanTypecase;
>>>>>
>>>>> I'm guessing the speedup for TYPECASE itself is a factor of at
>>>>> least
>>>>> ten. But it's still a pretty nasty hack. And there is still a
>>>>> lot
>>>>> of IsSubtype activity from narrowing.
>>>>>
>>>>> I suppose that the way the typecodes are generated in CM3 is
>>>>> sufficiently different (meant to be extended at runtime?) from how
>>>>> it was done in PM3 that one can't really go back to the old code.
>>>>> Cardelli's idea of keeping an array of parents up to ROOT plus a
>>>>> "depth" for each type might have merit, though.
>>>>>
>>>>> To see if a is a subtype of b, something like:
>>>>>
>>>>> b = a.ancestors[a.depth-b.depth-1] (* with appropriate range
>>>>> checks *)
>>>>>
>>>>> Would this be easy to put in? I'm not sure how one can be sure
>>>>> that typecodes are done being generated? There's something called
>>>>> RTTypeSRC.FinishObjectTypes ..
>>>>>
>>>>> And PM3 still generates code that's four times faster.
>>>>>
>>>>> Mika
>>>>>
>>
More information about the M3devel
mailing list