[M3devel] optimization [ was Re: Performance issues with CM3 ]
Mika Nystrom
mika at async.caltech.edu
Mon Apr 27 02:31:42 CEST 2009
Tony, it might not be as bad as all that. Well, on one level it
is. Things are certainly not perfect.
But I am able to operate with m3core built with -O (as well as with
-O3). Lots of scary-looking compiler warnings, but things do work.
There are just a few programs that won't build. I didn't try "all"
in the CM3 dist---only my own programs and m3core (since m3core has
the biggest performance impact).
With lots of tweaks and adjustments, I now see my code running about
100% slower under CM3 than the same code does under PM3 (on the
same machine). This is including my typecase hacks, as described
earlier today. I'm guessing most of it is the FreeBSD pthreads
implementation in libc_r + the calls to PushEFrame.
Mika
Tony Hosking writes:
>I am very disturbed about this since it suggests a regression. I had
>spent a huge amount of time a year or so back making sure the backend
>would play properly with gcc -O3, but it seems we are now back in a
>bad place. I'm not sure what changes have occurred to the backend
>since then, but they would be the prime candidates. Unfortunately, I
>don't have a lot of time right now to try to debug these -O3 problems,
>but I do want to fix them since they will eventually impinge on my own
>work. It would be really good to get our regressions framework back
>up and running and to put -O3 in there as the default build option --
>it seems there have been ongoing Tinderbox problems for a while now,
>since my SOLgnu regression runs appear to have stopped completely.
>I'll need to check the logs.
>
>On 27 Apr 2009, at 04:12, Mika Nystrom wrote:
>
>> Hi Tony,
>>
>> I looked at this more closely, and I was wrong. The compiler doesn't
>> actually segfault on -O. I was using -gstabs+ but switched to -gstabs
>> after your email (doesn't seem to matter).
>>
>> I get a ton of warnings at either optimization level, and there are
>> definitely bugs in the optimizer. The resulting code is generally
>> not correct. (By comparison, I had to turn off PM3's optimizer for
>> only one of the hundred or so packages I build.) Things often fail
>> to compile, even at -O.
>>
>> At -O3, I get one segfault:
>>
>> new source -> compiling TextCommandQueueTbl.i3
>> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
>> exception handling
>> new source -> compiling CommandLoop.m3
>> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
>> exception handling
>> ../src/CommandLoop.m3: In function 'CommandLoop__Run':
>> ../src/CommandLoop.m3:279: internal compiler error: Segmentation fault
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> See <http://gcc.gnu.org/bugs.html> for instructions.
>> m3_backend => 4
>> m3cc (aka cm3cg) failed compiling: CommandLoop.mc
>> new source -> compiling CommandLoopDefaultCommand.m3
>> cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack
>> exception handling
>> new source -> compiling TextCommandTbl.m3
>>
>> where:
>>
>> 272
>> (*****************************************************************************
>> 273
>> * *
>> 274 * Command Loop
>> Main *
>> 275
>> * *
>> 276
>> *****************************************************************************)
>> 277
>> 278
>> 279 PROCEDURE Run(self: T; source: Pathname.T := NIL; term:
>> Term.T := NIL) =
>> 280 CONST
>> 281 Comment = SET OF CHAR{'%','#'};
>> 282 VAR
>> 283 completer := NEW(StdCompleter, loop:=self);
>> 284 line: TEXT;
>> 285 BEGIN
>> 286 IF term = NIL THEN
>> 287 self.term := Term.Default();
>> 288 ELSE
>> 289 self.term := term;
>> 290 END;
>> 291 LOOP
>> 292 TRY
>> 293 IF source # NIL THEN
>> 294 DoLoad(self.load, TextList.List2("",source),
>> self.term);
>> 295 source := NIL;
>>
>> ...
>>
>> Even at -O, things don't work right. Here's a typical output:
>>
>> new source -> compiling PassiveArb1.m3
>> "../src/PassiveArb1.m3", line 68: warning: not used (e)
>> "../src/PassiveArb1.m3", line 45: warning: not used (newCon)
>> 2 warnings encountered
>> ../src/PassiveArb1.m3: In function 'PassiveArb1__FApply':
>> ../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_buyO' might
>> be clobbered by 'longjmp' or 'vfork'
>> ../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_selO' might
>> be clobbered by 'longjmp' or 'vfork'
>> new source -> compiling PassiveArb2.i3
>> new source -> compiling ExecRecorder2.i3
>> new source -> compiling ArbPingPong.i3
>> new source -> compiling PassiveArb2.m3
>> ../src/PassiveArb2.m3: In function 'PassiveArb2__Apply':
>> ../src/PassiveArb2.m3:388: warning: variable 'M3_EWPD1K_delta' might
>> be clobbered by 'longjmp' or 'vfork'
>> ../src/PassiveArb2.m3:388: warning: variable 'M3_Cwb5VA_toExec'
>> might be clobbered by 'longjmp' or 'vfork'
>> new source -> compiling Globals.i3
>> new source -> compiling ActiveArb1.i3
>> new source -> compiling ActiveArb1.m3
>> new source -> compiling ExecRecorder.i3
>> new source -> compiling ExecRecorder.m3
>> new source -> compiling ExecRec.i3
>> new source -> compiling ExecRecorder2.m3
>> new source -> compiling ExecRec.m3
>> new source -> compiling ArbPingPong.m3
>> new source -> compiling Main.m3
>> "../src/Main.m3", line 72: warning: potentially unhandled exception:
>> OSError.E
>> "../src/Main.m3", line 30: warning: potentially unhandled
>> exceptions: Rd.EndOfFile, Rd.Failure, Thread.Alerted
>> "../src/Main.m3", line 31: warning: potentially unhandled
>> exceptions: Thread.Alerted, Wr.Failure
>> "../src/Main.m3", line 32: warning: potentially unhandled
>> exceptions: Thread.Alerted, Wr.Failure
>> "../src/Main.m3", line 33: warning: potentially unhandled
>> exceptions: Thread.Alerted, Wr.Failure
>> "../src/Main.m3", line 118: warning: potentially unhandled
>> exception: OSError.E
>> "../src/Main.m3", line 204: warning: potentially unhandled
>> exception: OSError.E
>> 7 warnings encountered
>> -> linking testtrade
>> /usr/lib/libc.so: WARNING! setkey(3) not present in the system!
>> /usr/lib/libc.so: warning: this program uses gets(), which is unsafe.
>> /usr/lib/libc.so: warning: mktemp() possibly used unsafely; consider
>> using mkstemp()
>> /usr/lib/libc.so: WARNING! des_setkey(3) not present in the system!
>> /usr/lib/libc.so: WARNING! encrypt(3) not present in the system!
>> /usr/lib/libc.so: warning: tmpnam() possibly used unsafely; consider
>> using mkstemp()
>> /usr/lib/libc.so: warning: this program uses f_prealloc(), which is
>> not recommended.
>> /usr/lib/libc.so: WARNING! des_cipher(3) not present in the system!
>> /usr/lib/libc.so: warning: tempnam() possibly used unsafely;
>> consider using mkstemp()
>> Main.mo: In function `Main_M3':
>> ../src/Main.m3:164: undefined reference to `Main__5__1__1__CanStart.
>> 198'
>> /home/mika/t-cm3/calarm/twslib/FreeBSD4/libtwslib.so: undefined
>> reference to `TWSReader__RCApply__RD.332'
>> m3_link => 1
>> linker failed linking: testtrade
>> Fatal Error: package build failed
>>
>>
>> Mika
>>
>>
>>
>> Tony Hosking writes:
>>>
>>> On 26 Apr 2009, at 15:22, Mika Nystrom wrote:
>>>
>>>>
>>>> Hello again,
>>>>
>>>> Now I've managed to get all the code up and running under CM3. I
>>>> found and committed fixes to a bug in Wx and some code in one of
>>>> the m3tk libraries that looked like it never was finished in the
>>>> first place.
>>>>
>>>> As I mentioned earlier, I wasn't able to get user threads working
>>>> in CM3 on FreeBSD 4.11. But with some help from Jay, I was able to
>>>> get things working with libc_r. Performance, unfortunately,
>>>> leaves something to be desired.
>>>>
>>>> For the first time I've been able to compare timings on identical
>>>> hardware between the PM3 I was using and the CM3 that's out now.
>>>>
>>>> Note that optimization doesn't seem to work..? (Not even -O, much
>>>> less -O3... the compiler segfaults.)
>>>
>>> Are you passing -gstabs? It should not segfault on -O3 - this is a
>>> regression if it does.
>>>
>>>> Here's what I get, using no optimization either in PM3 or CM3. The
>>>> test is my Scheme interpreter generating SQL and Modula-3 code
>>>> (a bit like the Hibernate system you can get for Java):
>>>>
>>>>
>>>> CPU seconds CM3 PM3
>>>> First version 5.269 1.366
>>>> Fewer NEWs 2.039 0.440 (code cleanup on my part)
>>>> TYPECASE hack 1.770 (see below)
>>>>
>>>> Some "poor man's profiling" (i.e., ctrl-C'ing in m3gdb) suggests
>>>> that
>>>> most of the time is spent either in threading code (this could just
>>>> be a lousy implementation in libc_r), the garbage collector, or in
>>>> "ScanTypecase".
>>>>
>>>> The only one of these routines I am qualified to do anything about
>>>> is
>>>> ScanTypecase. I don't know why the Critical Mass people... <how
>>>> colorful
>>>> language can one use on m3devel?>.. all over this code. I assume it
>>>> has
>>>> something to do with Java.
>>>>
>>>> The PM3 code (from SRC?) has this wonderful, concise, efficient bit:
>>>>
>>>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>>> VAR t := Get (b);
>>>> BEGIN
>>>> IF (a >= RT0u.nTypes) THEN BadType (a) END;
>>>> IF (a = 0) THEN RETURN TRUE END;
>>>> RETURN (t.typecode <= a AND a <= t.lastSubTypeTC);
>>>> END IsSubtype;
>>>>
>>>> replaced with the following absolute abomination in CM3:
>>>>
>>>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>>>> VAR t: RT0.TypeDefn;
>>>> BEGIN
>>>> IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
>>>> t := Get (a);
>>>> IF (t = NIL) THEN RETURN FALSE; END;
>>>> IF (t.typecode = b) THEN RETURN TRUE END;
>>>> WHILE (t.kind = ORD (TK.Obj)) DO
>>>> IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
>>>> t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
>>>> IF (t = NIL) THEN RETURN FALSE; END;
>>>> IF (t.typecode = b) THEN RETURN TRUE; END;
>>>> END;
>>>> IF (t.traced # 0)
>>>> THEN RETURN (b = RT0.RefanyTypecode);
>>>> ELSE RETURN (b = RT0.AddressTypecode);
>>>> END;
>>>> END IsSubtype;
>>>
>>> This is all to support dynamic loading of libraries.
>>>
>>>> Furthermore, CM3 has a hook for "ScanTypecase" that's missing
>>>> in PM3 (the older compiler actually generates code for this):
>>>>
>>>> PROCEDURE ScanTypecase (ref: REFANY;
>>>> x: ADDRESS(*ARRAY [0..] OF Cell*)):
>>>> INTEGER =
>>>> VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode;
>>>> BEGIN
>>>> IF (ref = NIL) THEN RETURN 0; END;
>>>> tc := TYPECODE (ref);
>>>> p := x; i := 0;
>>>> LOOP
>>>> IF (p.uid = 0) THEN RETURN i; END;
>>>> IF (p.defn = NIL) THEN
>>>> p.defn := FindType (p.uid);
>>>> IF (p.defn = NIL) THEN
>>>> Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>>> LOOPHOLE (p.uid, ADDRESS), NIL);
>>>> END;
>>>> END;
>>>> xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>>> IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END;
>>>> INC (p, ADRSIZE (p^)); INC (i);
>>>> END;
>>>> END ScanTypecase;
>>>>
>>>> Where to begin? A loop with all kinds of runtime checks of
>>>> properties
>>>> that are supposedly known at compile time? IsSubtype (itself a loop)
>>>> called from inside the loop?
>>>
>>> Not if dynamically loaded!
>>>
>>>> I was able to cut out almost all of the typecase activity from my
>>>> program by using the following code in RTType.m3, which depends on
>>>> the ADDRESS x never changing (well more specifically never being
>>>> the same for two TYPECASE statements):
>>>>
>>>> TYPE
>>>> TypeCaseResult = RECORD
>>>> x : ADDRESS;
>>>> tc : Typecode;
>>>> arm : INTEGER;
>>>> END;
>>>>
>>>> CONST
>>>> TCCachePow = 6;
>>>> TCCacheSize = Word.Shift(1,TCCachePow);
>>>> TCMask = TCCacheSize-1;
>>>>
>>>> VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult {
>>>> TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } ,
>>>> ..
>>>> };
>>>>
>>>> (*
>>>> VAR tcScans := 0; tcHits := 0; (* instrumenting counters *)
>>>> *)
>>>>
>>>> PROCEDURE ScanTypecase (ref: REFANY;
>>>> x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>>>> VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode;
>>>> BEGIN
>>>> tc := TYPECODE (ref);
>>>> IF (ref = NIL) THEN RETURN 0; END;
>>>>
>>>> WITH hash = Word.And(Word.Times(tc,
>>>>
>>>> Word.RightShift(LOOPHOLE(x,Word.T),2)),
>>>> TCMask),
>>>> entry = TCCache[hash] DO
>>>> (*INC(tcScans);*)
>>>> IF entry.x = x AND entry.tc = tc THEN
>>>> (*INC(tcHits);*)
>>>> RETURN entry.arm
>>>> END;
>>>>
>>>> p := x; i := 0;
>>>> LOOP
>>>> IF (p.uid = 0) THEN entry.x := x; entry.tc := tc;
>>>> entry.arm := i; RETURN i; END;
>>>> IF (p.defn = NIL) THEN
>>>> p.defn := FindType (p.uid);
>>>> IF (p.defn = NIL) THEN
>>>> Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>>>> LOOPHOLE (p.uid, ADDRESS), NIL);
>>>> END;
>>>> END;
>>>> xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>>>> IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x;
>>>> entry.tc := tc; entry.arm := i; RETURN i; END;
>>>> INC (p, ADRSIZE (p^)); INC (i);
>>>> END;
>>>> END;
>>>> END ScanTypecase;
>>>>
>>>> I'm guessing the speedup for TYPECASE itself is a factor of at least
>>>> ten. But it's still a pretty nasty hack. And there is still a lot
>>>> of IsSubtype activity from narrowing.
>>>>
>>>> I suppose that the way the typecodes are generated in CM3 is
>>>> sufficiently different (meant to be extended at runtime?) from how
>>>> it was done in PM3 that one can't really go back to the old code.
>>>> Cardelli's idea of keeping an array of parents up to ROOT plus a
>>>> "depth" for each type might have merit, though.
>>>>
>>>> To see if a is a subtype of b, something like:
>>>>
>>>> b = a.ancestors[a.depth-b.depth-1] (* with appropriate range
>>>> checks *)
>>>>
>>>> Would this be easy to put in? I'm not sure how one can be sure
>>>> that typecodes are done being generated? There's something called
>>>> RTTypeSRC.FinishObjectTypes ..
>>>>
>>>> And PM3 still generates code that's four times faster.
>>>>
>>>> Mika
>>>>
>
More information about the M3devel
mailing list