[M3devel] optimization [ was Re: Performance issues with CM3 ]
Mika Nystrom
mika at async.caltech.edu
Sun Apr 26 20:12:35 CEST 2009
Hi Tony,
I looked at this more closely, and I was wrong. The compiler doesn't
actually segfault on -O. I was using -gstabs+ but switched to -gstabs
after your email (doesn't seem to matter).
I get a ton of warnings at either optimization level, and there are
definitely bugs in the optimizer. The resulting code is generally
not correct. (By comparison, I had to turn off PM3's optimizer for
only one of the hundred or so packages I build.) Things often fail
to compile, even at -O.
At -O3, I get one segfault:
new source -> compiling TextCommandQueueTbl.i3
cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack exception handling
new source -> compiling CommandLoop.m3
cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack exception handling
../src/CommandLoop.m3: In function 'CommandLoop__Run':
../src/CommandLoop.m3:279: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
m3_backend => 4
m3cc (aka cm3cg) failed compiling: CommandLoop.mc
new source -> compiling CommandLoopDefaultCommand.m3
cm3cg: warning: -freorder-blocks disabled for Modula-3 ex_stack exception handling
new source -> compiling TextCommandTbl.m3
where:
272 (*****************************************************************************
273 * *
274 * Command Loop Main *
275 * *
276 *****************************************************************************)
277
278
279 PROCEDURE Run(self: T; source: Pathname.T := NIL; term: Term.T := NIL) =
280 CONST
281 Comment = SET OF CHAR{'%','#'};
282 VAR
283 completer := NEW(StdCompleter, loop:=self);
284 line: TEXT;
285 BEGIN
286 IF term = NIL THEN
287 self.term := Term.Default();
288 ELSE
289 self.term := term;
290 END;
291 LOOP
292 TRY
293 IF source # NIL THEN
294 DoLoad(self.load, TextList.List2("",source), self.term);
295 source := NIL;
...
Even at -O, things don't work right. Here's a typical output:
new source -> compiling PassiveArb1.m3
"../src/PassiveArb1.m3", line 68: warning: not used (e)
"../src/PassiveArb1.m3", line 45: warning: not used (newCon)
2 warnings encountered
../src/PassiveArb1.m3: In function 'PassiveArb1__FApply':
../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_buyO' might be clobbered by 'longjmp' or 'vfork'
../src/PassiveArb1.m3:81: warning: variable 'M3_Cwb5VA_selO' might be clobbered by 'longjmp' or 'vfork'
new source -> compiling PassiveArb2.i3
new source -> compiling ExecRecorder2.i3
new source -> compiling ArbPingPong.i3
new source -> compiling PassiveArb2.m3
../src/PassiveArb2.m3: In function 'PassiveArb2__Apply':
../src/PassiveArb2.m3:388: warning: variable 'M3_EWPD1K_delta' might be clobbered by 'longjmp' or 'vfork'
../src/PassiveArb2.m3:388: warning: variable 'M3_Cwb5VA_toExec' might be clobbered by 'longjmp' or 'vfork'
new source -> compiling Globals.i3
new source -> compiling ActiveArb1.i3
new source -> compiling ActiveArb1.m3
new source -> compiling ExecRecorder.i3
new source -> compiling ExecRecorder.m3
new source -> compiling ExecRec.i3
new source -> compiling ExecRecorder2.m3
new source -> compiling ExecRec.m3
new source -> compiling ArbPingPong.m3
new source -> compiling Main.m3
"../src/Main.m3", line 72: warning: potentially unhandled exception: OSError.E
"../src/Main.m3", line 30: warning: potentially unhandled exceptions: Rd.EndOfFile, Rd.Failure, Thread.Alerted
"../src/Main.m3", line 31: warning: potentially unhandled exceptions: Thread.Alerted, Wr.Failure
"../src/Main.m3", line 32: warning: potentially unhandled exceptions: Thread.Alerted, Wr.Failure
"../src/Main.m3", line 33: warning: potentially unhandled exceptions: Thread.Alerted, Wr.Failure
"../src/Main.m3", line 118: warning: potentially unhandled exception: OSError.E
"../src/Main.m3", line 204: warning: potentially unhandled exception: OSError.E
7 warnings encountered
-> linking testtrade
/usr/lib/libc.so: WARNING! setkey(3) not present in the system!
/usr/lib/libc.so: warning: this program uses gets(), which is unsafe.
/usr/lib/libc.so: warning: mktemp() possibly used unsafely; consider using mkstemp()
/usr/lib/libc.so: WARNING! des_setkey(3) not present in the system!
/usr/lib/libc.so: WARNING! encrypt(3) not present in the system!
/usr/lib/libc.so: warning: tmpnam() possibly used unsafely; consider using mkstemp()
/usr/lib/libc.so: warning: this program uses f_prealloc(), which is not recommended.
/usr/lib/libc.so: WARNING! des_cipher(3) not present in the system!
/usr/lib/libc.so: warning: tempnam() possibly used unsafely; consider using mkstemp()
Main.mo: In function `Main_M3':
../src/Main.m3:164: undefined reference to `Main__5__1__1__CanStart.198'
/home/mika/t-cm3/calarm/twslib/FreeBSD4/libtwslib.so: undefined reference to `TWSReader__RCApply__RD.332'
m3_link => 1
linker failed linking: testtrade
Fatal Error: package build failed
Mika
Tony Hosking writes:
>
>On 26 Apr 2009, at 15:22, Mika Nystrom wrote:
>
>>
>> Hello again,
>>
>> Now I've managed to get all the code up and running under CM3. I
>> found and committed fixes to a bug in Wx and some code in one of
>> the m3tk libraries that looked like it never was finished in the
>> first place.
>>
>> As I mentioned earlier, I wasn't able to get user threads working
>> in CM3 on FreeBSD 4.11. But with some help from Jay, I was able to
>> get things working with libc_r. Performance, unfortunately,
>> leaves something to be desired.
>>
>> For the first time I've been able to compare timings on identical
>> hardware between the PM3 I was using and the CM3 that's out now.
>>
>> Note that optimization doesn't seem to work..? (Not even -O, much
>> less -O3... the compiler segfaults.)
>
>Are you passing -gstabs? It should not segfault on -O3 - this is a
>regression if it does.
>
>> Here's what I get, using no optimization either in PM3 or CM3. The
>> test is my Scheme interpreter generating SQL and Modula-3 code
>> (a bit like the Hibernate system you can get for Java):
>>
>>
>> CPU seconds CM3 PM3
>> First version 5.269 1.366
>> Fewer NEWs 2.039 0.440 (code cleanup on my part)
>> TYPECASE hack 1.770 (see below)
>>
>> Some "poor man's profiling" (i.e., ctrl-C'ing in m3gdb) suggests that
>> most of the time is spent either in threading code (this could just
>> be a lousy implementation in libc_r), the garbage collector, or in
>> "ScanTypecase".
>>
>> The only one of these routines I am qualified to do anything about is
>> ScanTypecase. I don't know why the Critical Mass people... <how
>> colorful
>> language can one use on m3devel?>.. all over this code. I assume it
>> has
>> something to do with Java.
>>
>> The PM3 code (from SRC?) has this wonderful, concise, efficient bit:
>>
>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>> VAR t := Get (b);
>> BEGIN
>> IF (a >= RT0u.nTypes) THEN BadType (a) END;
>> IF (a = 0) THEN RETURN TRUE END;
>> RETURN (t.typecode <= a AND a <= t.lastSubTypeTC);
>> END IsSubtype;
>>
>> replaced with the following absolute abomination in CM3:
>>
>> PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
>> VAR t: RT0.TypeDefn;
>> BEGIN
>> IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
>> t := Get (a);
>> IF (t = NIL) THEN RETURN FALSE; END;
>> IF (t.typecode = b) THEN RETURN TRUE END;
>> WHILE (t.kind = ORD (TK.Obj)) DO
>> IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
>> t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
>> IF (t = NIL) THEN RETURN FALSE; END;
>> IF (t.typecode = b) THEN RETURN TRUE; END;
>> END;
>> IF (t.traced # 0)
>> THEN RETURN (b = RT0.RefanyTypecode);
>> ELSE RETURN (b = RT0.AddressTypecode);
>> END;
>> END IsSubtype;
>
>This is all to support dynamic loading of libraries.
>
>> Furthermore, CM3 has a hook for "ScanTypecase" that's missing
>> in PM3 (the older compiler actually generates code for this):
>>
>> PROCEDURE ScanTypecase (ref: REFANY;
>> x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>> VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode;
>> BEGIN
>> IF (ref = NIL) THEN RETURN 0; END;
>> tc := TYPECODE (ref);
>> p := x; i := 0;
>> LOOP
>> IF (p.uid = 0) THEN RETURN i; END;
>> IF (p.defn = NIL) THEN
>> p.defn := FindType (p.uid);
>> IF (p.defn = NIL) THEN
>> Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>> LOOPHOLE (p.uid, ADDRESS), NIL);
>> END;
>> END;
>> xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>> IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END;
>> INC (p, ADRSIZE (p^)); INC (i);
>> END;
>> END ScanTypecase;
>>
>> Where to begin? A loop with all kinds of runtime checks of properties
>> that are supposedly known at compile time? IsSubtype (itself a loop)
>> called from inside the loop?
>
>Not if dynamically loaded!
>
>> I was able to cut out almost all of the typecase activity from my
>> program by using the following code in RTType.m3, which depends on
>> the ADDRESS x never changing (well more specifically never being
>> the same for two TYPECASE statements):
>>
>> TYPE
>> TypeCaseResult = RECORD
>> x : ADDRESS;
>> tc : Typecode;
>> arm : INTEGER;
>> END;
>>
>> CONST
>> TCCachePow = 6;
>> TCCacheSize = Word.Shift(1,TCCachePow);
>> TCMask = TCCacheSize-1;
>>
>> VAR TCCache := ARRAY [0..TCCacheSize-1] OF TypeCaseResult {
>> TypeCaseResult { LOOPHOLE(0,ADDRESS), 0, -1 } ,
>> ..
>> };
>>
>> (*
>> VAR tcScans := 0; tcHits := 0; (* instrumenting counters *)
>> *)
>>
>> PROCEDURE ScanTypecase (ref: REFANY;
>> x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER =
>> VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode;
>> BEGIN
>> tc := TYPECODE (ref);
>> IF (ref = NIL) THEN RETURN 0; END;
>>
>> WITH hash = Word.And(Word.Times(tc,
>>
>> Word.RightShift(LOOPHOLE(x,Word.T),2)),
>> TCMask),
>> entry = TCCache[hash] DO
>> (*INC(tcScans);*)
>> IF entry.x = x AND entry.tc = tc THEN
>> (*INC(tcHits);*)
>> RETURN entry.arm
>> END;
>>
>> p := x; i := 0;
>> LOOP
>> IF (p.uid = 0) THEN entry.x := x; entry.tc := tc;
>> entry.arm := i; RETURN i; END;
>> IF (p.defn = NIL) THEN
>> p.defn := FindType (p.uid);
>> IF (p.defn = NIL) THEN
>> Fail (RTE.MissingType, RTModule.FromDataAddress(x),
>> LOOPHOLE (p.uid, ADDRESS), NIL);
>> END;
>> END;
>> xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode;
>> IF (tc = xc) OR IsSubtype (tc, xc) THEN entry.x := x;
>> entry.tc := tc; entry.arm := i; RETURN i; END;
>> INC (p, ADRSIZE (p^)); INC (i);
>> END;
>> END;
>> END ScanTypecase;
>>
>> I'm guessing the speedup for TYPECASE itself is a factor of at least
>> ten. But it's still a pretty nasty hack. And there is still a lot
>> of IsSubtype activity from narrowing.
>>
>> I suppose that the way the typecodes are generated in CM3 is
>> sufficiently different (meant to be extended at runtime?) from how
>> it was done in PM3 that one can't really go back to the old code.
>> Cardelli's idea of keeping an array of parents up to ROOT plus a
>> "depth" for each type might have merit, though.
>>
>> To see if a is a subtype of b, something like:
>>
>> b = a.ancestors[a.depth-b.depth-1] (* with appropriate range checks *)
>>
>> Would this be easy to put in? I'm not sure how one can be sure
>> that typecodes are done being generated? There's something called
>> RTTypeSRC.FinishObjectTypes ..
>>
>> And PM3 still generates code that's four times faster.
>>
>> Mika
>>
More information about the M3devel
mailing list