From jay.krell at cornell.edu Wed Oct 1 01:24:14 2008 From: jay.krell at cornell.edu (Jay) Date: Tue, 30 Sep 2008 23:24:14 +0000 Subject: [M3devel] ARM Darwin In-Reply-To: <7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org> References: <5ED8E753-6B9E-4FED-8689-1D3D317A5A36@cs.purdue.edu> <7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org> Message-ID: Get me a machine and I'll work on it. :) I'll get one before long but I'm bogged down with existing x86, AMD64, PPC, PPC64 (AIX), Mips (Irix) hardware not yet being used for all its meant.. I suspect Apple hasn't pushed their changes up, so be sure to poke around their gcc source. > Apple are building their own ARM GCC and use that to configure the > back end. Then the runtime issues which I imagine might be with the GC gcc -v ? > and threading. I'm not sure there will be any native treading and I'm > sure VM will look very different. I assume it'll look like most any Posix or *_DARWIN or 32bit thereof system. I assume it has pthreads. - Jay > From: darko at darko.org > To: hosking at cs.purdue.edu > Date: Tue, 30 Sep 2008 14:59:39 +0200 > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] ARM Darwin > > Thanks, it should be a bit easier than the normal process since the > compiler doesn't have to be fully bootstrapped, I just have to get a > cross working. I know the first thing is to get the machine > configuration correct, which I'll start when I get my hands on one of > the machines in a couple of days. The other thing is to work out how > Apple are building their own ARM GCC and use that to configure the > back end. Then the runtime issues which I imagine might be with the GC > and threading. I'm not sure there will be any native treading and I'm > sure VM will look very different. > > > On 30/09/2008, at 2:44 PM, Tony Hosking wrote: > >> I can share tips... >> >> On Sep 30, 2008, at 1:41 PM, Darko wrote: >> >>> Is anyone interested in working on an ARM port for Darwin? Or maybe >>> just providing some tips as I give it a try? >>> >>> Cheers, >>> Darko. >> > From jay.krell at cornell.edu Wed Oct 1 08:41:03 2008 From: jay.krell at cornell.edu (Jay) Date: Wed, 1 Oct 2008 06:41:03 +0000 Subject: [M3devel] AMD-64 binaries? In-Reply-To: <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu> References: <48BDF24B.900@wichita.edu> <20080903075804.zhep2ichmow00scg@mail.elegosoft.com> <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu> Message-ID: No -- you would know best about AMD64_DARWIN. I'm sure ALPHA_OSF used to work, but it's been so long, I don't think it counts. I'm being lazy. file AMD64_DARWIN/cm3cg => fat binary? I doubt it. => with ppc, i386, amd64? (doubt it) => or just ppc, i386? (doubt it) => or just i386? This is I "suspect". => or just AMD64. This would be somewhat interesting. I'm pretty sure cm3cg is always 32bit "these days". I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in the same way. This was a nice thing to find, that the problem is portable to multiple?all 64 bit hosts. I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same problem. Anyway, I should really get to debugging this soon. It's a bit odd because gcc itself doesn't have this bug and I reviewed a lot of the code and it was ok. I'm just going to have to step through it in parallel on 32bit and 64bit hosts and find where they diverge. A LOT was identical, like the files output by cm3 into cm3cg were identical. I was close a few months ago but sloughed off. - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Tue, 30 Sep 2008 10:16:41 +0100> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD-64 binaries?> > 64-bit hosted tools? Do you mean only for Linux? I don't quite > understand what you are saying.> > On Sep 30, 2008, at 9:36 AM, Jay wrote:> > >> > I'm getting back to this now.> > I didn't realize it till this weekend, but that archive is > > "relatively incompatible".> > In particular it has 32bit hosted tools, and won't run on Debian > > 4.0r4 / AMD64.> > Something about glibc 2.4, when all I see on my system is 2.3.> > I'll see what I can do.> > Probably just rebuild cm3cg.> > I think it was built on Fedora, but could have been Ubuntu or > > OpenSuse.> > Probably just that Debian stable lags the others.> >> > The main problem to debug is why 64bit hosted tools "never" work.> > (Right?)> >> >> > Stay tuned for a bunch more ports "soon", I've got a bunch more > > hardware,> > that runs Linux and others (Solaris, AIX, Irix).. :)> >> > I'll be able to debug the high dpi gui problems on a friend's laptop > > soon too.> > Send me a repro. I expect it is trivial -- like anything with a > > scrollbar.> > I can try formsedit, etc.> >> >> > - Jay> >> >> >> Date: Wed, 3 Sep 2008 07:58:04 +0200> >> From: wagner at elegosoft.com> >> To: m3devel at elegosoft.com> >> Subject: Re: [M3devel] AMD-64 binaries?> >>> >> Quoting "Rodney M. Bates" :> >>> >>> Are there binaries for AMD-64 around that can be used> >>> to bootstrap a 64-bit Linux compiler?> >>> >> Have a look at> >>> >> http://www.opencm3.net/uploaded-archives/index.html> >>> >> There are some AMD64 archives; I don't know about their status> >> offhand, though. I think Jay Krell produced them.> >> AFAIK there is no regular build on this platform yet.> >>> >> Olaf> >> --> >> Olaf Wagner -- elego Software Solutions GmbH> >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany> >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 > >> 45 86 95> >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: > >> Berlin> >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: > >> DE163214194> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Wed Oct 1 09:02:29 2008 From: jay.krell at cornell.edu (Jay) Date: Wed, 1 Oct 2008 07:02:29 +0000 Subject: [M3devel] m3cc build fails on older MacOS X In-Reply-To: <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org> References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com> <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org> Message-ID: well, I agree and disagree. "Almost everyone" only cares about C++, C#, Windows, and a little bit of Linux and Java. "Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc. Supporting 10.2 and 10.3 "ought not" be so difficult, but ok. I wiped out the install and won't likely come back to it until a bunch of other things are done. e.g.: debug 64 bit hosted cm3cg move PPC_LINUX to pthreads high dpi bring up or backup a bunch of targets I have hardware for, and some others I don't have yet. Adding back support for NT4/Win9x probably not hard, though similar with gcc on Mac, the current Microsoft tools no longer target them. It all gets easier with virtualization.. (Which is easiest on x86/amd64.) - Jay > From: darko at darko.org > To: hosking at cs.purdue.edu > Date: Tue, 30 Sep 2008 11:50:43 +0200 > CC: m3devel at elegosoft.com; jay.krell at cornell.edu > Subject: Re: [M3devel] m3cc build fails on older MacOS X > > I think supporting the latest version is enough work. I don't see the > point of supporting older releases. Also, this seems to be relevant to > development on that version of the system. Anyone who wants to build > can upgrade. > > > On 30/09/2008, at 11:15 AM, Tony Hosking wrote: > >> Does anyone really care about 10.3 now? As I recall, it had some >> pretty broken assumptions. >> >> On Sep 30, 2008, at 9:25 AM, Jay wrote: >> >>> >>> I have a machine running 10.3 now. >>> >>> gcc-4.3.2 (the current release) won't (toplevel) configure on >>> MacOSX 10.3 apparently because its assembler doesn't support >>> ".machine". >>> Current "cctools" won't compile on 10.3 without patches or other >>> updates, due to mucking with ppc64 stuff, though that is easy to fix. >>> >>> A simple wrapper around as for use on 10.3 that strips the .machine >>> directive is probably reasonable, or a patch to gcc to just not >>> emit it for Darwin, except maybe for non-ppc, or subject to a switch. >>> >>> Other than support for more architectures, I never found any of the >>> updates beyond 10.2 very interesting. >>> Though current Firefox and Safari also won't run on 10.3. >>> >>> IF I get this working, maybe I'll bring 10.2 back up also.. >>> >>> - Jay >>> >>> ________________________________ >>> >>> From: jayk123 at hotmail.com >>> To: wagner at elegosoft.com; m3devel at elegosoft.com >>> Subject: RE: [M3devel] m3cc build fails on older MacOS X >>> Date: Tue, 6 May 2008 10:49:11 +0000 >>> >>> >>> >>> >>> I don't know what these Darwin versions are. >>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5? >>> I used to run 10.2 and could perhaps bring it back (though I'd hate >>> to lose my PPC_LINUX install.. :( ) >>> >>>> make[2]: Nothing to be done for `all'. >>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>> `patsubst'. Stop. >>> >>> Hopefully that's enough context though. >>> >>> The rest is a cascade. >>> What happens if you remove all my m3makefile wierdness (which works >>> everywhere else..) and just configure and make? >>> >>> Can I ssh into this? >>> >>> - Jay >>> >>> >>> >>> ________________________________ >>> >>> >>>> Date: Tue, 6 May 2008 07:57:54 +0200 >>>> From: wagner at elegosoft.com >>>> To: m3devel at elegosoft.com >>>> Subject: [M3devel] m3cc build fails on older MacOS X >>>> >>>> On % uname -a >>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30 >>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power >>>> Macintosh powerpc: >>>> >>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o >>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o >>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./fnmatch.o >>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./getruntime.o >>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o >>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./obstack.o >>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o >>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o >>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o >>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o >>>> ./xstrndup.o> required-list >>>> make[2]: Nothing to be done for `all'. >>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>> `patsubst'. Stop. >>>> make: *** [all-libcpp] Error 2 >>>> /bin/sh: line 1: cd: gcc: No such file or directory >>>> make: *** No rule to make target `s-modes'. Stop. >>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314: quake >>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2 >>>> >>>> --procedure-- -line- -file--- >>>> cp_if -- >>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>> 9 >>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args >>>> >>>> Fatal Error: package build failed >>>> ==> m3-sys/m3cc done >>>> >>>> Any ideas? >>>> >>>> Olaf >>>> -- >>>> Olaf Wagner -- elego Software Solutions GmbH >>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany >>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 >>>> 45 86 95 >>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: >>>> Berlin >>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: >>>> DE163214194 >>>> >>> >> > From darko at darko.org Wed Oct 1 09:10:35 2008 From: darko at darko.org (Darko) Date: Wed, 1 Oct 2008 09:10:35 +0200 Subject: [M3devel] m3cc build fails on older MacOS X In-Reply-To: References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com> <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org> Message-ID: <973F196C-4B4A-4526-878C-93942E48E72A@darko.org> Why bother with it if no one uses it and no-one is going to use it? Supporting M3 on Macs is good because people will use it into the future. People aren't moving back to 10.3. I wouldn't bother with it at all. On 01/10/2008, at 9:02 AM, Jay wrote: > > well, I agree and disagree. > > "Almost everyone" only cares about C++, C#, Windows, and a little > bit of Linux and Java. > "Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc. > > Supporting 10.2 and 10.3 "ought not" be so difficult, but ok. > > I wiped out the install and won't likely come back to it until > a bunch of other things are done. > e.g.: > debug 64 bit hosted cm3cg > move PPC_LINUX to pthreads > high dpi > bring up or backup a bunch of targets I have hardware for, > and some others I don't have yet. > > Adding back support for NT4/Win9x probably not hard, though > similar with gcc on Mac, the current Microsoft tools no longer > target them. > > It all gets easier with virtualization.. > (Which is easiest on x86/amd64.) > > - Jay > > > >> From: darko at darko.org >> To: hosking at cs.purdue.edu >> Date: Tue, 30 Sep 2008 11:50:43 +0200 >> CC: m3devel at elegosoft.com; jay.krell at cornell.edu >> Subject: Re: [M3devel] m3cc build fails on older MacOS X >> >> I think supporting the latest version is enough work. I don't see the >> point of supporting older releases. Also, this seems to be relevant >> to >> development on that version of the system. Anyone who wants to build >> can upgrade. >> >> >> On 30/09/2008, at 11:15 AM, Tony Hosking wrote: >> >>> Does anyone really care about 10.3 now? As I recall, it had some >>> pretty broken assumptions. >>> >>> On Sep 30, 2008, at 9:25 AM, Jay wrote: >>> >>>> >>>> I have a machine running 10.3 now. >>>> >>>> gcc-4.3.2 (the current release) won't (toplevel) configure on >>>> MacOSX 10.3 apparently because its assembler doesn't support >>>> ".machine". >>>> Current "cctools" won't compile on 10.3 without patches or other >>>> updates, due to mucking with ppc64 stuff, though that is easy to >>>> fix. >>>> >>>> A simple wrapper around as for use on 10.3 that strips the .machine >>>> directive is probably reasonable, or a patch to gcc to just not >>>> emit it for Darwin, except maybe for non-ppc, or subject to a >>>> switch. >>>> >>>> Other than support for more architectures, I never found any of the >>>> updates beyond 10.2 very interesting. >>>> Though current Firefox and Safari also won't run on 10.3. >>>> >>>> IF I get this working, maybe I'll bring 10.2 back up also.. >>>> >>>> - Jay >>>> >>>> ________________________________ >>>> >>>> From: jayk123 at hotmail.com >>>> To: wagner at elegosoft.com; m3devel at elegosoft.com >>>> Subject: RE: [M3devel] m3cc build fails on older MacOS X >>>> Date: Tue, 6 May 2008 10:49:11 +0000 >>>> >>>> >>>> >>>> >>>> I don't know what these Darwin versions are. >>>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5? >>>> I used to run 10.2 and could perhaps bring it back (though I'd hate >>>> to lose my PPC_LINUX install.. :( ) >>>> >>>>> make[2]: Nothing to be done for `all'. >>>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>>> `patsubst'. Stop. >>>> >>>> Hopefully that's enough context though. >>>> >>>> The rest is a cascade. >>>> What happens if you remove all my m3makefile wierdness (which works >>>> everywhere else..) and just configure and make? >>>> >>>> Can I ssh into this? >>>> >>>> - Jay >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> >>>>> Date: Tue, 6 May 2008 07:57:54 +0200 >>>>> From: wagner at elegosoft.com >>>>> To: m3devel at elegosoft.com >>>>> Subject: [M3devel] m3cc build fails on older MacOS X >>>>> >>>>> On % uname -a >>>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30 >>>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power >>>>> Macintosh powerpc: >>>>> >>>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o >>>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o >>>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./ >>>>> fnmatch.o >>>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./ >>>>> getruntime.o >>>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o >>>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./ >>>>> obstack.o >>>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o >>>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o >>>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o >>>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o >>>>> ./xstrndup.o> required-list >>>>> make[2]: Nothing to be done for `all'. >>>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>>> `patsubst'. Stop. >>>>> make: *** [all-libcpp] Error 2 >>>>> /bin/sh: line 1: cd: gcc: No such file or directory >>>>> make: *** No rule to make target `s-modes'. Stop. >>>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314: >>>>> quake >>>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2 >>>>> >>>>> --procedure-- -line- -file--- >>>>> cp_if -- >>>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>>> 9 >>>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args >>>>> >>>>> Fatal Error: package build failed >>>>> ==> m3-sys/m3cc done >>>>> >>>>> Any ideas? >>>>> >>>>> Olaf >>>>> -- >>>>> Olaf Wagner -- elego Software Solutions GmbH >>>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany >>>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 >>>>> 45 86 95 >>>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: >>>>> Berlin >>>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: >>>>> DE163214194 >>>>> >>>> >>> >> From darko at darko.org Wed Oct 1 12:03:15 2008 From: darko at darko.org (Darko) Date: Wed, 1 Oct 2008 12:03:15 +0200 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu> Message-ID: I've extended one of the modules with a function that formats any allocated value for printing. If you're interested I can clean them up a little and post them. On 28/09/2008, at 8:01 AM, Darko wrote: > As far as I know, yes, they're not in the binary. I'd love to be > proven wrong though, or fix it so they did. I have a module that > reads the .M3WEB file and maps it to types and a module that will > read and write any field within a type safely using a numeric index. > Neither is perfect. You can integrate the two to get what you want > but I seem to remember having some problems mapping type ids (UIDs?) > to typecodes at runtime. > > > On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: > >> Right, I am aware of those interfaces.. just wondering what was >> out there. Do I really need to look at .M3WEB? I thought >> that m3gdb could figure out things without anything outside >> of the binary... >> >> I'm looking for essentially what m3gdb offers, say prints >> at minimum the name of the type (this I recall is trivial with >> some of the RT* interfaces) but hopefully also with field names >> and values, but doesn't expand references recursively.. something >> like that? >> >> Mika >> >> Darko writes: >>> You can use RTTipe to read the fields and values within a type. If >>> you >>> also want the type and field names you can interpret the .M3WEB >>> file. >>> I have a couple of modules that do something like that but they are >>> not what you would call finished. What level of detail are you >>> after? >>> >>> >>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> I am working on a writing an interpreter that I'd like to embed in >>>> various Modula-3 programs. It so happens that this interpreter >>>> might from time to time be manipulating arbitrary M3 REFs, and just >>>> from the point of view of providing information to a human user, >>>> it might be nice to be able to pretty-print these. Does anyone >>>> have any code that accomplishes this, at least partly? I'm >>>> thinking >>>> that since m3gdb can do it, the information must all be in the >>>> binary---somehow. (Even enumeration names, right?) And since the >>>> pickler can pickle things... hmm. >>>> >>>> I would greatly appreciate any guidance that's out there... >>>> >>>> Best regards, >>>> Mika Nystrom > From hosking at cs.purdue.edu Wed Oct 1 11:59:23 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Wed, 1 Oct 2008 10:59:23 +0100 Subject: [M3devel] AMD-64 binaries? In-Reply-To: References: <48BDF24B.900@wichita.edu> <20080903075804.zhep2ichmow00scg@mail.elegosoft.com> <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu> Message-ID: <26766FFA-C3B6-45D2-8156-80FD14922882@cs.purdue.edu> I can definitely vouch for ALPHA_OSF having worked as recently as two years ago, but without the pthreads native threading system. That port should have been easy enough I suspect. On Oct 1, 2008, at 7:41 AM, Jay wrote: > No -- you would know best about AMD64_DARWIN. > I'm sure ALPHA_OSF used to work, but it's been so long, I don't > think it counts. > > I'm being lazy. > > file AMD64_DARWIN/cm3cg > => fat binary? I doubt it. > => with ppc, i386, amd64? (doubt it) > => or just ppc, i386? (doubt it) > => or just i386? This is I "suspect". > => or just AMD64. This would be somewhat interesting. I believe that is how I configured it. > I'm pretty sure cm3cg is always 32bit "these days". Nope, cm3cg on AMD64_DARWIN is 64-bit. > I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in > the same way. > This was a nice thing to find, that the problem is portable to > multiple?all 64 bit hosts. > > I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same > problem. Don't thinks so. > Anyway, I should really get to debugging this soon. > > It's a bit odd because gcc itself doesn't have this bug and I > reviewed a lot of the code and it was ok. I'm just going to have to > step through it in parallel on 32bit and 64bit hosts and find where > they diverge. A LOT was identical, like the files output by cm3 into > cm3cg were identical. Yes, the intermediate code should be identical. Any such problems would be with cm3cg. > I was close a few months ago but sloughed off. Good luck. > > > - Jay > > > > From: hosking at cs.purdue.edu > > To: jay.krell at cornell.edu > > Date: Tue, 30 Sep 2008 10:16:41 +0100 > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] AMD-64 binaries? > > > > 64-bit hosted tools? Do you mean only for Linux? I don't quite > > understand what you are saying. > > > > On Sep 30, 2008, at 9:36 AM, Jay wrote: > > > > > > > > I'm getting back to this now. > > > I didn't realize it till this weekend, but that archive is > > > "relatively incompatible". > > > In particular it has 32bit hosted tools, and won't run on Debian > > > 4.0r4 / AMD64. > > > Something about glibc 2.4, when all I see on my system is 2.3. > > > I'll see what I can do. > > > Probably just rebuild cm3cg. > > > I think it was built on Fedora, but could have been Ubuntu or > > > OpenSuse. > > > Probably just that Debian stable lags the others. > > > > > > The main problem to debug is why 64bit hosted tools "never" work. > > > (Right?) > > > > > > > > > Stay tuned for a bunch more ports "soon", I've got a bunch more > > > hardware, > > > that runs Linux and others (Solaris, AIX, Irix).. :) > > > > > > I'll be able to debug the high dpi gui problems on a friend's > laptop > > > soon too. > > > Send me a repro. I expect it is trivial -- like anything with a > > > scrollbar. > > > I can try formsedit, etc. > > > > > > > > > - Jay > > > > > > > > >> Date: Wed, 3 Sep 2008 07:58:04 +0200 > > >> From: wagner at elegosoft.com > > >> To: m3devel at elegosoft.com > > >> Subject: Re: [M3devel] AMD-64 binaries? > > >> > > >> Quoting "Rodney M. Bates" : > > >> > > >>> Are there binaries for AMD-64 around that can be used > > >>> to bootstrap a 64-bit Linux compiler? > > >> > > >> Have a look at > > >> > > >> http://www.opencm3.net/uploaded-archives/index.html > > >> > > >> There are some AMD64 archives; I don't know about their status > > >> offhand, though. I think Jay Krell produced them. > > >> AFAIK there is no regular build on this platform yet. > > >> > > >> Olaf > > >> -- > > >> Olaf Wagner -- elego Software Solutions GmbH > > >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany > > >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 > > >> 45 86 95 > > >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: > > >> Berlin > > >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: > > >> DE163214194 > > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Wed Oct 1 12:07:00 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Wed, 1 Oct 2008 11:07:00 +0100 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu> Message-ID: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> m3gdb makes use of stabs debug information spat out by the backend. They are only in the binary if compiled -g. There are other ways to get what you are after, as Darko has observed. On Oct 1, 2008, at 11:03 AM, Darko wrote: > I've extended one of the modules with a function that formats any > allocated value for printing. If you're interested I can clean them > up a little and post them. > > > On 28/09/2008, at 8:01 AM, Darko wrote: > >> As far as I know, yes, they're not in the binary. I'd love to be >> proven wrong though, or fix it so they did. I have a module that >> reads the .M3WEB file and maps it to types and a module that will >> read and write any field within a type safely using a numeric >> index. Neither is perfect. You can integrate the two to get what >> you want but I seem to remember having some problems mapping type >> ids (UIDs?) to typecodes at runtime. >> >> >> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >> >>> Right, I am aware of those interfaces.. just wondering what was >>> out there. Do I really need to look at .M3WEB? I thought >>> that m3gdb could figure out things without anything outside >>> of the binary... >>> >>> I'm looking for essentially what m3gdb offers, say prints >>> at minimum the name of the type (this I recall is trivial with >>> some of the RT* interfaces) but hopefully also with field names >>> and values, but doesn't expand references recursively.. something >>> like that? >>> >>> Mika >>> >>> Darko writes: >>>> You can use RTTipe to read the fields and values within a type. >>>> If you >>>> also want the type and field names you can interpret the .M3WEB >>>> file. >>>> I have a couple of modules that do something like that but they are >>>> not what you would call finished. What level of detail are you >>>> after? >>>> >>>> >>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> I am working on a writing an interpreter that I'd like to embed in >>>>> various Modula-3 programs. It so happens that this interpreter >>>>> might from time to time be manipulating arbitrary M3 REFs, and >>>>> just >>>>> from the point of view of providing information to a human user, >>>>> it might be nice to be able to pretty-print these. Does anyone >>>>> have any code that accomplishes this, at least partly? I'm >>>>> thinking >>>>> that since m3gdb can do it, the information must all be in the >>>>> binary---somehow. (Even enumeration names, right?) And since the >>>>> pickler can pickle things... hmm. >>>>> >>>>> I would greatly appreciate any guidance that's out there... >>>>> >>>>> Best regards, >>>>> Mika Nystrom >> From darko at darko.org Wed Oct 1 12:35:09 2008 From: darko at darko.org (Darko) Date: Wed, 1 Oct 2008 12:35:09 +0200 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu> <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> Message-ID: Here's some info on the stabs format: http://www.cs.utah.edu/dept/old/texinfo/gdb/stabs_toc.html On 01/10/2008, at 12:07 PM, Tony Hosking wrote: > m3gdb makes use of stabs debug information spat out by the backend. > They are only in the binary if compiled -g. There are other ways to > get what you are after, as Darko has observed. > > On Oct 1, 2008, at 11:03 AM, Darko wrote: > >> I've extended one of the modules with a function that formats any >> allocated value for printing. If you're interested I can clean them >> up a little and post them. >> >> >> On 28/09/2008, at 8:01 AM, Darko wrote: >> >>> As far as I know, yes, they're not in the binary. I'd love to be >>> proven wrong though, or fix it so they did. I have a module that >>> reads the .M3WEB file and maps it to types and a module that will >>> read and write any field within a type safely using a numeric >>> index. Neither is perfect. You can integrate the two to get what >>> you want but I seem to remember having some problems mapping type >>> ids (UIDs?) to typecodes at runtime. >>> >>> >>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >>> >>>> Right, I am aware of those interfaces.. just wondering what was >>>> out there. Do I really need to look at .M3WEB? I thought >>>> that m3gdb could figure out things without anything outside >>>> of the binary... >>>> >>>> I'm looking for essentially what m3gdb offers, say prints >>>> at minimum the name of the type (this I recall is trivial with >>>> some of the RT* interfaces) but hopefully also with field names >>>> and values, but doesn't expand references recursively.. something >>>> like that? >>>> >>>> Mika >>>> >>>> Darko writes: >>>>> You can use RTTipe to read the fields and values within a type. >>>>> If you >>>>> also want the type and field names you can interpret the .M3WEB >>>>> file. >>>>> I have a couple of modules that do something like that but they >>>>> are >>>>> not what you would call finished. What level of detail are you >>>>> after? >>>>> >>>>> >>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>>> >>>>>> Hello Modula-3 people, >>>>>> >>>>>> I am working on a writing an interpreter that I'd like to embed >>>>>> in >>>>>> various Modula-3 programs. It so happens that this interpreter >>>>>> might from time to time be manipulating arbitrary M3 REFs, and >>>>>> just >>>>>> from the point of view of providing information to a human user, >>>>>> it might be nice to be able to pretty-print these. Does anyone >>>>>> have any code that accomplishes this, at least partly? I'm >>>>>> thinking >>>>>> that since m3gdb can do it, the information must all be in the >>>>>> binary---somehow. (Even enumeration names, right?) And since >>>>>> the >>>>>> pickler can pickle things... hmm. >>>>>> >>>>>> I would greatly appreciate any guidance that's out there... >>>>>> >>>>>> Best regards, >>>>>> Mika Nystrom >>> > From mika at async.caltech.edu Wed Oct 1 20:09:58 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Wed, 01 Oct 2008 11:09:58 -0700 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: Your message of "Wed, 01 Oct 2008 12:03:15 +0200." Message-ID: <200810011809.m91I9wxY087739@camembert.async.caltech.edu> Oh, I'd love to give it a try! I'm a little surprised no one has chimed in on the question of whether you really need .M3WEB... I could swear I can get good symbolic debugging with m3gdb on just a binary... Mika Darko writes: >I've extended one of the modules with a function that formats any >allocated value for printing. If you're interested I can clean them up >a little and post them. > > >On 28/09/2008, at 8:01 AM, Darko wrote: > >> As far as I know, yes, they're not in the binary. I'd love to be >> proven wrong though, or fix it so they did. I have a module that >> reads the .M3WEB file and maps it to types and a module that will >> read and write any field within a type safely using a numeric index. >> Neither is perfect. You can integrate the two to get what you want >> but I seem to remember having some problems mapping type ids (UIDs?) >> to typecodes at runtime. >> >> >> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >> >>> Right, I am aware of those interfaces.. just wondering what was >>> out there. Do I really need to look at .M3WEB? I thought >>> that m3gdb could figure out things without anything outside >>> of the binary... >>> >>> I'm looking for essentially what m3gdb offers, say prints >>> at minimum the name of the type (this I recall is trivial with >>> some of the RT* interfaces) but hopefully also with field names >>> and values, but doesn't expand references recursively.. something >>> like that? >>> >>> Mika >>> >>> Darko writes: >>>> You can use RTTipe to read the fields and values within a type. If >>>> you >>>> also want the type and field names you can interpret the .M3WEB >>>> file. >>>> I have a couple of modules that do something like that but they are >>>> not what you would call finished. What level of detail are you >>>> after? >>>> >>>> >>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> I am working on a writing an interpreter that I'd like to embed in >>>>> various Modula-3 programs. It so happens that this interpreter >>>>> might from time to time be manipulating arbitrary M3 REFs, and just >>>>> from the point of view of providing information to a human user, >>>>> it might be nice to be able to pretty-print these. Does anyone >>>>> have any code that accomplishes this, at least partly? I'm >>>>> thinking >>>>> that since m3gdb can do it, the information must all be in the >>>>> binary---somehow. (Even enumeration names, right?) And since the >>>>> pickler can pickle things... hmm. >>>>> >>>>> I would greatly appreciate any guidance that's out there... >>>>> >>>>> Best regards, >>>>> Mika Nystrom >> From mika at async.caltech.edu Wed Oct 1 20:10:38 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Wed, 01 Oct 2008 11:10:38 -0700 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: Your message of "Wed, 01 Oct 2008 11:07:00 BST." <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> Message-ID: <200810011810.m91IAcDW087832@camembert.async.caltech.edu> Ok, ignore my previous email :-) Tony Hosking writes: >m3gdb makes use of stabs debug information spat out by the backend. >They are only in the binary if compiled -g. There are other ways to >get what you are after, as Darko has observed. > >On Oct 1, 2008, at 11:03 AM, Darko wrote: > >> I've extended one of the modules with a function that formats any >> allocated value for printing. If you're interested I can clean them >> up a little and post them. >> >> >> On 28/09/2008, at 8:01 AM, Darko wrote: >> >>> As far as I know, yes, they're not in the binary. I'd love to be >>> proven wrong though, or fix it so they did. I have a module that >>> reads the .M3WEB file and maps it to types and a module that will >>> read and write any field within a type safely using a numeric >>> index. Neither is perfect. You can integrate the two to get what >>> you want but I seem to remember having some problems mapping type >>> ids (UIDs?) to typecodes at runtime. >>> >>> >>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >>> >>>> Right, I am aware of those interfaces.. just wondering what was >>>> out there. Do I really need to look at .M3WEB? I thought >>>> that m3gdb could figure out things without anything outside >>>> of the binary... >>>> >>>> I'm looking for essentially what m3gdb offers, say prints >>>> at minimum the name of the type (this I recall is trivial with >>>> some of the RT* interfaces) but hopefully also with field names >>>> and values, but doesn't expand references recursively.. something >>>> like that? >>>> >>>> Mika >>>> >>>> Darko writes: >>>>> You can use RTTipe to read the fields and values within a type. >>>>> If you >>>>> also want the type and field names you can interpret the .M3WEB >>>>> file. >>>>> I have a couple of modules that do something like that but they are >>>>> not what you would call finished. What level of detail are you >>>>> after? >>>>> >>>>> >>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>>> >>>>>> Hello Modula-3 people, >>>>>> >>>>>> I am working on a writing an interpreter that I'd like to embed in >>>>>> various Modula-3 programs. It so happens that this interpreter >>>>>> might from time to time be manipulating arbitrary M3 REFs, and >>>>>> just >>>>>> from the point of view of providing information to a human user, >>>>>> it might be nice to be able to pretty-print these. Does anyone >>>>>> have any code that accomplishes this, at least partly? I'm >>>>>> thinking >>>>>> that since m3gdb can do it, the information must all be in the >>>>>> binary---somehow. (Even enumeration names, right?) And since the >>>>>> pickler can pickle things... hmm. >>>>>> >>>>>> I would greatly appreciate any guidance that's out there... >>>>>> >>>>>> Best regards, >>>>>> Mika Nystrom >>> From jay.krell at cornell.edu Sun Oct 12 11:51:03 2008 From: jay.krell at cornell.edu (Jay) Date: Sun, 12 Oct 2008 09:51:03 +0000 Subject: [M3devel] a bunch of new/old platform names? Message-ID: I plan on soon bringing "back" some old ports -- building current archives -- and bring up some new ports. Specifically I have hardware: RS/6000 (PPC64/AIX), SGI (MIPS), SPARC64, plus the usual x86/AMD64. Two of the platforms did exist. In particular, "MIPS_IRIX" is "IRIX5". Reuse IRIX5, or introduce MIPS_IRIX? PPC_AIX is IBMR2 or such. Same question. Also, must versions really be in platform names? I'm loathe to add a third dimension to the matrix. I did just note that FreeBSD 7.0 64 bit is ABI-incompatible with FreeBSD 6.3 64 bit, lame. SGI claims good ABI across all the 6.5 releases, which is all there will be now. IBM claims good 32 bit ABI compat across AIX 4.x - 6.x and good 64 bit ABI compat across 5.x and 6.x, but incompatibility from 64 bit 4.x. (Microsoft has always been good here, but "behavioral" compat is the actual tricky issue.) And, what do folks think about putting "32" in new 32 bit platform names? I'm considering the following: MIPS32_{IRIX,LINUX,OPENBSD,NETBSD} MIPS64_IRIX (6.5) SPARC{32,64}_{LINUX,*BSD}(probably no SPARC32_*BSD actually, and SPARC32_LINUX is already in, but not building regularly) {SPARC64,I386,AMD64}_SOLARIS PPC{32,64}_AIX (PPC64_LINUX is blocked, Linux has problems booting on the hardware and I have no Mac G5 yet). AMD64_*BSD Also, maybe some of the code should be restructured to separate processor from OS? That might be primarily only pointer size. Any interest in "x86" instead of "I386"? If I make good progress against those 18 (!), I can see about PPC64_DARWIN, HPPA_*, IA64_*, ALPHA_*, ARM_*, which I lack hardware for. PPC_LINUX also should be converted to pthreads imho. Mostly this is all just a matter of installing the OS and configuring gcc. And, yeah, I have the two m3cgs stepping side by side to find the problem there, and will have use of a high dpi Windows laptop for that other problem.. And then of course, if the vast majority of platforms are named like that, there might be pressure to bring the rest in line. :) I386_{NT,LINUX,*BSD,CYGWIN,MINGWIN} - Jay From mika at async.caltech.edu Fri Oct 17 00:32:39 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 15:32:39 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? Message-ID: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu> Hello Modula-3 people, As I mentioned in an earlier email about printing structures (thanks Darko), I'm in the midst of coding an interpreter embedded in Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's JScheme for Java (well it was at first strongly based, but more and more loosely, if you know what I mean...) I expected that the performance of the interpreter would be much better in Modula-3 than in Java, and I have been testing on two different systems. One is my ancient FreeBSD-4.11 with an old PM3, and the other is CM3 on a recent Debian system. What I am finding is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting close to ten times as fast on some tasks at this point), but on Linux/CM3 it is much closer in speed to JScheme than I would like. When I started, with code that was essentially equivalent to JScheme, I found that it was a bit slower than JScheme on Linux/CM3 and possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to spend most of its time in (surprise, surprise!) memory allocation and garbage collection. The speedup I have achieved between the first implementation and now was due to the use of Modula-3 constructs that are superior to Java's, such as the use of arrays of RECORDs to make small stacks rather than linked lists. (I get readable code with much fewer memory allocations and GC work.) Now, since this is an interpreter, I as the implementer have limited control over how much memory is allocated and freed, and where it is needed. However, I can sometimes fall back on C-style memory management, but I would like to do it in a safe way. For instance, I have special-cased evaluation of Scheme primitives, as follows. Under the "normal" implementation, a list of things to evaluate is built up, passed to an evaluation function, and then the GC is left to sweep up the mess. The problem is that there are various tricky routes by which references can escape the evaluator, so you can't just assume that what you put in is going to be dead right after an eval and free it. Instead, I set a flag in the evaluator, which is TRUE if it is OK to free the list after the eval and FALSE if it's unclear (in which case the problem is left up to the GC). For the vast majority of Scheme primitives, one can indeed free the list right after the eval. Now of course I am not interested in unsafe code, so what I do is this: TYPE Pair = OBJECT first, rest : REFANY; END; VAR mu := NEW(MUTEX); free : Pair := NIL; PROCEDURE GetPair() : Pair = BEGIN LOCK mu DO IF free # NIL THEN TRY RETURN free FINALLY free := free.rest END END END; RETURN NEW(Pair) END GetPair; PROCEDURE ReturnPair(cons : Pair) = BEGIN cons.first := NIL; LOCK mu DO cons.rest := free; free := cons END END ReturnPair; my eval code looks like VAR okToFree : BOOLEAN; BEGIN args := GetPair(); ... result := EvalPrimitive(args, (*VAR OUT*) okToFree); IF okToFree THEN ReturnPair(args) END; RETURN result END and this does work well. In fact it speeds up the Linux implementation by almost 100% to recycle the lists like this *just* for the evaluation of Scheme primitives. But it's still ugly, isn't it? There's a mutex, and a global variable. And yes, the time spent messing with the mutex is noticeable, and I haven't even made the code multi-threaded yet (and that is coming!) So I'm thinking, what I really want is a structure that is attached to my current Thread.T. I want to be able to access just a single pointer (like the free list) but be sure it is unique to my current thread. No locking would be necessary if I could do this. Does anyone have an elegant solution that does something like this? Thread-specific "static" variables? Just one REFANY would be enough for a lot of uses... seems to me this should be a frequently occurring problem? Best regards, Mika From hosking at cs.purdue.edu Fri Oct 17 00:54:51 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Thu, 16 Oct 2008 23:54:51 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu> References: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu> Message-ID: Have you tried running @M3noincremental? On 16 Oct 2008, at 23:32, Mika Nystrom wrote: > Hello Modula-3 people, > > As I mentioned in an earlier email about printing structures (thanks > Darko), I'm in the midst of coding an interpreter embedded in > Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's > JScheme for Java (well it was at first strongly based, but more and > more loosely, if you know what I mean...) > > I expected that the performance of the interpreter would be much > better in Modula-3 than in Java, and I have been testing on two > different systems. One is my ancient FreeBSD-4.11 with an old PM3, > and the other is CM3 on a recent Debian system. What I am finding > is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting > close to ten times as fast on some tasks at this point), but on > Linux/CM3 it is much closer in speed to JScheme than I would like. > > When I started, with code that was essentially equivalent to JScheme, > I found that it was a bit slower than JScheme on Linux/CM3 and > possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to > spend most of its time in (surprise, surprise!) memory allocation > and garbage collection. The speedup I have achieved between the > first implementation and now was due to the use of Modula-3 constructs > that are superior to Java's, such as the use of arrays of RECORDs > to make small stacks rather than linked lists. (I get readable > code with much fewer memory allocations and GC work.) > > Now, since this is an interpreter, I as the implementer have limited > control over how much memory is allocated and freed, and where it is > needed. However, I can sometimes fall back on C-style memory > management, > but I would like to do it in a safe way. For instance, I have > special-cased > evaluation of Scheme primitives, as follows. > > Under the "normal" implementation, a list of things to evaluate is > built up, passed to an evaluation function, and then the GC is left > to sweep up the mess. The problem is that there are various tricky > routes by which references can escape the evaluator, so you can't > just assume that what you put in is going to be dead right after > an eval and free it. Instead, I set a flag in the evaluator, which > is TRUE if it is OK to free the list after the eval and FALSE if > it's unclear (in which case the problem is left up to the GC). > > For the vast majority of Scheme primitives, one can indeed free the > list right after the eval. Now of course I am not interested > in unsafe code, so what I do is this: > > TYPE Pair = OBJECT first, rest : REFANY; END; > > VAR > mu := NEW(MUTEX); > free : Pair := NIL; > > PROCEDURE GetPair() : Pair = > BEGIN > LOCK mu DO > IF free # NIL THEN > TRY > RETURN free > FINALLY > free := free.rest > END > END > END; > RETURN NEW(Pair) > END GetPair; > > PROCEDURE ReturnPair(cons : Pair) = > BEGIN > cons.first := NIL; > LOCK mu DO > cons.rest := free; > free := cons > END > END ReturnPair; > > my eval code looks like > > VAR okToFree : BOOLEAN; BEGIN > > args := GetPair(); ... > result := EvalPrimitive(args, (*VAR OUT*) okToFree); > > IF okToFree THEN ReturnPair(args) END; > RETURN result > END > > and this does work well. In fact it speeds up the Linux > implementation > by almost 100% to recycle the lists like this *just* for the > evaluation of Scheme primitives. > > But it's still ugly, isn't it? There's a mutex, and a global > variable. And yes, the time spent messing with the mutex is > noticeable, and I haven't even made the code multi-threaded yet > (and that is coming!) > > So I'm thinking, what I really want is a structure that is attached > to my current Thread.T. I want to be able to access just a single > pointer (like the free list) but be sure it is unique to my current > thread. No locking would be necessary if I could do this. > > Does anyone have an elegant solution that does something like this? > Thread-specific "static" variables? Just one REFANY would be enough > for a lot of uses... seems to me this should be a frequently > occurring problem? > > Best regards, > Mika > > > > > > From mika at async.caltech.edu Fri Oct 17 01:30:01 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 16:30:01 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Thu, 16 Oct 2008 23:54:51 BST." Message-ID: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> Hi Tony, I figured you would chime in! Yes, @M3noincremental seems to make things consistently a tad bit slower (but a very small difference), on both FreeBSD and Linux. @M3nogc makes a bigger difference, of course. Unfortunately I seem to have lost the code that did a lot of memory allocations. My tricks (as described in the email---and others!) have removed most of the troublesome memory allocations, but now I'm stuck with the mutex instead... Mika Tony Hosking writes: >Have you tried running @M3noincremental? > >On 16 Oct 2008, at 23:32, Mika Nystrom wrote: > >> Hello Modula-3 people, >> >> As I mentioned in an earlier email about printing structures (thanks >> Darko), I'm in the midst of coding an interpreter embedded in >> Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's >> JScheme for Java (well it was at first strongly based, but more and >> more loosely, if you know what I mean...) >> >> I expected that the performance of the interpreter would be much >> better in Modula-3 than in Java, and I have been testing on two >> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >> and the other is CM3 on a recent Debian system. What I am finding >> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting >> close to ten times as fast on some tasks at this point), but on >> Linux/CM3 it is much closer in speed to JScheme than I would like. >> >> When I started, with code that was essentially equivalent to JScheme, >> I found that it was a bit slower than JScheme on Linux/CM3 and >> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >> spend most of its time in (surprise, surprise!) memory allocation >> and garbage collection. The speedup I have achieved between the >> first implementation and now was due to the use of Modula-3 constructs >> that are superior to Java's, such as the use of arrays of RECORDs >> to make small stacks rather than linked lists. (I get readable >> code with much fewer memory allocations and GC work.) >> >> Now, since this is an interpreter, I as the implementer have limited >> control over how much memory is allocated and freed, and where it is >> needed. However, I can sometimes fall back on C-style memory >> management, >> but I would like to do it in a safe way. For instance, I have >> special-cased >> evaluation of Scheme primitives, as follows. >> >> Under the "normal" implementation, a list of things to evaluate is >> built up, passed to an evaluation function, and then the GC is left >> to sweep up the mess. The problem is that there are various tricky >> routes by which references can escape the evaluator, so you can't >> just assume that what you put in is going to be dead right after >> an eval and free it. Instead, I set a flag in the evaluator, which >> is TRUE if it is OK to free the list after the eval and FALSE if >> it's unclear (in which case the problem is left up to the GC). >> >> For the vast majority of Scheme primitives, one can indeed free the >> list right after the eval. Now of course I am not interested >> in unsafe code, so what I do is this: >> >> TYPE Pair = OBJECT first, rest : REFANY; END; >> >> VAR >> mu := NEW(MUTEX); >> free : Pair := NIL; >> >> PROCEDURE GetPair() : Pair = >> BEGIN >> LOCK mu DO >> IF free # NIL THEN >> TRY >> RETURN free >> FINALLY >> free := free.rest >> END >> END >> END; >> RETURN NEW(Pair) >> END GetPair; >> >> PROCEDURE ReturnPair(cons : Pair) = >> BEGIN >> cons.first := NIL; >> LOCK mu DO >> cons.rest := free; >> free := cons >> END >> END ReturnPair; >> >> my eval code looks like >> >> VAR okToFree : BOOLEAN; BEGIN >> >> args := GetPair(); ... >> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >> >> IF okToFree THEN ReturnPair(args) END; >> RETURN result >> END >> >> and this does work well. In fact it speeds up the Linux >> implementation >> by almost 100% to recycle the lists like this *just* for the >> evaluation of Scheme primitives. >> >> But it's still ugly, isn't it? There's a mutex, and a global >> variable. And yes, the time spent messing with the mutex is >> noticeable, and I haven't even made the code multi-threaded yet >> (and that is coming!) >> >> So I'm thinking, what I really want is a structure that is attached >> to my current Thread.T. I want to be able to access just a single >> pointer (like the free list) but be sure it is unique to my current >> thread. No locking would be necessary if I could do this. >> >> Does anyone have an elegant solution that does something like this? >> Thread-specific "static" variables? Just one REFANY would be enough >> for a lot of uses... seems to me this should be a frequently >> occurring problem? >> >> Best regards, >> Mika >> >> >> >> >> >> From jay.krell at cornell.edu Fri Oct 17 06:40:28 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 17 Oct 2008 04:40:28 +0000 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> References: Your message of <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> Message-ID: Making this per-thread is a fairly classic good improvement. You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and allowing for a free to come in from any thread. A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Including an array of mutexes. If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a global counter that gets assigned into the thread on first use per-thread. The cache could also be more than one element. How do you manage okToFree? Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be. - Jay ---------------------------------------- > To: hosking at cs.purdue.edu > Date: Thu, 16 Oct 2008 16:30:01 -0700 > From: mika at async.caltech.edu > CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu > Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? > > Hi Tony, > > I figured you would chime in! > > Yes, @M3noincremental seems to make things consistently a tad bit > slower (but a very small difference), on both FreeBSD and Linux. > @M3nogc makes a bigger difference, of course. > > Unfortunately I seem to have lost the code that did a lot of memory > allocations. My tricks (as described in the email---and others!) > have removed most of the troublesome memory allocations, but now > I'm stuck with the mutex instead... > > Mika > > Tony Hosking writes: >>Have you tried running @M3noincremental? >> >>On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >> >>> Hello Modula-3 people, >>> >>> As I mentioned in an earlier email about printing structures (thanks >>> Darko), I'm in the midst of coding an interpreter embedded in >>> Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's >>> JScheme for Java (well it was at first strongly based, but more and >>> more loosely, if you know what I mean...) >>> >>> I expected that the performance of the interpreter would be much >>> better in Modula-3 than in Java, and I have been testing on two >>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>> and the other is CM3 on a recent Debian system. What I am finding >>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting >>> close to ten times as fast on some tasks at this point), but on >>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>> >>> When I started, with code that was essentially equivalent to JScheme, >>> I found that it was a bit slower than JScheme on Linux/CM3 and >>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>> spend most of its time in (surprise, surprise!) memory allocation >>> and garbage collection. The speedup I have achieved between the >>> first implementation and now was due to the use of Modula-3 constructs >>> that are superior to Java's, such as the use of arrays of RECORDs >>> to make small stacks rather than linked lists. (I get readable >>> code with much fewer memory allocations and GC work.) >>> >>> Now, since this is an interpreter, I as the implementer have limited >>> control over how much memory is allocated and freed, and where it is >>> needed. However, I can sometimes fall back on C-style memory >>> management, >>> but I would like to do it in a safe way. For instance, I have >>> special-cased >>> evaluation of Scheme primitives, as follows. >>> >>> Under the "normal" implementation, a list of things to evaluate is >>> built up, passed to an evaluation function, and then the GC is left >>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>> just assume that what you put in is going to be dead right after >>> an eval and free it. Instead, I set a flag in the evaluator, which >>> is TRUE if it is OK to free the list after the eval and FALSE if >>> it's unclear (in which case the problem is left up to the GC). >>> >>> For the vast majority of Scheme primitives, one can indeed free the >>> list right after the eval. Now of course I am not interested >>> in unsafe code, so what I do is this: >>> >>> TYPE Pair = OBJECT first, rest : REFANY; END; >>> >>> VAR >>> mu := NEW(MUTEX); >>> free : Pair := NIL; >>> >>> PROCEDURE GetPair() : Pair = >>> BEGIN >>> LOCK mu DO >>> IF free # NIL THEN >>> TRY >>> RETURN free >>> FINALLY >>> free := free.rest >>> END >>> END >>> END; >>> RETURN NEW(Pair) >>> END GetPair; >>> >>> PROCEDURE ReturnPair(cons : Pair) = >>> BEGIN >>> cons.first := NIL; >>> LOCK mu DO >>> cons.rest := free; >>> free := cons >>> END >>> END ReturnPair; >>> >>> my eval code looks like >>> >>> VAR okToFree : BOOLEAN; BEGIN >>> >>> args := GetPair(); ... >>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>> >>> IF okToFree THEN ReturnPair(args) END; >>> RETURN result >>> END >>> >>> and this does work well. In fact it speeds up the Linux >>> implementation >>> by almost 100% to recycle the lists like this *just* for the >>> evaluation of Scheme primitives. >>> >>> But it's still ugly, isn't it? There's a mutex, and a global >>> variable. And yes, the time spent messing with the mutex is >>> noticeable, and I haven't even made the code multi-threaded yet >>> (and that is coming!) >>> >>> So I'm thinking, what I really want is a structure that is attached >>> to my current Thread.T. I want to be able to access just a single >>> pointer (like the free list) but be sure it is unique to my current >>> thread. No locking would be necessary if I could do this. >>> >>> Does anyone have an elegant solution that does something like this? >>> Thread-specific "static" variables? Just one REFANY would be enough >>> for a lot of uses... seems to me this should be a frequently >>> occurring problem? >>> >>> Best regards, >>> Mika >>> >>> >>> >>> >>> >>> From mika at async.caltech.edu Fri Oct 17 08:32:15 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 23:32:15 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000." Message-ID: <200810170632.m9H6WFHd078061@camembert.async.caltech.edu> Well, I was thinking of something even simpler. A Thread.T is an OBJECT. It's garbage collected just like any other object, is it not? Why can't the thing that makes new threads simply include a single globally visible field in every Thread.T, of type REFANY? Call it "data". Then you can always manipulate Thread.Self().data as you see fit without any need for locks. There can be no problem with this as long as it is always manipulated from within that thread. Of course this can be trivially encapsulated by not revealing "data" and indeed always accessing it as Thread.Self().data. You would not normally access this from any other thread. It's indeed only meant to be used in the idiom x := Allocate(); TRY DoSomething(x) FINALLY Free(x) END It's also not really a "Free" but just returning the object to a free list (there can be no unsafe behavior here). As a "nicer" interface, one could register routines with a public interface, asking it to manufacture some kind of thread globals. For maximum sanity, they would be visible inside the MODULE that requested them, but I'm not sure how to accomplish this. And of course there's not much point in any of this unless it can be made efficient or else a mutex plus a true global will work just as well. What I'm talking about I guess could be done by hacking up Thread.Fork() to return a subtype of Thread.T, but that won't work for the first thread. But with this method you could have arbitrary fields (and methods) attached to a Thread.T. How to collect everything you need is a different story... I'm not asking for a new language feature... really was just wondering if anyone had tried anything like this before, and now am rambling a bit. Mika Jay writes: > >Making this per-thread is a fairly classic good improvement. > >You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and a >llowing for a free to come in from any thread. > >A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Includi >ng an array of mutexes. > >If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a glo >bal counter that gets assigned into the thread on first use per-thread. > >The cache could also be more than one element. > >How do you manage okToFree? > >Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work > with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be. > > - Jay > >---------------------------------------- >> To: hosking at cs.purdue.edu >> Date: Thu, 16 Oct 2008 16:30:01 -0700 >> From: mika at async.caltech.edu >> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu >> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? >> >> Hi Tony, >> >> I figured you would chime in! >> >> Yes, @M3noincremental seems to make things consistently a tad bit >> slower (but a very small difference), on both FreeBSD and Linux. >> @M3nogc makes a bigger difference, of course. >> >> Unfortunately I seem to have lost the code that did a lot of memory >> allocations. My tricks (as described in the email---and others!) >> have removed most of the troublesome memory allocations, but now >> I'm stuck with the mutex instead... >> >> Mika >> >> Tony Hosking writes: >>>Have you tried running @M3noincremental? >>> >>>On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> As I mentioned in an earlier email about printing structures (thanks >>>> Darko), I'm in the midst of coding an interpreter embedded in >>>> Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's >>>> JScheme for Java (well it was at first strongly based, but more and >>>> more loosely, if you know what I mean...) >>>> >>>> I expected that the performance of the interpreter would be much >>>> better in Modula-3 than in Java, and I have been testing on two >>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>> and the other is CM3 on a recent Debian system. What I am finding >>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting >>> close to ten times as fast on some tasks at this point), but on >>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>> >>>> When I started, with code that was essentially equivalent to JScheme, >>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>> spend most of its time in (surprise, surprise!) memory allocation >>>> and garbage collection. The speedup I have achieved between the >>>> first implementation and now was due to the use of Modula-3 constructs >>>> that are superior to Java's, such as the use of arrays of RECORDs >>>> to make small stacks rather than linked lists. (I get readable >>>> code with much fewer memory allocations and GC work.) >>>> >>>> Now, since this is an interpreter, I as the implementer have limited >>>> control over how much memory is allocated and freed, and where it is >>>> needed. However, I can sometimes fall back on C-style memory >>>> management, >>>> but I would like to do it in a safe way. For instance, I have >>>> special-cased >>>> evaluation of Scheme primitives, as follows. >>>> >>>> Under the "normal" implementation, a list of things to evaluate is >>>> built up, passed to an evaluation function, and then the GC is left >>>> to sweep up the mess. The problem is that there are various tricky >>>> routes by which references can escape the evaluator, so you can't >>>> just assume that what you put in is going to be dead right after >>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>> it's unclear (in which case the problem is left up to the GC). >>>> >>>> For the vast majority of Scheme primitives, one can indeed free the >>>> list right after the eval. Now of course I am not interested >>>> in unsafe code, so what I do is this: >>>> >>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>> >>>> VAR >>>> mu := NEW(MUTEX); >>>> free : Pair := NIL; >>>> >>>> PROCEDURE GetPair() : Pair = >>>> BEGIN >>>> LOCK mu DO >>>> IF free # NIL THEN >>>> TRY >>>> RETURN free >>>> FINALLY >>>> free := free.rest >>>> END >>>> END >>>> END; >>>> RETURN NEW(Pair) >>>> END GetPair; >>>> >>>> PROCEDURE ReturnPair(cons : Pair) = >>>> BEGIN >>>> cons.first := NIL; >>>> LOCK mu DO >>>> cons.rest := free; >>>> free := cons >>>> END >>>> END ReturnPair; >>>> >>>> my eval code looks like >>>> >>>> VAR okToFree : BOOLEAN; BEGIN >>>> >>>> args := GetPair(); ... >>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>> >>>> IF okToFree THEN ReturnPair(args) END; >>>> RETURN result >>>> END >>>> >>>> and this does work well. In fact it speeds up the Linux >>>> implementation >>>> by almost 100% to recycle the lists like this *just* for the >>>> evaluation of Scheme primitives. >>>> >>>> But it's still ugly, isn't it? There's a mutex, and a global >>>> variable. And yes, the time spent messing with the mutex is >>>> noticeable, and I haven't even made the code multi-threaded yet >>>> (and that is coming!) >>>> >>>> So I'm thinking, what I really want is a structure that is attached >>>> to my current Thread.T. I want to be able to access just a single >>>> pointer (like the free list) but be sure it is unique to my current >>>> thread. No locking would be necessary if I could do this. >>>> >>>> Does anyone have an elegant solution that does something like this? >>>> Thread-specific "static" variables? Just one REFANY would be enough >>>> for a lot of uses... seems to me this should be a frequently >>>> occurring problem? >>>> >>>> Best regards, >>>> Mika >>>> >>>> >>>> >>>> >>>> >>>> From hosking at cs.purdue.edu Fri Oct 17 08:35:03 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 17 Oct 2008 07:35:03 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> References: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> Message-ID: <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> I suspect part of the overhead of allocation in the new code is the need for thread-local allocation buffers, which means we need to access thread-local state. We really need an efficient way to do that, but pthreads thread-local accesses may be what is killing you. On 17 Oct 2008, at 00:30, Mika Nystrom wrote: > Hi Tony, > > I figured you would chime in! > > Yes, @M3noincremental seems to make things consistently a tad bit > slower (but a very small difference), on both FreeBSD and Linux. > @M3nogc makes a bigger difference, of course. > > Unfortunately I seem to have lost the code that did a lot of memory > allocations. My tricks (as described in the email---and others!) > have removed most of the troublesome memory allocations, but now > I'm stuck with the mutex instead... > > Mika > > Tony Hosking writes: >> Have you tried running @M3noincremental? >> >> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >> >>> Hello Modula-3 people, >>> >>> As I mentioned in an earlier email about printing structures (thanks >>> Darko), I'm in the midst of coding an interpreter embedded in >>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>> Norvig's >>> JScheme for Java (well it was at first strongly based, but more and >>> more loosely, if you know what I mean...) >>> >>> I expected that the performance of the interpreter would be much >>> better in Modula-3 than in Java, and I have been testing on two >>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>> and the other is CM3 on a recent Debian system. What I am finding >>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>> (getting >>> close to ten times as fast on some tasks at this point), but on >>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>> >>> When I started, with code that was essentially equivalent to >>> JScheme, >>> I found that it was a bit slower than JScheme on Linux/CM3 and >>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>> spend most of its time in (surprise, surprise!) memory allocation >>> and garbage collection. The speedup I have achieved between the >>> first implementation and now was due to the use of Modula-3 >>> constructs >>> that are superior to Java's, such as the use of arrays of RECORDs >>> to make small stacks rather than linked lists. (I get readable >>> code with much fewer memory allocations and GC work.) >>> >>> Now, since this is an interpreter, I as the implementer have limited >>> control over how much memory is allocated and freed, and where it is >>> needed. However, I can sometimes fall back on C-style memory >>> management, >>> but I would like to do it in a safe way. For instance, I have >>> special-cased >>> evaluation of Scheme primitives, as follows. >>> >>> Under the "normal" implementation, a list of things to evaluate is >>> built up, passed to an evaluation function, and then the GC is left >>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>> just assume that what you put in is going to be dead right after >>> an eval and free it. Instead, I set a flag in the evaluator, which >>> is TRUE if it is OK to free the list after the eval and FALSE if >>> it's unclear (in which case the problem is left up to the GC). >>> >>> For the vast majority of Scheme primitives, one can indeed free the >>> list right after the eval. Now of course I am not interested >>> in unsafe code, so what I do is this: >>> >>> TYPE Pair = OBJECT first, rest : REFANY; END; >>> >>> VAR >>> mu := NEW(MUTEX); >>> free : Pair := NIL; >>> >>> PROCEDURE GetPair() : Pair = >>> BEGIN >>> LOCK mu DO >>> IF free # NIL THEN >>> TRY >>> RETURN free >>> FINALLY >>> free := free.rest >>> END >>> END >>> END; >>> RETURN NEW(Pair) >>> END GetPair; >>> >>> PROCEDURE ReturnPair(cons : Pair) = >>> BEGIN >>> cons.first := NIL; >>> LOCK mu DO >>> cons.rest := free; >>> free := cons >>> END >>> END ReturnPair; >>> >>> my eval code looks like >>> >>> VAR okToFree : BOOLEAN; BEGIN >>> >>> args := GetPair(); ... >>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>> >>> IF okToFree THEN ReturnPair(args) END; >>> RETURN result >>> END >>> >>> and this does work well. In fact it speeds up the Linux >>> implementation >>> by almost 100% to recycle the lists like this *just* for the >>> evaluation of Scheme primitives. >>> >>> But it's still ugly, isn't it? There's a mutex, and a global >>> variable. And yes, the time spent messing with the mutex is >>> noticeable, and I haven't even made the code multi-threaded yet >>> (and that is coming!) >>> >>> So I'm thinking, what I really want is a structure that is attached >>> to my current Thread.T. I want to be able to access just a single >>> pointer (like the free list) but be sure it is unique to my current >>> thread. No locking would be necessary if I could do this. >>> >>> Does anyone have an elegant solution that does something like this? >>> Thread-specific "static" variables? Just one REFANY would be enough >>> for a lot of uses... seems to me this should be a frequently >>> occurring problem? >>> >>> Best regards, >>> Mika >>> >>> >>> >>> >>> >>> From mika at async.caltech.edu Fri Oct 17 08:50:13 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 23:50:13 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000." Message-ID: <200810170650.m9H6oDU0078549@camembert.async.caltech.edu> Jay writes: ... >How do you manage okToFree? ... I forgot to answer this q. Well, the primitive evaluation in the interpreter is just a big CASE statement. I really just look at where it references the list I am making, and if it references the list at all in a branch, I insert the code "okToFree := FALSE". The first two parameters are passed in separately. Here's the code... since you ask! This is the code for the special case of a two-argument Scheme procedure call, such as (+ x 1) . PROCEDURE Apply2(t : T; interp : Scheme.T; a1, a2 : Object) : Object VAR d1, d2 := GetCons(); free := TRUE; BEGIN d1.first := a1; d1.rest := d2; d2.first := a2; d2.rest := NIL; WITH res = Prims(t, interp, d1, a1, a2, free) DO IF free THEN ReturnCons(d1); ReturnCons(d2) END; RETURN res END END Apply2; PROCEDURE Prims(t : T; interp : Scheme.T; args, x, y : Object; VAR free : BOOLEAN) : Object = (* The (hopefully temporary) list of arguments is args. x and y are the first two elements of args *) BEGIN CASE VAL(t.idNumber,P) OF P.Eq => RETURN NumCompare(args, '=') (* known not to let args escape *) | P.List => free := FALSE; RETURN args (* args escapes, dont know whither *) | P.Car => RETURN PedanticFirst(x) (* doesn't even use args *) (* and about another 100 cases follow here *) END END Prims; Mika From mika at async.caltech.edu Fri Oct 17 10:03:18 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 17 Oct 2008 01:03:18 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST." <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> Message-ID: <200810170803.m9H83IIC080081@camembert.async.caltech.edu> Ok this suggests that using thread local state to get around the problem won't help either. Can I ask a question... I am looking at ThreadPThread.m3... Why do you have to lock the slotMu in Self()? PROCEDURE Self (): T = (* If not the initial thread and not created by Fork, returns NIL *) (* LL = 0 *) VAR me := GetActivation(); t: T; BEGIN IF me = NIL THEN RETURN NIL END; WITH r = Upthread.mutex_lock(slotMu) DO <*ASSERT r=0*> END; t := slots[me.slot]; WITH r = Upthread.mutex_unlock(slotMu) DO <*ASSERT r=0*> END; IF (t.act # me) THEN Die(ThisLine(), "thread with bad slot!") END; RETURN t; END Self; Is it just because of AssignSlots? If so.. it's actually a very rare event that there would ever be a conflict, no? (Only when "slots" is extended?) Can data be stored in an "Activation"? Not TRACED data, obviously, hmm... Mika Tony Hosking writes: >I suspect part of the overhead of allocation in the new code is the >need for thread-local allocation buffers, which means we need to >access thread-local state. We really need an efficient way to do >that, but pthreads thread-local accesses may be what is killing you. > >On 17 Oct 2008, at 00:30, Mika Nystrom wrote: > >> Hi Tony, >> >> I figured you would chime in! >> >> Yes, @M3noincremental seems to make things consistently a tad bit >> slower (but a very small difference), on both FreeBSD and Linux. >> @M3nogc makes a bigger difference, of course. >> >> Unfortunately I seem to have lost the code that did a lot of memory >> allocations. My tricks (as described in the email---and others!) >> have removed most of the troublesome memory allocations, but now >> I'm stuck with the mutex instead... >> >> Mika >> >> Tony Hosking writes: >>> Have you tried running @M3noincremental? >>> >>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> As I mentioned in an earlier email about printing structures (thanks >>>> Darko), I'm in the midst of coding an interpreter embedded in >>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>> Norvig's >>>> JScheme for Java (well it was at first strongly based, but more and >>>> more loosely, if you know what I mean...) >>>> >>>> I expected that the performance of the interpreter would be much >>>> better in Modula-3 than in Java, and I have been testing on two >>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>> and the other is CM3 on a recent Debian system. What I am finding >>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>> (getting >>>> close to ten times as fast on some tasks at this point), but on >>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>> >>>> When I started, with code that was essentially equivalent to >>>> JScheme, >>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>> spend most of its time in (surprise, surprise!) memory allocation >>>> and garbage collection. The speedup I have achieved between the >>>> first implementation and now was due to the use of Modula-3 >>>> constructs >>>> that are superior to Java's, such as the use of arrays of RECORDs >>>> to make small stacks rather than linked lists. (I get readable >>>> code with much fewer memory allocations and GC work.) >>>> >>>> Now, since this is an interpreter, I as the implementer have limited >>>> control over how much memory is allocated and freed, and where it is >>>> needed. However, I can sometimes fall back on C-style memory >>>> management, >>>> but I would like to do it in a safe way. For instance, I have >>>> special-cased >>>> evaluation of Scheme primitives, as follows. >>>> >>>> Under the "normal" implementation, a list of things to evaluate is >>>> built up, passed to an evaluation function, and then the GC is left >>>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>>> just assume that what you put in is going to be dead right after >>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>> it's unclear (in which case the problem is left up to the GC). >>>> >>>> For the vast majority of Scheme primitives, one can indeed free the >>>> list right after the eval. Now of course I am not interested >>>> in unsafe code, so what I do is this: >>>> >>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>> >>>> VAR >>>> mu := NEW(MUTEX); >>>> free : Pair := NIL; >>>> >>>> PROCEDURE GetPair() : Pair = >>>> BEGIN >>>> LOCK mu DO >>>> IF free # NIL THEN >>>> TRY >>>> RETURN free >>>> FINALLY >>>> free := free.rest >>>> END >>>> END >>>> END; >>>> RETURN NEW(Pair) >>>> END GetPair; >>>> >>>> PROCEDURE ReturnPair(cons : Pair) = >>>> BEGIN >>>> cons.first := NIL; >>>> LOCK mu DO >>>> cons.rest := free; >>>> free := cons >>>> END >>>> END ReturnPair; >>>> >>>> my eval code looks like >>>> >>>> VAR okToFree : BOOLEAN; BEGIN >>>> >>>> args := GetPair(); ... >>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>> >>>> IF okToFree THEN ReturnPair(args) END; >>>> RETURN result >>>> END >>>> >>>> and this does work well. In fact it speeds up the Linux >>>> implementation >>>> by almost 100% to recycle the lists like this *just* for the >>>> evaluation of Scheme primitives. >>>> >>>> But it's still ugly, isn't it? There's a mutex, and a global >>>> variable. And yes, the time spent messing with the mutex is >>>> noticeable, and I haven't even made the code multi-threaded yet >>>> (and that is coming!) >>>> >>>> So I'm thinking, what I really want is a structure that is attached >>>> to my current Thread.T. I want to be able to access just a single >>>> pointer (like the free list) but be sure it is unique to my current >>>> thread. No locking would be necessary if I could do this. >>>> >>>> Does anyone have an elegant solution that does something like this? >>>> Thread-specific "static" variables? Just one REFANY would be enough >>>> for a lot of uses... seems to me this should be a frequently >>>> occurring problem? >>>> >>>> Best regards, >>>> Mika >>>> >>>> >>>> >>>> >>>> >>>> From mika at async.caltech.edu Fri Oct 17 10:32:28 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 17 Oct 2008 01:32:28 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST." <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> Message-ID: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> Ok I am sorry I am slow to pick up on this. I take it the problem is actually the Upthread.getspecific routine, which itself calls something get_curthread somewhere inside pthreads, which in turn involves a context switch to the supervisor---the identity of the current thread is just not accessible anywhere in user space. Also explains why this program runs faster with my old PM3, which uses longjmp threads. The only way to avoid it (really) is to pass a pointer to the Thread.T of the currently executing thread in the activation record of *every* procedure, so that allocators can find it when necessary.... but that is very expensive in terms of stack memory. Or I can just make a structure like that that I pass around where I need it in my own program. Thread-specific and user-managed. I believe I have just answered all my own questions, but I hope Tony will correct me if my answers are incorrect. Mika Tony Hosking writes: >I suspect part of the overhead of allocation in the new code is the >need for thread-local allocation buffers, which means we need to >access thread-local state. We really need an efficient way to do >that, but pthreads thread-local accesses may be what is killing you. > >On 17 Oct 2008, at 00:30, Mika Nystrom wrote: > >> Hi Tony, >> >> I figured you would chime in! >> >> Yes, @M3noincremental seems to make things consistently a tad bit >> slower (but a very small difference), on both FreeBSD and Linux. >> @M3nogc makes a bigger difference, of course. >> >> Unfortunately I seem to have lost the code that did a lot of memory >> allocations. My tricks (as described in the email---and others!) >> have removed most of the troublesome memory allocations, but now >> I'm stuck with the mutex instead... >> >> Mika >> >> Tony Hosking writes: >>> Have you tried running @M3noincremental? >>> >>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> As I mentioned in an earlier email about printing structures (thanks >>>> Darko), I'm in the midst of coding an interpreter embedded in >>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>> Norvig's >>>> JScheme for Java (well it was at first strongly based, but more and >>>> more loosely, if you know what I mean...) >>>> >>>> I expected that the performance of the interpreter would be much >>>> better in Modula-3 than in Java, and I have been testing on two >>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>> and the other is CM3 on a recent Debian system. What I am finding >>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>> (getting >>>> close to ten times as fast on some tasks at this point), but on >>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>> >>>> When I started, with code that was essentially equivalent to >>>> JScheme, >>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>> spend most of its time in (surprise, surprise!) memory allocation >>>> and garbage collection. The speedup I have achieved between the >>>> first implementation and now was due to the use of Modula-3 >>>> constructs >>>> that are superior to Java's, such as the use of arrays of RECORDs >>>> to make small stacks rather than linked lists. (I get readable >>>> code with much fewer memory allocations and GC work.) >>>> >>>> Now, since this is an interpreter, I as the implementer have limited >>>> control over how much memory is allocated and freed, and where it is >>>> needed. However, I can sometimes fall back on C-style memory >>>> management, >>>> but I would like to do it in a safe way. For instance, I have >>>> special-cased >>>> evaluation of Scheme primitives, as follows. >>>> >>>> Under the "normal" implementation, a list of things to evaluate is >>>> built up, passed to an evaluation function, and then the GC is left >>>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>>> just assume that what you put in is going to be dead right after >>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>> it's unclear (in which case the problem is left up to the GC). >>>> >>>> For the vast majority of Scheme primitives, one can indeed free the >>>> list right after the eval. Now of course I am not interested >>>> in unsafe code, so what I do is this: >>>> >>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>> >>>> VAR >>>> mu := NEW(MUTEX); >>>> free : Pair := NIL; >>>> >>>> PROCEDURE GetPair() : Pair = >>>> BEGIN >>>> LOCK mu DO >>>> IF free # NIL THEN >>>> TRY >>>> RETURN free >>>> FINALLY >>>> free := free.rest >>>> END >>>> END >>>> END; >>>> RETURN NEW(Pair) >>>> END GetPair; >>>> >>>> PROCEDURE ReturnPair(cons : Pair) = >>>> BEGIN >>>> cons.first := NIL; >>>> LOCK mu DO >>>> cons.rest := free; >>>> free := cons >>>> END >>>> END ReturnPair; >>>> >>>> my eval code looks like >>>> >>>> VAR okToFree : BOOLEAN; BEGIN >>>> >>>> args := GetPair(); ... >>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>> >>>> IF okToFree THEN ReturnPair(args) END; >>>> RETURN result >>>> END >>>> >>>> and this does work well. In fact it speeds up the Linux >>>> implementation >>>> by almost 100% to recycle the lists like this *just* for the >>>> evaluation of Scheme primitives. >>>> >>>> But it's still ugly, isn't it? There's a mutex, and a global >>>> variable. And yes, the time spent messing with the mutex is >>>> noticeable, and I haven't even made the code multi-threaded yet >>>> (and that is coming!) >>>> >>>> So I'm thinking, what I really want is a structure that is attached >>>> to my current Thread.T. I want to be able to access just a single >>>> pointer (like the free list) but be sure it is unique to my current >>>> thread. No locking would be necessary if I could do this. >>>> >>>> Does anyone have an elegant solution that does something like this? >>>> Thread-specific "static" variables? Just one REFANY would be enough >>>> for a lot of uses... seems to me this should be a frequently >>>> occurring problem? >>>> >>>> Best regards, >>>> Mika >>>> >>>> >>>> >>>> >>>> >>>> From jay.krell at cornell.edu Sat Oct 18 00:42:35 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 17 Oct 2008 22:42:35 +0000 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> References: Your message of <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> Message-ID: Right and wrong. Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGetValue. Wrong that this necessarily incurs a switch to the supervisor/kernel, and perhaps wrong to call that at a "context switch". It depends on the operating system. I will explain. On Windows/x86, the FS register points to a partly documented per-thread data structure. C and C++ exception handling use FS:0. Disassemble any code. You'll find it is used. Not by Modula-3 though. Disassemble TlsGetValue. cdb /z %windir%\system32\kernel32.dll 0:000> uf kernel32!TlsGetValue kernel32!TlsGetValue: typical looking prolog.. 7dd813e0 8bff mov edi,edi 7dd813e2 55 push ebp 7dd813e3 8bec mov ebp,esp fs:18 contains a "normal" "linear" pointer to fs:0 Get that pointer. 7dd813e5 64a118000000 mov eax,dword ptr fs:[00000018h] get the index 7dd813eb 8b4d08 mov ecx,dword ptr [ebp+8] SetLastError(0) 7dd813ee 83603400 and dword ptr [eax+34h],0 There are 64 preallocated thread local slots -- compare the index to 64. 7dd813f2 83f940 cmp ecx,40h If it above or equal to 64, go use the non preallocated slots. 7dd813f5 0f8353e20200 jae kernel32!lstrcmpi+0x4b22 (7ddaf64e) preallocated slots are at fs:e10; get the data and done kernel32!TlsGetValue+0x1b: 7dd813fb 8b8488100e0000 mov eax,dword ptr [eax+ecx*4+0E10h] epilog kernel32!TlsGetValue+0x22: 7dd81402 5d pop ebp 7dd81403 c20400 ret 4 get here for indices>= 64 compare index to 1088 == 1024 + 64, as there are another 1024 more slowly available slots kernel32!lstrcmpi+0x4b22: 7ddaf64e 81f940040000 cmp ecx,440h if it is below 1024, go use those slots 7ddaf654 7211 jb kernel32!lstrcmpi+0x4b3b (7ddaf667) index is above or equal to 1024, SetLastError(invalid parameter) kernel32!lstrcmpi+0x4b2a: 7ddaf656 680d0000c0 push 0C000000Dh 7ddaf65b e80025fdff call kernel32!GetProcessHeap+0x12 (7dd81b60) and return 0 -- 0 is not unambiguously an error -- that's why last error was cleared at the start kernel32!lstrcmpi+0x4b34: 7ddaf660 33c0 xor eax,eax 7ddaf662 e99b1dfdff jmp kernel32!TlsGetValue+0x22 (7dd81402) This is where the slots between 64 and 1088 are used. Get pointer from FS:F94 and compare to null. If it is null, that is ok, it means nobody has yet calls TlsSetValue for this value, so it just retains its initial 0 value. kernel32!lstrcmpi+0x4b3b: 7ddaf667 8b80940f0000 mov eax,dword ptr [eax+0F94h] 7ddaf66d 85c0 test eax,eax 7ddaf66f 74ef je kernel32!lstrcmpi+0x4b34 (7ddaf660) Index is between 64 and 1088, and there is a non null pointer at FS:F94. Subtract 64 from index and index into pointer there. Note it does the subtraction after the multiplication, so subtracts 64*4=0x100. kernel32!lstrcmpi+0x4b45: 7ddaf671 8b848800ffffff mov eax,dword ptr [eax+ecx*4-100h] 7ddaf678 e9851dfdff jmp kernel32!TlsGetValue+0x22 (7dd81402) So, it is a few instructions but there is no context switch into the kernel/supervisor. Also, calls into the kernel aren't necessarily a "context switch". Some context is saved, and a bit is twiddled in the processor to indicate a privilege level change, but no page tables are altered and I believe no TLBs (translation lookaside buffer) are invalidated, and no thread scheduling decisions are made -- though upon exit from the kernel, APCs (asynchronous procedure call) can be run -- on the calling thread. A more expensive context switch is when another thread or another process runs. Switching threads requires saving more context, and switching processes requires changing the register that points to the page tables. One detail there -- calling into the x86 NT kernel does not preserve floating point state -- that's the additional state that a thread switch has to save, at least. NT/x86 kernel drivers aren't allowed to use floating point, with some exception, like if they are video drivers (only certain functions?) or they explicitly save/restore the floating point registers using public functions. I don't know about the other architectures. I think IA64 only preserves some floating point state, not all. Now, the question then is how is Upthread.getspecific implemented on other archictures and operating systems. We should look into that for various operating systems. Oh, also, let's see what __declspec(thread) does. >type t.c __declspec(thread) int a; void F1(int); void F2() { F1(a); } cl -c t.c link -dump -disasm t.obj Dump of file t.obj File Type: COFF OBJECT _F2: 00000000: 55 push ebp 00000001: 8B EC mov ebp,esp 00000003: A1 00 00 00 00 mov eax,dword ptr [__tls_index] 00000008: 64 8B 0D 00 00 00 mov ecx,dword ptr fs:[__tls_array] 00 0000000F: 8B 14 81 mov edx,dword ptr [ecx+eax*4] 00000012: 8B 82 00 00 00 00 mov eax,dword ptr _a[edx] 00000018: 50 push eax 00000019: E8 00 00 00 00 call _F1 0000001E: 83 C4 04 add esp,4 00000021: 5D pop ebp 00000022: C3 ret See the compiler generated code reference FS directly. The optimized version is: Dump of file t.obj File Type: COFF OBJECT _F2: 00000000: A1 00 00 00 00 mov eax,dword ptr [__tls_index] 00000005: 64 8B 0D 00 00 00 mov ecx,dword ptr fs:[__tls_array] 00 0000000C: 8B 14 81 mov edx,dword ptr [ecx+eax*4] 0000000F: 8B 82 00 00 00 00 mov eax,dword ptr _a[edx] 00000015: 50 push eax 00000016: E8 00 00 00 00 call _F1 0000001B: 59 pop ecx 0000001C: C3 ret - Jay > To: hosking at cs.purdue.edu > Date: Fri, 17 Oct 2008 01:32:28 -0700 > From: mika at async.caltech.edu > CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu > Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? > > Ok I am sorry I am slow to pick up on this. > > I take it the problem is actually the Upthread.getspecific routine, > which itself calls something get_curthread somewhere inside pthreads, > which in turn involves a context switch to the supervisor---the identity > of the current thread is just not accessible anywhere in user space. > Also explains why this program runs faster with my old PM3, which uses > longjmp threads. > > The only way to avoid it (really) is to pass a pointer to the > Thread.T of the currently executing thread in the activation record > of *every* procedure, so that allocators can find it when necessary.... > but that is very expensive in terms of stack memory. > > Or I can just make a structure like that that I pass around where > I need it in my own program. Thread-specific and user-managed. > > I believe I have just answered all my own questions, but I hope > Tony will correct me if my answers are incorrect. > > Mika > > Tony Hosking writes: >>I suspect part of the overhead of allocation in the new code is the >>need for thread-local allocation buffers, which means we need to >>access thread-local state. We really need an efficient way to do >>that, but pthreads thread-local accesses may be what is killing you. >> >>On 17 Oct 2008, at 00:30, Mika Nystrom wrote: >> >>> Hi Tony, >>> >>> I figured you would chime in! >>> >>> Yes, @M3noincremental seems to make things consistently a tad bit >>> slower (but a very small difference), on both FreeBSD and Linux. >>> @M3nogc makes a bigger difference, of course. >>> >>> Unfortunately I seem to have lost the code that did a lot of memory >>> allocations. My tricks (as described in the email---and others!) >>> have removed most of the troublesome memory allocations, but now >>> I'm stuck with the mutex instead... >>> >>> Mika >>> >>> Tony Hosking writes: >>>> Have you tried running @M3noincremental? >>>> >>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> As I mentioned in an earlier email about printing structures (thanks >>>>> Darko), I'm in the midst of coding an interpreter embedded in >>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>>> Norvig's >>>>> JScheme for Java (well it was at first strongly based, but more and >>>>> more loosely, if you know what I mean...) >>>>> >>>>> I expected that the performance of the interpreter would be much >>>>> better in Modula-3 than in Java, and I have been testing on two >>>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>>> and the other is CM3 on a recent Debian system. What I am finding >>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>>> (getting >>>>> close to ten times as fast on some tasks at this point), but on >>>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>>> >>>>> When I started, with code that was essentially equivalent to >>>>> JScheme, >>>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>>> spend most of its time in (surprise, surprise!) memory allocation >>>>> and garbage collection. The speedup I have achieved between the >>>>> first implementation and now was due to the use of Modula-3 >>>>> constructs >>>>> that are superior to Java's, such as the use of arrays of RECORDs >>>>> to make small stacks rather than linked lists. (I get readable >>>>> code with much fewer memory allocations and GC work.) >>>>> >>>>> Now, since this is an interpreter, I as the implementer have limited >>>>> control over how much memory is allocated and freed, and where it is >>>>> needed. However, I can sometimes fall back on C-style memory >>>>> management, >>>>> but I would like to do it in a safe way. For instance, I have >>>>> special-cased >>>>> evaluation of Scheme primitives, as follows. >>>>> >>>>> Under the "normal" implementation, a list of things to evaluate is >>>>> built up, passed to an evaluation function, and then the GC is left >>>>> to sweep up the mess. The problem is that there are various tricky >>>> routes by which references can escape the evaluator, so you can't >>>>> just assume that what you put in is going to be dead right after >>>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>>> it's unclear (in which case the problem is left up to the GC). >>>>> >>>>> For the vast majority of Scheme primitives, one can indeed free the >>>>> list right after the eval. Now of course I am not interested >>>>> in unsafe code, so what I do is this: >>>>> >>>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>>> >>>>> VAR >>>>> mu := NEW(MUTEX); >>>>> free : Pair := NIL; >>>>> >>>>> PROCEDURE GetPair() : Pair = >>>>> BEGIN >>>>> LOCK mu DO >>>>> IF free # NIL THEN >>>>> TRY >>>>> RETURN free >>>>> FINALLY >>>>> free := free.rest >>>>> END >>>>> END >>>>> END; >>>>> RETURN NEW(Pair) >>>>> END GetPair; >>>>> >>>>> PROCEDURE ReturnPair(cons : Pair) = >>>>> BEGIN >>>>> cons.first := NIL; >>>>> LOCK mu DO >>>>> cons.rest := free; >>>>> free := cons >>>>> END >>>>> END ReturnPair; >>>>> >>>>> my eval code looks like >>>>> >>>>> VAR okToFree : BOOLEAN; BEGIN >>>>> >>>>> args := GetPair(); ... >>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>>> >>>>> IF okToFree THEN ReturnPair(args) END; >>>>> RETURN result >>>>> END >>>>> >>>>> and this does work well. In fact it speeds up the Linux >>>>> implementation >>>>> by almost 100% to recycle the lists like this *just* for the >>>>> evaluation of Scheme primitives. >>>>> >>>>> But it's still ugly, isn't it? There's a mutex, and a global >>>>> variable. And yes, the time spent messing with the mutex is >>>>> noticeable, and I haven't even made the code multi-threaded yet >>>>> (and that is coming!) >>>>> >>>>> So I'm thinking, what I really want is a structure that is attached >>>>> to my current Thread.T. I want to be able to access just a single >>>>> pointer (like the free list) but be sure it is unique to my current >>>>> thread. No locking would be necessary if I could do this. >>>>> >>>>> Does anyone have an elegant solution that does something like this? >>>>> Thread-specific "static" variables? Just one REFANY would be enough >>>>> for a lot of uses... seems to me this should be a frequently >>>>> occurring problem? >>>>> >>>>> Best regards, >>>>> Mika >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> From mika at async.caltech.edu Sat Oct 18 01:00:28 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 17 Oct 2008 16:00:28 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 22:42:35 -0000." Message-ID: <200810172300.m9HN0SfN008554@camembert.async.caltech.edu> No, I didn't mean that it *necessarily* involves a context switch. Obviously it doesn't, because the user-level threading doesn't ever need to do a "kernel" context switch (but of course does its own switching, however I don't see that it would need that to get or set a variable). I just meant that looking at the (C) implementation of pthreads I have (on FreeBSD), on that system, it does seem to, as the code in question is marked as "kernel code". In any case I think I have been able to solve my particular problem by identifying a data structure that is inherently only accessed from a single thread (in my program) and attaching my memory recycling trickery to that particular structure. I get very little memory allocation/GC and no need for locks at all, which is precisely the effect I was going for. I am still a little bit concerned about the performance of CM3-generated code but the main culprit appears to be TYPECASE/ISTYPE now, far from garbage collectors and thread libraries. I'll send an update if I can find something egregiously inefficient. Mika Jay writes: > >Right and wrong. > >Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGet >Value. >Wrong that this necessarily incurs a switch to the supervisor/kernel, and perh >aps wrong to call that at a "context switch". It depends on the operating syst >em. > >I will explain. > >On Windows/x86, the FS register points to a partly documented per-thread data >structure. >C and C++ exception handling use FS:0. >Disassemble any code. You'll find it is used. Not by Modula-3 though. > >Disassemble TlsGetValue. > > cdb /z %windir%\system32\kernel32.dll > >0:000> uf kernel32!TlsGetValue >kernel32!TlsGetValue: ... From mika at async.caltech.edu Sat Oct 18 10:41:30 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Sat, 18 Oct 2008 01:41:30 -0700 Subject: [M3devel] Fortran Message-ID: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu> Ok now in the realm of crazy questions---and I apologize to those whose inboxes I clog with some of my emails... If there is anyone out there in Modula-3-ether who has ever written or heard of ... an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs ... would he please make himself known to me? (I have a Scheme interpreter to trade...) Mika From lemming at henning-thielemann.de Sat Oct 18 17:34:50 2008 From: lemming at henning-thielemann.de (Henning Thielemann) Date: Sat, 18 Oct 2008 17:34:50 +0200 (MEST) Subject: [M3devel] Fortran In-Reply-To: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu> References: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu> Message-ID: On Sat, 18 Oct 2008, Mika Nystrom wrote: > Ok now in the realm of crazy questions---and I apologize to those > whose inboxes I clog with some of my emails... > > If there is anyone out there in Modula-3-ether who has ever written > or heard of ... > > an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs > > ... would he please make himself known to me? (I have a Scheme > interpreter to trade...) I have written a program for generating Modula-3 interfaces for LAPACK (linear algebra routines) using m3coco. But I'm afraid that my Fortran parser works only for LAPACK and no other library. I have just copied the CVS files to http://modula3.elegosoft.com/cgi-bin/cvsweb.cgi/m3/pm3/language/parsing/m3coco/test/?cvsroot=PM3 Before you check this out, I might move it to a different location, maybe cm3/m3-tools, if this is more appropriate. (Maybe you also need the revised m3coco version, which I only have on a branch, and never tried to merge it back to HEAD.) While searching my own code in the net, I found some nice interviews with Luca Cardelli: http://www.wikio.com/technology/development/modula-3 From mika at async.caltech.edu Tue Oct 21 13:05:01 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 21 Oct 2008 04:05:01 -0700 Subject: [M3devel] CM3 on Mac OS X Tiger Message-ID: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu> Hello everyone, Sorry if I have asked this before---I feel I must have, and Tony probably answered it, too, but I can't find it anywhere in my email archives. It looks like I finally upgraded my Mac to Tiger a half year ago, and everything broke. (Modula-3, emacs, make, etc etc etc etc.) I am finally getting around to fixing it. Now I am trying to compile CM3 in accordance with Tony's instructions as of June 24, 2007: (short quote here) > cd ~/cm3-cvs > mkdir boot > cd boot > tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz > ./cminstall Now you will have some kind of cm3 installed, presumably in /usr/ local/cm3/bin/cm3. Make sure you have a fresh CVS checkout in directory cm3 (let's assume this is in your home directory ~/cm3). Also, make sure you have an up-to-date version of the CM3 backend compiler cm3cg installed by executing the following: STEP 0: export CM3=/usr/local/cm3/bin/cm3 cd ~/cm3/m3-sys/m3cc $CM3 $CM3 -ship You can skip this last step if you know your backend compiler is up to date. Now, let's build the new compiler from scratch (this is the sequence I use regularly to test changes to the run-time system whenever I make them): STEP 1: cd ~/cm3/m3-libs/m3core $CM3 $CM3 -ship (end short quote, there's much more) What happens is that when building m3core, my compiler is building it against the interfaces in /usr/local/cm3, NOT the interfaces within m3core itself: --- building in PPC_DARWIN --- ignoring ../src/m3overrides new source -> compiling RTCollector.m3 "../src/runtime/common/RTCollector.m3", line 2914: unknown qualification '.' (AMD64_LINUX) "../src/runtime/common/RTCollector.m3", line 2915: unknown qualification '.' (SPARC32_LINUX) "../src/runtime/common/RTCollector.m3", line 2916: unknown qualification '.' (SPARC64_OPENBSD) "../src/runtime/common/RTCollector.m3", line 2917: unknown qualification '.' (PPC32_OPENBSD) 4 errors encountered stale imports -> compiling RTDebug.m3 Fatal Error: bad version stamps: RTDebug.m3 version stamp mismatch: Compiler.Platform => RTDebug.m3 => Compiler.i3 version stamp mismatch: Compiler.ThisPlatform <8b5a6f513e082750> => RTDebug.m3 <8e110d4fed998051> => Compiler.i3 I feel like I should REALLY know the answer to this, but how do I get the compiler to use only the local sources and not attempt to compile things with reference to the already-installed interfaces? Mika From hosking at cs.purdue.edu Tue Oct 21 13:21:36 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 12:21:36 +0100 Subject: [M3devel] CM3 on Mac OS X Tiger In-Reply-To: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu> References: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu> Message-ID: <27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu> This is a phase ordering problem that arises when you use an old compiler to compile newer sources. It really should be fixed somehow. In any case, the problem is those lines in RTCollector at the bottom (I deleted them yesterday on the main trunk) that refer to values supposedly built in to the compiler (which are not there for the old binary you are using). I think if you delete those lines then you should be OK. Once you have a new compiler bootstrapped (with those configuration values available built in) then you should be able to compile that code (excepting that I just deleted those lines yesterday). On 21 Oct 2008, at 12:05, Mika Nystrom wrote: > Hello everyone, > > Sorry if I have asked this before---I feel I must have, and Tony > probably answered it, too, but I can't find it anywhere in my email > archives. > > It looks like I finally upgraded my Mac to Tiger a half year ago, > and everything broke. (Modula-3, emacs, make, etc etc etc etc.) > I am finally getting around to fixing it. Now I am trying to > compile CM3 in accordance with Tony's instructions as of June 24, > 2007: > > (short quote here) >> cd ~/cm3-cvs >> mkdir boot >> cd boot >> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz >> ./cminstall > > Now you will have some kind of cm3 installed, presumably in /usr/ > local/cm3/bin/cm3. > > Make sure you have a fresh CVS checkout in directory cm3 (let's > assume this is in your home directory ~/cm3). Also, make sure you > have an up-to-date version of the CM3 backend compiler cm3cg > installed by executing the following: > > STEP 0: > > export CM3=/usr/local/cm3/bin/cm3 > cd ~/cm3/m3-sys/m3cc > $CM3 > $CM3 -ship > > You can skip this last step if you know your backend compiler is up > to date. > > Now, let's build the new compiler from scratch (this is the sequence > I use regularly to test changes to the run-time system whenever I > make them): > > STEP 1: > > cd ~/cm3/m3-libs/m3core > $CM3 > $CM3 -ship > (end short quote, there's much more) > > What happens is that when building m3core, my compiler is building > it against the interfaces in /usr/local/cm3, NOT the interfaces > within m3core itself: > > --- building in PPC_DARWIN --- > > ignoring ../src/m3overrides > > new source -> compiling RTCollector.m3 > "../src/runtime/common/RTCollector.m3", line 2914: unknown > qualification '.' (AMD64_LINUX) > "../src/runtime/common/RTCollector.m3", line 2915: unknown > qualification '.' (SPARC32_LINUX) > "../src/runtime/common/RTCollector.m3", line 2916: unknown > qualification '.' (SPARC64_OPENBSD) > "../src/runtime/common/RTCollector.m3", line 2917: unknown > qualification '.' (PPC32_OPENBSD) > 4 errors encountered > stale imports -> compiling RTDebug.m3 > > Fatal Error: bad version stamps: RTDebug.m3 > > version stamp mismatch: Compiler.Platform > => RTDebug.m3 > => Compiler.i3 > version stamp mismatch: Compiler.ThisPlatform > <8b5a6f513e082750> => RTDebug.m3 > <8e110d4fed998051> => Compiler.i3 > > I feel like I should REALLY know the answer to this, but how do I > get the compiler to use only the local sources and not attempt > to compile things with reference to the already-installed > interfaces? > > Mika From hosking at cs.purdue.edu Tue Oct 21 16:54:58 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 15:54:58 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> Message-ID: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu> I have one more question that I forgot to ask before. Did you evaluate performance with -O3 optimization in the backend? Generally, I have the following in my m3_backend specs so that turning on optimization results in -O3 (and lots of lovely inlining): proc m3_backend (source, object, optimize, debug) is local args = [ "-m32", "-quiet", source, "-o", object, % fPIC really is needed here, despite man gcc saying it is the default. % This is because man gcc is about Apple's gcc but m3cg is % built from FSF source. "-fPIC", "-fno-reorder-blocks" ] if optimize args += "-O3" end if debug args += "-gstabs" end if M3_PROFILING args += "-p" end return try_exec (m3back, args) end On 17 Oct 2008, at 09:32, Mika Nystrom wrote: > Ok I am sorry I am slow to pick up on this. > > I take it the problem is actually the Upthread.getspecific routine, > which itself calls something get_curthread somewhere inside pthreads, > which in turn involves a context switch to the supervisor---the > identity > of the current thread is just not accessible anywhere in user space. > Also explains why this program runs faster with my old PM3, which uses > longjmp threads. > > The only way to avoid it (really) is to pass a pointer to the > Thread.T of the currently executing thread in the activation record > of *every* procedure, so that allocators can find it when > necessary.... > but that is very expensive in terms of stack memory. > > Or I can just make a structure like that that I pass around where > I need it in my own program. Thread-specific and user-managed. > > I believe I have just answered all my own questions, but I hope > Tony will correct me if my answers are incorrect. > > Mika > > Tony Hosking writes: >> I suspect part of the overhead of allocation in the new code is the >> need for thread-local allocation buffers, which means we need to >> access thread-local state. We really need an efficient way to do >> that, but pthreads thread-local accesses may be what is killing you. >> >> On 17 Oct 2008, at 00:30, Mika Nystrom wrote: >> >>> Hi Tony, >>> >>> I figured you would chime in! >>> >>> Yes, @M3noincremental seems to make things consistently a tad bit >>> slower (but a very small difference), on both FreeBSD and Linux. >>> @M3nogc makes a bigger difference, of course. >>> >>> Unfortunately I seem to have lost the code that did a lot of memory >>> allocations. My tricks (as described in the email---and others!) >>> have removed most of the troublesome memory allocations, but now >>> I'm stuck with the mutex instead... >>> >>> Mika >>> >>> Tony Hosking writes: >>>> Have you tried running @M3noincremental? >>>> >>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> As I mentioned in an earlier email about printing structures >>>>> (thanks >>>>> Darko), I'm in the midst of coding an interpreter embedded in >>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>>> Norvig's >>>>> JScheme for Java (well it was at first strongly based, but more >>>>> and >>>>> more loosely, if you know what I mean...) >>>>> >>>>> I expected that the performance of the interpreter would be much >>>>> better in Modula-3 than in Java, and I have been testing on two >>>>> different systems. One is my ancient FreeBSD-4.11 with an old >>>>> PM3, >>>>> and the other is CM3 on a recent Debian system. What I am finding >>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>>> (getting >>>>> close to ten times as fast on some tasks at this point), but on >>>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>>> >>>>> When I started, with code that was essentially equivalent to >>>>> JScheme, >>>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>>> spend most of its time in (surprise, surprise!) memory allocation >>>>> and garbage collection. The speedup I have achieved between the >>>>> first implementation and now was due to the use of Modula-3 >>>>> constructs >>>>> that are superior to Java's, such as the use of arrays of RECORDs >>>>> to make small stacks rather than linked lists. (I get readable >>>>> code with much fewer memory allocations and GC work.) >>>>> >>>>> Now, since this is an interpreter, I as the implementer have >>>>> limited >>>>> control over how much memory is allocated and freed, and where >>>>> it is >>>>> needed. However, I can sometimes fall back on C-style memory >>>>> management, >>>>> but I would like to do it in a safe way. For instance, I have >>>>> special-cased >>>>> evaluation of Scheme primitives, as follows. >>>>> >>>>> Under the "normal" implementation, a list of things to evaluate is >>>>> built up, passed to an evaluation function, and then the GC is >>>>> left >>>>> to sweep up the mess. The problem is that there are various >>>>> tricky >>>> routes by which references can escape the evaluator, so you can't >>>>> just assume that what you put in is going to be dead right after >>>>> an eval and free it. Instead, I set a flag in the evaluator, >>>>> which >>>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>>> it's unclear (in which case the problem is left up to the GC). >>>>> >>>>> For the vast majority of Scheme primitives, one can indeed free >>>>> the >>>>> list right after the eval. Now of course I am not interested >>>>> in unsafe code, so what I do is this: >>>>> >>>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>>> >>>>> VAR >>>>> mu := NEW(MUTEX); >>>>> free : Pair := NIL; >>>>> >>>>> PROCEDURE GetPair() : Pair = >>>>> BEGIN >>>>> LOCK mu DO >>>>> IF free # NIL THEN >>>>> TRY >>>>> RETURN free >>>>> FINALLY >>>>> free := free.rest >>>>> END >>>>> END >>>>> END; >>>>> RETURN NEW(Pair) >>>>> END GetPair; >>>>> >>>>> PROCEDURE ReturnPair(cons : Pair) = >>>>> BEGIN >>>>> cons.first := NIL; >>>>> LOCK mu DO >>>>> cons.rest := free; >>>>> free := cons >>>>> END >>>>> END ReturnPair; >>>>> >>>>> my eval code looks like >>>>> >>>>> VAR okToFree : BOOLEAN; BEGIN >>>>> >>>>> args := GetPair(); ... >>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>>> >>>>> IF okToFree THEN ReturnPair(args) END; >>>>> RETURN result >>>>> END >>>>> >>>>> and this does work well. In fact it speeds up the Linux >>>>> implementation >>>>> by almost 100% to recycle the lists like this *just* for the >>>>> evaluation of Scheme primitives. >>>>> >>>>> But it's still ugly, isn't it? There's a mutex, and a global >>>>> variable. And yes, the time spent messing with the mutex is >>>>> noticeable, and I haven't even made the code multi-threaded yet >>>>> (and that is coming!) >>>>> >>>>> So I'm thinking, what I really want is a structure that is >>>>> attached >>>>> to my current Thread.T. I want to be able to access just a single >>>>> pointer (like the free list) but be sure it is unique to my >>>>> current >>>>> thread. No locking would be necessary if I could do this. >>>>> >>>>> Does anyone have an elegant solution that does something like >>>>> this? >>>>> Thread-specific "static" variables? Just one REFANY would be >>>>> enough >>>>> for a lot of uses... seems to me this should be a frequently >>>>> occurring problem? >>>>> >>>>> Best regards, >>>>> Mika >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> From hosking at cs.purdue.edu Tue Oct 21 17:17:24 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 16:17:24 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu> References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu> Message-ID: <1396C14A-B23D-4D19-804B-B1627B44106F@cs.purdue.edu> Also, turn off assertions. On 21 Oct 2008, at 15:54, Tony Hosking wrote: > I have one more question that I forgot to ask before. Did you > evaluate performance with -O3 optimization in the backend? > > Generally, I have the following in my m3_backend specs so that > turning on optimization results in -O3 (and lots of lovely inlining): > > proc m3_backend (source, object, optimize, debug) is > local args = > [ > "-m32", > "-quiet", > source, > "-o", > object, > % fPIC really is needed here, despite man gcc saying it is the > default. > % This is because man gcc is about Apple's gcc but m3cg is > % built from FSF source. > "-fPIC", > "-fno-reorder-blocks" > ] > if optimize args += "-O3" end > if debug args += "-gstabs" end > if M3_PROFILING args += "-p" end > return try_exec (m3back, args) > end > > > On 17 Oct 2008, at 09:32, Mika Nystrom wrote: > >> Ok I am sorry I am slow to pick up on this. >> >> I take it the problem is actually the Upthread.getspecific routine, >> which itself calls something get_curthread somewhere inside pthreads, >> which in turn involves a context switch to the supervisor---the >> identity >> of the current thread is just not accessible anywhere in user space. >> Also explains why this program runs faster with my old PM3, which >> uses >> longjmp threads. >> >> The only way to avoid it (really) is to pass a pointer to the >> Thread.T of the currently executing thread in the activation record >> of *every* procedure, so that allocators can find it when >> necessary.... >> but that is very expensive in terms of stack memory. >> >> Or I can just make a structure like that that I pass around where >> I need it in my own program. Thread-specific and user-managed. >> >> I believe I have just answered all my own questions, but I hope >> Tony will correct me if my answers are incorrect. >> >> Mika >> >> Tony Hosking writes: >>> I suspect part of the overhead of allocation in the new code is the >>> need for thread-local allocation buffers, which means we need to >>> access thread-local state. We really need an efficient way to do >>> that, but pthreads thread-local accesses may be what is killing you. >>> >>> On 17 Oct 2008, at 00:30, Mika Nystrom wrote: >>> >>>> Hi Tony, >>>> >>>> I figured you would chime in! >>>> >>>> Yes, @M3noincremental seems to make things consistently a tad bit >>>> slower (but a very small difference), on both FreeBSD and Linux. >>>> @M3nogc makes a bigger difference, of course. >>>> >>>> Unfortunately I seem to have lost the code that did a lot of memory >>>> allocations. My tricks (as described in the email---and others!) >>>> have removed most of the troublesome memory allocations, but now >>>> I'm stuck with the mutex instead... >>>> >>>> Mika >>>> >>>> Tony Hosking writes: >>>>> Have you tried running @M3noincremental? >>>>> >>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>>>> >>>>>> Hello Modula-3 people, >>>>>> >>>>>> As I mentioned in an earlier email about printing structures >>>>>> (thanks >>>>>> Darko), I'm in the midst of coding an interpreter embedded in >>>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>>>> Norvig's >>>>>> JScheme for Java (well it was at first strongly based, but more >>>>>> and >>>>>> more loosely, if you know what I mean...) >>>>>> >>>>>> I expected that the performance of the interpreter would be much >>>>>> better in Modula-3 than in Java, and I have been testing on two >>>>>> different systems. One is my ancient FreeBSD-4.11 with an old >>>>>> PM3, >>>>>> and the other is CM3 on a recent Debian system. What I am >>>>>> finding >>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>>>> (getting >>>>>> close to ten times as fast on some tasks at this point), but on >>>>>> Linux/CM3 it is much closer in speed to JScheme than I would >>>>>> like. >>>>>> >>>>>> When I started, with code that was essentially equivalent to >>>>>> JScheme, >>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>>>> spend most of its time in (surprise, surprise!) memory allocation >>>>>> and garbage collection. The speedup I have achieved between the >>>>>> first implementation and now was due to the use of Modula-3 >>>>>> constructs >>>>>> that are superior to Java's, such as the use of arrays of RECORDs >>>>>> to make small stacks rather than linked lists. (I get readable >>>>>> code with much fewer memory allocations and GC work.) >>>>>> >>>>>> Now, since this is an interpreter, I as the implementer have >>>>>> limited >>>>>> control over how much memory is allocated and freed, and where >>>>>> it is >>>>>> needed. However, I can sometimes fall back on C-style memory >>>>>> management, >>>>>> but I would like to do it in a safe way. For instance, I have >>>>>> special-cased >>>>>> evaluation of Scheme primitives, as follows. >>>>>> >>>>>> Under the "normal" implementation, a list of things to evaluate >>>>>> is >>>>>> built up, passed to an evaluation function, and then the GC is >>>>>> left >>>>>> to sweep up the mess. The problem is that there are various >>>>>> tricky >>>>> routes by which references can escape the evaluator, so you can't >>>>>> just assume that what you put in is going to be dead right after >>>>>> an eval and free it. Instead, I set a flag in the evaluator, >>>>>> which >>>>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>>>> it's unclear (in which case the problem is left up to the GC). >>>>>> >>>>>> For the vast majority of Scheme primitives, one can indeed free >>>>>> the >>>>>> list right after the eval. Now of course I am not interested >>>>>> in unsafe code, so what I do is this: >>>>>> >>>>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>>>> >>>>>> VAR >>>>>> mu := NEW(MUTEX); >>>>>> free : Pair := NIL; >>>>>> >>>>>> PROCEDURE GetPair() : Pair = >>>>>> BEGIN >>>>>> LOCK mu DO >>>>>> IF free # NIL THEN >>>>>> TRY >>>>>> RETURN free >>>>>> FINALLY >>>>>> free := free.rest >>>>>> END >>>>>> END >>>>>> END; >>>>>> RETURN NEW(Pair) >>>>>> END GetPair; >>>>>> >>>>>> PROCEDURE ReturnPair(cons : Pair) = >>>>>> BEGIN >>>>>> cons.first := NIL; >>>>>> LOCK mu DO >>>>>> cons.rest := free; >>>>>> free := cons >>>>>> END >>>>>> END ReturnPair; >>>>>> >>>>>> my eval code looks like >>>>>> >>>>>> VAR okToFree : BOOLEAN; BEGIN >>>>>> >>>>>> args := GetPair(); ... >>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>>>> >>>>>> IF okToFree THEN ReturnPair(args) END; >>>>>> RETURN result >>>>>> END >>>>>> >>>>>> and this does work well. In fact it speeds up the Linux >>>>>> implementation >>>>>> by almost 100% to recycle the lists like this *just* for the >>>>>> evaluation of Scheme primitives. >>>>>> >>>>>> But it's still ugly, isn't it? There's a mutex, and a global >>>>>> variable. And yes, the time spent messing with the mutex is >>>>>> noticeable, and I haven't even made the code multi-threaded yet >>>>>> (and that is coming!) >>>>>> >>>>>> So I'm thinking, what I really want is a structure that is >>>>>> attached >>>>>> to my current Thread.T. I want to be able to access just a >>>>>> single >>>>>> pointer (like the free list) but be sure it is unique to my >>>>>> current >>>>>> thread. No locking would be necessary if I could do this. >>>>>> >>>>>> Does anyone have an elegant solution that does something like >>>>>> this? >>>>>> Thread-specific "static" variables? Just one REFANY would be >>>>>> enough >>>>>> for a lot of uses... seems to me this should be a frequently >>>>>> occurring problem? >>>>>> >>>>>> Best regards, >>>>>> Mika >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> > From mika at async.caltech.edu Tue Oct 21 22:18:07 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 21 Oct 2008 13:18:07 -0700 Subject: [M3devel] CM3 on Mac OS X Tiger In-Reply-To: Your message of "Tue, 21 Oct 2008 12:21:36 BST." <27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu> Message-ID: <200810212018.m9LKI81o019865@camembert.async.caltech.edu> Hi Tony, Thanks for helping, as usual! I ran into this now, is this also a bootstrapping problem? (Moving on to building libm3, cleared out existing PPC_DARWIN, have rebuilt m3cc... only see a single version of Compiler.i3 anywhere...) Here's the log: [lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship --- building in PPC_DARWIN --- ignoring ../src/m3overrides new source -> compiling Atom.i3 new source -> compiling AtomList.i3 new source -> compiling OSError.i3 new source -> compiling File.i3 new source -> compiling RegularFile.i3 new source -> compiling Pipe.i3 new source -> compiling TextSeq.i3 new source -> compiling Pathname.i3 new source -> compiling FS.i3 new source -> compiling Process.i3 new source -> compiling Socket.i3 new source -> compiling Terminal.i3 new source -> compiling FS.m3 new source -> compiling Terminal.m3 new source -> compiling RegularFile.m3 new source -> compiling Pipe.m3 new source -> compiling Socket.m3 new source -> compiling OSConfig.i3 new source -> compiling OSErrorPosix.i3 new source -> compiling Fmt.i3 new source -> compiling OSErrorPosix.m3 new source -> compiling FilePosix.i3 new source -> compiling FilePosix.m3 new source -> compiling FSPosix.m3 new source -> compiling PipePosix.m3 new source -> compiling PathnamePosix.m3 new source -> compiling SocketPosix.m3 Fatal Error: bad version stamps: SocketPosix.m3 version stamp mismatch: Compiler.Platform => SocketPosix.m3 => Compiler.i3 version stamp mismatch: Compiler.ThisPlatform <8b5a6f513e082750> => SocketPosix.m3 <8e110d4fed998051> => Compiler.i3 [lapdog:~/cm3/m3-libs/libm3] mika% Tony Hosking writes: >This is a phase ordering problem that arises when you use an old >compiler to compile newer sources. It really should be fixed >somehow. In any case, the problem is those lines in RTCollector at >the bottom (I deleted them yesterday on the main trunk) that refer to >values supposedly built in to the compiler (which are not there for >the old binary you are using). I think if you delete those lines then >you should be OK. Once you have a new compiler bootstrapped (with >those configuration values available built in) then you should be able >to compile that code (excepting that I just deleted those lines >yesterday). > > >On 21 Oct 2008, at 12:05, Mika Nystrom wrote: > >> Hello everyone, >> >> Sorry if I have asked this before---I feel I must have, and Tony >> probably answered it, too, but I can't find it anywhere in my email >> archives. >> >> It looks like I finally upgraded my Mac to Tiger a half year ago, >> and everything broke. (Modula-3, emacs, make, etc etc etc etc.) >> I am finally getting around to fixing it. Now I am trying to >> compile CM3 in accordance with Tony's instructions as of June 24, >> 2007: >> >> (short quote here) >>> cd ~/cm3-cvs >>> mkdir boot >>> cd boot >>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz >>> ./cminstall >> >> Now you will have some kind of cm3 installed, presumably in /usr/ >> local/cm3/bin/cm3. >> >> Make sure you have a fresh CVS checkout in directory cm3 (let's >> assume this is in your home directory ~/cm3). Also, make sure you >> have an up-to-date version of the CM3 backend compiler cm3cg >> installed by executing the following: >> >> STEP 0: >> >> export CM3=/usr/local/cm3/bin/cm3 >> cd ~/cm3/m3-sys/m3cc >> $CM3 >> $CM3 -ship >> >> You can skip this last step if you know your backend compiler is up >> to date. >> >> Now, let's build the new compiler from scratch (this is the sequence >> I use regularly to test changes to the run-time system whenever I >> make them): >> >> STEP 1: >> >> cd ~/cm3/m3-libs/m3core >> $CM3 >> $CM3 -ship >> (end short quote, there's much more) >> >> What happens is that when building m3core, my compiler is building >> it against the interfaces in /usr/local/cm3, NOT the interfaces >> within m3core itself: >> >> --- building in PPC_DARWIN --- >> >> ignoring ../src/m3overrides >> >> new source -> compiling RTCollector.m3 >> "../src/runtime/common/RTCollector.m3", line 2914: unknown >> qualification '.' (AMD64_LINUX) >> "../src/runtime/common/RTCollector.m3", line 2915: unknown >> qualification '.' (SPARC32_LINUX) >> "../src/runtime/common/RTCollector.m3", line 2916: unknown >> qualification '.' (SPARC64_OPENBSD) >> "../src/runtime/common/RTCollector.m3", line 2917: unknown >> qualification '.' (PPC32_OPENBSD) >> 4 errors encountered >> stale imports -> compiling RTDebug.m3 >> >> Fatal Error: bad version stamps: RTDebug.m3 >> >> version stamp mismatch: Compiler.Platform >> => RTDebug.m3 >> => Compiler.i3 >> version stamp mismatch: Compiler.ThisPlatform >> <8b5a6f513e082750> => RTDebug.m3 >> <8e110d4fed998051> => Compiler.i3 >> >> I feel like I should REALLY know the answer to this, but how do I >> get the compiler to use only the local sources and not attempt >> to compile things with reference to the already-installed >> interfaces? >> >> Mika From hosking at cs.purdue.edu Tue Oct 21 23:29:07 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 22:29:07 +0100 Subject: [M3devel] CM3 on Mac OS X Tiger In-Reply-To: <200810212018.m9LKI81o019865@camembert.async.caltech.edu> References: <200810212018.m9LKI81o019865@camembert.async.caltech.edu> Message-ID: Hmm. Not sure. Looks like it. On 21 Oct 2008, at 21:18, Mika Nystrom wrote: > Hi Tony, > > Thanks for helping, as usual! > > I ran into this now, is this also a bootstrapping problem? (Moving > on to building libm3, cleared out existing PPC_DARWIN, have rebuilt > m3cc... only see a single version of Compiler.i3 anywhere...) > > Here's the log: > > [lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship > --- building in PPC_DARWIN --- > > ignoring ../src/m3overrides > > new source -> compiling Atom.i3 > new source -> compiling AtomList.i3 > new source -> compiling OSError.i3 > new source -> compiling File.i3 > new source -> compiling RegularFile.i3 > new source -> compiling Pipe.i3 > new source -> compiling TextSeq.i3 > new source -> compiling Pathname.i3 > new source -> compiling FS.i3 > new source -> compiling Process.i3 > new source -> compiling Socket.i3 > new source -> compiling Terminal.i3 > new source -> compiling FS.m3 > new source -> compiling Terminal.m3 > new source -> compiling RegularFile.m3 > new source -> compiling Pipe.m3 > new source -> compiling Socket.m3 > new source -> compiling OSConfig.i3 > new source -> compiling OSErrorPosix.i3 > new source -> compiling Fmt.i3 > new source -> compiling OSErrorPosix.m3 > new source -> compiling FilePosix.i3 > new source -> compiling FilePosix.m3 > new source -> compiling FSPosix.m3 > new source -> compiling PipePosix.m3 > new source -> compiling PathnamePosix.m3 > new source -> compiling SocketPosix.m3 > > Fatal Error: bad version stamps: SocketPosix.m3 > > version stamp mismatch: Compiler.Platform > => SocketPosix.m3 > => Compiler.i3 > version stamp mismatch: Compiler.ThisPlatform > <8b5a6f513e082750> => SocketPosix.m3 > <8e110d4fed998051> => Compiler.i3 > [lapdog:~/cm3/m3-libs/libm3] mika% > > Tony Hosking writes: >> This is a phase ordering problem that arises when you use an old >> compiler to compile newer sources. It really should be fixed >> somehow. In any case, the problem is those lines in RTCollector at >> the bottom (I deleted them yesterday on the main trunk) that refer to >> values supposedly built in to the compiler (which are not there for >> the old binary you are using). I think if you delete those lines >> then >> you should be OK. Once you have a new compiler bootstrapped (with >> those configuration values available built in) then you should be >> able >> to compile that code (excepting that I just deleted those lines >> yesterday). >> >> >> On 21 Oct 2008, at 12:05, Mika Nystrom wrote: >> >>> Hello everyone, >>> >>> Sorry if I have asked this before---I feel I must have, and Tony >>> probably answered it, too, but I can't find it anywhere in my email >>> archives. >>> >>> It looks like I finally upgraded my Mac to Tiger a half year ago, >>> and everything broke. (Modula-3, emacs, make, etc etc etc etc.) >>> I am finally getting around to fixing it. Now I am trying to >>> compile CM3 in accordance with Tony's instructions as of June 24, >>> 2007: >>> >>> (short quote here) >>>> cd ~/cm3-cvs >>>> mkdir boot >>>> cd boot >>>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz >>>> ./cminstall >>> >>> Now you will have some kind of cm3 installed, presumably in /usr/ >>> local/cm3/bin/cm3. >>> >>> Make sure you have a fresh CVS checkout in directory cm3 (let's >>> assume this is in your home directory ~/cm3). Also, make sure you >>> have an up-to-date version of the CM3 backend compiler cm3cg >>> installed by executing the following: >>> >>> STEP 0: >>> >>> export CM3=/usr/local/cm3/bin/cm3 >>> cd ~/cm3/m3-sys/m3cc >>> $CM3 >>> $CM3 -ship >>> >>> You can skip this last step if you know your backend compiler is up >>> to date. >>> >>> Now, let's build the new compiler from scratch (this is the sequence >>> I use regularly to test changes to the run-time system whenever I >>> make them): >>> >>> STEP 1: >>> >>> cd ~/cm3/m3-libs/m3core >>> $CM3 >>> $CM3 -ship >>> (end short quote, there's much more) >>> >>> What happens is that when building m3core, my compiler is building >>> it against the interfaces in /usr/local/cm3, NOT the interfaces >>> within m3core itself: >>> >>> --- building in PPC_DARWIN --- >>> >>> ignoring ../src/m3overrides >>> >>> new source -> compiling RTCollector.m3 >>> "../src/runtime/common/RTCollector.m3", line 2914: unknown >>> qualification '.' (AMD64_LINUX) >>> "../src/runtime/common/RTCollector.m3", line 2915: unknown >>> qualification '.' (SPARC32_LINUX) >>> "../src/runtime/common/RTCollector.m3", line 2916: unknown >>> qualification '.' (SPARC64_OPENBSD) >>> "../src/runtime/common/RTCollector.m3", line 2917: unknown >>> qualification '.' (PPC32_OPENBSD) >>> 4 errors encountered >>> stale imports -> compiling RTDebug.m3 >>> >>> Fatal Error: bad version stamps: RTDebug.m3 >>> >>> version stamp mismatch: Compiler.Platform >>> => RTDebug.m3 >>> => Compiler.i3 >>> version stamp mismatch: Compiler.ThisPlatform >>> <8b5a6f513e082750> => RTDebug.m3 >>> <8e110d4fed998051> => Compiler.i3 >>> >>> I feel like I should REALLY know the answer to this, but how do I >>> get the compiler to use only the local sources and not attempt >>> to compile things with reference to the already-installed >>> interfaces? >>> >>> Mika From mika at async.caltech.edu Thu Oct 23 10:24:53 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 23 Oct 2008 01:24:53 -0700 Subject: [M3devel] NEW in RTType.m3 Message-ID: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu> Hello Modula-3 people, Does anyone know whether there is anything that prevents using NEW in RTType.m3? I added a lot of memory recycling to the Scheme interpreter I am working on, and now it seems it is spending a lot of time in Typecase and IsSubtype. I was wondering if it is possible to memoize IsSubtype inside RTType.m3... (specifically just replacing IsSubtype with an array lookup). It is the nature of the interpreter that it spends a lot of time checking types and narrowing things back and forth, as Scheme and Modula-3 references share the same representation. Mika From hosking at cs.purdue.edu Thu Oct 23 12:10:01 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Thu, 23 Oct 2008 11:10:01 +0100 Subject: [M3devel] NEW in RTType.m3 In-Reply-To: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu> References: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu> Message-ID: <7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu> Could be dangerous depending on module link orderings. Might be better to cache your own lookups in your interpreter. On 23 Oct 2008, at 09:24, Mika Nystrom wrote: > Hello Modula-3 people, > > Does anyone know whether there is anything that prevents using NEW > in RTType.m3? > > I added a lot of memory recycling to the Scheme interpreter I am > working on, and now it seems it is spending a lot of time in Typecase > and IsSubtype. I was wondering if it is possible to memoize IsSubtype > inside RTType.m3... (specifically just replacing IsSubtype with an > array lookup). > > It is the nature of the interpreter that it spends a lot of time > checking types and narrowing things back and forth, as Scheme and > Modula-3 references share the same representation. > > Mika From mika at async.caltech.edu Thu Oct 23 19:29:50 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 23 Oct 2008 10:29:50 -0700 Subject: [M3devel] NEW in RTType.m3 In-Reply-To: Your message of "Thu, 23 Oct 2008 11:10:01 BST." <7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu> Message-ID: <200810231729.m9NHToMC080136@camembert.async.caltech.edu> Well I'm not calling Typecase and IsSubtype directly---the compiler is inserting the calls. Here's an example of my code: 170 IF x # NIL AND ISTYPE(x,Symbol) THEN 171 RETURN env.lookup(x) 172 ELSIF x = NIL OR NOT ISTYPE(x,Pair) THEN 173 RETURN x 174 ELSE this code actually winds up in here (RTType.m3): PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN = VAR t: RT0.TypeDefn; BEGIN IF (a = RT0.NilTypecode) THEN RETURN TRUE END; t := Get (a); IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE END; WHILE (t.kind = ORD (TK.Obj)) DO IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END; t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent; IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE; END; END; IF (t.traced # 0) THEN RETURN (b = RT0.RefanyTypecode); ELSE RETURN (b = RT0.AddressTypecode); END; END IsSubtype; Again this is an example of something where the CM3 code seems to be hurting more than PM3, but it could be that for some reason I have more visibility into the CM3 code, or that there's an optimization difference (I haven't been able to investigate this fully yet). In any case, it's clear that if IsSubtype could be replaced with a table lookup, this kind of code would be accelerated by potentially a lot. Note that while in the above example the code might be accelerated by (in my opinion, less clear) use of TYPECODE (since I never subtype Symbol or Pair---for now!), this is not so for some NARROWs. The NARROWs also wind up calling RTType.IsSubtype, and they arise because I have types that depend on each other, and unless I want to introduce extra complexity (new partial revelations) or stick everything in the same interface, I am forced to NARROW something to avoid a circular dependency of interfaces... A method of A.T takes a B.T and a method of B.T takes an A.T, so I make a supertype X.T s.t. A.T <: X.T ; then I can declare B.T.m to take an X.T and NARROW it to A.T within B.T.m... triggering a call to the above code. (For simplicity's sake, X.T could be REFANY or ROOT.) An attempt to declare B.T.m as taking A.T would lead to a circular dependency between A and B. The code is really rather simple and it's a shame if you have to make it look much more complicated to avoid issues like these which might equally well be solved by tweaking the runtime implementation a bit. Mika Tony Hosking writes: >Could be dangerous depending on module link orderings. Might be >better to cache your own lookups in your interpreter. > >On 23 Oct 2008, at 09:24, Mika Nystrom wrote: > >> Hello Modula-3 people, >> >> Does anyone know whether there is anything that prevents using NEW >> in RTType.m3? >> >> I added a lot of memory recycling to the Scheme interpreter I am >> working on, and now it seems it is spending a lot of time in Typecase >> and IsSubtype. I was wondering if it is possible to memoize IsSubtype >> inside RTType.m3... (specifically just replacing IsSubtype with an >> array lookup). >> >> It is the nature of the interpreter that it spends a lot of time >> checking types and narrowing things back and forth, as Scheme and >> Modula-3 references share the same representation. >> >> Mika From mika at async.caltech.edu Sat Oct 25 05:16:56 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 24 Oct 2008 20:16:56 -0700 Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3 Message-ID: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu> Dear Modula-3 people, I had a crash in my program from a range error that I believe shouldn't have happened the way it did, although it's not in my code, so I'm not sure if there's a reason for the way it's done (matching a C declaration somewhere, maybe??). Here it is, from ThreadPosix.m3: PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN; timeoutInterval: LONGREAL := -1.0D0): WaitResult = <*FATAL Alerted*> BEGIN self.alertable := FALSE; RETURN XIOWait(fd, read, timeoutInterval); END IOWait; PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN; timeoutInterval: LONGREAL := -1.0D0): WaitResult RAISES {Alerted} = BEGIN self.alertable := TRUE; RETURN XIOWait(fd, read, timeoutInterval); END IOAlertWait; PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL): WaitResult RAISES {Alerted} = VAR res: INTEGER; fdindex := fd DIV FDSetSize; fdset := FDSet{fd MOD FDSetSize}; ... rest omitted ... Note that IOWait calls XIOWait. IOWait is declared as taking an INTEGER, but XIOWait takes a CARDINAL. So I really should use a CARDINAL in passing to IOWait, but since IOWait is the interface function it's not clear that I should do that (until my program crashes after passing -1 from some carelessly wrapped C code). I don't like the fact that I get a range error *inside* the library when it appears unnecessary---it should have happened in my code, as I make the call. Suggested improvement: declare all the FDs in SchedulerPosix.i3 (the interface that exports these routines) to be CARDINAL instead of INTEGER. Mika From hosking at cs.purdue.edu Mon Oct 27 15:28:52 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Mon, 27 Oct 2008 14:28:52 +0000 Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3 In-Reply-To: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu> References: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu> Message-ID: <5232F2E4-3B0E-49E5-B1C8-BB4D04C60C33@cs.purdue.edu> Sounds fair to me. On 25 Oct 2008, at 04:16, Mika Nystrom wrote: > > Dear Modula-3 people, > > I had a crash in my program from a range error that I believe > shouldn't have happened the way it did, although it's not in my > code, so I'm not sure if there's a reason for the way it's done > (matching > a C declaration somewhere, maybe??). > > Here it is, from ThreadPosix.m3: > > PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN; > timeoutInterval: LONGREAL := -1.0D0): WaitResult = > <*FATAL Alerted*> > BEGIN > self.alertable := FALSE; > RETURN XIOWait(fd, read, timeoutInterval); > END IOWait; > > PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN; > timeoutInterval: LONGREAL := -1.0D0): WaitResult > RAISES {Alerted} = > BEGIN > self.alertable := TRUE; > RETURN XIOWait(fd, read, timeoutInterval); > END IOAlertWait; > > PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL): > WaitResult > RAISES {Alerted} = > VAR res: INTEGER; > fdindex := fd DIV FDSetSize; > fdset := FDSet{fd MOD FDSetSize}; > ... rest omitted ... > > Note that IOWait calls XIOWait. IOWait is declared as taking an > INTEGER, but XIOWait takes a CARDINAL. > > So I really should use a CARDINAL in passing to IOWait, but since > IOWait is the interface function it's not clear that I should do > that (until my program crashes after passing -1 from some carelessly > wrapped C code). I don't like the fact that I get a range error > *inside* the library when it appears unnecessary---it should have > happened in my code, as I make the call. > > Suggested improvement: declare all the FDs in SchedulerPosix.i3 > (the interface that exports these routines) to be CARDINAL instead > of INTEGER. > > Mika From jay.krell at cornell.edu Thu Oct 30 22:21:09 2008 From: jay.krell at cornell.edu (Jay) Date: Thu, 30 Oct 2008 21:21:09 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: Please try this: http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 std failed to build because stubgen crashed, probably due to gc. cm3 does crash right away without @M3nogc. Something like this: cd /src wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 cd /cm3 rm -rf * tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 cd /src/cm3/scripts/python ./do-cm3-all.py realclean ./upgrade.py ./do-cm3-all.py realclean ./do-cm3-std.py buildship => it will fail, at zeus, but it should get far; you'll also need some X devel packages to get that far, I had a failure for lack of libXaw for example. I did not run anything, any of the GUI packages, but building itself with itself is a decent test. I renamed the old AMD64_LINUX archives to "1.0.0". http://www.opencm3.com/uploaded-archives/ This has the bug fix I commited last night to cm3cg, and therefore a 64 bit hosted cm3cg. jay at amd64a:/cm3/bin$ file * AMD64_LINUX: ASCII text cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped cm3.cfg: ASCII English text cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped Unix.common: ASCII English text Built on Debian 4.0r4 (r5 is out). jay at amd64a:/cm3/bin$ uname -a Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64 GNU/Linux jay at amd64a:/cm3/bin$ dmesg | head Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805) Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org) ( gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Tue Aug 19 04:30:56 UTC 2008 Though really I couldn't do it without Visual C++ on Windows providing excellent find-in-files and editing, nothing else comes close, I edit on Windows and scp the files over. :) - Jay ________________________________ From: jay.krell at cornell.edu To: dragisha at m3w.org; m3devel at elegosoft.com Date: Tue, 9 Sep 2008 09:43:03 +0000 Subject: Re: [M3devel] AMD64_LINUX status From hosking at cs.purdue.edu Fri Oct 31 11:19:51 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 31 Oct 2008 10:19:51 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: Umm, I think I found your bug with GC: Check out "RTMachine.PointerAlignment". You have it set to BITSIZE(INTEGER). I suspect what you want is something like BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should probably be 2*BYTESIZE(ADDRESS). On 30 Oct 2008, at 21:21, Jay wrote: > > Please try this: > > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > > std failed to build because stubgen crashed, probably due to gc. > cm3 does crash right away without @M3nogc. > > Something like this: > cd /src > wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > cd /cm3 > rm -rf * > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > d5.7.0.tar.bz2 > cd /src/cm3/scripts/python > ./do-cm3-all.py realclean > ./upgrade.py > ./do-cm3-all.py realclean > ./do-cm3-std.py buildship > => it will fail, at zeus, but it should get far; you'll also need > some X devel packages to get that far, I had a failure for lack of > libXaw for example. I did not run anything, any of the GUI packages, > but building itself with itself is a decent test. > > I renamed the old AMD64_LINUX archives to "1.0.0". > http://www.opencm3.com/uploaded-archives/ > > This has the bug fix I commited last night to cm3cg, and therefore a > 64 bit hosted cm3cg. > > jay at amd64a:/cm3/bin$ file * > AMD64_LINUX: ASCII text > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > for GNU/Linux 2.6.0, not stripped > cm3.cfg: ASCII English text > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Li > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > 2.6.0, not stripped > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Li > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > 2.6.0, not stripped > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > for GNU/Linux 2.6.0, not stripped > Unix.common: ASCII English text > > Built on Debian 4.0r4 (r5 is out). > jay at amd64a:/cm3/bin$ uname -a > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > x86_64 GNU/Linux > jay at amd64a:/cm3/bin$ dmesg | head > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805) > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org > ) ( > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > Tue Aug 19 04:30:56 UTC 2008 > > Though really I couldn't do it without Visual C++ on Windows > providing excellent find-in-files and editing, nothing else comes > close, I edit on Windows and scp the files over. :) > > - Jay > > ________________________________ > > From: jay.krell at cornell.edu > To: dragisha at m3w.org; m3devel at elegosoft.com > Date: Tue, 9 Sep 2008 09:43:03 +0000 > Subject: Re: [M3devel] AMD64_LINUX status > > > > From jay.krell at cornell.edu Fri Oct 31 14:52:43 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 31 Oct 2008 13:52:43 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: Tony, Excellent, thanks, that helps. How do you know and confirm the right values? I don't like guessing. And then cause then of :) : SymbolPickling font metrics...Done./cm3/bin/m3bundle -name JunoBundle -F/tmp/qk/cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTABstubgen: Processing RemoteView.T ****** runtime error:*** NEW() was unable to allocate more memory.*** file "../src/runtime/common/RTAllocator.m3", line 285*** "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit 1536: /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB --procedure-- -line- -file---exec -- _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmplnetobjv1 44 /cm3/pkg/netobj/src/netobj.tmplnetobj 64 /cm3/pkg/netobj/src/netobj.tmplinclude_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args I should debug it, and double check that I upgraded what had to be upgraded. - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Fri, 31 Oct 2008 10:19:51 +0000> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD64_LINUX status> > Umm, I think I found your bug with GC:> > Check out "RTMachine.PointerAlignment". You have it set to > BITSIZE(INTEGER). I suspect what you want is something like > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > probably be 2*BYTESIZE(ADDRESS).> > > > On 30 Oct 2008, at 21:21, Jay wrote:> > >> > Please try this:> >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> >> > std failed to build because stubgen crashed, probably due to gc.> > cm3 does crash right away without @M3nogc.> >> > Something like this:> > cd /src> > wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > cd /cm3> > rm -rf *> > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > d5.7.0.tar.bz2> > cd /src/cm3/scripts/python> > ./do-cm3-all.py realclean> > ./upgrade.py> > ./do-cm3-all.py realclean> > ./do-cm3-std.py buildship> > => it will fail, at zeus, but it should get far; you'll also need > > some X devel packages to get that far, I had a failure for lack of > > libXaw for example. I did not run anything, any of the GUI packages, > > but building itself with itself is a decent test.> >> > I renamed the old AMD64_LINUX archives to "1.0.0".> > http://www.opencm3.com/uploaded-archives/> >> > This has the bug fix I commited last night to cm3cg, and therefore a > > 64 bit hosted cm3cg.> >> > jay at amd64a:/cm3/bin$ file *> > AMD64_LINUX: ASCII text> > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > cm3.cfg: ASCII English text> > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > Unix.common: ASCII English text> >> > Built on Debian 4.0r4 (r5 is out).> > jay at amd64a:/cm3/bin$ uname -a> > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > x86_64 GNU/Linux> > jay at amd64a:/cm3/bin$ dmesg | head> > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org > > ) (> > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > Tue Aug 19 04:30:56 UTC 2008> >> > Though really I couldn't do it without Visual C++ on Windows > > providing excellent find-in-files and editing, nothing else comes > > close, I edit on Windows and scp the files over. :)> >> > - Jay> >> > ________________________________> >> > From: jay.krell at cornell.edu> > To: dragisha at m3w.org; m3devel at elegosoft.com> > Date: Tue, 9 Sep 2008 09:43:03 +0000> > Subject: Re: [M3devel] AMD64_LINUX status> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Oct 31 15:25:13 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 31 Oct 2008 14:25:13 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: <1225462205.14482.60.camel@faramir.m3w.org> References: <1220941880.9421.11.camel@faramir.m3w.org> <1225462205.14482.60.camel@faramir.m3w.org> Message-ID: It seems like there's still a problem. I haven't debugged it yet. (I'm sure glad Tony found the other problem before I debugged it.) I updated http://www.opencm3.com/uploaded-archives with Tony's fix. The older builds are now 0.0.0.1 and 0.0.0.2. - Jay> Subject: Re: [M3devel] AMD64_LINUX status> From: dragisha at m3w.org> To: jay.krell at cornell.edu> CC: hosking at cs.purdue.edu; m3devel at elegosoft.com> Date: Fri, 31 Oct 2008 15:10:05 +0100> > So, we now have fully functional AMD64_LINUX (_with_ GC)?> > TIA> > On Fri, 2008-10-31 at 13:52 +0000, Jay wrote:> > Tony, Excellent, thanks, that helps.> > How do you know and confirm the right values? I don't like guessing.> > > > And then cause then of :) :> > > > Symbol> > Pickling font metrics...> > Done.> > /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk> > /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > stubgen: Processing RemoteView.T> > > > ***> > *** runtime error:> > *** NEW() was unable to allocate more memory.> > *** file "../src/runtime/common/RTAllocator.m3", line 285> > ***> > "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit> > 1536: /cm3> > /bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > --procedure-- -line- -file---> > exec -- > > _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmpl> > netobjv1 44 /cm3/pkg/netobj/src/netobj.tmpl> > netobj 64 /cm3/pkg/netobj/src/netobj.tmpl> > include_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile> > > > 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args> > > > > > I should debug it, and double check that I upgraded what had to be> > upgraded.> > > > - Jay> > > > > > > > > From: hosking at cs.purdue.edu> > > To: jay.krell at cornell.edu> > > Date: Fri, 31 Oct 2008 10:19:51 +0000> > > CC: m3devel at elegosoft.com> > > Subject: Re: [M3devel] AMD64_LINUX status> > > > > > Umm, I think I found your bug with GC:> > > > > > Check out "RTMachine.PointerAlignment". You have it set to > > > BITSIZE(INTEGER). I suspect what you want is something like > > > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > > > probably be 2*BYTESIZE(ADDRESS).> > > > > > > > > > > > On 30 Oct 2008, at 21:21, Jay wrote:> > > > > > >> > > > Please try this:> > > >> > > >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > >> > > > std failed to build because stubgen crashed, probably due to gc.> > > > cm3 does crash right away without @M3nogc.> > > >> > > > Something like this:> > > > cd /src> > > > wget> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > > cd /cm3> > > > rm -rf *> > > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > > > d5.7.0.tar.bz2> > > > cd /src/cm3/scripts/python> > > > ./do-cm3-all.py realclean> > > > ./upgrade.py> > > > ./do-cm3-all.py realclean> > > > ./do-cm3-std.py buildship> > > > => it will fail, at zeus, but it should get far; you'll also need > > > > some X devel packages to get that far, I had a failure for lack> > of > > > > libXaw for example. I did not run anything, any of the GUI> > packages, > > > > but building itself with itself is a decent test.> > > >> > > > I renamed the old AMD64_LINUX archives to "1.0.0".> > > > http://www.opencm3.com/uploaded-archives/> > > >> > > > This has the bug fix I commited last night to cm3cg, and therefore> > a > > > > 64 bit hosted cm3cg.> > > >> > > > jay at amd64a:/cm3/bin$ file *> > > > AMD64_LINUX: ASCII text> > > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > cm3.cfg: ASCII English text> > > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > Unix.common: ASCII English text> > > >> > > > Built on Debian 4.0r4 (r5 is out).> > > > jay at amd64a:/cm3/bin$ uname -a> > > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > > > x86_64 GNU/Linux> > > > jay at amd64a:/cm3/bin$ dmesg | head> > > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2)> > (dannf at debian.org > > > > ) (> > > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > > > Tue Aug 19 04:30:56 UTC 2008> > > >> > > > Though really I couldn't do it without Visual C++ on Windows > > > > providing excellent find-in-files and editing, nothing else comes > > > > close, I edit on Windows and scp the files over. :)> > > >> > > > - Jay> > > >> > > > ________________________________> > > >> > > > From: jay.krell at cornell.edu> > > > To: dragisha at m3w.org; m3devel at elegosoft.com> > > > Date: Tue, 9 Sep 2008 09:43:03 +0000> > > > Subject: Re: [M3devel] AMD64_LINUX status> > > >> > > >> > > >> > > >> > > > > > -- > Dragi?a Duri? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Oct 31 15:10:05 2008 From: dragisha at m3w.org (=?UTF-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 31 Oct 2008 15:10:05 +0100 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: <1225462205.14482.60.camel@faramir.m3w.org> So, we now have fully functional AMD64_LINUX (_with_ GC)? TIA On Fri, 2008-10-31 at 13:52 +0000, Jay wrote: > Tony, Excellent, thanks, that helps. > How do you know and confirm the right values? I don't like guessing. > > And then cause then of :) : > > Symbol > Pickling font metrics... > Done. > /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk > /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB > stubgen: Processing RemoteView.T > > *** > *** runtime error: > *** NEW() was unable to allocate more memory. > *** file "../src/runtime/common/RTAllocator.m3", line 285 > *** > "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit > 1536: /cm3 > /bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB > --procedure-- -line- -file--- > exec -- > _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmpl > netobjv1 44 /cm3/pkg/netobj/src/netobj.tmpl > netobj 64 /cm3/pkg/netobj/src/netobj.tmpl > include_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile > > 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args > > > I should debug it, and double check that I upgraded what had to be > upgraded. > > - Jay > > > > > From: hosking at cs.purdue.edu > > To: jay.krell at cornell.edu > > Date: Fri, 31 Oct 2008 10:19:51 +0000 > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] AMD64_LINUX status > > > > Umm, I think I found your bug with GC: > > > > Check out "RTMachine.PointerAlignment". You have it set to > > BITSIZE(INTEGER). I suspect what you want is something like > > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > > probably be 2*BYTESIZE(ADDRESS). > > > > > > > > On 30 Oct 2008, at 21:21, Jay wrote: > > > > > > > > Please try this: > > > > > > > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > > > > > > std failed to build because stubgen crashed, probably due to gc. > > > cm3 does crash right away without @M3nogc. > > > > > > Something like this: > > > cd /src > > > wget > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > > > cd /cm3 > > > rm -rf * > > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > > d5.7.0.tar.bz2 > > > cd /src/cm3/scripts/python > > > ./do-cm3-all.py realclean > > > ./upgrade.py > > > ./do-cm3-all.py realclean > > > ./do-cm3-std.py buildship > > > => it will fail, at zeus, but it should get far; you'll also need > > > some X devel packages to get that far, I had a failure for lack > of > > > libXaw for example. I did not run anything, any of the GUI > packages, > > > but building itself with itself is a decent test. > > > > > > I renamed the old AMD64_LINUX archives to "1.0.0". > > > http://www.opencm3.com/uploaded-archives/ > > > > > > This has the bug fix I commited last night to cm3cg, and therefore > a > > > 64 bit hosted cm3cg. > > > > > > jay at amd64a:/cm3/bin$ file * > > > AMD64_LINUX: ASCII text > > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared > libs), > > > for GNU/Linux 2.6.0, not stripped > > > cm3.cfg: ASCII English text > > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Li > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > 2.6.0, not stripped > > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Li > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > 2.6.0, not stripped > > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared > libs), > > > for GNU/Linux 2.6.0, not stripped > > > Unix.common: ASCII English text > > > > > > Built on Debian 4.0r4 (r5 is out). > > > jay at amd64a:/cm3/bin$ uname -a > > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > > x86_64 GNU/Linux > > > jay at amd64a:/cm3/bin$ dmesg | head > > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805) > > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) > (dannf at debian.org > > > ) ( > > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > > Tue Aug 19 04:30:56 UTC 2008 > > > > > > Though really I couldn't do it without Visual C++ on Windows > > > providing excellent find-in-files and editing, nothing else comes > > > close, I edit on Windows and scp the files over. :) > > > > > > - Jay > > > > > > ________________________________ > > > > > > From: jay.krell at cornell.edu > > > To: dragisha at m3w.org; m3devel at elegosoft.com > > > Date: Tue, 9 Sep 2008 09:43:03 +0000 > > > Subject: Re: [M3devel] AMD64_LINUX status > > > > > > > > > > > > > > > -- Dragi?a Duri? From jay.krell at cornell.edu Wed Oct 1 01:24:14 2008 From: jay.krell at cornell.edu (Jay) Date: Tue, 30 Sep 2008 23:24:14 +0000 Subject: [M3devel] ARM Darwin In-Reply-To: <7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org> References: <5ED8E753-6B9E-4FED-8689-1D3D317A5A36@cs.purdue.edu> <7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org> Message-ID: Get me a machine and I'll work on it. :) I'll get one before long but I'm bogged down with existing x86, AMD64, PPC, PPC64 (AIX), Mips (Irix) hardware not yet being used for all its meant.. I suspect Apple hasn't pushed their changes up, so be sure to poke around their gcc source. > Apple are building their own ARM GCC and use that to configure the > back end. Then the runtime issues which I imagine might be with the GC gcc -v ? > and threading. I'm not sure there will be any native treading and I'm > sure VM will look very different. I assume it'll look like most any Posix or *_DARWIN or 32bit thereof system. I assume it has pthreads. - Jay > From: darko at darko.org > To: hosking at cs.purdue.edu > Date: Tue, 30 Sep 2008 14:59:39 +0200 > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] ARM Darwin > > Thanks, it should be a bit easier than the normal process since the > compiler doesn't have to be fully bootstrapped, I just have to get a > cross working. I know the first thing is to get the machine > configuration correct, which I'll start when I get my hands on one of > the machines in a couple of days. The other thing is to work out how > Apple are building their own ARM GCC and use that to configure the > back end. Then the runtime issues which I imagine might be with the GC > and threading. I'm not sure there will be any native treading and I'm > sure VM will look very different. > > > On 30/09/2008, at 2:44 PM, Tony Hosking wrote: > >> I can share tips... >> >> On Sep 30, 2008, at 1:41 PM, Darko wrote: >> >>> Is anyone interested in working on an ARM port for Darwin? Or maybe >>> just providing some tips as I give it a try? >>> >>> Cheers, >>> Darko. >> > From jay.krell at cornell.edu Wed Oct 1 08:41:03 2008 From: jay.krell at cornell.edu (Jay) Date: Wed, 1 Oct 2008 06:41:03 +0000 Subject: [M3devel] AMD-64 binaries? In-Reply-To: <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu> References: <48BDF24B.900@wichita.edu> <20080903075804.zhep2ichmow00scg@mail.elegosoft.com> <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu> Message-ID: No -- you would know best about AMD64_DARWIN. I'm sure ALPHA_OSF used to work, but it's been so long, I don't think it counts. I'm being lazy. file AMD64_DARWIN/cm3cg => fat binary? I doubt it. => with ppc, i386, amd64? (doubt it) => or just ppc, i386? (doubt it) => or just i386? This is I "suspect". => or just AMD64. This would be somewhat interesting. I'm pretty sure cm3cg is always 32bit "these days". I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in the same way. This was a nice thing to find, that the problem is portable to multiple?all 64 bit hosts. I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same problem. Anyway, I should really get to debugging this soon. It's a bit odd because gcc itself doesn't have this bug and I reviewed a lot of the code and it was ok. I'm just going to have to step through it in parallel on 32bit and 64bit hosts and find where they diverge. A LOT was identical, like the files output by cm3 into cm3cg were identical. I was close a few months ago but sloughed off. - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Tue, 30 Sep 2008 10:16:41 +0100> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD-64 binaries?> > 64-bit hosted tools? Do you mean only for Linux? I don't quite > understand what you are saying.> > On Sep 30, 2008, at 9:36 AM, Jay wrote:> > >> > I'm getting back to this now.> > I didn't realize it till this weekend, but that archive is > > "relatively incompatible".> > In particular it has 32bit hosted tools, and won't run on Debian > > 4.0r4 / AMD64.> > Something about glibc 2.4, when all I see on my system is 2.3.> > I'll see what I can do.> > Probably just rebuild cm3cg.> > I think it was built on Fedora, but could have been Ubuntu or > > OpenSuse.> > Probably just that Debian stable lags the others.> >> > The main problem to debug is why 64bit hosted tools "never" work.> > (Right?)> >> >> > Stay tuned for a bunch more ports "soon", I've got a bunch more > > hardware,> > that runs Linux and others (Solaris, AIX, Irix).. :)> >> > I'll be able to debug the high dpi gui problems on a friend's laptop > > soon too.> > Send me a repro. I expect it is trivial -- like anything with a > > scrollbar.> > I can try formsedit, etc.> >> >> > - Jay> >> >> >> Date: Wed, 3 Sep 2008 07:58:04 +0200> >> From: wagner at elegosoft.com> >> To: m3devel at elegosoft.com> >> Subject: Re: [M3devel] AMD-64 binaries?> >>> >> Quoting "Rodney M. Bates" :> >>> >>> Are there binaries for AMD-64 around that can be used> >>> to bootstrap a 64-bit Linux compiler?> >>> >> Have a look at> >>> >> http://www.opencm3.net/uploaded-archives/index.html> >>> >> There are some AMD64 archives; I don't know about their status> >> offhand, though. I think Jay Krell produced them.> >> AFAIK there is no regular build on this platform yet.> >>> >> Olaf> >> --> >> Olaf Wagner -- elego Software Solutions GmbH> >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany> >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 > >> 45 86 95> >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: > >> Berlin> >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: > >> DE163214194> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Wed Oct 1 09:02:29 2008 From: jay.krell at cornell.edu (Jay) Date: Wed, 1 Oct 2008 07:02:29 +0000 Subject: [M3devel] m3cc build fails on older MacOS X In-Reply-To: <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org> References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com> <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org> Message-ID: well, I agree and disagree. "Almost everyone" only cares about C++, C#, Windows, and a little bit of Linux and Java. "Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc. Supporting 10.2 and 10.3 "ought not" be so difficult, but ok. I wiped out the install and won't likely come back to it until a bunch of other things are done. e.g.: debug 64 bit hosted cm3cg move PPC_LINUX to pthreads high dpi bring up or backup a bunch of targets I have hardware for, and some others I don't have yet. Adding back support for NT4/Win9x probably not hard, though similar with gcc on Mac, the current Microsoft tools no longer target them. It all gets easier with virtualization.. (Which is easiest on x86/amd64.) - Jay > From: darko at darko.org > To: hosking at cs.purdue.edu > Date: Tue, 30 Sep 2008 11:50:43 +0200 > CC: m3devel at elegosoft.com; jay.krell at cornell.edu > Subject: Re: [M3devel] m3cc build fails on older MacOS X > > I think supporting the latest version is enough work. I don't see the > point of supporting older releases. Also, this seems to be relevant to > development on that version of the system. Anyone who wants to build > can upgrade. > > > On 30/09/2008, at 11:15 AM, Tony Hosking wrote: > >> Does anyone really care about 10.3 now? As I recall, it had some >> pretty broken assumptions. >> >> On Sep 30, 2008, at 9:25 AM, Jay wrote: >> >>> >>> I have a machine running 10.3 now. >>> >>> gcc-4.3.2 (the current release) won't (toplevel) configure on >>> MacOSX 10.3 apparently because its assembler doesn't support >>> ".machine". >>> Current "cctools" won't compile on 10.3 without patches or other >>> updates, due to mucking with ppc64 stuff, though that is easy to fix. >>> >>> A simple wrapper around as for use on 10.3 that strips the .machine >>> directive is probably reasonable, or a patch to gcc to just not >>> emit it for Darwin, except maybe for non-ppc, or subject to a switch. >>> >>> Other than support for more architectures, I never found any of the >>> updates beyond 10.2 very interesting. >>> Though current Firefox and Safari also won't run on 10.3. >>> >>> IF I get this working, maybe I'll bring 10.2 back up also.. >>> >>> - Jay >>> >>> ________________________________ >>> >>> From: jayk123 at hotmail.com >>> To: wagner at elegosoft.com; m3devel at elegosoft.com >>> Subject: RE: [M3devel] m3cc build fails on older MacOS X >>> Date: Tue, 6 May 2008 10:49:11 +0000 >>> >>> >>> >>> >>> I don't know what these Darwin versions are. >>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5? >>> I used to run 10.2 and could perhaps bring it back (though I'd hate >>> to lose my PPC_LINUX install.. :( ) >>> >>>> make[2]: Nothing to be done for `all'. >>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>> `patsubst'. Stop. >>> >>> Hopefully that's enough context though. >>> >>> The rest is a cascade. >>> What happens if you remove all my m3makefile wierdness (which works >>> everywhere else..) and just configure and make? >>> >>> Can I ssh into this? >>> >>> - Jay >>> >>> >>> >>> ________________________________ >>> >>> >>>> Date: Tue, 6 May 2008 07:57:54 +0200 >>>> From: wagner at elegosoft.com >>>> To: m3devel at elegosoft.com >>>> Subject: [M3devel] m3cc build fails on older MacOS X >>>> >>>> On % uname -a >>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30 >>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power >>>> Macintosh powerpc: >>>> >>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o >>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o >>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./fnmatch.o >>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./getruntime.o >>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o >>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./obstack.o >>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o >>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o >>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o >>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o >>>> ./xstrndup.o> required-list >>>> make[2]: Nothing to be done for `all'. >>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>> `patsubst'. Stop. >>>> make: *** [all-libcpp] Error 2 >>>> /bin/sh: line 1: cd: gcc: No such file or directory >>>> make: *** No rule to make target `s-modes'. Stop. >>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314: quake >>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2 >>>> >>>> --procedure-- -line- -file--- >>>> cp_if -- >>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>> 9 >>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args >>>> >>>> Fatal Error: package build failed >>>> ==> m3-sys/m3cc done >>>> >>>> Any ideas? >>>> >>>> Olaf >>>> -- >>>> Olaf Wagner -- elego Software Solutions GmbH >>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany >>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 >>>> 45 86 95 >>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: >>>> Berlin >>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: >>>> DE163214194 >>>> >>> >> > From darko at darko.org Wed Oct 1 09:10:35 2008 From: darko at darko.org (Darko) Date: Wed, 1 Oct 2008 09:10:35 +0200 Subject: [M3devel] m3cc build fails on older MacOS X In-Reply-To: References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com> <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org> Message-ID: <973F196C-4B4A-4526-878C-93942E48E72A@darko.org> Why bother with it if no one uses it and no-one is going to use it? Supporting M3 on Macs is good because people will use it into the future. People aren't moving back to 10.3. I wouldn't bother with it at all. On 01/10/2008, at 9:02 AM, Jay wrote: > > well, I agree and disagree. > > "Almost everyone" only cares about C++, C#, Windows, and a little > bit of Linux and Java. > "Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc. > > Supporting 10.2 and 10.3 "ought not" be so difficult, but ok. > > I wiped out the install and won't likely come back to it until > a bunch of other things are done. > e.g.: > debug 64 bit hosted cm3cg > move PPC_LINUX to pthreads > high dpi > bring up or backup a bunch of targets I have hardware for, > and some others I don't have yet. > > Adding back support for NT4/Win9x probably not hard, though > similar with gcc on Mac, the current Microsoft tools no longer > target them. > > It all gets easier with virtualization.. > (Which is easiest on x86/amd64.) > > - Jay > > > >> From: darko at darko.org >> To: hosking at cs.purdue.edu >> Date: Tue, 30 Sep 2008 11:50:43 +0200 >> CC: m3devel at elegosoft.com; jay.krell at cornell.edu >> Subject: Re: [M3devel] m3cc build fails on older MacOS X >> >> I think supporting the latest version is enough work. I don't see the >> point of supporting older releases. Also, this seems to be relevant >> to >> development on that version of the system. Anyone who wants to build >> can upgrade. >> >> >> On 30/09/2008, at 11:15 AM, Tony Hosking wrote: >> >>> Does anyone really care about 10.3 now? As I recall, it had some >>> pretty broken assumptions. >>> >>> On Sep 30, 2008, at 9:25 AM, Jay wrote: >>> >>>> >>>> I have a machine running 10.3 now. >>>> >>>> gcc-4.3.2 (the current release) won't (toplevel) configure on >>>> MacOSX 10.3 apparently because its assembler doesn't support >>>> ".machine". >>>> Current "cctools" won't compile on 10.3 without patches or other >>>> updates, due to mucking with ppc64 stuff, though that is easy to >>>> fix. >>>> >>>> A simple wrapper around as for use on 10.3 that strips the .machine >>>> directive is probably reasonable, or a patch to gcc to just not >>>> emit it for Darwin, except maybe for non-ppc, or subject to a >>>> switch. >>>> >>>> Other than support for more architectures, I never found any of the >>>> updates beyond 10.2 very interesting. >>>> Though current Firefox and Safari also won't run on 10.3. >>>> >>>> IF I get this working, maybe I'll bring 10.2 back up also.. >>>> >>>> - Jay >>>> >>>> ________________________________ >>>> >>>> From: jayk123 at hotmail.com >>>> To: wagner at elegosoft.com; m3devel at elegosoft.com >>>> Subject: RE: [M3devel] m3cc build fails on older MacOS X >>>> Date: Tue, 6 May 2008 10:49:11 +0000 >>>> >>>> >>>> >>>> >>>> I don't know what these Darwin versions are. >>>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5? >>>> I used to run 10.2 and could perhaps bring it back (though I'd hate >>>> to lose my PPC_LINUX install.. :( ) >>>> >>>>> make[2]: Nothing to be done for `all'. >>>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>>> `patsubst'. Stop. >>>> >>>> Hopefully that's enough context though. >>>> >>>> The rest is a cascade. >>>> What happens if you remove all my m3makefile wierdness (which works >>>> everywhere else..) and just configure and make? >>>> >>>> Can I ssh into this? >>>> >>>> - Jay >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> >>>>> Date: Tue, 6 May 2008 07:57:54 +0200 >>>>> From: wagner at elegosoft.com >>>>> To: m3devel at elegosoft.com >>>>> Subject: [M3devel] m3cc build fails on older MacOS X >>>>> >>>>> On % uname -a >>>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30 >>>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power >>>>> Macintosh powerpc: >>>>> >>>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o >>>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o >>>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./ >>>>> fnmatch.o >>>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./ >>>>> getruntime.o >>>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o >>>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./ >>>>> obstack.o >>>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o >>>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o >>>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o >>>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o >>>>> ./xstrndup.o> required-list >>>>> make[2]: Nothing to be done for `all'. >>>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>>> `patsubst'. Stop. >>>>> make: *** [all-libcpp] Error 2 >>>>> /bin/sh: line 1: cd: gcc: No such file or directory >>>>> make: *** No rule to make target `s-modes'. Stop. >>>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314: >>>>> quake >>>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2 >>>>> >>>>> --procedure-- -line- -file--- >>>>> cp_if -- >>>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>>> 9 >>>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args >>>>> >>>>> Fatal Error: package build failed >>>>> ==> m3-sys/m3cc done >>>>> >>>>> Any ideas? >>>>> >>>>> Olaf >>>>> -- >>>>> Olaf Wagner -- elego Software Solutions GmbH >>>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany >>>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 >>>>> 45 86 95 >>>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: >>>>> Berlin >>>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: >>>>> DE163214194 >>>>> >>>> >>> >> From darko at darko.org Wed Oct 1 12:03:15 2008 From: darko at darko.org (Darko) Date: Wed, 1 Oct 2008 12:03:15 +0200 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu> Message-ID: I've extended one of the modules with a function that formats any allocated value for printing. If you're interested I can clean them up a little and post them. On 28/09/2008, at 8:01 AM, Darko wrote: > As far as I know, yes, they're not in the binary. I'd love to be > proven wrong though, or fix it so they did. I have a module that > reads the .M3WEB file and maps it to types and a module that will > read and write any field within a type safely using a numeric index. > Neither is perfect. You can integrate the two to get what you want > but I seem to remember having some problems mapping type ids (UIDs?) > to typecodes at runtime. > > > On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: > >> Right, I am aware of those interfaces.. just wondering what was >> out there. Do I really need to look at .M3WEB? I thought >> that m3gdb could figure out things without anything outside >> of the binary... >> >> I'm looking for essentially what m3gdb offers, say prints >> at minimum the name of the type (this I recall is trivial with >> some of the RT* interfaces) but hopefully also with field names >> and values, but doesn't expand references recursively.. something >> like that? >> >> Mika >> >> Darko writes: >>> You can use RTTipe to read the fields and values within a type. If >>> you >>> also want the type and field names you can interpret the .M3WEB >>> file. >>> I have a couple of modules that do something like that but they are >>> not what you would call finished. What level of detail are you >>> after? >>> >>> >>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> I am working on a writing an interpreter that I'd like to embed in >>>> various Modula-3 programs. It so happens that this interpreter >>>> might from time to time be manipulating arbitrary M3 REFs, and just >>>> from the point of view of providing information to a human user, >>>> it might be nice to be able to pretty-print these. Does anyone >>>> have any code that accomplishes this, at least partly? I'm >>>> thinking >>>> that since m3gdb can do it, the information must all be in the >>>> binary---somehow. (Even enumeration names, right?) And since the >>>> pickler can pickle things... hmm. >>>> >>>> I would greatly appreciate any guidance that's out there... >>>> >>>> Best regards, >>>> Mika Nystrom > From hosking at cs.purdue.edu Wed Oct 1 11:59:23 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Wed, 1 Oct 2008 10:59:23 +0100 Subject: [M3devel] AMD-64 binaries? In-Reply-To: References: <48BDF24B.900@wichita.edu> <20080903075804.zhep2ichmow00scg@mail.elegosoft.com> <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu> Message-ID: <26766FFA-C3B6-45D2-8156-80FD14922882@cs.purdue.edu> I can definitely vouch for ALPHA_OSF having worked as recently as two years ago, but without the pthreads native threading system. That port should have been easy enough I suspect. On Oct 1, 2008, at 7:41 AM, Jay wrote: > No -- you would know best about AMD64_DARWIN. > I'm sure ALPHA_OSF used to work, but it's been so long, I don't > think it counts. > > I'm being lazy. > > file AMD64_DARWIN/cm3cg > => fat binary? I doubt it. > => with ppc, i386, amd64? (doubt it) > => or just ppc, i386? (doubt it) > => or just i386? This is I "suspect". > => or just AMD64. This would be somewhat interesting. I believe that is how I configured it. > I'm pretty sure cm3cg is always 32bit "these days". Nope, cm3cg on AMD64_DARWIN is 64-bit. > I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in > the same way. > This was a nice thing to find, that the problem is portable to > multiple?all 64 bit hosts. > > I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same > problem. Don't thinks so. > Anyway, I should really get to debugging this soon. > > It's a bit odd because gcc itself doesn't have this bug and I > reviewed a lot of the code and it was ok. I'm just going to have to > step through it in parallel on 32bit and 64bit hosts and find where > they diverge. A LOT was identical, like the files output by cm3 into > cm3cg were identical. Yes, the intermediate code should be identical. Any such problems would be with cm3cg. > I was close a few months ago but sloughed off. Good luck. > > > - Jay > > > > From: hosking at cs.purdue.edu > > To: jay.krell at cornell.edu > > Date: Tue, 30 Sep 2008 10:16:41 +0100 > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] AMD-64 binaries? > > > > 64-bit hosted tools? Do you mean only for Linux? I don't quite > > understand what you are saying. > > > > On Sep 30, 2008, at 9:36 AM, Jay wrote: > > > > > > > > I'm getting back to this now. > > > I didn't realize it till this weekend, but that archive is > > > "relatively incompatible". > > > In particular it has 32bit hosted tools, and won't run on Debian > > > 4.0r4 / AMD64. > > > Something about glibc 2.4, when all I see on my system is 2.3. > > > I'll see what I can do. > > > Probably just rebuild cm3cg. > > > I think it was built on Fedora, but could have been Ubuntu or > > > OpenSuse. > > > Probably just that Debian stable lags the others. > > > > > > The main problem to debug is why 64bit hosted tools "never" work. > > > (Right?) > > > > > > > > > Stay tuned for a bunch more ports "soon", I've got a bunch more > > > hardware, > > > that runs Linux and others (Solaris, AIX, Irix).. :) > > > > > > I'll be able to debug the high dpi gui problems on a friend's > laptop > > > soon too. > > > Send me a repro. I expect it is trivial -- like anything with a > > > scrollbar. > > > I can try formsedit, etc. > > > > > > > > > - Jay > > > > > > > > >> Date: Wed, 3 Sep 2008 07:58:04 +0200 > > >> From: wagner at elegosoft.com > > >> To: m3devel at elegosoft.com > > >> Subject: Re: [M3devel] AMD-64 binaries? > > >> > > >> Quoting "Rodney M. Bates" : > > >> > > >>> Are there binaries for AMD-64 around that can be used > > >>> to bootstrap a 64-bit Linux compiler? > > >> > > >> Have a look at > > >> > > >> http://www.opencm3.net/uploaded-archives/index.html > > >> > > >> There are some AMD64 archives; I don't know about their status > > >> offhand, though. I think Jay Krell produced them. > > >> AFAIK there is no regular build on this platform yet. > > >> > > >> Olaf > > >> -- > > >> Olaf Wagner -- elego Software Solutions GmbH > > >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany > > >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 > > >> 45 86 95 > > >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: > > >> Berlin > > >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: > > >> DE163214194 > > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Wed Oct 1 12:07:00 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Wed, 1 Oct 2008 11:07:00 +0100 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu> Message-ID: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> m3gdb makes use of stabs debug information spat out by the backend. They are only in the binary if compiled -g. There are other ways to get what you are after, as Darko has observed. On Oct 1, 2008, at 11:03 AM, Darko wrote: > I've extended one of the modules with a function that formats any > allocated value for printing. If you're interested I can clean them > up a little and post them. > > > On 28/09/2008, at 8:01 AM, Darko wrote: > >> As far as I know, yes, they're not in the binary. I'd love to be >> proven wrong though, or fix it so they did. I have a module that >> reads the .M3WEB file and maps it to types and a module that will >> read and write any field within a type safely using a numeric >> index. Neither is perfect. You can integrate the two to get what >> you want but I seem to remember having some problems mapping type >> ids (UIDs?) to typecodes at runtime. >> >> >> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >> >>> Right, I am aware of those interfaces.. just wondering what was >>> out there. Do I really need to look at .M3WEB? I thought >>> that m3gdb could figure out things without anything outside >>> of the binary... >>> >>> I'm looking for essentially what m3gdb offers, say prints >>> at minimum the name of the type (this I recall is trivial with >>> some of the RT* interfaces) but hopefully also with field names >>> and values, but doesn't expand references recursively.. something >>> like that? >>> >>> Mika >>> >>> Darko writes: >>>> You can use RTTipe to read the fields and values within a type. >>>> If you >>>> also want the type and field names you can interpret the .M3WEB >>>> file. >>>> I have a couple of modules that do something like that but they are >>>> not what you would call finished. What level of detail are you >>>> after? >>>> >>>> >>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> I am working on a writing an interpreter that I'd like to embed in >>>>> various Modula-3 programs. It so happens that this interpreter >>>>> might from time to time be manipulating arbitrary M3 REFs, and >>>>> just >>>>> from the point of view of providing information to a human user, >>>>> it might be nice to be able to pretty-print these. Does anyone >>>>> have any code that accomplishes this, at least partly? I'm >>>>> thinking >>>>> that since m3gdb can do it, the information must all be in the >>>>> binary---somehow. (Even enumeration names, right?) And since the >>>>> pickler can pickle things... hmm. >>>>> >>>>> I would greatly appreciate any guidance that's out there... >>>>> >>>>> Best regards, >>>>> Mika Nystrom >> From darko at darko.org Wed Oct 1 12:35:09 2008 From: darko at darko.org (Darko) Date: Wed, 1 Oct 2008 12:35:09 +0200 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu> <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> Message-ID: Here's some info on the stabs format: http://www.cs.utah.edu/dept/old/texinfo/gdb/stabs_toc.html On 01/10/2008, at 12:07 PM, Tony Hosking wrote: > m3gdb makes use of stabs debug information spat out by the backend. > They are only in the binary if compiled -g. There are other ways to > get what you are after, as Darko has observed. > > On Oct 1, 2008, at 11:03 AM, Darko wrote: > >> I've extended one of the modules with a function that formats any >> allocated value for printing. If you're interested I can clean them >> up a little and post them. >> >> >> On 28/09/2008, at 8:01 AM, Darko wrote: >> >>> As far as I know, yes, they're not in the binary. I'd love to be >>> proven wrong though, or fix it so they did. I have a module that >>> reads the .M3WEB file and maps it to types and a module that will >>> read and write any field within a type safely using a numeric >>> index. Neither is perfect. You can integrate the two to get what >>> you want but I seem to remember having some problems mapping type >>> ids (UIDs?) to typecodes at runtime. >>> >>> >>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >>> >>>> Right, I am aware of those interfaces.. just wondering what was >>>> out there. Do I really need to look at .M3WEB? I thought >>>> that m3gdb could figure out things without anything outside >>>> of the binary... >>>> >>>> I'm looking for essentially what m3gdb offers, say prints >>>> at minimum the name of the type (this I recall is trivial with >>>> some of the RT* interfaces) but hopefully also with field names >>>> and values, but doesn't expand references recursively.. something >>>> like that? >>>> >>>> Mika >>>> >>>> Darko writes: >>>>> You can use RTTipe to read the fields and values within a type. >>>>> If you >>>>> also want the type and field names you can interpret the .M3WEB >>>>> file. >>>>> I have a couple of modules that do something like that but they >>>>> are >>>>> not what you would call finished. What level of detail are you >>>>> after? >>>>> >>>>> >>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>>> >>>>>> Hello Modula-3 people, >>>>>> >>>>>> I am working on a writing an interpreter that I'd like to embed >>>>>> in >>>>>> various Modula-3 programs. It so happens that this interpreter >>>>>> might from time to time be manipulating arbitrary M3 REFs, and >>>>>> just >>>>>> from the point of view of providing information to a human user, >>>>>> it might be nice to be able to pretty-print these. Does anyone >>>>>> have any code that accomplishes this, at least partly? I'm >>>>>> thinking >>>>>> that since m3gdb can do it, the information must all be in the >>>>>> binary---somehow. (Even enumeration names, right?) And since >>>>>> the >>>>>> pickler can pickle things... hmm. >>>>>> >>>>>> I would greatly appreciate any guidance that's out there... >>>>>> >>>>>> Best regards, >>>>>> Mika Nystrom >>> > From mika at async.caltech.edu Wed Oct 1 20:09:58 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Wed, 01 Oct 2008 11:09:58 -0700 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: Your message of "Wed, 01 Oct 2008 12:03:15 +0200." Message-ID: <200810011809.m91I9wxY087739@camembert.async.caltech.edu> Oh, I'd love to give it a try! I'm a little surprised no one has chimed in on the question of whether you really need .M3WEB... I could swear I can get good symbolic debugging with m3gdb on just a binary... Mika Darko writes: >I've extended one of the modules with a function that formats any >allocated value for printing. If you're interested I can clean them up >a little and post them. > > >On 28/09/2008, at 8:01 AM, Darko wrote: > >> As far as I know, yes, they're not in the binary. I'd love to be >> proven wrong though, or fix it so they did. I have a module that >> reads the .M3WEB file and maps it to types and a module that will >> read and write any field within a type safely using a numeric index. >> Neither is perfect. You can integrate the two to get what you want >> but I seem to remember having some problems mapping type ids (UIDs?) >> to typecodes at runtime. >> >> >> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >> >>> Right, I am aware of those interfaces.. just wondering what was >>> out there. Do I really need to look at .M3WEB? I thought >>> that m3gdb could figure out things without anything outside >>> of the binary... >>> >>> I'm looking for essentially what m3gdb offers, say prints >>> at minimum the name of the type (this I recall is trivial with >>> some of the RT* interfaces) but hopefully also with field names >>> and values, but doesn't expand references recursively.. something >>> like that? >>> >>> Mika >>> >>> Darko writes: >>>> You can use RTTipe to read the fields and values within a type. If >>>> you >>>> also want the type and field names you can interpret the .M3WEB >>>> file. >>>> I have a couple of modules that do something like that but they are >>>> not what you would call finished. What level of detail are you >>>> after? >>>> >>>> >>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> I am working on a writing an interpreter that I'd like to embed in >>>>> various Modula-3 programs. It so happens that this interpreter >>>>> might from time to time be manipulating arbitrary M3 REFs, and just >>>>> from the point of view of providing information to a human user, >>>>> it might be nice to be able to pretty-print these. Does anyone >>>>> have any code that accomplishes this, at least partly? I'm >>>>> thinking >>>>> that since m3gdb can do it, the information must all be in the >>>>> binary---somehow. (Even enumeration names, right?) And since the >>>>> pickler can pickle things... hmm. >>>>> >>>>> I would greatly appreciate any guidance that's out there... >>>>> >>>>> Best regards, >>>>> Mika Nystrom >> From mika at async.caltech.edu Wed Oct 1 20:10:38 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Wed, 01 Oct 2008 11:10:38 -0700 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: Your message of "Wed, 01 Oct 2008 11:07:00 BST." <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> Message-ID: <200810011810.m91IAcDW087832@camembert.async.caltech.edu> Ok, ignore my previous email :-) Tony Hosking writes: >m3gdb makes use of stabs debug information spat out by the backend. >They are only in the binary if compiled -g. There are other ways to >get what you are after, as Darko has observed. > >On Oct 1, 2008, at 11:03 AM, Darko wrote: > >> I've extended one of the modules with a function that formats any >> allocated value for printing. If you're interested I can clean them >> up a little and post them. >> >> >> On 28/09/2008, at 8:01 AM, Darko wrote: >> >>> As far as I know, yes, they're not in the binary. I'd love to be >>> proven wrong though, or fix it so they did. I have a module that >>> reads the .M3WEB file and maps it to types and a module that will >>> read and write any field within a type safely using a numeric >>> index. Neither is perfect. You can integrate the two to get what >>> you want but I seem to remember having some problems mapping type >>> ids (UIDs?) to typecodes at runtime. >>> >>> >>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >>> >>>> Right, I am aware of those interfaces.. just wondering what was >>>> out there. Do I really need to look at .M3WEB? I thought >>>> that m3gdb could figure out things without anything outside >>>> of the binary... >>>> >>>> I'm looking for essentially what m3gdb offers, say prints >>>> at minimum the name of the type (this I recall is trivial with >>>> some of the RT* interfaces) but hopefully also with field names >>>> and values, but doesn't expand references recursively.. something >>>> like that? >>>> >>>> Mika >>>> >>>> Darko writes: >>>>> You can use RTTipe to read the fields and values within a type. >>>>> If you >>>>> also want the type and field names you can interpret the .M3WEB >>>>> file. >>>>> I have a couple of modules that do something like that but they are >>>>> not what you would call finished. What level of detail are you >>>>> after? >>>>> >>>>> >>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>>> >>>>>> Hello Modula-3 people, >>>>>> >>>>>> I am working on a writing an interpreter that I'd like to embed in >>>>>> various Modula-3 programs. It so happens that this interpreter >>>>>> might from time to time be manipulating arbitrary M3 REFs, and >>>>>> just >>>>>> from the point of view of providing information to a human user, >>>>>> it might be nice to be able to pretty-print these. Does anyone >>>>>> have any code that accomplishes this, at least partly? I'm >>>>>> thinking >>>>>> that since m3gdb can do it, the information must all be in the >>>>>> binary---somehow. (Even enumeration names, right?) And since the >>>>>> pickler can pickle things... hmm. >>>>>> >>>>>> I would greatly appreciate any guidance that's out there... >>>>>> >>>>>> Best regards, >>>>>> Mika Nystrom >>> From jay.krell at cornell.edu Sun Oct 12 11:51:03 2008 From: jay.krell at cornell.edu (Jay) Date: Sun, 12 Oct 2008 09:51:03 +0000 Subject: [M3devel] a bunch of new/old platform names? Message-ID: I plan on soon bringing "back" some old ports -- building current archives -- and bring up some new ports. Specifically I have hardware: RS/6000 (PPC64/AIX), SGI (MIPS), SPARC64, plus the usual x86/AMD64. Two of the platforms did exist. In particular, "MIPS_IRIX" is "IRIX5". Reuse IRIX5, or introduce MIPS_IRIX? PPC_AIX is IBMR2 or such. Same question. Also, must versions really be in platform names? I'm loathe to add a third dimension to the matrix. I did just note that FreeBSD 7.0 64 bit is ABI-incompatible with FreeBSD 6.3 64 bit, lame. SGI claims good ABI across all the 6.5 releases, which is all there will be now. IBM claims good 32 bit ABI compat across AIX 4.x - 6.x and good 64 bit ABI compat across 5.x and 6.x, but incompatibility from 64 bit 4.x. (Microsoft has always been good here, but "behavioral" compat is the actual tricky issue.) And, what do folks think about putting "32" in new 32 bit platform names? I'm considering the following: MIPS32_{IRIX,LINUX,OPENBSD,NETBSD} MIPS64_IRIX (6.5) SPARC{32,64}_{LINUX,*BSD}(probably no SPARC32_*BSD actually, and SPARC32_LINUX is already in, but not building regularly) {SPARC64,I386,AMD64}_SOLARIS PPC{32,64}_AIX (PPC64_LINUX is blocked, Linux has problems booting on the hardware and I have no Mac G5 yet). AMD64_*BSD Also, maybe some of the code should be restructured to separate processor from OS? That might be primarily only pointer size. Any interest in "x86" instead of "I386"? If I make good progress against those 18 (!), I can see about PPC64_DARWIN, HPPA_*, IA64_*, ALPHA_*, ARM_*, which I lack hardware for. PPC_LINUX also should be converted to pthreads imho. Mostly this is all just a matter of installing the OS and configuring gcc. And, yeah, I have the two m3cgs stepping side by side to find the problem there, and will have use of a high dpi Windows laptop for that other problem.. And then of course, if the vast majority of platforms are named like that, there might be pressure to bring the rest in line. :) I386_{NT,LINUX,*BSD,CYGWIN,MINGWIN} - Jay From mika at async.caltech.edu Fri Oct 17 00:32:39 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 15:32:39 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? Message-ID: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu> Hello Modula-3 people, As I mentioned in an earlier email about printing structures (thanks Darko), I'm in the midst of coding an interpreter embedded in Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's JScheme for Java (well it was at first strongly based, but more and more loosely, if you know what I mean...) I expected that the performance of the interpreter would be much better in Modula-3 than in Java, and I have been testing on two different systems. One is my ancient FreeBSD-4.11 with an old PM3, and the other is CM3 on a recent Debian system. What I am finding is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting close to ten times as fast on some tasks at this point), but on Linux/CM3 it is much closer in speed to JScheme than I would like. When I started, with code that was essentially equivalent to JScheme, I found that it was a bit slower than JScheme on Linux/CM3 and possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to spend most of its time in (surprise, surprise!) memory allocation and garbage collection. The speedup I have achieved between the first implementation and now was due to the use of Modula-3 constructs that are superior to Java's, such as the use of arrays of RECORDs to make small stacks rather than linked lists. (I get readable code with much fewer memory allocations and GC work.) Now, since this is an interpreter, I as the implementer have limited control over how much memory is allocated and freed, and where it is needed. However, I can sometimes fall back on C-style memory management, but I would like to do it in a safe way. For instance, I have special-cased evaluation of Scheme primitives, as follows. Under the "normal" implementation, a list of things to evaluate is built up, passed to an evaluation function, and then the GC is left to sweep up the mess. The problem is that there are various tricky routes by which references can escape the evaluator, so you can't just assume that what you put in is going to be dead right after an eval and free it. Instead, I set a flag in the evaluator, which is TRUE if it is OK to free the list after the eval and FALSE if it's unclear (in which case the problem is left up to the GC). For the vast majority of Scheme primitives, one can indeed free the list right after the eval. Now of course I am not interested in unsafe code, so what I do is this: TYPE Pair = OBJECT first, rest : REFANY; END; VAR mu := NEW(MUTEX); free : Pair := NIL; PROCEDURE GetPair() : Pair = BEGIN LOCK mu DO IF free # NIL THEN TRY RETURN free FINALLY free := free.rest END END END; RETURN NEW(Pair) END GetPair; PROCEDURE ReturnPair(cons : Pair) = BEGIN cons.first := NIL; LOCK mu DO cons.rest := free; free := cons END END ReturnPair; my eval code looks like VAR okToFree : BOOLEAN; BEGIN args := GetPair(); ... result := EvalPrimitive(args, (*VAR OUT*) okToFree); IF okToFree THEN ReturnPair(args) END; RETURN result END and this does work well. In fact it speeds up the Linux implementation by almost 100% to recycle the lists like this *just* for the evaluation of Scheme primitives. But it's still ugly, isn't it? There's a mutex, and a global variable. And yes, the time spent messing with the mutex is noticeable, and I haven't even made the code multi-threaded yet (and that is coming!) So I'm thinking, what I really want is a structure that is attached to my current Thread.T. I want to be able to access just a single pointer (like the free list) but be sure it is unique to my current thread. No locking would be necessary if I could do this. Does anyone have an elegant solution that does something like this? Thread-specific "static" variables? Just one REFANY would be enough for a lot of uses... seems to me this should be a frequently occurring problem? Best regards, Mika From hosking at cs.purdue.edu Fri Oct 17 00:54:51 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Thu, 16 Oct 2008 23:54:51 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu> References: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu> Message-ID: Have you tried running @M3noincremental? On 16 Oct 2008, at 23:32, Mika Nystrom wrote: > Hello Modula-3 people, > > As I mentioned in an earlier email about printing structures (thanks > Darko), I'm in the midst of coding an interpreter embedded in > Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's > JScheme for Java (well it was at first strongly based, but more and > more loosely, if you know what I mean...) > > I expected that the performance of the interpreter would be much > better in Modula-3 than in Java, and I have been testing on two > different systems. One is my ancient FreeBSD-4.11 with an old PM3, > and the other is CM3 on a recent Debian system. What I am finding > is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting > close to ten times as fast on some tasks at this point), but on > Linux/CM3 it is much closer in speed to JScheme than I would like. > > When I started, with code that was essentially equivalent to JScheme, > I found that it was a bit slower than JScheme on Linux/CM3 and > possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to > spend most of its time in (surprise, surprise!) memory allocation > and garbage collection. The speedup I have achieved between the > first implementation and now was due to the use of Modula-3 constructs > that are superior to Java's, such as the use of arrays of RECORDs > to make small stacks rather than linked lists. (I get readable > code with much fewer memory allocations and GC work.) > > Now, since this is an interpreter, I as the implementer have limited > control over how much memory is allocated and freed, and where it is > needed. However, I can sometimes fall back on C-style memory > management, > but I would like to do it in a safe way. For instance, I have > special-cased > evaluation of Scheme primitives, as follows. > > Under the "normal" implementation, a list of things to evaluate is > built up, passed to an evaluation function, and then the GC is left > to sweep up the mess. The problem is that there are various tricky > routes by which references can escape the evaluator, so you can't > just assume that what you put in is going to be dead right after > an eval and free it. Instead, I set a flag in the evaluator, which > is TRUE if it is OK to free the list after the eval and FALSE if > it's unclear (in which case the problem is left up to the GC). > > For the vast majority of Scheme primitives, one can indeed free the > list right after the eval. Now of course I am not interested > in unsafe code, so what I do is this: > > TYPE Pair = OBJECT first, rest : REFANY; END; > > VAR > mu := NEW(MUTEX); > free : Pair := NIL; > > PROCEDURE GetPair() : Pair = > BEGIN > LOCK mu DO > IF free # NIL THEN > TRY > RETURN free > FINALLY > free := free.rest > END > END > END; > RETURN NEW(Pair) > END GetPair; > > PROCEDURE ReturnPair(cons : Pair) = > BEGIN > cons.first := NIL; > LOCK mu DO > cons.rest := free; > free := cons > END > END ReturnPair; > > my eval code looks like > > VAR okToFree : BOOLEAN; BEGIN > > args := GetPair(); ... > result := EvalPrimitive(args, (*VAR OUT*) okToFree); > > IF okToFree THEN ReturnPair(args) END; > RETURN result > END > > and this does work well. In fact it speeds up the Linux > implementation > by almost 100% to recycle the lists like this *just* for the > evaluation of Scheme primitives. > > But it's still ugly, isn't it? There's a mutex, and a global > variable. And yes, the time spent messing with the mutex is > noticeable, and I haven't even made the code multi-threaded yet > (and that is coming!) > > So I'm thinking, what I really want is a structure that is attached > to my current Thread.T. I want to be able to access just a single > pointer (like the free list) but be sure it is unique to my current > thread. No locking would be necessary if I could do this. > > Does anyone have an elegant solution that does something like this? > Thread-specific "static" variables? Just one REFANY would be enough > for a lot of uses... seems to me this should be a frequently > occurring problem? > > Best regards, > Mika > > > > > > From mika at async.caltech.edu Fri Oct 17 01:30:01 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 16:30:01 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Thu, 16 Oct 2008 23:54:51 BST." Message-ID: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> Hi Tony, I figured you would chime in! Yes, @M3noincremental seems to make things consistently a tad bit slower (but a very small difference), on both FreeBSD and Linux. @M3nogc makes a bigger difference, of course. Unfortunately I seem to have lost the code that did a lot of memory allocations. My tricks (as described in the email---and others!) have removed most of the troublesome memory allocations, but now I'm stuck with the mutex instead... Mika Tony Hosking writes: >Have you tried running @M3noincremental? > >On 16 Oct 2008, at 23:32, Mika Nystrom wrote: > >> Hello Modula-3 people, >> >> As I mentioned in an earlier email about printing structures (thanks >> Darko), I'm in the midst of coding an interpreter embedded in >> Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's >> JScheme for Java (well it was at first strongly based, but more and >> more loosely, if you know what I mean...) >> >> I expected that the performance of the interpreter would be much >> better in Modula-3 than in Java, and I have been testing on two >> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >> and the other is CM3 on a recent Debian system. What I am finding >> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting >> close to ten times as fast on some tasks at this point), but on >> Linux/CM3 it is much closer in speed to JScheme than I would like. >> >> When I started, with code that was essentially equivalent to JScheme, >> I found that it was a bit slower than JScheme on Linux/CM3 and >> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >> spend most of its time in (surprise, surprise!) memory allocation >> and garbage collection. The speedup I have achieved between the >> first implementation and now was due to the use of Modula-3 constructs >> that are superior to Java's, such as the use of arrays of RECORDs >> to make small stacks rather than linked lists. (I get readable >> code with much fewer memory allocations and GC work.) >> >> Now, since this is an interpreter, I as the implementer have limited >> control over how much memory is allocated and freed, and where it is >> needed. However, I can sometimes fall back on C-style memory >> management, >> but I would like to do it in a safe way. For instance, I have >> special-cased >> evaluation of Scheme primitives, as follows. >> >> Under the "normal" implementation, a list of things to evaluate is >> built up, passed to an evaluation function, and then the GC is left >> to sweep up the mess. The problem is that there are various tricky >> routes by which references can escape the evaluator, so you can't >> just assume that what you put in is going to be dead right after >> an eval and free it. Instead, I set a flag in the evaluator, which >> is TRUE if it is OK to free the list after the eval and FALSE if >> it's unclear (in which case the problem is left up to the GC). >> >> For the vast majority of Scheme primitives, one can indeed free the >> list right after the eval. Now of course I am not interested >> in unsafe code, so what I do is this: >> >> TYPE Pair = OBJECT first, rest : REFANY; END; >> >> VAR >> mu := NEW(MUTEX); >> free : Pair := NIL; >> >> PROCEDURE GetPair() : Pair = >> BEGIN >> LOCK mu DO >> IF free # NIL THEN >> TRY >> RETURN free >> FINALLY >> free := free.rest >> END >> END >> END; >> RETURN NEW(Pair) >> END GetPair; >> >> PROCEDURE ReturnPair(cons : Pair) = >> BEGIN >> cons.first := NIL; >> LOCK mu DO >> cons.rest := free; >> free := cons >> END >> END ReturnPair; >> >> my eval code looks like >> >> VAR okToFree : BOOLEAN; BEGIN >> >> args := GetPair(); ... >> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >> >> IF okToFree THEN ReturnPair(args) END; >> RETURN result >> END >> >> and this does work well. In fact it speeds up the Linux >> implementation >> by almost 100% to recycle the lists like this *just* for the >> evaluation of Scheme primitives. >> >> But it's still ugly, isn't it? There's a mutex, and a global >> variable. And yes, the time spent messing with the mutex is >> noticeable, and I haven't even made the code multi-threaded yet >> (and that is coming!) >> >> So I'm thinking, what I really want is a structure that is attached >> to my current Thread.T. I want to be able to access just a single >> pointer (like the free list) but be sure it is unique to my current >> thread. No locking would be necessary if I could do this. >> >> Does anyone have an elegant solution that does something like this? >> Thread-specific "static" variables? Just one REFANY would be enough >> for a lot of uses... seems to me this should be a frequently >> occurring problem? >> >> Best regards, >> Mika >> >> >> >> >> >> From jay.krell at cornell.edu Fri Oct 17 06:40:28 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 17 Oct 2008 04:40:28 +0000 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> References: Your message of <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> Message-ID: Making this per-thread is a fairly classic good improvement. You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and allowing for a free to come in from any thread. A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Including an array of mutexes. If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a global counter that gets assigned into the thread on first use per-thread. The cache could also be more than one element. How do you manage okToFree? Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be. - Jay ---------------------------------------- > To: hosking at cs.purdue.edu > Date: Thu, 16 Oct 2008 16:30:01 -0700 > From: mika at async.caltech.edu > CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu > Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? > > Hi Tony, > > I figured you would chime in! > > Yes, @M3noincremental seems to make things consistently a tad bit > slower (but a very small difference), on both FreeBSD and Linux. > @M3nogc makes a bigger difference, of course. > > Unfortunately I seem to have lost the code that did a lot of memory > allocations. My tricks (as described in the email---and others!) > have removed most of the troublesome memory allocations, but now > I'm stuck with the mutex instead... > > Mika > > Tony Hosking writes: >>Have you tried running @M3noincremental? >> >>On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >> >>> Hello Modula-3 people, >>> >>> As I mentioned in an earlier email about printing structures (thanks >>> Darko), I'm in the midst of coding an interpreter embedded in >>> Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's >>> JScheme for Java (well it was at first strongly based, but more and >>> more loosely, if you know what I mean...) >>> >>> I expected that the performance of the interpreter would be much >>> better in Modula-3 than in Java, and I have been testing on two >>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>> and the other is CM3 on a recent Debian system. What I am finding >>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting >>> close to ten times as fast on some tasks at this point), but on >>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>> >>> When I started, with code that was essentially equivalent to JScheme, >>> I found that it was a bit slower than JScheme on Linux/CM3 and >>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>> spend most of its time in (surprise, surprise!) memory allocation >>> and garbage collection. The speedup I have achieved between the >>> first implementation and now was due to the use of Modula-3 constructs >>> that are superior to Java's, such as the use of arrays of RECORDs >>> to make small stacks rather than linked lists. (I get readable >>> code with much fewer memory allocations and GC work.) >>> >>> Now, since this is an interpreter, I as the implementer have limited >>> control over how much memory is allocated and freed, and where it is >>> needed. However, I can sometimes fall back on C-style memory >>> management, >>> but I would like to do it in a safe way. For instance, I have >>> special-cased >>> evaluation of Scheme primitives, as follows. >>> >>> Under the "normal" implementation, a list of things to evaluate is >>> built up, passed to an evaluation function, and then the GC is left >>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>> just assume that what you put in is going to be dead right after >>> an eval and free it. Instead, I set a flag in the evaluator, which >>> is TRUE if it is OK to free the list after the eval and FALSE if >>> it's unclear (in which case the problem is left up to the GC). >>> >>> For the vast majority of Scheme primitives, one can indeed free the >>> list right after the eval. Now of course I am not interested >>> in unsafe code, so what I do is this: >>> >>> TYPE Pair = OBJECT first, rest : REFANY; END; >>> >>> VAR >>> mu := NEW(MUTEX); >>> free : Pair := NIL; >>> >>> PROCEDURE GetPair() : Pair = >>> BEGIN >>> LOCK mu DO >>> IF free # NIL THEN >>> TRY >>> RETURN free >>> FINALLY >>> free := free.rest >>> END >>> END >>> END; >>> RETURN NEW(Pair) >>> END GetPair; >>> >>> PROCEDURE ReturnPair(cons : Pair) = >>> BEGIN >>> cons.first := NIL; >>> LOCK mu DO >>> cons.rest := free; >>> free := cons >>> END >>> END ReturnPair; >>> >>> my eval code looks like >>> >>> VAR okToFree : BOOLEAN; BEGIN >>> >>> args := GetPair(); ... >>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>> >>> IF okToFree THEN ReturnPair(args) END; >>> RETURN result >>> END >>> >>> and this does work well. In fact it speeds up the Linux >>> implementation >>> by almost 100% to recycle the lists like this *just* for the >>> evaluation of Scheme primitives. >>> >>> But it's still ugly, isn't it? There's a mutex, and a global >>> variable. And yes, the time spent messing with the mutex is >>> noticeable, and I haven't even made the code multi-threaded yet >>> (and that is coming!) >>> >>> So I'm thinking, what I really want is a structure that is attached >>> to my current Thread.T. I want to be able to access just a single >>> pointer (like the free list) but be sure it is unique to my current >>> thread. No locking would be necessary if I could do this. >>> >>> Does anyone have an elegant solution that does something like this? >>> Thread-specific "static" variables? Just one REFANY would be enough >>> for a lot of uses... seems to me this should be a frequently >>> occurring problem? >>> >>> Best regards, >>> Mika >>> >>> >>> >>> >>> >>> From mika at async.caltech.edu Fri Oct 17 08:32:15 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 23:32:15 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000." Message-ID: <200810170632.m9H6WFHd078061@camembert.async.caltech.edu> Well, I was thinking of something even simpler. A Thread.T is an OBJECT. It's garbage collected just like any other object, is it not? Why can't the thing that makes new threads simply include a single globally visible field in every Thread.T, of type REFANY? Call it "data". Then you can always manipulate Thread.Self().data as you see fit without any need for locks. There can be no problem with this as long as it is always manipulated from within that thread. Of course this can be trivially encapsulated by not revealing "data" and indeed always accessing it as Thread.Self().data. You would not normally access this from any other thread. It's indeed only meant to be used in the idiom x := Allocate(); TRY DoSomething(x) FINALLY Free(x) END It's also not really a "Free" but just returning the object to a free list (there can be no unsafe behavior here). As a "nicer" interface, one could register routines with a public interface, asking it to manufacture some kind of thread globals. For maximum sanity, they would be visible inside the MODULE that requested them, but I'm not sure how to accomplish this. And of course there's not much point in any of this unless it can be made efficient or else a mutex plus a true global will work just as well. What I'm talking about I guess could be done by hacking up Thread.Fork() to return a subtype of Thread.T, but that won't work for the first thread. But with this method you could have arbitrary fields (and methods) attached to a Thread.T. How to collect everything you need is a different story... I'm not asking for a new language feature... really was just wondering if anyone had tried anything like this before, and now am rambling a bit. Mika Jay writes: > >Making this per-thread is a fairly classic good improvement. > >You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and a >llowing for a free to come in from any thread. > >A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Includi >ng an array of mutexes. > >If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a glo >bal counter that gets assigned into the thread on first use per-thread. > >The cache could also be more than one element. > >How do you manage okToFree? > >Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work > with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be. > > - Jay > >---------------------------------------- >> To: hosking at cs.purdue.edu >> Date: Thu, 16 Oct 2008 16:30:01 -0700 >> From: mika at async.caltech.edu >> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu >> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? >> >> Hi Tony, >> >> I figured you would chime in! >> >> Yes, @M3noincremental seems to make things consistently a tad bit >> slower (but a very small difference), on both FreeBSD and Linux. >> @M3nogc makes a bigger difference, of course. >> >> Unfortunately I seem to have lost the code that did a lot of memory >> allocations. My tricks (as described in the email---and others!) >> have removed most of the troublesome memory allocations, but now >> I'm stuck with the mutex instead... >> >> Mika >> >> Tony Hosking writes: >>>Have you tried running @M3noincremental? >>> >>>On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> As I mentioned in an earlier email about printing structures (thanks >>>> Darko), I'm in the midst of coding an interpreter embedded in >>>> Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's >>>> JScheme for Java (well it was at first strongly based, but more and >>>> more loosely, if you know what I mean...) >>>> >>>> I expected that the performance of the interpreter would be much >>>> better in Modula-3 than in Java, and I have been testing on two >>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>> and the other is CM3 on a recent Debian system. What I am finding >>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting >>> close to ten times as fast on some tasks at this point), but on >>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>> >>>> When I started, with code that was essentially equivalent to JScheme, >>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>> spend most of its time in (surprise, surprise!) memory allocation >>>> and garbage collection. The speedup I have achieved between the >>>> first implementation and now was due to the use of Modula-3 constructs >>>> that are superior to Java's, such as the use of arrays of RECORDs >>>> to make small stacks rather than linked lists. (I get readable >>>> code with much fewer memory allocations and GC work.) >>>> >>>> Now, since this is an interpreter, I as the implementer have limited >>>> control over how much memory is allocated and freed, and where it is >>>> needed. However, I can sometimes fall back on C-style memory >>>> management, >>>> but I would like to do it in a safe way. For instance, I have >>>> special-cased >>>> evaluation of Scheme primitives, as follows. >>>> >>>> Under the "normal" implementation, a list of things to evaluate is >>>> built up, passed to an evaluation function, and then the GC is left >>>> to sweep up the mess. The problem is that there are various tricky >>>> routes by which references can escape the evaluator, so you can't >>>> just assume that what you put in is going to be dead right after >>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>> it's unclear (in which case the problem is left up to the GC). >>>> >>>> For the vast majority of Scheme primitives, one can indeed free the >>>> list right after the eval. Now of course I am not interested >>>> in unsafe code, so what I do is this: >>>> >>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>> >>>> VAR >>>> mu := NEW(MUTEX); >>>> free : Pair := NIL; >>>> >>>> PROCEDURE GetPair() : Pair = >>>> BEGIN >>>> LOCK mu DO >>>> IF free # NIL THEN >>>> TRY >>>> RETURN free >>>> FINALLY >>>> free := free.rest >>>> END >>>> END >>>> END; >>>> RETURN NEW(Pair) >>>> END GetPair; >>>> >>>> PROCEDURE ReturnPair(cons : Pair) = >>>> BEGIN >>>> cons.first := NIL; >>>> LOCK mu DO >>>> cons.rest := free; >>>> free := cons >>>> END >>>> END ReturnPair; >>>> >>>> my eval code looks like >>>> >>>> VAR okToFree : BOOLEAN; BEGIN >>>> >>>> args := GetPair(); ... >>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>> >>>> IF okToFree THEN ReturnPair(args) END; >>>> RETURN result >>>> END >>>> >>>> and this does work well. In fact it speeds up the Linux >>>> implementation >>>> by almost 100% to recycle the lists like this *just* for the >>>> evaluation of Scheme primitives. >>>> >>>> But it's still ugly, isn't it? There's a mutex, and a global >>>> variable. And yes, the time spent messing with the mutex is >>>> noticeable, and I haven't even made the code multi-threaded yet >>>> (and that is coming!) >>>> >>>> So I'm thinking, what I really want is a structure that is attached >>>> to my current Thread.T. I want to be able to access just a single >>>> pointer (like the free list) but be sure it is unique to my current >>>> thread. No locking would be necessary if I could do this. >>>> >>>> Does anyone have an elegant solution that does something like this? >>>> Thread-specific "static" variables? Just one REFANY would be enough >>>> for a lot of uses... seems to me this should be a frequently >>>> occurring problem? >>>> >>>> Best regards, >>>> Mika >>>> >>>> >>>> >>>> >>>> >>>> From hosking at cs.purdue.edu Fri Oct 17 08:35:03 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 17 Oct 2008 07:35:03 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> References: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> Message-ID: <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> I suspect part of the overhead of allocation in the new code is the need for thread-local allocation buffers, which means we need to access thread-local state. We really need an efficient way to do that, but pthreads thread-local accesses may be what is killing you. On 17 Oct 2008, at 00:30, Mika Nystrom wrote: > Hi Tony, > > I figured you would chime in! > > Yes, @M3noincremental seems to make things consistently a tad bit > slower (but a very small difference), on both FreeBSD and Linux. > @M3nogc makes a bigger difference, of course. > > Unfortunately I seem to have lost the code that did a lot of memory > allocations. My tricks (as described in the email---and others!) > have removed most of the troublesome memory allocations, but now > I'm stuck with the mutex instead... > > Mika > > Tony Hosking writes: >> Have you tried running @M3noincremental? >> >> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >> >>> Hello Modula-3 people, >>> >>> As I mentioned in an earlier email about printing structures (thanks >>> Darko), I'm in the midst of coding an interpreter embedded in >>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>> Norvig's >>> JScheme for Java (well it was at first strongly based, but more and >>> more loosely, if you know what I mean...) >>> >>> I expected that the performance of the interpreter would be much >>> better in Modula-3 than in Java, and I have been testing on two >>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>> and the other is CM3 on a recent Debian system. What I am finding >>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>> (getting >>> close to ten times as fast on some tasks at this point), but on >>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>> >>> When I started, with code that was essentially equivalent to >>> JScheme, >>> I found that it was a bit slower than JScheme on Linux/CM3 and >>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>> spend most of its time in (surprise, surprise!) memory allocation >>> and garbage collection. The speedup I have achieved between the >>> first implementation and now was due to the use of Modula-3 >>> constructs >>> that are superior to Java's, such as the use of arrays of RECORDs >>> to make small stacks rather than linked lists. (I get readable >>> code with much fewer memory allocations and GC work.) >>> >>> Now, since this is an interpreter, I as the implementer have limited >>> control over how much memory is allocated and freed, and where it is >>> needed. However, I can sometimes fall back on C-style memory >>> management, >>> but I would like to do it in a safe way. For instance, I have >>> special-cased >>> evaluation of Scheme primitives, as follows. >>> >>> Under the "normal" implementation, a list of things to evaluate is >>> built up, passed to an evaluation function, and then the GC is left >>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>> just assume that what you put in is going to be dead right after >>> an eval and free it. Instead, I set a flag in the evaluator, which >>> is TRUE if it is OK to free the list after the eval and FALSE if >>> it's unclear (in which case the problem is left up to the GC). >>> >>> For the vast majority of Scheme primitives, one can indeed free the >>> list right after the eval. Now of course I am not interested >>> in unsafe code, so what I do is this: >>> >>> TYPE Pair = OBJECT first, rest : REFANY; END; >>> >>> VAR >>> mu := NEW(MUTEX); >>> free : Pair := NIL; >>> >>> PROCEDURE GetPair() : Pair = >>> BEGIN >>> LOCK mu DO >>> IF free # NIL THEN >>> TRY >>> RETURN free >>> FINALLY >>> free := free.rest >>> END >>> END >>> END; >>> RETURN NEW(Pair) >>> END GetPair; >>> >>> PROCEDURE ReturnPair(cons : Pair) = >>> BEGIN >>> cons.first := NIL; >>> LOCK mu DO >>> cons.rest := free; >>> free := cons >>> END >>> END ReturnPair; >>> >>> my eval code looks like >>> >>> VAR okToFree : BOOLEAN; BEGIN >>> >>> args := GetPair(); ... >>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>> >>> IF okToFree THEN ReturnPair(args) END; >>> RETURN result >>> END >>> >>> and this does work well. In fact it speeds up the Linux >>> implementation >>> by almost 100% to recycle the lists like this *just* for the >>> evaluation of Scheme primitives. >>> >>> But it's still ugly, isn't it? There's a mutex, and a global >>> variable. And yes, the time spent messing with the mutex is >>> noticeable, and I haven't even made the code multi-threaded yet >>> (and that is coming!) >>> >>> So I'm thinking, what I really want is a structure that is attached >>> to my current Thread.T. I want to be able to access just a single >>> pointer (like the free list) but be sure it is unique to my current >>> thread. No locking would be necessary if I could do this. >>> >>> Does anyone have an elegant solution that does something like this? >>> Thread-specific "static" variables? Just one REFANY would be enough >>> for a lot of uses... seems to me this should be a frequently >>> occurring problem? >>> >>> Best regards, >>> Mika >>> >>> >>> >>> >>> >>> From mika at async.caltech.edu Fri Oct 17 08:50:13 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 23:50:13 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000." Message-ID: <200810170650.m9H6oDU0078549@camembert.async.caltech.edu> Jay writes: ... >How do you manage okToFree? ... I forgot to answer this q. Well, the primitive evaluation in the interpreter is just a big CASE statement. I really just look at where it references the list I am making, and if it references the list at all in a branch, I insert the code "okToFree := FALSE". The first two parameters are passed in separately. Here's the code... since you ask! This is the code for the special case of a two-argument Scheme procedure call, such as (+ x 1) . PROCEDURE Apply2(t : T; interp : Scheme.T; a1, a2 : Object) : Object VAR d1, d2 := GetCons(); free := TRUE; BEGIN d1.first := a1; d1.rest := d2; d2.first := a2; d2.rest := NIL; WITH res = Prims(t, interp, d1, a1, a2, free) DO IF free THEN ReturnCons(d1); ReturnCons(d2) END; RETURN res END END Apply2; PROCEDURE Prims(t : T; interp : Scheme.T; args, x, y : Object; VAR free : BOOLEAN) : Object = (* The (hopefully temporary) list of arguments is args. x and y are the first two elements of args *) BEGIN CASE VAL(t.idNumber,P) OF P.Eq => RETURN NumCompare(args, '=') (* known not to let args escape *) | P.List => free := FALSE; RETURN args (* args escapes, dont know whither *) | P.Car => RETURN PedanticFirst(x) (* doesn't even use args *) (* and about another 100 cases follow here *) END END Prims; Mika From mika at async.caltech.edu Fri Oct 17 10:03:18 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 17 Oct 2008 01:03:18 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST." <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> Message-ID: <200810170803.m9H83IIC080081@camembert.async.caltech.edu> Ok this suggests that using thread local state to get around the problem won't help either. Can I ask a question... I am looking at ThreadPThread.m3... Why do you have to lock the slotMu in Self()? PROCEDURE Self (): T = (* If not the initial thread and not created by Fork, returns NIL *) (* LL = 0 *) VAR me := GetActivation(); t: T; BEGIN IF me = NIL THEN RETURN NIL END; WITH r = Upthread.mutex_lock(slotMu) DO <*ASSERT r=0*> END; t := slots[me.slot]; WITH r = Upthread.mutex_unlock(slotMu) DO <*ASSERT r=0*> END; IF (t.act # me) THEN Die(ThisLine(), "thread with bad slot!") END; RETURN t; END Self; Is it just because of AssignSlots? If so.. it's actually a very rare event that there would ever be a conflict, no? (Only when "slots" is extended?) Can data be stored in an "Activation"? Not TRACED data, obviously, hmm... Mika Tony Hosking writes: >I suspect part of the overhead of allocation in the new code is the >need for thread-local allocation buffers, which means we need to >access thread-local state. We really need an efficient way to do >that, but pthreads thread-local accesses may be what is killing you. > >On 17 Oct 2008, at 00:30, Mika Nystrom wrote: > >> Hi Tony, >> >> I figured you would chime in! >> >> Yes, @M3noincremental seems to make things consistently a tad bit >> slower (but a very small difference), on both FreeBSD and Linux. >> @M3nogc makes a bigger difference, of course. >> >> Unfortunately I seem to have lost the code that did a lot of memory >> allocations. My tricks (as described in the email---and others!) >> have removed most of the troublesome memory allocations, but now >> I'm stuck with the mutex instead... >> >> Mika >> >> Tony Hosking writes: >>> Have you tried running @M3noincremental? >>> >>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> As I mentioned in an earlier email about printing structures (thanks >>>> Darko), I'm in the midst of coding an interpreter embedded in >>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>> Norvig's >>>> JScheme for Java (well it was at first strongly based, but more and >>>> more loosely, if you know what I mean...) >>>> >>>> I expected that the performance of the interpreter would be much >>>> better in Modula-3 than in Java, and I have been testing on two >>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>> and the other is CM3 on a recent Debian system. What I am finding >>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>> (getting >>>> close to ten times as fast on some tasks at this point), but on >>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>> >>>> When I started, with code that was essentially equivalent to >>>> JScheme, >>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>> spend most of its time in (surprise, surprise!) memory allocation >>>> and garbage collection. The speedup I have achieved between the >>>> first implementation and now was due to the use of Modula-3 >>>> constructs >>>> that are superior to Java's, such as the use of arrays of RECORDs >>>> to make small stacks rather than linked lists. (I get readable >>>> code with much fewer memory allocations and GC work.) >>>> >>>> Now, since this is an interpreter, I as the implementer have limited >>>> control over how much memory is allocated and freed, and where it is >>>> needed. However, I can sometimes fall back on C-style memory >>>> management, >>>> but I would like to do it in a safe way. For instance, I have >>>> special-cased >>>> evaluation of Scheme primitives, as follows. >>>> >>>> Under the "normal" implementation, a list of things to evaluate is >>>> built up, passed to an evaluation function, and then the GC is left >>>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>>> just assume that what you put in is going to be dead right after >>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>> it's unclear (in which case the problem is left up to the GC). >>>> >>>> For the vast majority of Scheme primitives, one can indeed free the >>>> list right after the eval. Now of course I am not interested >>>> in unsafe code, so what I do is this: >>>> >>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>> >>>> VAR >>>> mu := NEW(MUTEX); >>>> free : Pair := NIL; >>>> >>>> PROCEDURE GetPair() : Pair = >>>> BEGIN >>>> LOCK mu DO >>>> IF free # NIL THEN >>>> TRY >>>> RETURN free >>>> FINALLY >>>> free := free.rest >>>> END >>>> END >>>> END; >>>> RETURN NEW(Pair) >>>> END GetPair; >>>> >>>> PROCEDURE ReturnPair(cons : Pair) = >>>> BEGIN >>>> cons.first := NIL; >>>> LOCK mu DO >>>> cons.rest := free; >>>> free := cons >>>> END >>>> END ReturnPair; >>>> >>>> my eval code looks like >>>> >>>> VAR okToFree : BOOLEAN; BEGIN >>>> >>>> args := GetPair(); ... >>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>> >>>> IF okToFree THEN ReturnPair(args) END; >>>> RETURN result >>>> END >>>> >>>> and this does work well. In fact it speeds up the Linux >>>> implementation >>>> by almost 100% to recycle the lists like this *just* for the >>>> evaluation of Scheme primitives. >>>> >>>> But it's still ugly, isn't it? There's a mutex, and a global >>>> variable. And yes, the time spent messing with the mutex is >>>> noticeable, and I haven't even made the code multi-threaded yet >>>> (and that is coming!) >>>> >>>> So I'm thinking, what I really want is a structure that is attached >>>> to my current Thread.T. I want to be able to access just a single >>>> pointer (like the free list) but be sure it is unique to my current >>>> thread. No locking would be necessary if I could do this. >>>> >>>> Does anyone have an elegant solution that does something like this? >>>> Thread-specific "static" variables? Just one REFANY would be enough >>>> for a lot of uses... seems to me this should be a frequently >>>> occurring problem? >>>> >>>> Best regards, >>>> Mika >>>> >>>> >>>> >>>> >>>> >>>> From mika at async.caltech.edu Fri Oct 17 10:32:28 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 17 Oct 2008 01:32:28 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST." <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> Message-ID: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> Ok I am sorry I am slow to pick up on this. I take it the problem is actually the Upthread.getspecific routine, which itself calls something get_curthread somewhere inside pthreads, which in turn involves a context switch to the supervisor---the identity of the current thread is just not accessible anywhere in user space. Also explains why this program runs faster with my old PM3, which uses longjmp threads. The only way to avoid it (really) is to pass a pointer to the Thread.T of the currently executing thread in the activation record of *every* procedure, so that allocators can find it when necessary.... but that is very expensive in terms of stack memory. Or I can just make a structure like that that I pass around where I need it in my own program. Thread-specific and user-managed. I believe I have just answered all my own questions, but I hope Tony will correct me if my answers are incorrect. Mika Tony Hosking writes: >I suspect part of the overhead of allocation in the new code is the >need for thread-local allocation buffers, which means we need to >access thread-local state. We really need an efficient way to do >that, but pthreads thread-local accesses may be what is killing you. > >On 17 Oct 2008, at 00:30, Mika Nystrom wrote: > >> Hi Tony, >> >> I figured you would chime in! >> >> Yes, @M3noincremental seems to make things consistently a tad bit >> slower (but a very small difference), on both FreeBSD and Linux. >> @M3nogc makes a bigger difference, of course. >> >> Unfortunately I seem to have lost the code that did a lot of memory >> allocations. My tricks (as described in the email---and others!) >> have removed most of the troublesome memory allocations, but now >> I'm stuck with the mutex instead... >> >> Mika >> >> Tony Hosking writes: >>> Have you tried running @M3noincremental? >>> >>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> As I mentioned in an earlier email about printing structures (thanks >>>> Darko), I'm in the midst of coding an interpreter embedded in >>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>> Norvig's >>>> JScheme for Java (well it was at first strongly based, but more and >>>> more loosely, if you know what I mean...) >>>> >>>> I expected that the performance of the interpreter would be much >>>> better in Modula-3 than in Java, and I have been testing on two >>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>> and the other is CM3 on a recent Debian system. What I am finding >>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>> (getting >>>> close to ten times as fast on some tasks at this point), but on >>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>> >>>> When I started, with code that was essentially equivalent to >>>> JScheme, >>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>> spend most of its time in (surprise, surprise!) memory allocation >>>> and garbage collection. The speedup I have achieved between the >>>> first implementation and now was due to the use of Modula-3 >>>> constructs >>>> that are superior to Java's, such as the use of arrays of RECORDs >>>> to make small stacks rather than linked lists. (I get readable >>>> code with much fewer memory allocations and GC work.) >>>> >>>> Now, since this is an interpreter, I as the implementer have limited >>>> control over how much memory is allocated and freed, and where it is >>>> needed. However, I can sometimes fall back on C-style memory >>>> management, >>>> but I would like to do it in a safe way. For instance, I have >>>> special-cased >>>> evaluation of Scheme primitives, as follows. >>>> >>>> Under the "normal" implementation, a list of things to evaluate is >>>> built up, passed to an evaluation function, and then the GC is left >>>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>>> just assume that what you put in is going to be dead right after >>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>> it's unclear (in which case the problem is left up to the GC). >>>> >>>> For the vast majority of Scheme primitives, one can indeed free the >>>> list right after the eval. Now of course I am not interested >>>> in unsafe code, so what I do is this: >>>> >>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>> >>>> VAR >>>> mu := NEW(MUTEX); >>>> free : Pair := NIL; >>>> >>>> PROCEDURE GetPair() : Pair = >>>> BEGIN >>>> LOCK mu DO >>>> IF free # NIL THEN >>>> TRY >>>> RETURN free >>>> FINALLY >>>> free := free.rest >>>> END >>>> END >>>> END; >>>> RETURN NEW(Pair) >>>> END GetPair; >>>> >>>> PROCEDURE ReturnPair(cons : Pair) = >>>> BEGIN >>>> cons.first := NIL; >>>> LOCK mu DO >>>> cons.rest := free; >>>> free := cons >>>> END >>>> END ReturnPair; >>>> >>>> my eval code looks like >>>> >>>> VAR okToFree : BOOLEAN; BEGIN >>>> >>>> args := GetPair(); ... >>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>> >>>> IF okToFree THEN ReturnPair(args) END; >>>> RETURN result >>>> END >>>> >>>> and this does work well. In fact it speeds up the Linux >>>> implementation >>>> by almost 100% to recycle the lists like this *just* for the >>>> evaluation of Scheme primitives. >>>> >>>> But it's still ugly, isn't it? There's a mutex, and a global >>>> variable. And yes, the time spent messing with the mutex is >>>> noticeable, and I haven't even made the code multi-threaded yet >>>> (and that is coming!) >>>> >>>> So I'm thinking, what I really want is a structure that is attached >>>> to my current Thread.T. I want to be able to access just a single >>>> pointer (like the free list) but be sure it is unique to my current >>>> thread. No locking would be necessary if I could do this. >>>> >>>> Does anyone have an elegant solution that does something like this? >>>> Thread-specific "static" variables? Just one REFANY would be enough >>>> for a lot of uses... seems to me this should be a frequently >>>> occurring problem? >>>> >>>> Best regards, >>>> Mika >>>> >>>> >>>> >>>> >>>> >>>> From jay.krell at cornell.edu Sat Oct 18 00:42:35 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 17 Oct 2008 22:42:35 +0000 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> References: Your message of <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> Message-ID: Right and wrong. Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGetValue. Wrong that this necessarily incurs a switch to the supervisor/kernel, and perhaps wrong to call that at a "context switch". It depends on the operating system. I will explain. On Windows/x86, the FS register points to a partly documented per-thread data structure. C and C++ exception handling use FS:0. Disassemble any code. You'll find it is used. Not by Modula-3 though. Disassemble TlsGetValue. cdb /z %windir%\system32\kernel32.dll 0:000> uf kernel32!TlsGetValue kernel32!TlsGetValue: typical looking prolog.. 7dd813e0 8bff mov edi,edi 7dd813e2 55 push ebp 7dd813e3 8bec mov ebp,esp fs:18 contains a "normal" "linear" pointer to fs:0 Get that pointer. 7dd813e5 64a118000000 mov eax,dword ptr fs:[00000018h] get the index 7dd813eb 8b4d08 mov ecx,dword ptr [ebp+8] SetLastError(0) 7dd813ee 83603400 and dword ptr [eax+34h],0 There are 64 preallocated thread local slots -- compare the index to 64. 7dd813f2 83f940 cmp ecx,40h If it above or equal to 64, go use the non preallocated slots. 7dd813f5 0f8353e20200 jae kernel32!lstrcmpi+0x4b22 (7ddaf64e) preallocated slots are at fs:e10; get the data and done kernel32!TlsGetValue+0x1b: 7dd813fb 8b8488100e0000 mov eax,dword ptr [eax+ecx*4+0E10h] epilog kernel32!TlsGetValue+0x22: 7dd81402 5d pop ebp 7dd81403 c20400 ret 4 get here for indices>= 64 compare index to 1088 == 1024 + 64, as there are another 1024 more slowly available slots kernel32!lstrcmpi+0x4b22: 7ddaf64e 81f940040000 cmp ecx,440h if it is below 1024, go use those slots 7ddaf654 7211 jb kernel32!lstrcmpi+0x4b3b (7ddaf667) index is above or equal to 1024, SetLastError(invalid parameter) kernel32!lstrcmpi+0x4b2a: 7ddaf656 680d0000c0 push 0C000000Dh 7ddaf65b e80025fdff call kernel32!GetProcessHeap+0x12 (7dd81b60) and return 0 -- 0 is not unambiguously an error -- that's why last error was cleared at the start kernel32!lstrcmpi+0x4b34: 7ddaf660 33c0 xor eax,eax 7ddaf662 e99b1dfdff jmp kernel32!TlsGetValue+0x22 (7dd81402) This is where the slots between 64 and 1088 are used. Get pointer from FS:F94 and compare to null. If it is null, that is ok, it means nobody has yet calls TlsSetValue for this value, so it just retains its initial 0 value. kernel32!lstrcmpi+0x4b3b: 7ddaf667 8b80940f0000 mov eax,dword ptr [eax+0F94h] 7ddaf66d 85c0 test eax,eax 7ddaf66f 74ef je kernel32!lstrcmpi+0x4b34 (7ddaf660) Index is between 64 and 1088, and there is a non null pointer at FS:F94. Subtract 64 from index and index into pointer there. Note it does the subtraction after the multiplication, so subtracts 64*4=0x100. kernel32!lstrcmpi+0x4b45: 7ddaf671 8b848800ffffff mov eax,dword ptr [eax+ecx*4-100h] 7ddaf678 e9851dfdff jmp kernel32!TlsGetValue+0x22 (7dd81402) So, it is a few instructions but there is no context switch into the kernel/supervisor. Also, calls into the kernel aren't necessarily a "context switch". Some context is saved, and a bit is twiddled in the processor to indicate a privilege level change, but no page tables are altered and I believe no TLBs (translation lookaside buffer) are invalidated, and no thread scheduling decisions are made -- though upon exit from the kernel, APCs (asynchronous procedure call) can be run -- on the calling thread. A more expensive context switch is when another thread or another process runs. Switching threads requires saving more context, and switching processes requires changing the register that points to the page tables. One detail there -- calling into the x86 NT kernel does not preserve floating point state -- that's the additional state that a thread switch has to save, at least. NT/x86 kernel drivers aren't allowed to use floating point, with some exception, like if they are video drivers (only certain functions?) or they explicitly save/restore the floating point registers using public functions. I don't know about the other architectures. I think IA64 only preserves some floating point state, not all. Now, the question then is how is Upthread.getspecific implemented on other archictures and operating systems. We should look into that for various operating systems. Oh, also, let's see what __declspec(thread) does. >type t.c __declspec(thread) int a; void F1(int); void F2() { F1(a); } cl -c t.c link -dump -disasm t.obj Dump of file t.obj File Type: COFF OBJECT _F2: 00000000: 55 push ebp 00000001: 8B EC mov ebp,esp 00000003: A1 00 00 00 00 mov eax,dword ptr [__tls_index] 00000008: 64 8B 0D 00 00 00 mov ecx,dword ptr fs:[__tls_array] 00 0000000F: 8B 14 81 mov edx,dword ptr [ecx+eax*4] 00000012: 8B 82 00 00 00 00 mov eax,dword ptr _a[edx] 00000018: 50 push eax 00000019: E8 00 00 00 00 call _F1 0000001E: 83 C4 04 add esp,4 00000021: 5D pop ebp 00000022: C3 ret See the compiler generated code reference FS directly. The optimized version is: Dump of file t.obj File Type: COFF OBJECT _F2: 00000000: A1 00 00 00 00 mov eax,dword ptr [__tls_index] 00000005: 64 8B 0D 00 00 00 mov ecx,dword ptr fs:[__tls_array] 00 0000000C: 8B 14 81 mov edx,dword ptr [ecx+eax*4] 0000000F: 8B 82 00 00 00 00 mov eax,dword ptr _a[edx] 00000015: 50 push eax 00000016: E8 00 00 00 00 call _F1 0000001B: 59 pop ecx 0000001C: C3 ret - Jay > To: hosking at cs.purdue.edu > Date: Fri, 17 Oct 2008 01:32:28 -0700 > From: mika at async.caltech.edu > CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu > Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? > > Ok I am sorry I am slow to pick up on this. > > I take it the problem is actually the Upthread.getspecific routine, > which itself calls something get_curthread somewhere inside pthreads, > which in turn involves a context switch to the supervisor---the identity > of the current thread is just not accessible anywhere in user space. > Also explains why this program runs faster with my old PM3, which uses > longjmp threads. > > The only way to avoid it (really) is to pass a pointer to the > Thread.T of the currently executing thread in the activation record > of *every* procedure, so that allocators can find it when necessary.... > but that is very expensive in terms of stack memory. > > Or I can just make a structure like that that I pass around where > I need it in my own program. Thread-specific and user-managed. > > I believe I have just answered all my own questions, but I hope > Tony will correct me if my answers are incorrect. > > Mika > > Tony Hosking writes: >>I suspect part of the overhead of allocation in the new code is the >>need for thread-local allocation buffers, which means we need to >>access thread-local state. We really need an efficient way to do >>that, but pthreads thread-local accesses may be what is killing you. >> >>On 17 Oct 2008, at 00:30, Mika Nystrom wrote: >> >>> Hi Tony, >>> >>> I figured you would chime in! >>> >>> Yes, @M3noincremental seems to make things consistently a tad bit >>> slower (but a very small difference), on both FreeBSD and Linux. >>> @M3nogc makes a bigger difference, of course. >>> >>> Unfortunately I seem to have lost the code that did a lot of memory >>> allocations. My tricks (as described in the email---and others!) >>> have removed most of the troublesome memory allocations, but now >>> I'm stuck with the mutex instead... >>> >>> Mika >>> >>> Tony Hosking writes: >>>> Have you tried running @M3noincremental? >>>> >>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> As I mentioned in an earlier email about printing structures (thanks >>>>> Darko), I'm in the midst of coding an interpreter embedded in >>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>>> Norvig's >>>>> JScheme for Java (well it was at first strongly based, but more and >>>>> more loosely, if you know what I mean...) >>>>> >>>>> I expected that the performance of the interpreter would be much >>>>> better in Modula-3 than in Java, and I have been testing on two >>>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>>> and the other is CM3 on a recent Debian system. What I am finding >>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>>> (getting >>>>> close to ten times as fast on some tasks at this point), but on >>>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>>> >>>>> When I started, with code that was essentially equivalent to >>>>> JScheme, >>>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>>> spend most of its time in (surprise, surprise!) memory allocation >>>>> and garbage collection. The speedup I have achieved between the >>>>> first implementation and now was due to the use of Modula-3 >>>>> constructs >>>>> that are superior to Java's, such as the use of arrays of RECORDs >>>>> to make small stacks rather than linked lists. (I get readable >>>>> code with much fewer memory allocations and GC work.) >>>>> >>>>> Now, since this is an interpreter, I as the implementer have limited >>>>> control over how much memory is allocated and freed, and where it is >>>>> needed. However, I can sometimes fall back on C-style memory >>>>> management, >>>>> but I would like to do it in a safe way. For instance, I have >>>>> special-cased >>>>> evaluation of Scheme primitives, as follows. >>>>> >>>>> Under the "normal" implementation, a list of things to evaluate is >>>>> built up, passed to an evaluation function, and then the GC is left >>>>> to sweep up the mess. The problem is that there are various tricky >>>> routes by which references can escape the evaluator, so you can't >>>>> just assume that what you put in is going to be dead right after >>>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>>> it's unclear (in which case the problem is left up to the GC). >>>>> >>>>> For the vast majority of Scheme primitives, one can indeed free the >>>>> list right after the eval. Now of course I am not interested >>>>> in unsafe code, so what I do is this: >>>>> >>>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>>> >>>>> VAR >>>>> mu := NEW(MUTEX); >>>>> free : Pair := NIL; >>>>> >>>>> PROCEDURE GetPair() : Pair = >>>>> BEGIN >>>>> LOCK mu DO >>>>> IF free # NIL THEN >>>>> TRY >>>>> RETURN free >>>>> FINALLY >>>>> free := free.rest >>>>> END >>>>> END >>>>> END; >>>>> RETURN NEW(Pair) >>>>> END GetPair; >>>>> >>>>> PROCEDURE ReturnPair(cons : Pair) = >>>>> BEGIN >>>>> cons.first := NIL; >>>>> LOCK mu DO >>>>> cons.rest := free; >>>>> free := cons >>>>> END >>>>> END ReturnPair; >>>>> >>>>> my eval code looks like >>>>> >>>>> VAR okToFree : BOOLEAN; BEGIN >>>>> >>>>> args := GetPair(); ... >>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>>> >>>>> IF okToFree THEN ReturnPair(args) END; >>>>> RETURN result >>>>> END >>>>> >>>>> and this does work well. In fact it speeds up the Linux >>>>> implementation >>>>> by almost 100% to recycle the lists like this *just* for the >>>>> evaluation of Scheme primitives. >>>>> >>>>> But it's still ugly, isn't it? There's a mutex, and a global >>>>> variable. And yes, the time spent messing with the mutex is >>>>> noticeable, and I haven't even made the code multi-threaded yet >>>>> (and that is coming!) >>>>> >>>>> So I'm thinking, what I really want is a structure that is attached >>>>> to my current Thread.T. I want to be able to access just a single >>>>> pointer (like the free list) but be sure it is unique to my current >>>>> thread. No locking would be necessary if I could do this. >>>>> >>>>> Does anyone have an elegant solution that does something like this? >>>>> Thread-specific "static" variables? Just one REFANY would be enough >>>>> for a lot of uses... seems to me this should be a frequently >>>>> occurring problem? >>>>> >>>>> Best regards, >>>>> Mika >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> From mika at async.caltech.edu Sat Oct 18 01:00:28 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 17 Oct 2008 16:00:28 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 22:42:35 -0000." Message-ID: <200810172300.m9HN0SfN008554@camembert.async.caltech.edu> No, I didn't mean that it *necessarily* involves a context switch. Obviously it doesn't, because the user-level threading doesn't ever need to do a "kernel" context switch (but of course does its own switching, however I don't see that it would need that to get or set a variable). I just meant that looking at the (C) implementation of pthreads I have (on FreeBSD), on that system, it does seem to, as the code in question is marked as "kernel code". In any case I think I have been able to solve my particular problem by identifying a data structure that is inherently only accessed from a single thread (in my program) and attaching my memory recycling trickery to that particular structure. I get very little memory allocation/GC and no need for locks at all, which is precisely the effect I was going for. I am still a little bit concerned about the performance of CM3-generated code but the main culprit appears to be TYPECASE/ISTYPE now, far from garbage collectors and thread libraries. I'll send an update if I can find something egregiously inefficient. Mika Jay writes: > >Right and wrong. > >Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGet >Value. >Wrong that this necessarily incurs a switch to the supervisor/kernel, and perh >aps wrong to call that at a "context switch". It depends on the operating syst >em. > >I will explain. > >On Windows/x86, the FS register points to a partly documented per-thread data >structure. >C and C++ exception handling use FS:0. >Disassemble any code. You'll find it is used. Not by Modula-3 though. > >Disassemble TlsGetValue. > > cdb /z %windir%\system32\kernel32.dll > >0:000> uf kernel32!TlsGetValue >kernel32!TlsGetValue: ... From mika at async.caltech.edu Sat Oct 18 10:41:30 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Sat, 18 Oct 2008 01:41:30 -0700 Subject: [M3devel] Fortran Message-ID: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu> Ok now in the realm of crazy questions---and I apologize to those whose inboxes I clog with some of my emails... If there is anyone out there in Modula-3-ether who has ever written or heard of ... an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs ... would he please make himself known to me? (I have a Scheme interpreter to trade...) Mika From lemming at henning-thielemann.de Sat Oct 18 17:34:50 2008 From: lemming at henning-thielemann.de (Henning Thielemann) Date: Sat, 18 Oct 2008 17:34:50 +0200 (MEST) Subject: [M3devel] Fortran In-Reply-To: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu> References: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu> Message-ID: On Sat, 18 Oct 2008, Mika Nystrom wrote: > Ok now in the realm of crazy questions---and I apologize to those > whose inboxes I clog with some of my emails... > > If there is anyone out there in Modula-3-ether who has ever written > or heard of ... > > an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs > > ... would he please make himself known to me? (I have a Scheme > interpreter to trade...) I have written a program for generating Modula-3 interfaces for LAPACK (linear algebra routines) using m3coco. But I'm afraid that my Fortran parser works only for LAPACK and no other library. I have just copied the CVS files to http://modula3.elegosoft.com/cgi-bin/cvsweb.cgi/m3/pm3/language/parsing/m3coco/test/?cvsroot=PM3 Before you check this out, I might move it to a different location, maybe cm3/m3-tools, if this is more appropriate. (Maybe you also need the revised m3coco version, which I only have on a branch, and never tried to merge it back to HEAD.) While searching my own code in the net, I found some nice interviews with Luca Cardelli: http://www.wikio.com/technology/development/modula-3 From mika at async.caltech.edu Tue Oct 21 13:05:01 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 21 Oct 2008 04:05:01 -0700 Subject: [M3devel] CM3 on Mac OS X Tiger Message-ID: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu> Hello everyone, Sorry if I have asked this before---I feel I must have, and Tony probably answered it, too, but I can't find it anywhere in my email archives. It looks like I finally upgraded my Mac to Tiger a half year ago, and everything broke. (Modula-3, emacs, make, etc etc etc etc.) I am finally getting around to fixing it. Now I am trying to compile CM3 in accordance with Tony's instructions as of June 24, 2007: (short quote here) > cd ~/cm3-cvs > mkdir boot > cd boot > tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz > ./cminstall Now you will have some kind of cm3 installed, presumably in /usr/ local/cm3/bin/cm3. Make sure you have a fresh CVS checkout in directory cm3 (let's assume this is in your home directory ~/cm3). Also, make sure you have an up-to-date version of the CM3 backend compiler cm3cg installed by executing the following: STEP 0: export CM3=/usr/local/cm3/bin/cm3 cd ~/cm3/m3-sys/m3cc $CM3 $CM3 -ship You can skip this last step if you know your backend compiler is up to date. Now, let's build the new compiler from scratch (this is the sequence I use regularly to test changes to the run-time system whenever I make them): STEP 1: cd ~/cm3/m3-libs/m3core $CM3 $CM3 -ship (end short quote, there's much more) What happens is that when building m3core, my compiler is building it against the interfaces in /usr/local/cm3, NOT the interfaces within m3core itself: --- building in PPC_DARWIN --- ignoring ../src/m3overrides new source -> compiling RTCollector.m3 "../src/runtime/common/RTCollector.m3", line 2914: unknown qualification '.' (AMD64_LINUX) "../src/runtime/common/RTCollector.m3", line 2915: unknown qualification '.' (SPARC32_LINUX) "../src/runtime/common/RTCollector.m3", line 2916: unknown qualification '.' (SPARC64_OPENBSD) "../src/runtime/common/RTCollector.m3", line 2917: unknown qualification '.' (PPC32_OPENBSD) 4 errors encountered stale imports -> compiling RTDebug.m3 Fatal Error: bad version stamps: RTDebug.m3 version stamp mismatch: Compiler.Platform => RTDebug.m3 => Compiler.i3 version stamp mismatch: Compiler.ThisPlatform <8b5a6f513e082750> => RTDebug.m3 <8e110d4fed998051> => Compiler.i3 I feel like I should REALLY know the answer to this, but how do I get the compiler to use only the local sources and not attempt to compile things with reference to the already-installed interfaces? Mika From hosking at cs.purdue.edu Tue Oct 21 13:21:36 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 12:21:36 +0100 Subject: [M3devel] CM3 on Mac OS X Tiger In-Reply-To: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu> References: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu> Message-ID: <27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu> This is a phase ordering problem that arises when you use an old compiler to compile newer sources. It really should be fixed somehow. In any case, the problem is those lines in RTCollector at the bottom (I deleted them yesterday on the main trunk) that refer to values supposedly built in to the compiler (which are not there for the old binary you are using). I think if you delete those lines then you should be OK. Once you have a new compiler bootstrapped (with those configuration values available built in) then you should be able to compile that code (excepting that I just deleted those lines yesterday). On 21 Oct 2008, at 12:05, Mika Nystrom wrote: > Hello everyone, > > Sorry if I have asked this before---I feel I must have, and Tony > probably answered it, too, but I can't find it anywhere in my email > archives. > > It looks like I finally upgraded my Mac to Tiger a half year ago, > and everything broke. (Modula-3, emacs, make, etc etc etc etc.) > I am finally getting around to fixing it. Now I am trying to > compile CM3 in accordance with Tony's instructions as of June 24, > 2007: > > (short quote here) >> cd ~/cm3-cvs >> mkdir boot >> cd boot >> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz >> ./cminstall > > Now you will have some kind of cm3 installed, presumably in /usr/ > local/cm3/bin/cm3. > > Make sure you have a fresh CVS checkout in directory cm3 (let's > assume this is in your home directory ~/cm3). Also, make sure you > have an up-to-date version of the CM3 backend compiler cm3cg > installed by executing the following: > > STEP 0: > > export CM3=/usr/local/cm3/bin/cm3 > cd ~/cm3/m3-sys/m3cc > $CM3 > $CM3 -ship > > You can skip this last step if you know your backend compiler is up > to date. > > Now, let's build the new compiler from scratch (this is the sequence > I use regularly to test changes to the run-time system whenever I > make them): > > STEP 1: > > cd ~/cm3/m3-libs/m3core > $CM3 > $CM3 -ship > (end short quote, there's much more) > > What happens is that when building m3core, my compiler is building > it against the interfaces in /usr/local/cm3, NOT the interfaces > within m3core itself: > > --- building in PPC_DARWIN --- > > ignoring ../src/m3overrides > > new source -> compiling RTCollector.m3 > "../src/runtime/common/RTCollector.m3", line 2914: unknown > qualification '.' (AMD64_LINUX) > "../src/runtime/common/RTCollector.m3", line 2915: unknown > qualification '.' (SPARC32_LINUX) > "../src/runtime/common/RTCollector.m3", line 2916: unknown > qualification '.' (SPARC64_OPENBSD) > "../src/runtime/common/RTCollector.m3", line 2917: unknown > qualification '.' (PPC32_OPENBSD) > 4 errors encountered > stale imports -> compiling RTDebug.m3 > > Fatal Error: bad version stamps: RTDebug.m3 > > version stamp mismatch: Compiler.Platform > => RTDebug.m3 > => Compiler.i3 > version stamp mismatch: Compiler.ThisPlatform > <8b5a6f513e082750> => RTDebug.m3 > <8e110d4fed998051> => Compiler.i3 > > I feel like I should REALLY know the answer to this, but how do I > get the compiler to use only the local sources and not attempt > to compile things with reference to the already-installed > interfaces? > > Mika From hosking at cs.purdue.edu Tue Oct 21 16:54:58 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 15:54:58 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> Message-ID: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu> I have one more question that I forgot to ask before. Did you evaluate performance with -O3 optimization in the backend? Generally, I have the following in my m3_backend specs so that turning on optimization results in -O3 (and lots of lovely inlining): proc m3_backend (source, object, optimize, debug) is local args = [ "-m32", "-quiet", source, "-o", object, % fPIC really is needed here, despite man gcc saying it is the default. % This is because man gcc is about Apple's gcc but m3cg is % built from FSF source. "-fPIC", "-fno-reorder-blocks" ] if optimize args += "-O3" end if debug args += "-gstabs" end if M3_PROFILING args += "-p" end return try_exec (m3back, args) end On 17 Oct 2008, at 09:32, Mika Nystrom wrote: > Ok I am sorry I am slow to pick up on this. > > I take it the problem is actually the Upthread.getspecific routine, > which itself calls something get_curthread somewhere inside pthreads, > which in turn involves a context switch to the supervisor---the > identity > of the current thread is just not accessible anywhere in user space. > Also explains why this program runs faster with my old PM3, which uses > longjmp threads. > > The only way to avoid it (really) is to pass a pointer to the > Thread.T of the currently executing thread in the activation record > of *every* procedure, so that allocators can find it when > necessary.... > but that is very expensive in terms of stack memory. > > Or I can just make a structure like that that I pass around where > I need it in my own program. Thread-specific and user-managed. > > I believe I have just answered all my own questions, but I hope > Tony will correct me if my answers are incorrect. > > Mika > > Tony Hosking writes: >> I suspect part of the overhead of allocation in the new code is the >> need for thread-local allocation buffers, which means we need to >> access thread-local state. We really need an efficient way to do >> that, but pthreads thread-local accesses may be what is killing you. >> >> On 17 Oct 2008, at 00:30, Mika Nystrom wrote: >> >>> Hi Tony, >>> >>> I figured you would chime in! >>> >>> Yes, @M3noincremental seems to make things consistently a tad bit >>> slower (but a very small difference), on both FreeBSD and Linux. >>> @M3nogc makes a bigger difference, of course. >>> >>> Unfortunately I seem to have lost the code that did a lot of memory >>> allocations. My tricks (as described in the email---and others!) >>> have removed most of the troublesome memory allocations, but now >>> I'm stuck with the mutex instead... >>> >>> Mika >>> >>> Tony Hosking writes: >>>> Have you tried running @M3noincremental? >>>> >>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> As I mentioned in an earlier email about printing structures >>>>> (thanks >>>>> Darko), I'm in the midst of coding an interpreter embedded in >>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>>> Norvig's >>>>> JScheme for Java (well it was at first strongly based, but more >>>>> and >>>>> more loosely, if you know what I mean...) >>>>> >>>>> I expected that the performance of the interpreter would be much >>>>> better in Modula-3 than in Java, and I have been testing on two >>>>> different systems. One is my ancient FreeBSD-4.11 with an old >>>>> PM3, >>>>> and the other is CM3 on a recent Debian system. What I am finding >>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>>> (getting >>>>> close to ten times as fast on some tasks at this point), but on >>>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>>> >>>>> When I started, with code that was essentially equivalent to >>>>> JScheme, >>>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>>> spend most of its time in (surprise, surprise!) memory allocation >>>>> and garbage collection. The speedup I have achieved between the >>>>> first implementation and now was due to the use of Modula-3 >>>>> constructs >>>>> that are superior to Java's, such as the use of arrays of RECORDs >>>>> to make small stacks rather than linked lists. (I get readable >>>>> code with much fewer memory allocations and GC work.) >>>>> >>>>> Now, since this is an interpreter, I as the implementer have >>>>> limited >>>>> control over how much memory is allocated and freed, and where >>>>> it is >>>>> needed. However, I can sometimes fall back on C-style memory >>>>> management, >>>>> but I would like to do it in a safe way. For instance, I have >>>>> special-cased >>>>> evaluation of Scheme primitives, as follows. >>>>> >>>>> Under the "normal" implementation, a list of things to evaluate is >>>>> built up, passed to an evaluation function, and then the GC is >>>>> left >>>>> to sweep up the mess. The problem is that there are various >>>>> tricky >>>> routes by which references can escape the evaluator, so you can't >>>>> just assume that what you put in is going to be dead right after >>>>> an eval and free it. Instead, I set a flag in the evaluator, >>>>> which >>>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>>> it's unclear (in which case the problem is left up to the GC). >>>>> >>>>> For the vast majority of Scheme primitives, one can indeed free >>>>> the >>>>> list right after the eval. Now of course I am not interested >>>>> in unsafe code, so what I do is this: >>>>> >>>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>>> >>>>> VAR >>>>> mu := NEW(MUTEX); >>>>> free : Pair := NIL; >>>>> >>>>> PROCEDURE GetPair() : Pair = >>>>> BEGIN >>>>> LOCK mu DO >>>>> IF free # NIL THEN >>>>> TRY >>>>> RETURN free >>>>> FINALLY >>>>> free := free.rest >>>>> END >>>>> END >>>>> END; >>>>> RETURN NEW(Pair) >>>>> END GetPair; >>>>> >>>>> PROCEDURE ReturnPair(cons : Pair) = >>>>> BEGIN >>>>> cons.first := NIL; >>>>> LOCK mu DO >>>>> cons.rest := free; >>>>> free := cons >>>>> END >>>>> END ReturnPair; >>>>> >>>>> my eval code looks like >>>>> >>>>> VAR okToFree : BOOLEAN; BEGIN >>>>> >>>>> args := GetPair(); ... >>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>>> >>>>> IF okToFree THEN ReturnPair(args) END; >>>>> RETURN result >>>>> END >>>>> >>>>> and this does work well. In fact it speeds up the Linux >>>>> implementation >>>>> by almost 100% to recycle the lists like this *just* for the >>>>> evaluation of Scheme primitives. >>>>> >>>>> But it's still ugly, isn't it? There's a mutex, and a global >>>>> variable. And yes, the time spent messing with the mutex is >>>>> noticeable, and I haven't even made the code multi-threaded yet >>>>> (and that is coming!) >>>>> >>>>> So I'm thinking, what I really want is a structure that is >>>>> attached >>>>> to my current Thread.T. I want to be able to access just a single >>>>> pointer (like the free list) but be sure it is unique to my >>>>> current >>>>> thread. No locking would be necessary if I could do this. >>>>> >>>>> Does anyone have an elegant solution that does something like >>>>> this? >>>>> Thread-specific "static" variables? Just one REFANY would be >>>>> enough >>>>> for a lot of uses... seems to me this should be a frequently >>>>> occurring problem? >>>>> >>>>> Best regards, >>>>> Mika >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> From hosking at cs.purdue.edu Tue Oct 21 17:17:24 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 16:17:24 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu> References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu> Message-ID: <1396C14A-B23D-4D19-804B-B1627B44106F@cs.purdue.edu> Also, turn off assertions. On 21 Oct 2008, at 15:54, Tony Hosking wrote: > I have one more question that I forgot to ask before. Did you > evaluate performance with -O3 optimization in the backend? > > Generally, I have the following in my m3_backend specs so that > turning on optimization results in -O3 (and lots of lovely inlining): > > proc m3_backend (source, object, optimize, debug) is > local args = > [ > "-m32", > "-quiet", > source, > "-o", > object, > % fPIC really is needed here, despite man gcc saying it is the > default. > % This is because man gcc is about Apple's gcc but m3cg is > % built from FSF source. > "-fPIC", > "-fno-reorder-blocks" > ] > if optimize args += "-O3" end > if debug args += "-gstabs" end > if M3_PROFILING args += "-p" end > return try_exec (m3back, args) > end > > > On 17 Oct 2008, at 09:32, Mika Nystrom wrote: > >> Ok I am sorry I am slow to pick up on this. >> >> I take it the problem is actually the Upthread.getspecific routine, >> which itself calls something get_curthread somewhere inside pthreads, >> which in turn involves a context switch to the supervisor---the >> identity >> of the current thread is just not accessible anywhere in user space. >> Also explains why this program runs faster with my old PM3, which >> uses >> longjmp threads. >> >> The only way to avoid it (really) is to pass a pointer to the >> Thread.T of the currently executing thread in the activation record >> of *every* procedure, so that allocators can find it when >> necessary.... >> but that is very expensive in terms of stack memory. >> >> Or I can just make a structure like that that I pass around where >> I need it in my own program. Thread-specific and user-managed. >> >> I believe I have just answered all my own questions, but I hope >> Tony will correct me if my answers are incorrect. >> >> Mika >> >> Tony Hosking writes: >>> I suspect part of the overhead of allocation in the new code is the >>> need for thread-local allocation buffers, which means we need to >>> access thread-local state. We really need an efficient way to do >>> that, but pthreads thread-local accesses may be what is killing you. >>> >>> On 17 Oct 2008, at 00:30, Mika Nystrom wrote: >>> >>>> Hi Tony, >>>> >>>> I figured you would chime in! >>>> >>>> Yes, @M3noincremental seems to make things consistently a tad bit >>>> slower (but a very small difference), on both FreeBSD and Linux. >>>> @M3nogc makes a bigger difference, of course. >>>> >>>> Unfortunately I seem to have lost the code that did a lot of memory >>>> allocations. My tricks (as described in the email---and others!) >>>> have removed most of the troublesome memory allocations, but now >>>> I'm stuck with the mutex instead... >>>> >>>> Mika >>>> >>>> Tony Hosking writes: >>>>> Have you tried running @M3noincremental? >>>>> >>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>>>> >>>>>> Hello Modula-3 people, >>>>>> >>>>>> As I mentioned in an earlier email about printing structures >>>>>> (thanks >>>>>> Darko), I'm in the midst of coding an interpreter embedded in >>>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>>>> Norvig's >>>>>> JScheme for Java (well it was at first strongly based, but more >>>>>> and >>>>>> more loosely, if you know what I mean...) >>>>>> >>>>>> I expected that the performance of the interpreter would be much >>>>>> better in Modula-3 than in Java, and I have been testing on two >>>>>> different systems. One is my ancient FreeBSD-4.11 with an old >>>>>> PM3, >>>>>> and the other is CM3 on a recent Debian system. What I am >>>>>> finding >>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>>>> (getting >>>>>> close to ten times as fast on some tasks at this point), but on >>>>>> Linux/CM3 it is much closer in speed to JScheme than I would >>>>>> like. >>>>>> >>>>>> When I started, with code that was essentially equivalent to >>>>>> JScheme, >>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>>>> spend most of its time in (surprise, surprise!) memory allocation >>>>>> and garbage collection. The speedup I have achieved between the >>>>>> first implementation and now was due to the use of Modula-3 >>>>>> constructs >>>>>> that are superior to Java's, such as the use of arrays of RECORDs >>>>>> to make small stacks rather than linked lists. (I get readable >>>>>> code with much fewer memory allocations and GC work.) >>>>>> >>>>>> Now, since this is an interpreter, I as the implementer have >>>>>> limited >>>>>> control over how much memory is allocated and freed, and where >>>>>> it is >>>>>> needed. However, I can sometimes fall back on C-style memory >>>>>> management, >>>>>> but I would like to do it in a safe way. For instance, I have >>>>>> special-cased >>>>>> evaluation of Scheme primitives, as follows. >>>>>> >>>>>> Under the "normal" implementation, a list of things to evaluate >>>>>> is >>>>>> built up, passed to an evaluation function, and then the GC is >>>>>> left >>>>>> to sweep up the mess. The problem is that there are various >>>>>> tricky >>>>> routes by which references can escape the evaluator, so you can't >>>>>> just assume that what you put in is going to be dead right after >>>>>> an eval and free it. Instead, I set a flag in the evaluator, >>>>>> which >>>>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>>>> it's unclear (in which case the problem is left up to the GC). >>>>>> >>>>>> For the vast majority of Scheme primitives, one can indeed free >>>>>> the >>>>>> list right after the eval. Now of course I am not interested >>>>>> in unsafe code, so what I do is this: >>>>>> >>>>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>>>> >>>>>> VAR >>>>>> mu := NEW(MUTEX); >>>>>> free : Pair := NIL; >>>>>> >>>>>> PROCEDURE GetPair() : Pair = >>>>>> BEGIN >>>>>> LOCK mu DO >>>>>> IF free # NIL THEN >>>>>> TRY >>>>>> RETURN free >>>>>> FINALLY >>>>>> free := free.rest >>>>>> END >>>>>> END >>>>>> END; >>>>>> RETURN NEW(Pair) >>>>>> END GetPair; >>>>>> >>>>>> PROCEDURE ReturnPair(cons : Pair) = >>>>>> BEGIN >>>>>> cons.first := NIL; >>>>>> LOCK mu DO >>>>>> cons.rest := free; >>>>>> free := cons >>>>>> END >>>>>> END ReturnPair; >>>>>> >>>>>> my eval code looks like >>>>>> >>>>>> VAR okToFree : BOOLEAN; BEGIN >>>>>> >>>>>> args := GetPair(); ... >>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>>>> >>>>>> IF okToFree THEN ReturnPair(args) END; >>>>>> RETURN result >>>>>> END >>>>>> >>>>>> and this does work well. In fact it speeds up the Linux >>>>>> implementation >>>>>> by almost 100% to recycle the lists like this *just* for the >>>>>> evaluation of Scheme primitives. >>>>>> >>>>>> But it's still ugly, isn't it? There's a mutex, and a global >>>>>> variable. And yes, the time spent messing with the mutex is >>>>>> noticeable, and I haven't even made the code multi-threaded yet >>>>>> (and that is coming!) >>>>>> >>>>>> So I'm thinking, what I really want is a structure that is >>>>>> attached >>>>>> to my current Thread.T. I want to be able to access just a >>>>>> single >>>>>> pointer (like the free list) but be sure it is unique to my >>>>>> current >>>>>> thread. No locking would be necessary if I could do this. >>>>>> >>>>>> Does anyone have an elegant solution that does something like >>>>>> this? >>>>>> Thread-specific "static" variables? Just one REFANY would be >>>>>> enough >>>>>> for a lot of uses... seems to me this should be a frequently >>>>>> occurring problem? >>>>>> >>>>>> Best regards, >>>>>> Mika >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> > From mika at async.caltech.edu Tue Oct 21 22:18:07 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 21 Oct 2008 13:18:07 -0700 Subject: [M3devel] CM3 on Mac OS X Tiger In-Reply-To: Your message of "Tue, 21 Oct 2008 12:21:36 BST." <27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu> Message-ID: <200810212018.m9LKI81o019865@camembert.async.caltech.edu> Hi Tony, Thanks for helping, as usual! I ran into this now, is this also a bootstrapping problem? (Moving on to building libm3, cleared out existing PPC_DARWIN, have rebuilt m3cc... only see a single version of Compiler.i3 anywhere...) Here's the log: [lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship --- building in PPC_DARWIN --- ignoring ../src/m3overrides new source -> compiling Atom.i3 new source -> compiling AtomList.i3 new source -> compiling OSError.i3 new source -> compiling File.i3 new source -> compiling RegularFile.i3 new source -> compiling Pipe.i3 new source -> compiling TextSeq.i3 new source -> compiling Pathname.i3 new source -> compiling FS.i3 new source -> compiling Process.i3 new source -> compiling Socket.i3 new source -> compiling Terminal.i3 new source -> compiling FS.m3 new source -> compiling Terminal.m3 new source -> compiling RegularFile.m3 new source -> compiling Pipe.m3 new source -> compiling Socket.m3 new source -> compiling OSConfig.i3 new source -> compiling OSErrorPosix.i3 new source -> compiling Fmt.i3 new source -> compiling OSErrorPosix.m3 new source -> compiling FilePosix.i3 new source -> compiling FilePosix.m3 new source -> compiling FSPosix.m3 new source -> compiling PipePosix.m3 new source -> compiling PathnamePosix.m3 new source -> compiling SocketPosix.m3 Fatal Error: bad version stamps: SocketPosix.m3 version stamp mismatch: Compiler.Platform => SocketPosix.m3 => Compiler.i3 version stamp mismatch: Compiler.ThisPlatform <8b5a6f513e082750> => SocketPosix.m3 <8e110d4fed998051> => Compiler.i3 [lapdog:~/cm3/m3-libs/libm3] mika% Tony Hosking writes: >This is a phase ordering problem that arises when you use an old >compiler to compile newer sources. It really should be fixed >somehow. In any case, the problem is those lines in RTCollector at >the bottom (I deleted them yesterday on the main trunk) that refer to >values supposedly built in to the compiler (which are not there for >the old binary you are using). I think if you delete those lines then >you should be OK. Once you have a new compiler bootstrapped (with >those configuration values available built in) then you should be able >to compile that code (excepting that I just deleted those lines >yesterday). > > >On 21 Oct 2008, at 12:05, Mika Nystrom wrote: > >> Hello everyone, >> >> Sorry if I have asked this before---I feel I must have, and Tony >> probably answered it, too, but I can't find it anywhere in my email >> archives. >> >> It looks like I finally upgraded my Mac to Tiger a half year ago, >> and everything broke. (Modula-3, emacs, make, etc etc etc etc.) >> I am finally getting around to fixing it. Now I am trying to >> compile CM3 in accordance with Tony's instructions as of June 24, >> 2007: >> >> (short quote here) >>> cd ~/cm3-cvs >>> mkdir boot >>> cd boot >>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz >>> ./cminstall >> >> Now you will have some kind of cm3 installed, presumably in /usr/ >> local/cm3/bin/cm3. >> >> Make sure you have a fresh CVS checkout in directory cm3 (let's >> assume this is in your home directory ~/cm3). Also, make sure you >> have an up-to-date version of the CM3 backend compiler cm3cg >> installed by executing the following: >> >> STEP 0: >> >> export CM3=/usr/local/cm3/bin/cm3 >> cd ~/cm3/m3-sys/m3cc >> $CM3 >> $CM3 -ship >> >> You can skip this last step if you know your backend compiler is up >> to date. >> >> Now, let's build the new compiler from scratch (this is the sequence >> I use regularly to test changes to the run-time system whenever I >> make them): >> >> STEP 1: >> >> cd ~/cm3/m3-libs/m3core >> $CM3 >> $CM3 -ship >> (end short quote, there's much more) >> >> What happens is that when building m3core, my compiler is building >> it against the interfaces in /usr/local/cm3, NOT the interfaces >> within m3core itself: >> >> --- building in PPC_DARWIN --- >> >> ignoring ../src/m3overrides >> >> new source -> compiling RTCollector.m3 >> "../src/runtime/common/RTCollector.m3", line 2914: unknown >> qualification '.' (AMD64_LINUX) >> "../src/runtime/common/RTCollector.m3", line 2915: unknown >> qualification '.' (SPARC32_LINUX) >> "../src/runtime/common/RTCollector.m3", line 2916: unknown >> qualification '.' (SPARC64_OPENBSD) >> "../src/runtime/common/RTCollector.m3", line 2917: unknown >> qualification '.' (PPC32_OPENBSD) >> 4 errors encountered >> stale imports -> compiling RTDebug.m3 >> >> Fatal Error: bad version stamps: RTDebug.m3 >> >> version stamp mismatch: Compiler.Platform >> => RTDebug.m3 >> => Compiler.i3 >> version stamp mismatch: Compiler.ThisPlatform >> <8b5a6f513e082750> => RTDebug.m3 >> <8e110d4fed998051> => Compiler.i3 >> >> I feel like I should REALLY know the answer to this, but how do I >> get the compiler to use only the local sources and not attempt >> to compile things with reference to the already-installed >> interfaces? >> >> Mika From hosking at cs.purdue.edu Tue Oct 21 23:29:07 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 22:29:07 +0100 Subject: [M3devel] CM3 on Mac OS X Tiger In-Reply-To: <200810212018.m9LKI81o019865@camembert.async.caltech.edu> References: <200810212018.m9LKI81o019865@camembert.async.caltech.edu> Message-ID: Hmm. Not sure. Looks like it. On 21 Oct 2008, at 21:18, Mika Nystrom wrote: > Hi Tony, > > Thanks for helping, as usual! > > I ran into this now, is this also a bootstrapping problem? (Moving > on to building libm3, cleared out existing PPC_DARWIN, have rebuilt > m3cc... only see a single version of Compiler.i3 anywhere...) > > Here's the log: > > [lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship > --- building in PPC_DARWIN --- > > ignoring ../src/m3overrides > > new source -> compiling Atom.i3 > new source -> compiling AtomList.i3 > new source -> compiling OSError.i3 > new source -> compiling File.i3 > new source -> compiling RegularFile.i3 > new source -> compiling Pipe.i3 > new source -> compiling TextSeq.i3 > new source -> compiling Pathname.i3 > new source -> compiling FS.i3 > new source -> compiling Process.i3 > new source -> compiling Socket.i3 > new source -> compiling Terminal.i3 > new source -> compiling FS.m3 > new source -> compiling Terminal.m3 > new source -> compiling RegularFile.m3 > new source -> compiling Pipe.m3 > new source -> compiling Socket.m3 > new source -> compiling OSConfig.i3 > new source -> compiling OSErrorPosix.i3 > new source -> compiling Fmt.i3 > new source -> compiling OSErrorPosix.m3 > new source -> compiling FilePosix.i3 > new source -> compiling FilePosix.m3 > new source -> compiling FSPosix.m3 > new source -> compiling PipePosix.m3 > new source -> compiling PathnamePosix.m3 > new source -> compiling SocketPosix.m3 > > Fatal Error: bad version stamps: SocketPosix.m3 > > version stamp mismatch: Compiler.Platform > => SocketPosix.m3 > => Compiler.i3 > version stamp mismatch: Compiler.ThisPlatform > <8b5a6f513e082750> => SocketPosix.m3 > <8e110d4fed998051> => Compiler.i3 > [lapdog:~/cm3/m3-libs/libm3] mika% > > Tony Hosking writes: >> This is a phase ordering problem that arises when you use an old >> compiler to compile newer sources. It really should be fixed >> somehow. In any case, the problem is those lines in RTCollector at >> the bottom (I deleted them yesterday on the main trunk) that refer to >> values supposedly built in to the compiler (which are not there for >> the old binary you are using). I think if you delete those lines >> then >> you should be OK. Once you have a new compiler bootstrapped (with >> those configuration values available built in) then you should be >> able >> to compile that code (excepting that I just deleted those lines >> yesterday). >> >> >> On 21 Oct 2008, at 12:05, Mika Nystrom wrote: >> >>> Hello everyone, >>> >>> Sorry if I have asked this before---I feel I must have, and Tony >>> probably answered it, too, but I can't find it anywhere in my email >>> archives. >>> >>> It looks like I finally upgraded my Mac to Tiger a half year ago, >>> and everything broke. (Modula-3, emacs, make, etc etc etc etc.) >>> I am finally getting around to fixing it. Now I am trying to >>> compile CM3 in accordance with Tony's instructions as of June 24, >>> 2007: >>> >>> (short quote here) >>>> cd ~/cm3-cvs >>>> mkdir boot >>>> cd boot >>>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz >>>> ./cminstall >>> >>> Now you will have some kind of cm3 installed, presumably in /usr/ >>> local/cm3/bin/cm3. >>> >>> Make sure you have a fresh CVS checkout in directory cm3 (let's >>> assume this is in your home directory ~/cm3). Also, make sure you >>> have an up-to-date version of the CM3 backend compiler cm3cg >>> installed by executing the following: >>> >>> STEP 0: >>> >>> export CM3=/usr/local/cm3/bin/cm3 >>> cd ~/cm3/m3-sys/m3cc >>> $CM3 >>> $CM3 -ship >>> >>> You can skip this last step if you know your backend compiler is up >>> to date. >>> >>> Now, let's build the new compiler from scratch (this is the sequence >>> I use regularly to test changes to the run-time system whenever I >>> make them): >>> >>> STEP 1: >>> >>> cd ~/cm3/m3-libs/m3core >>> $CM3 >>> $CM3 -ship >>> (end short quote, there's much more) >>> >>> What happens is that when building m3core, my compiler is building >>> it against the interfaces in /usr/local/cm3, NOT the interfaces >>> within m3core itself: >>> >>> --- building in PPC_DARWIN --- >>> >>> ignoring ../src/m3overrides >>> >>> new source -> compiling RTCollector.m3 >>> "../src/runtime/common/RTCollector.m3", line 2914: unknown >>> qualification '.' (AMD64_LINUX) >>> "../src/runtime/common/RTCollector.m3", line 2915: unknown >>> qualification '.' (SPARC32_LINUX) >>> "../src/runtime/common/RTCollector.m3", line 2916: unknown >>> qualification '.' (SPARC64_OPENBSD) >>> "../src/runtime/common/RTCollector.m3", line 2917: unknown >>> qualification '.' (PPC32_OPENBSD) >>> 4 errors encountered >>> stale imports -> compiling RTDebug.m3 >>> >>> Fatal Error: bad version stamps: RTDebug.m3 >>> >>> version stamp mismatch: Compiler.Platform >>> => RTDebug.m3 >>> => Compiler.i3 >>> version stamp mismatch: Compiler.ThisPlatform >>> <8b5a6f513e082750> => RTDebug.m3 >>> <8e110d4fed998051> => Compiler.i3 >>> >>> I feel like I should REALLY know the answer to this, but how do I >>> get the compiler to use only the local sources and not attempt >>> to compile things with reference to the already-installed >>> interfaces? >>> >>> Mika From mika at async.caltech.edu Thu Oct 23 10:24:53 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 23 Oct 2008 01:24:53 -0700 Subject: [M3devel] NEW in RTType.m3 Message-ID: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu> Hello Modula-3 people, Does anyone know whether there is anything that prevents using NEW in RTType.m3? I added a lot of memory recycling to the Scheme interpreter I am working on, and now it seems it is spending a lot of time in Typecase and IsSubtype. I was wondering if it is possible to memoize IsSubtype inside RTType.m3... (specifically just replacing IsSubtype with an array lookup). It is the nature of the interpreter that it spends a lot of time checking types and narrowing things back and forth, as Scheme and Modula-3 references share the same representation. Mika From hosking at cs.purdue.edu Thu Oct 23 12:10:01 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Thu, 23 Oct 2008 11:10:01 +0100 Subject: [M3devel] NEW in RTType.m3 In-Reply-To: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu> References: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu> Message-ID: <7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu> Could be dangerous depending on module link orderings. Might be better to cache your own lookups in your interpreter. On 23 Oct 2008, at 09:24, Mika Nystrom wrote: > Hello Modula-3 people, > > Does anyone know whether there is anything that prevents using NEW > in RTType.m3? > > I added a lot of memory recycling to the Scheme interpreter I am > working on, and now it seems it is spending a lot of time in Typecase > and IsSubtype. I was wondering if it is possible to memoize IsSubtype > inside RTType.m3... (specifically just replacing IsSubtype with an > array lookup). > > It is the nature of the interpreter that it spends a lot of time > checking types and narrowing things back and forth, as Scheme and > Modula-3 references share the same representation. > > Mika From mika at async.caltech.edu Thu Oct 23 19:29:50 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 23 Oct 2008 10:29:50 -0700 Subject: [M3devel] NEW in RTType.m3 In-Reply-To: Your message of "Thu, 23 Oct 2008 11:10:01 BST." <7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu> Message-ID: <200810231729.m9NHToMC080136@camembert.async.caltech.edu> Well I'm not calling Typecase and IsSubtype directly---the compiler is inserting the calls. Here's an example of my code: 170 IF x # NIL AND ISTYPE(x,Symbol) THEN 171 RETURN env.lookup(x) 172 ELSIF x = NIL OR NOT ISTYPE(x,Pair) THEN 173 RETURN x 174 ELSE this code actually winds up in here (RTType.m3): PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN = VAR t: RT0.TypeDefn; BEGIN IF (a = RT0.NilTypecode) THEN RETURN TRUE END; t := Get (a); IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE END; WHILE (t.kind = ORD (TK.Obj)) DO IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END; t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent; IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE; END; END; IF (t.traced # 0) THEN RETURN (b = RT0.RefanyTypecode); ELSE RETURN (b = RT0.AddressTypecode); END; END IsSubtype; Again this is an example of something where the CM3 code seems to be hurting more than PM3, but it could be that for some reason I have more visibility into the CM3 code, or that there's an optimization difference (I haven't been able to investigate this fully yet). In any case, it's clear that if IsSubtype could be replaced with a table lookup, this kind of code would be accelerated by potentially a lot. Note that while in the above example the code might be accelerated by (in my opinion, less clear) use of TYPECODE (since I never subtype Symbol or Pair---for now!), this is not so for some NARROWs. The NARROWs also wind up calling RTType.IsSubtype, and they arise because I have types that depend on each other, and unless I want to introduce extra complexity (new partial revelations) or stick everything in the same interface, I am forced to NARROW something to avoid a circular dependency of interfaces... A method of A.T takes a B.T and a method of B.T takes an A.T, so I make a supertype X.T s.t. A.T <: X.T ; then I can declare B.T.m to take an X.T and NARROW it to A.T within B.T.m... triggering a call to the above code. (For simplicity's sake, X.T could be REFANY or ROOT.) An attempt to declare B.T.m as taking A.T would lead to a circular dependency between A and B. The code is really rather simple and it's a shame if you have to make it look much more complicated to avoid issues like these which might equally well be solved by tweaking the runtime implementation a bit. Mika Tony Hosking writes: >Could be dangerous depending on module link orderings. Might be >better to cache your own lookups in your interpreter. > >On 23 Oct 2008, at 09:24, Mika Nystrom wrote: > >> Hello Modula-3 people, >> >> Does anyone know whether there is anything that prevents using NEW >> in RTType.m3? >> >> I added a lot of memory recycling to the Scheme interpreter I am >> working on, and now it seems it is spending a lot of time in Typecase >> and IsSubtype. I was wondering if it is possible to memoize IsSubtype >> inside RTType.m3... (specifically just replacing IsSubtype with an >> array lookup). >> >> It is the nature of the interpreter that it spends a lot of time >> checking types and narrowing things back and forth, as Scheme and >> Modula-3 references share the same representation. >> >> Mika From mika at async.caltech.edu Sat Oct 25 05:16:56 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 24 Oct 2008 20:16:56 -0700 Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3 Message-ID: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu> Dear Modula-3 people, I had a crash in my program from a range error that I believe shouldn't have happened the way it did, although it's not in my code, so I'm not sure if there's a reason for the way it's done (matching a C declaration somewhere, maybe??). Here it is, from ThreadPosix.m3: PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN; timeoutInterval: LONGREAL := -1.0D0): WaitResult = <*FATAL Alerted*> BEGIN self.alertable := FALSE; RETURN XIOWait(fd, read, timeoutInterval); END IOWait; PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN; timeoutInterval: LONGREAL := -1.0D0): WaitResult RAISES {Alerted} = BEGIN self.alertable := TRUE; RETURN XIOWait(fd, read, timeoutInterval); END IOAlertWait; PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL): WaitResult RAISES {Alerted} = VAR res: INTEGER; fdindex := fd DIV FDSetSize; fdset := FDSet{fd MOD FDSetSize}; ... rest omitted ... Note that IOWait calls XIOWait. IOWait is declared as taking an INTEGER, but XIOWait takes a CARDINAL. So I really should use a CARDINAL in passing to IOWait, but since IOWait is the interface function it's not clear that I should do that (until my program crashes after passing -1 from some carelessly wrapped C code). I don't like the fact that I get a range error *inside* the library when it appears unnecessary---it should have happened in my code, as I make the call. Suggested improvement: declare all the FDs in SchedulerPosix.i3 (the interface that exports these routines) to be CARDINAL instead of INTEGER. Mika From hosking at cs.purdue.edu Mon Oct 27 15:28:52 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Mon, 27 Oct 2008 14:28:52 +0000 Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3 In-Reply-To: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu> References: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu> Message-ID: <5232F2E4-3B0E-49E5-B1C8-BB4D04C60C33@cs.purdue.edu> Sounds fair to me. On 25 Oct 2008, at 04:16, Mika Nystrom wrote: > > Dear Modula-3 people, > > I had a crash in my program from a range error that I believe > shouldn't have happened the way it did, although it's not in my > code, so I'm not sure if there's a reason for the way it's done > (matching > a C declaration somewhere, maybe??). > > Here it is, from ThreadPosix.m3: > > PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN; > timeoutInterval: LONGREAL := -1.0D0): WaitResult = > <*FATAL Alerted*> > BEGIN > self.alertable := FALSE; > RETURN XIOWait(fd, read, timeoutInterval); > END IOWait; > > PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN; > timeoutInterval: LONGREAL := -1.0D0): WaitResult > RAISES {Alerted} = > BEGIN > self.alertable := TRUE; > RETURN XIOWait(fd, read, timeoutInterval); > END IOAlertWait; > > PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL): > WaitResult > RAISES {Alerted} = > VAR res: INTEGER; > fdindex := fd DIV FDSetSize; > fdset := FDSet{fd MOD FDSetSize}; > ... rest omitted ... > > Note that IOWait calls XIOWait. IOWait is declared as taking an > INTEGER, but XIOWait takes a CARDINAL. > > So I really should use a CARDINAL in passing to IOWait, but since > IOWait is the interface function it's not clear that I should do > that (until my program crashes after passing -1 from some carelessly > wrapped C code). I don't like the fact that I get a range error > *inside* the library when it appears unnecessary---it should have > happened in my code, as I make the call. > > Suggested improvement: declare all the FDs in SchedulerPosix.i3 > (the interface that exports these routines) to be CARDINAL instead > of INTEGER. > > Mika From jay.krell at cornell.edu Thu Oct 30 22:21:09 2008 From: jay.krell at cornell.edu (Jay) Date: Thu, 30 Oct 2008 21:21:09 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: Please try this: http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 std failed to build because stubgen crashed, probably due to gc. cm3 does crash right away without @M3nogc. Something like this: cd /src wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 cd /cm3 rm -rf * tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 cd /src/cm3/scripts/python ./do-cm3-all.py realclean ./upgrade.py ./do-cm3-all.py realclean ./do-cm3-std.py buildship => it will fail, at zeus, but it should get far; you'll also need some X devel packages to get that far, I had a failure for lack of libXaw for example. I did not run anything, any of the GUI packages, but building itself with itself is a decent test. I renamed the old AMD64_LINUX archives to "1.0.0". http://www.opencm3.com/uploaded-archives/ This has the bug fix I commited last night to cm3cg, and therefore a 64 bit hosted cm3cg. jay at amd64a:/cm3/bin$ file * AMD64_LINUX: ASCII text cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped cm3.cfg: ASCII English text cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped Unix.common: ASCII English text Built on Debian 4.0r4 (r5 is out). jay at amd64a:/cm3/bin$ uname -a Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64 GNU/Linux jay at amd64a:/cm3/bin$ dmesg | head Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805) Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org) ( gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Tue Aug 19 04:30:56 UTC 2008 Though really I couldn't do it without Visual C++ on Windows providing excellent find-in-files and editing, nothing else comes close, I edit on Windows and scp the files over. :) - Jay ________________________________ From: jay.krell at cornell.edu To: dragisha at m3w.org; m3devel at elegosoft.com Date: Tue, 9 Sep 2008 09:43:03 +0000 Subject: Re: [M3devel] AMD64_LINUX status From hosking at cs.purdue.edu Fri Oct 31 11:19:51 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 31 Oct 2008 10:19:51 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: Umm, I think I found your bug with GC: Check out "RTMachine.PointerAlignment". You have it set to BITSIZE(INTEGER). I suspect what you want is something like BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should probably be 2*BYTESIZE(ADDRESS). On 30 Oct 2008, at 21:21, Jay wrote: > > Please try this: > > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > > std failed to build because stubgen crashed, probably due to gc. > cm3 does crash right away without @M3nogc. > > Something like this: > cd /src > wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > cd /cm3 > rm -rf * > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > d5.7.0.tar.bz2 > cd /src/cm3/scripts/python > ./do-cm3-all.py realclean > ./upgrade.py > ./do-cm3-all.py realclean > ./do-cm3-std.py buildship > => it will fail, at zeus, but it should get far; you'll also need > some X devel packages to get that far, I had a failure for lack of > libXaw for example. I did not run anything, any of the GUI packages, > but building itself with itself is a decent test. > > I renamed the old AMD64_LINUX archives to "1.0.0". > http://www.opencm3.com/uploaded-archives/ > > This has the bug fix I commited last night to cm3cg, and therefore a > 64 bit hosted cm3cg. > > jay at amd64a:/cm3/bin$ file * > AMD64_LINUX: ASCII text > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > for GNU/Linux 2.6.0, not stripped > cm3.cfg: ASCII English text > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Li > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > 2.6.0, not stripped > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Li > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > 2.6.0, not stripped > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > for GNU/Linux 2.6.0, not stripped > Unix.common: ASCII English text > > Built on Debian 4.0r4 (r5 is out). > jay at amd64a:/cm3/bin$ uname -a > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > x86_64 GNU/Linux > jay at amd64a:/cm3/bin$ dmesg | head > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805) > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org > ) ( > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > Tue Aug 19 04:30:56 UTC 2008 > > Though really I couldn't do it without Visual C++ on Windows > providing excellent find-in-files and editing, nothing else comes > close, I edit on Windows and scp the files over. :) > > - Jay > > ________________________________ > > From: jay.krell at cornell.edu > To: dragisha at m3w.org; m3devel at elegosoft.com > Date: Tue, 9 Sep 2008 09:43:03 +0000 > Subject: Re: [M3devel] AMD64_LINUX status > > > > From jay.krell at cornell.edu Fri Oct 31 14:52:43 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 31 Oct 2008 13:52:43 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: Tony, Excellent, thanks, that helps. How do you know and confirm the right values? I don't like guessing. And then cause then of :) : SymbolPickling font metrics...Done./cm3/bin/m3bundle -name JunoBundle -F/tmp/qk/cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTABstubgen: Processing RemoteView.T ****** runtime error:*** NEW() was unable to allocate more memory.*** file "../src/runtime/common/RTAllocator.m3", line 285*** "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit 1536: /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB --procedure-- -line- -file---exec -- _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmplnetobjv1 44 /cm3/pkg/netobj/src/netobj.tmplnetobj 64 /cm3/pkg/netobj/src/netobj.tmplinclude_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args I should debug it, and double check that I upgraded what had to be upgraded. - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Fri, 31 Oct 2008 10:19:51 +0000> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD64_LINUX status> > Umm, I think I found your bug with GC:> > Check out "RTMachine.PointerAlignment". You have it set to > BITSIZE(INTEGER). I suspect what you want is something like > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > probably be 2*BYTESIZE(ADDRESS).> > > > On 30 Oct 2008, at 21:21, Jay wrote:> > >> > Please try this:> >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> >> > std failed to build because stubgen crashed, probably due to gc.> > cm3 does crash right away without @M3nogc.> >> > Something like this:> > cd /src> > wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > cd /cm3> > rm -rf *> > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > d5.7.0.tar.bz2> > cd /src/cm3/scripts/python> > ./do-cm3-all.py realclean> > ./upgrade.py> > ./do-cm3-all.py realclean> > ./do-cm3-std.py buildship> > => it will fail, at zeus, but it should get far; you'll also need > > some X devel packages to get that far, I had a failure for lack of > > libXaw for example. I did not run anything, any of the GUI packages, > > but building itself with itself is a decent test.> >> > I renamed the old AMD64_LINUX archives to "1.0.0".> > http://www.opencm3.com/uploaded-archives/> >> > This has the bug fix I commited last night to cm3cg, and therefore a > > 64 bit hosted cm3cg.> >> > jay at amd64a:/cm3/bin$ file *> > AMD64_LINUX: ASCII text> > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > cm3.cfg: ASCII English text> > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > Unix.common: ASCII English text> >> > Built on Debian 4.0r4 (r5 is out).> > jay at amd64a:/cm3/bin$ uname -a> > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > x86_64 GNU/Linux> > jay at amd64a:/cm3/bin$ dmesg | head> > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org > > ) (> > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > Tue Aug 19 04:30:56 UTC 2008> >> > Though really I couldn't do it without Visual C++ on Windows > > providing excellent find-in-files and editing, nothing else comes > > close, I edit on Windows and scp the files over. :)> >> > - Jay> >> > ________________________________> >> > From: jay.krell at cornell.edu> > To: dragisha at m3w.org; m3devel at elegosoft.com> > Date: Tue, 9 Sep 2008 09:43:03 +0000> > Subject: Re: [M3devel] AMD64_LINUX status> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Oct 31 15:25:13 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 31 Oct 2008 14:25:13 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: <1225462205.14482.60.camel@faramir.m3w.org> References: <1220941880.9421.11.camel@faramir.m3w.org> <1225462205.14482.60.camel@faramir.m3w.org> Message-ID: It seems like there's still a problem. I haven't debugged it yet. (I'm sure glad Tony found the other problem before I debugged it.) I updated http://www.opencm3.com/uploaded-archives with Tony's fix. The older builds are now 0.0.0.1 and 0.0.0.2. - Jay> Subject: Re: [M3devel] AMD64_LINUX status> From: dragisha at m3w.org> To: jay.krell at cornell.edu> CC: hosking at cs.purdue.edu; m3devel at elegosoft.com> Date: Fri, 31 Oct 2008 15:10:05 +0100> > So, we now have fully functional AMD64_LINUX (_with_ GC)?> > TIA> > On Fri, 2008-10-31 at 13:52 +0000, Jay wrote:> > Tony, Excellent, thanks, that helps.> > How do you know and confirm the right values? I don't like guessing.> > > > And then cause then of :) :> > > > Symbol> > Pickling font metrics...> > Done.> > /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk> > /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > stubgen: Processing RemoteView.T> > > > ***> > *** runtime error:> > *** NEW() was unable to allocate more memory.> > *** file "../src/runtime/common/RTAllocator.m3", line 285> > ***> > "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit> > 1536: /cm3> > /bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > --procedure-- -line- -file---> > exec -- > > _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmpl> > netobjv1 44 /cm3/pkg/netobj/src/netobj.tmpl> > netobj 64 /cm3/pkg/netobj/src/netobj.tmpl> > include_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile> > > > 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args> > > > > > I should debug it, and double check that I upgraded what had to be> > upgraded.> > > > - Jay> > > > > > > > > From: hosking at cs.purdue.edu> > > To: jay.krell at cornell.edu> > > Date: Fri, 31 Oct 2008 10:19:51 +0000> > > CC: m3devel at elegosoft.com> > > Subject: Re: [M3devel] AMD64_LINUX status> > > > > > Umm, I think I found your bug with GC:> > > > > > Check out "RTMachine.PointerAlignment". You have it set to > > > BITSIZE(INTEGER). I suspect what you want is something like > > > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > > > probably be 2*BYTESIZE(ADDRESS).> > > > > > > > > > > > On 30 Oct 2008, at 21:21, Jay wrote:> > > > > > >> > > > Please try this:> > > >> > > >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > >> > > > std failed to build because stubgen crashed, probably due to gc.> > > > cm3 does crash right away without @M3nogc.> > > >> > > > Something like this:> > > > cd /src> > > > wget> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > > cd /cm3> > > > rm -rf *> > > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > > > d5.7.0.tar.bz2> > > > cd /src/cm3/scripts/python> > > > ./do-cm3-all.py realclean> > > > ./upgrade.py> > > > ./do-cm3-all.py realclean> > > > ./do-cm3-std.py buildship> > > > => it will fail, at zeus, but it should get far; you'll also need > > > > some X devel packages to get that far, I had a failure for lack> > of > > > > libXaw for example. I did not run anything, any of the GUI> > packages, > > > > but building itself with itself is a decent test.> > > >> > > > I renamed the old AMD64_LINUX archives to "1.0.0".> > > > http://www.opencm3.com/uploaded-archives/> > > >> > > > This has the bug fix I commited last night to cm3cg, and therefore> > a > > > > 64 bit hosted cm3cg.> > > >> > > > jay at amd64a:/cm3/bin$ file *> > > > AMD64_LINUX: ASCII text> > > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > cm3.cfg: ASCII English text> > > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > Unix.common: ASCII English text> > > >> > > > Built on Debian 4.0r4 (r5 is out).> > > > jay at amd64a:/cm3/bin$ uname -a> > > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > > > x86_64 GNU/Linux> > > > jay at amd64a:/cm3/bin$ dmesg | head> > > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2)> > (dannf at debian.org > > > > ) (> > > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > > > Tue Aug 19 04:30:56 UTC 2008> > > >> > > > Though really I couldn't do it without Visual C++ on Windows > > > > providing excellent find-in-files and editing, nothing else comes > > > > close, I edit on Windows and scp the files over. :)> > > >> > > > - Jay> > > >> > > > ________________________________> > > >> > > > From: jay.krell at cornell.edu> > > > To: dragisha at m3w.org; m3devel at elegosoft.com> > > > Date: Tue, 9 Sep 2008 09:43:03 +0000> > > > Subject: Re: [M3devel] AMD64_LINUX status> > > >> > > >> > > >> > > >> > > > > > -- > Dragi?a Duri? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Oct 31 15:10:05 2008 From: dragisha at m3w.org (=?UTF-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 31 Oct 2008 15:10:05 +0100 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: <1225462205.14482.60.camel@faramir.m3w.org> So, we now have fully functional AMD64_LINUX (_with_ GC)? TIA On Fri, 2008-10-31 at 13:52 +0000, Jay wrote: > Tony, Excellent, thanks, that helps. > How do you know and confirm the right values? I don't like guessing. > > And then cause then of :) : > > Symbol > Pickling font metrics... > Done. > /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk > /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB > stubgen: Processing RemoteView.T > > *** > *** runtime error: > *** NEW() was unable to allocate more memory. > *** file "../src/runtime/common/RTAllocator.m3", line 285 > *** > "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit > 1536: /cm3 > /bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB > --procedure-- -line- -file--- > exec -- > _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmpl > netobjv1 44 /cm3/pkg/netobj/src/netobj.tmpl > netobj 64 /cm3/pkg/netobj/src/netobj.tmpl > include_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile > > 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args > > > I should debug it, and double check that I upgraded what had to be > upgraded. > > - Jay > > > > > From: hosking at cs.purdue.edu > > To: jay.krell at cornell.edu > > Date: Fri, 31 Oct 2008 10:19:51 +0000 > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] AMD64_LINUX status > > > > Umm, I think I found your bug with GC: > > > > Check out "RTMachine.PointerAlignment". You have it set to > > BITSIZE(INTEGER). I suspect what you want is something like > > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > > probably be 2*BYTESIZE(ADDRESS). > > > > > > > > On 30 Oct 2008, at 21:21, Jay wrote: > > > > > > > > Please try this: > > > > > > > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > > > > > > std failed to build because stubgen crashed, probably due to gc. > > > cm3 does crash right away without @M3nogc. > > > > > > Something like this: > > > cd /src > > > wget > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > > > cd /cm3 > > > rm -rf * > > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > > d5.7.0.tar.bz2 > > > cd /src/cm3/scripts/python > > > ./do-cm3-all.py realclean > > > ./upgrade.py > > > ./do-cm3-all.py realclean > > > ./do-cm3-std.py buildship > > > => it will fail, at zeus, but it should get far; you'll also need > > > some X devel packages to get that far, I had a failure for lack > of > > > libXaw for example. I did not run anything, any of the GUI > packages, > > > but building itself with itself is a decent test. > > > > > > I renamed the old AMD64_LINUX archives to "1.0.0". > > > http://www.opencm3.com/uploaded-archives/ > > > > > > This has the bug fix I commited last night to cm3cg, and therefore > a > > > 64 bit hosted cm3cg. > > > > > > jay at amd64a:/cm3/bin$ file * > > > AMD64_LINUX: ASCII text > > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared > libs), > > > for GNU/Linux 2.6.0, not stripped > > > cm3.cfg: ASCII English text > > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Li > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > 2.6.0, not stripped > > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Li > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > 2.6.0, not stripped > > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared > libs), > > > for GNU/Linux 2.6.0, not stripped > > > Unix.common: ASCII English text > > > > > > Built on Debian 4.0r4 (r5 is out). > > > jay at amd64a:/cm3/bin$ uname -a > > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > > x86_64 GNU/Linux > > > jay at amd64a:/cm3/bin$ dmesg | head > > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805) > > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) > (dannf at debian.org > > > ) ( > > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > > Tue Aug 19 04:30:56 UTC 2008 > > > > > > Though really I couldn't do it without Visual C++ on Windows > > > providing excellent find-in-files and editing, nothing else comes > > > close, I edit on Windows and scp the files over. :) > > > > > > - Jay > > > > > > ________________________________ > > > > > > From: jay.krell at cornell.edu > > > To: dragisha at m3w.org; m3devel at elegosoft.com > > > Date: Tue, 9 Sep 2008 09:43:03 +0000 > > > Subject: Re: [M3devel] AMD64_LINUX status > > > > > > > > > > > > > > > -- Dragi?a Duri? From jay.krell at cornell.edu Wed Oct 1 01:24:14 2008 From: jay.krell at cornell.edu (Jay) Date: Tue, 30 Sep 2008 23:24:14 +0000 Subject: [M3devel] ARM Darwin In-Reply-To: <7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org> References: <5ED8E753-6B9E-4FED-8689-1D3D317A5A36@cs.purdue.edu> <7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org> Message-ID: Get me a machine and I'll work on it. :) I'll get one before long but I'm bogged down with existing x86, AMD64, PPC, PPC64 (AIX), Mips (Irix) hardware not yet being used for all its meant.. I suspect Apple hasn't pushed their changes up, so be sure to poke around their gcc source. > Apple are building their own ARM GCC and use that to configure the > back end. Then the runtime issues which I imagine might be with the GC gcc -v ? > and threading. I'm not sure there will be any native treading and I'm > sure VM will look very different. I assume it'll look like most any Posix or *_DARWIN or 32bit thereof system. I assume it has pthreads. - Jay > From: darko at darko.org > To: hosking at cs.purdue.edu > Date: Tue, 30 Sep 2008 14:59:39 +0200 > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] ARM Darwin > > Thanks, it should be a bit easier than the normal process since the > compiler doesn't have to be fully bootstrapped, I just have to get a > cross working. I know the first thing is to get the machine > configuration correct, which I'll start when I get my hands on one of > the machines in a couple of days. The other thing is to work out how > Apple are building their own ARM GCC and use that to configure the > back end. Then the runtime issues which I imagine might be with the GC > and threading. I'm not sure there will be any native treading and I'm > sure VM will look very different. > > > On 30/09/2008, at 2:44 PM, Tony Hosking wrote: > >> I can share tips... >> >> On Sep 30, 2008, at 1:41 PM, Darko wrote: >> >>> Is anyone interested in working on an ARM port for Darwin? Or maybe >>> just providing some tips as I give it a try? >>> >>> Cheers, >>> Darko. >> > From jay.krell at cornell.edu Wed Oct 1 08:41:03 2008 From: jay.krell at cornell.edu (Jay) Date: Wed, 1 Oct 2008 06:41:03 +0000 Subject: [M3devel] AMD-64 binaries? In-Reply-To: <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu> References: <48BDF24B.900@wichita.edu> <20080903075804.zhep2ichmow00scg@mail.elegosoft.com> <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu> Message-ID: No -- you would know best about AMD64_DARWIN. I'm sure ALPHA_OSF used to work, but it's been so long, I don't think it counts. I'm being lazy. file AMD64_DARWIN/cm3cg => fat binary? I doubt it. => with ppc, i386, amd64? (doubt it) => or just ppc, i386? (doubt it) => or just i386? This is I "suspect". => or just AMD64. This would be somewhat interesting. I'm pretty sure cm3cg is always 32bit "these days". I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in the same way. This was a nice thing to find, that the problem is portable to multiple?all 64 bit hosts. I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same problem. Anyway, I should really get to debugging this soon. It's a bit odd because gcc itself doesn't have this bug and I reviewed a lot of the code and it was ok. I'm just going to have to step through it in parallel on 32bit and 64bit hosts and find where they diverge. A LOT was identical, like the files output by cm3 into cm3cg were identical. I was close a few months ago but sloughed off. - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Tue, 30 Sep 2008 10:16:41 +0100> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD-64 binaries?> > 64-bit hosted tools? Do you mean only for Linux? I don't quite > understand what you are saying.> > On Sep 30, 2008, at 9:36 AM, Jay wrote:> > >> > I'm getting back to this now.> > I didn't realize it till this weekend, but that archive is > > "relatively incompatible".> > In particular it has 32bit hosted tools, and won't run on Debian > > 4.0r4 / AMD64.> > Something about glibc 2.4, when all I see on my system is 2.3.> > I'll see what I can do.> > Probably just rebuild cm3cg.> > I think it was built on Fedora, but could have been Ubuntu or > > OpenSuse.> > Probably just that Debian stable lags the others.> >> > The main problem to debug is why 64bit hosted tools "never" work.> > (Right?)> >> >> > Stay tuned for a bunch more ports "soon", I've got a bunch more > > hardware,> > that runs Linux and others (Solaris, AIX, Irix).. :)> >> > I'll be able to debug the high dpi gui problems on a friend's laptop > > soon too.> > Send me a repro. I expect it is trivial -- like anything with a > > scrollbar.> > I can try formsedit, etc.> >> >> > - Jay> >> >> >> Date: Wed, 3 Sep 2008 07:58:04 +0200> >> From: wagner at elegosoft.com> >> To: m3devel at elegosoft.com> >> Subject: Re: [M3devel] AMD-64 binaries?> >>> >> Quoting "Rodney M. Bates" :> >>> >>> Are there binaries for AMD-64 around that can be used> >>> to bootstrap a 64-bit Linux compiler?> >>> >> Have a look at> >>> >> http://www.opencm3.net/uploaded-archives/index.html> >>> >> There are some AMD64 archives; I don't know about their status> >> offhand, though. I think Jay Krell produced them.> >> AFAIK there is no regular build on this platform yet.> >>> >> Olaf> >> --> >> Olaf Wagner -- elego Software Solutions GmbH> >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany> >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 > >> 45 86 95> >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: > >> Berlin> >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: > >> DE163214194> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Wed Oct 1 09:02:29 2008 From: jay.krell at cornell.edu (Jay) Date: Wed, 1 Oct 2008 07:02:29 +0000 Subject: [M3devel] m3cc build fails on older MacOS X In-Reply-To: <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org> References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com> <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org> Message-ID: well, I agree and disagree. "Almost everyone" only cares about C++, C#, Windows, and a little bit of Linux and Java. "Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc. Supporting 10.2 and 10.3 "ought not" be so difficult, but ok. I wiped out the install and won't likely come back to it until a bunch of other things are done. e.g.: debug 64 bit hosted cm3cg move PPC_LINUX to pthreads high dpi bring up or backup a bunch of targets I have hardware for, and some others I don't have yet. Adding back support for NT4/Win9x probably not hard, though similar with gcc on Mac, the current Microsoft tools no longer target them. It all gets easier with virtualization.. (Which is easiest on x86/amd64.) - Jay > From: darko at darko.org > To: hosking at cs.purdue.edu > Date: Tue, 30 Sep 2008 11:50:43 +0200 > CC: m3devel at elegosoft.com; jay.krell at cornell.edu > Subject: Re: [M3devel] m3cc build fails on older MacOS X > > I think supporting the latest version is enough work. I don't see the > point of supporting older releases. Also, this seems to be relevant to > development on that version of the system. Anyone who wants to build > can upgrade. > > > On 30/09/2008, at 11:15 AM, Tony Hosking wrote: > >> Does anyone really care about 10.3 now? As I recall, it had some >> pretty broken assumptions. >> >> On Sep 30, 2008, at 9:25 AM, Jay wrote: >> >>> >>> I have a machine running 10.3 now. >>> >>> gcc-4.3.2 (the current release) won't (toplevel) configure on >>> MacOSX 10.3 apparently because its assembler doesn't support >>> ".machine". >>> Current "cctools" won't compile on 10.3 without patches or other >>> updates, due to mucking with ppc64 stuff, though that is easy to fix. >>> >>> A simple wrapper around as for use on 10.3 that strips the .machine >>> directive is probably reasonable, or a patch to gcc to just not >>> emit it for Darwin, except maybe for non-ppc, or subject to a switch. >>> >>> Other than support for more architectures, I never found any of the >>> updates beyond 10.2 very interesting. >>> Though current Firefox and Safari also won't run on 10.3. >>> >>> IF I get this working, maybe I'll bring 10.2 back up also.. >>> >>> - Jay >>> >>> ________________________________ >>> >>> From: jayk123 at hotmail.com >>> To: wagner at elegosoft.com; m3devel at elegosoft.com >>> Subject: RE: [M3devel] m3cc build fails on older MacOS X >>> Date: Tue, 6 May 2008 10:49:11 +0000 >>> >>> >>> >>> >>> I don't know what these Darwin versions are. >>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5? >>> I used to run 10.2 and could perhaps bring it back (though I'd hate >>> to lose my PPC_LINUX install.. :( ) >>> >>>> make[2]: Nothing to be done for `all'. >>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>> `patsubst'. Stop. >>> >>> Hopefully that's enough context though. >>> >>> The rest is a cascade. >>> What happens if you remove all my m3makefile wierdness (which works >>> everywhere else..) and just configure and make? >>> >>> Can I ssh into this? >>> >>> - Jay >>> >>> >>> >>> ________________________________ >>> >>> >>>> Date: Tue, 6 May 2008 07:57:54 +0200 >>>> From: wagner at elegosoft.com >>>> To: m3devel at elegosoft.com >>>> Subject: [M3devel] m3cc build fails on older MacOS X >>>> >>>> On % uname -a >>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30 >>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power >>>> Macintosh powerpc: >>>> >>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o >>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o >>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./fnmatch.o >>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./getruntime.o >>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o >>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./obstack.o >>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o >>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o >>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o >>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o >>>> ./xstrndup.o> required-list >>>> make[2]: Nothing to be done for `all'. >>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>> `patsubst'. Stop. >>>> make: *** [all-libcpp] Error 2 >>>> /bin/sh: line 1: cd: gcc: No such file or directory >>>> make: *** No rule to make target `s-modes'. Stop. >>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314: quake >>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2 >>>> >>>> --procedure-- -line- -file--- >>>> cp_if -- >>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>> 9 >>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args >>>> >>>> Fatal Error: package build failed >>>> ==> m3-sys/m3cc done >>>> >>>> Any ideas? >>>> >>>> Olaf >>>> -- >>>> Olaf Wagner -- elego Software Solutions GmbH >>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany >>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 >>>> 45 86 95 >>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: >>>> Berlin >>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: >>>> DE163214194 >>>> >>> >> > From darko at darko.org Wed Oct 1 09:10:35 2008 From: darko at darko.org (Darko) Date: Wed, 1 Oct 2008 09:10:35 +0200 Subject: [M3devel] m3cc build fails on older MacOS X In-Reply-To: References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com> <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org> Message-ID: <973F196C-4B4A-4526-878C-93942E48E72A@darko.org> Why bother with it if no one uses it and no-one is going to use it? Supporting M3 on Macs is good because people will use it into the future. People aren't moving back to 10.3. I wouldn't bother with it at all. On 01/10/2008, at 9:02 AM, Jay wrote: > > well, I agree and disagree. > > "Almost everyone" only cares about C++, C#, Windows, and a little > bit of Linux and Java. > "Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc. > > Supporting 10.2 and 10.3 "ought not" be so difficult, but ok. > > I wiped out the install and won't likely come back to it until > a bunch of other things are done. > e.g.: > debug 64 bit hosted cm3cg > move PPC_LINUX to pthreads > high dpi > bring up or backup a bunch of targets I have hardware for, > and some others I don't have yet. > > Adding back support for NT4/Win9x probably not hard, though > similar with gcc on Mac, the current Microsoft tools no longer > target them. > > It all gets easier with virtualization.. > (Which is easiest on x86/amd64.) > > - Jay > > > >> From: darko at darko.org >> To: hosking at cs.purdue.edu >> Date: Tue, 30 Sep 2008 11:50:43 +0200 >> CC: m3devel at elegosoft.com; jay.krell at cornell.edu >> Subject: Re: [M3devel] m3cc build fails on older MacOS X >> >> I think supporting the latest version is enough work. I don't see the >> point of supporting older releases. Also, this seems to be relevant >> to >> development on that version of the system. Anyone who wants to build >> can upgrade. >> >> >> On 30/09/2008, at 11:15 AM, Tony Hosking wrote: >> >>> Does anyone really care about 10.3 now? As I recall, it had some >>> pretty broken assumptions. >>> >>> On Sep 30, 2008, at 9:25 AM, Jay wrote: >>> >>>> >>>> I have a machine running 10.3 now. >>>> >>>> gcc-4.3.2 (the current release) won't (toplevel) configure on >>>> MacOSX 10.3 apparently because its assembler doesn't support >>>> ".machine". >>>> Current "cctools" won't compile on 10.3 without patches or other >>>> updates, due to mucking with ppc64 stuff, though that is easy to >>>> fix. >>>> >>>> A simple wrapper around as for use on 10.3 that strips the .machine >>>> directive is probably reasonable, or a patch to gcc to just not >>>> emit it for Darwin, except maybe for non-ppc, or subject to a >>>> switch. >>>> >>>> Other than support for more architectures, I never found any of the >>>> updates beyond 10.2 very interesting. >>>> Though current Firefox and Safari also won't run on 10.3. >>>> >>>> IF I get this working, maybe I'll bring 10.2 back up also.. >>>> >>>> - Jay >>>> >>>> ________________________________ >>>> >>>> From: jayk123 at hotmail.com >>>> To: wagner at elegosoft.com; m3devel at elegosoft.com >>>> Subject: RE: [M3devel] m3cc build fails on older MacOS X >>>> Date: Tue, 6 May 2008 10:49:11 +0000 >>>> >>>> >>>> >>>> >>>> I don't know what these Darwin versions are. >>>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5? >>>> I used to run 10.2 and could perhaps bring it back (though I'd hate >>>> to lose my PPC_LINUX install.. :( ) >>>> >>>>> make[2]: Nothing to be done for `all'. >>>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>>> `patsubst'. Stop. >>>> >>>> Hopefully that's enough context though. >>>> >>>> The rest is a cascade. >>>> What happens if you remove all my m3makefile wierdness (which works >>>> everywhere else..) and just configure and make? >>>> >>>> Can I ssh into this? >>>> >>>> - Jay >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> >>>>> Date: Tue, 6 May 2008 07:57:54 +0200 >>>>> From: wagner at elegosoft.com >>>>> To: m3devel at elegosoft.com >>>>> Subject: [M3devel] m3cc build fails on older MacOS X >>>>> >>>>> On % uname -a >>>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30 >>>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power >>>>> Macintosh powerpc: >>>>> >>>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o >>>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o >>>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./ >>>>> fnmatch.o >>>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./ >>>>> getruntime.o >>>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o >>>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./ >>>>> obstack.o >>>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o >>>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o >>>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o >>>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o >>>>> ./xstrndup.o> required-list >>>>> make[2]: Nothing to be done for `all'. >>>>> Makefile:191: *** Insufficient number of arguments (2) to function >>>>> `patsubst'. Stop. >>>>> make: *** [all-libcpp] Error 2 >>>>> /bin/sh: line 1: cd: gcc: No such file or directory >>>>> make: *** No rule to make target `s-modes'. Stop. >>>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314: >>>>> quake >>>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2 >>>>> >>>>> --procedure-- -line- -file--- >>>>> cp_if -- >>>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile >>>>> 9 >>>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args >>>>> >>>>> Fatal Error: package build failed >>>>> ==> m3-sys/m3cc done >>>>> >>>>> Any ideas? >>>>> >>>>> Olaf >>>>> -- >>>>> Olaf Wagner -- elego Software Solutions GmbH >>>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany >>>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 >>>>> 45 86 95 >>>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: >>>>> Berlin >>>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: >>>>> DE163214194 >>>>> >>>> >>> >> From darko at darko.org Wed Oct 1 12:03:15 2008 From: darko at darko.org (Darko) Date: Wed, 1 Oct 2008 12:03:15 +0200 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu> Message-ID: I've extended one of the modules with a function that formats any allocated value for printing. If you're interested I can clean them up a little and post them. On 28/09/2008, at 8:01 AM, Darko wrote: > As far as I know, yes, they're not in the binary. I'd love to be > proven wrong though, or fix it so they did. I have a module that > reads the .M3WEB file and maps it to types and a module that will > read and write any field within a type safely using a numeric index. > Neither is perfect. You can integrate the two to get what you want > but I seem to remember having some problems mapping type ids (UIDs?) > to typecodes at runtime. > > > On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: > >> Right, I am aware of those interfaces.. just wondering what was >> out there. Do I really need to look at .M3WEB? I thought >> that m3gdb could figure out things without anything outside >> of the binary... >> >> I'm looking for essentially what m3gdb offers, say prints >> at minimum the name of the type (this I recall is trivial with >> some of the RT* interfaces) but hopefully also with field names >> and values, but doesn't expand references recursively.. something >> like that? >> >> Mika >> >> Darko writes: >>> You can use RTTipe to read the fields and values within a type. If >>> you >>> also want the type and field names you can interpret the .M3WEB >>> file. >>> I have a couple of modules that do something like that but they are >>> not what you would call finished. What level of detail are you >>> after? >>> >>> >>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> I am working on a writing an interpreter that I'd like to embed in >>>> various Modula-3 programs. It so happens that this interpreter >>>> might from time to time be manipulating arbitrary M3 REFs, and just >>>> from the point of view of providing information to a human user, >>>> it might be nice to be able to pretty-print these. Does anyone >>>> have any code that accomplishes this, at least partly? I'm >>>> thinking >>>> that since m3gdb can do it, the information must all be in the >>>> binary---somehow. (Even enumeration names, right?) And since the >>>> pickler can pickle things... hmm. >>>> >>>> I would greatly appreciate any guidance that's out there... >>>> >>>> Best regards, >>>> Mika Nystrom > From hosking at cs.purdue.edu Wed Oct 1 11:59:23 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Wed, 1 Oct 2008 10:59:23 +0100 Subject: [M3devel] AMD-64 binaries? In-Reply-To: References: <48BDF24B.900@wichita.edu> <20080903075804.zhep2ichmow00scg@mail.elegosoft.com> <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu> Message-ID: <26766FFA-C3B6-45D2-8156-80FD14922882@cs.purdue.edu> I can definitely vouch for ALPHA_OSF having worked as recently as two years ago, but without the pthreads native threading system. That port should have been easy enough I suspect. On Oct 1, 2008, at 7:41 AM, Jay wrote: > No -- you would know best about AMD64_DARWIN. > I'm sure ALPHA_OSF used to work, but it's been so long, I don't > think it counts. > > I'm being lazy. > > file AMD64_DARWIN/cm3cg > => fat binary? I doubt it. > => with ppc, i386, amd64? (doubt it) > => or just ppc, i386? (doubt it) > => or just i386? This is I "suspect". > => or just AMD64. This would be somewhat interesting. I believe that is how I configured it. > I'm pretty sure cm3cg is always 32bit "these days". Nope, cm3cg on AMD64_DARWIN is 64-bit. > I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in > the same way. > This was a nice thing to find, that the problem is portable to > multiple?all 64 bit hosts. > > I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same > problem. Don't thinks so. > Anyway, I should really get to debugging this soon. > > It's a bit odd because gcc itself doesn't have this bug and I > reviewed a lot of the code and it was ok. I'm just going to have to > step through it in parallel on 32bit and 64bit hosts and find where > they diverge. A LOT was identical, like the files output by cm3 into > cm3cg were identical. Yes, the intermediate code should be identical. Any such problems would be with cm3cg. > I was close a few months ago but sloughed off. Good luck. > > > - Jay > > > > From: hosking at cs.purdue.edu > > To: jay.krell at cornell.edu > > Date: Tue, 30 Sep 2008 10:16:41 +0100 > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] AMD-64 binaries? > > > > 64-bit hosted tools? Do you mean only for Linux? I don't quite > > understand what you are saying. > > > > On Sep 30, 2008, at 9:36 AM, Jay wrote: > > > > > > > > I'm getting back to this now. > > > I didn't realize it till this weekend, but that archive is > > > "relatively incompatible". > > > In particular it has 32bit hosted tools, and won't run on Debian > > > 4.0r4 / AMD64. > > > Something about glibc 2.4, when all I see on my system is 2.3. > > > I'll see what I can do. > > > Probably just rebuild cm3cg. > > > I think it was built on Fedora, but could have been Ubuntu or > > > OpenSuse. > > > Probably just that Debian stable lags the others. > > > > > > The main problem to debug is why 64bit hosted tools "never" work. > > > (Right?) > > > > > > > > > Stay tuned for a bunch more ports "soon", I've got a bunch more > > > hardware, > > > that runs Linux and others (Solaris, AIX, Irix).. :) > > > > > > I'll be able to debug the high dpi gui problems on a friend's > laptop > > > soon too. > > > Send me a repro. I expect it is trivial -- like anything with a > > > scrollbar. > > > I can try formsedit, etc. > > > > > > > > > - Jay > > > > > > > > >> Date: Wed, 3 Sep 2008 07:58:04 +0200 > > >> From: wagner at elegosoft.com > > >> To: m3devel at elegosoft.com > > >> Subject: Re: [M3devel] AMD-64 binaries? > > >> > > >> Quoting "Rodney M. Bates" : > > >> > > >>> Are there binaries for AMD-64 around that can be used > > >>> to bootstrap a 64-bit Linux compiler? > > >> > > >> Have a look at > > >> > > >> http://www.opencm3.net/uploaded-archives/index.html > > >> > > >> There are some AMD64 archives; I don't know about their status > > >> offhand, though. I think Jay Krell produced them. > > >> AFAIK there is no regular build on this platform yet. > > >> > > >> Olaf > > >> -- > > >> Olaf Wagner -- elego Software Solutions GmbH > > >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany > > >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 > > >> 45 86 95 > > >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: > > >> Berlin > > >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: > > >> DE163214194 > > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Wed Oct 1 12:07:00 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Wed, 1 Oct 2008 11:07:00 +0100 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu> Message-ID: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> m3gdb makes use of stabs debug information spat out by the backend. They are only in the binary if compiled -g. There are other ways to get what you are after, as Darko has observed. On Oct 1, 2008, at 11:03 AM, Darko wrote: > I've extended one of the modules with a function that formats any > allocated value for printing. If you're interested I can clean them > up a little and post them. > > > On 28/09/2008, at 8:01 AM, Darko wrote: > >> As far as I know, yes, they're not in the binary. I'd love to be >> proven wrong though, or fix it so they did. I have a module that >> reads the .M3WEB file and maps it to types and a module that will >> read and write any field within a type safely using a numeric >> index. Neither is perfect. You can integrate the two to get what >> you want but I seem to remember having some problems mapping type >> ids (UIDs?) to typecodes at runtime. >> >> >> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >> >>> Right, I am aware of those interfaces.. just wondering what was >>> out there. Do I really need to look at .M3WEB? I thought >>> that m3gdb could figure out things without anything outside >>> of the binary... >>> >>> I'm looking for essentially what m3gdb offers, say prints >>> at minimum the name of the type (this I recall is trivial with >>> some of the RT* interfaces) but hopefully also with field names >>> and values, but doesn't expand references recursively.. something >>> like that? >>> >>> Mika >>> >>> Darko writes: >>>> You can use RTTipe to read the fields and values within a type. >>>> If you >>>> also want the type and field names you can interpret the .M3WEB >>>> file. >>>> I have a couple of modules that do something like that but they are >>>> not what you would call finished. What level of detail are you >>>> after? >>>> >>>> >>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> I am working on a writing an interpreter that I'd like to embed in >>>>> various Modula-3 programs. It so happens that this interpreter >>>>> might from time to time be manipulating arbitrary M3 REFs, and >>>>> just >>>>> from the point of view of providing information to a human user, >>>>> it might be nice to be able to pretty-print these. Does anyone >>>>> have any code that accomplishes this, at least partly? I'm >>>>> thinking >>>>> that since m3gdb can do it, the information must all be in the >>>>> binary---somehow. (Even enumeration names, right?) And since the >>>>> pickler can pickle things... hmm. >>>>> >>>>> I would greatly appreciate any guidance that's out there... >>>>> >>>>> Best regards, >>>>> Mika Nystrom >> From darko at darko.org Wed Oct 1 12:35:09 2008 From: darko at darko.org (Darko) Date: Wed, 1 Oct 2008 12:35:09 +0200 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu> <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> Message-ID: Here's some info on the stabs format: http://www.cs.utah.edu/dept/old/texinfo/gdb/stabs_toc.html On 01/10/2008, at 12:07 PM, Tony Hosking wrote: > m3gdb makes use of stabs debug information spat out by the backend. > They are only in the binary if compiled -g. There are other ways to > get what you are after, as Darko has observed. > > On Oct 1, 2008, at 11:03 AM, Darko wrote: > >> I've extended one of the modules with a function that formats any >> allocated value for printing. If you're interested I can clean them >> up a little and post them. >> >> >> On 28/09/2008, at 8:01 AM, Darko wrote: >> >>> As far as I know, yes, they're not in the binary. I'd love to be >>> proven wrong though, or fix it so they did. I have a module that >>> reads the .M3WEB file and maps it to types and a module that will >>> read and write any field within a type safely using a numeric >>> index. Neither is perfect. You can integrate the two to get what >>> you want but I seem to remember having some problems mapping type >>> ids (UIDs?) to typecodes at runtime. >>> >>> >>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >>> >>>> Right, I am aware of those interfaces.. just wondering what was >>>> out there. Do I really need to look at .M3WEB? I thought >>>> that m3gdb could figure out things without anything outside >>>> of the binary... >>>> >>>> I'm looking for essentially what m3gdb offers, say prints >>>> at minimum the name of the type (this I recall is trivial with >>>> some of the RT* interfaces) but hopefully also with field names >>>> and values, but doesn't expand references recursively.. something >>>> like that? >>>> >>>> Mika >>>> >>>> Darko writes: >>>>> You can use RTTipe to read the fields and values within a type. >>>>> If you >>>>> also want the type and field names you can interpret the .M3WEB >>>>> file. >>>>> I have a couple of modules that do something like that but they >>>>> are >>>>> not what you would call finished. What level of detail are you >>>>> after? >>>>> >>>>> >>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>>> >>>>>> Hello Modula-3 people, >>>>>> >>>>>> I am working on a writing an interpreter that I'd like to embed >>>>>> in >>>>>> various Modula-3 programs. It so happens that this interpreter >>>>>> might from time to time be manipulating arbitrary M3 REFs, and >>>>>> just >>>>>> from the point of view of providing information to a human user, >>>>>> it might be nice to be able to pretty-print these. Does anyone >>>>>> have any code that accomplishes this, at least partly? I'm >>>>>> thinking >>>>>> that since m3gdb can do it, the information must all be in the >>>>>> binary---somehow. (Even enumeration names, right?) And since >>>>>> the >>>>>> pickler can pickle things... hmm. >>>>>> >>>>>> I would greatly appreciate any guidance that's out there... >>>>>> >>>>>> Best regards, >>>>>> Mika Nystrom >>> > From mika at async.caltech.edu Wed Oct 1 20:09:58 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Wed, 01 Oct 2008 11:09:58 -0700 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: Your message of "Wed, 01 Oct 2008 12:03:15 +0200." Message-ID: <200810011809.m91I9wxY087739@camembert.async.caltech.edu> Oh, I'd love to give it a try! I'm a little surprised no one has chimed in on the question of whether you really need .M3WEB... I could swear I can get good symbolic debugging with m3gdb on just a binary... Mika Darko writes: >I've extended one of the modules with a function that formats any >allocated value for printing. If you're interested I can clean them up >a little and post them. > > >On 28/09/2008, at 8:01 AM, Darko wrote: > >> As far as I know, yes, they're not in the binary. I'd love to be >> proven wrong though, or fix it so they did. I have a module that >> reads the .M3WEB file and maps it to types and a module that will >> read and write any field within a type safely using a numeric index. >> Neither is perfect. You can integrate the two to get what you want >> but I seem to remember having some problems mapping type ids (UIDs?) >> to typecodes at runtime. >> >> >> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >> >>> Right, I am aware of those interfaces.. just wondering what was >>> out there. Do I really need to look at .M3WEB? I thought >>> that m3gdb could figure out things without anything outside >>> of the binary... >>> >>> I'm looking for essentially what m3gdb offers, say prints >>> at minimum the name of the type (this I recall is trivial with >>> some of the RT* interfaces) but hopefully also with field names >>> and values, but doesn't expand references recursively.. something >>> like that? >>> >>> Mika >>> >>> Darko writes: >>>> You can use RTTipe to read the fields and values within a type. If >>>> you >>>> also want the type and field names you can interpret the .M3WEB >>>> file. >>>> I have a couple of modules that do something like that but they are >>>> not what you would call finished. What level of detail are you >>>> after? >>>> >>>> >>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> I am working on a writing an interpreter that I'd like to embed in >>>>> various Modula-3 programs. It so happens that this interpreter >>>>> might from time to time be manipulating arbitrary M3 REFs, and just >>>>> from the point of view of providing information to a human user, >>>>> it might be nice to be able to pretty-print these. Does anyone >>>>> have any code that accomplishes this, at least partly? I'm >>>>> thinking >>>>> that since m3gdb can do it, the information must all be in the >>>>> binary---somehow. (Even enumeration names, right?) And since the >>>>> pickler can pickle things... hmm. >>>>> >>>>> I would greatly appreciate any guidance that's out there... >>>>> >>>>> Best regards, >>>>> Mika Nystrom >> From mika at async.caltech.edu Wed Oct 1 20:10:38 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Wed, 01 Oct 2008 11:10:38 -0700 Subject: [M3devel] Pretty-printing REFANYs? In-Reply-To: Your message of "Wed, 01 Oct 2008 11:07:00 BST." <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> Message-ID: <200810011810.m91IAcDW087832@camembert.async.caltech.edu> Ok, ignore my previous email :-) Tony Hosking writes: >m3gdb makes use of stabs debug information spat out by the backend. >They are only in the binary if compiled -g. There are other ways to >get what you are after, as Darko has observed. > >On Oct 1, 2008, at 11:03 AM, Darko wrote: > >> I've extended one of the modules with a function that formats any >> allocated value for printing. If you're interested I can clean them >> up a little and post them. >> >> >> On 28/09/2008, at 8:01 AM, Darko wrote: >> >>> As far as I know, yes, they're not in the binary. I'd love to be >>> proven wrong though, or fix it so they did. I have a module that >>> reads the .M3WEB file and maps it to types and a module that will >>> read and write any field within a type safely using a numeric >>> index. Neither is perfect. You can integrate the two to get what >>> you want but I seem to remember having some problems mapping type >>> ids (UIDs?) to typecodes at runtime. >>> >>> >>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote: >>> >>>> Right, I am aware of those interfaces.. just wondering what was >>>> out there. Do I really need to look at .M3WEB? I thought >>>> that m3gdb could figure out things without anything outside >>>> of the binary... >>>> >>>> I'm looking for essentially what m3gdb offers, say prints >>>> at minimum the name of the type (this I recall is trivial with >>>> some of the RT* interfaces) but hopefully also with field names >>>> and values, but doesn't expand references recursively.. something >>>> like that? >>>> >>>> Mika >>>> >>>> Darko writes: >>>>> You can use RTTipe to read the fields and values within a type. >>>>> If you >>>>> also want the type and field names you can interpret the .M3WEB >>>>> file. >>>>> I have a couple of modules that do something like that but they are >>>>> not what you would call finished. What level of detail are you >>>>> after? >>>>> >>>>> >>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote: >>>>> >>>>>> Hello Modula-3 people, >>>>>> >>>>>> I am working on a writing an interpreter that I'd like to embed in >>>>>> various Modula-3 programs. It so happens that this interpreter >>>>>> might from time to time be manipulating arbitrary M3 REFs, and >>>>>> just >>>>>> from the point of view of providing information to a human user, >>>>>> it might be nice to be able to pretty-print these. Does anyone >>>>>> have any code that accomplishes this, at least partly? I'm >>>>>> thinking >>>>>> that since m3gdb can do it, the information must all be in the >>>>>> binary---somehow. (Even enumeration names, right?) And since the >>>>>> pickler can pickle things... hmm. >>>>>> >>>>>> I would greatly appreciate any guidance that's out there... >>>>>> >>>>>> Best regards, >>>>>> Mika Nystrom >>> From jay.krell at cornell.edu Sun Oct 12 11:51:03 2008 From: jay.krell at cornell.edu (Jay) Date: Sun, 12 Oct 2008 09:51:03 +0000 Subject: [M3devel] a bunch of new/old platform names? Message-ID: I plan on soon bringing "back" some old ports -- building current archives -- and bring up some new ports. Specifically I have hardware: RS/6000 (PPC64/AIX), SGI (MIPS), SPARC64, plus the usual x86/AMD64. Two of the platforms did exist. In particular, "MIPS_IRIX" is "IRIX5". Reuse IRIX5, or introduce MIPS_IRIX? PPC_AIX is IBMR2 or such. Same question. Also, must versions really be in platform names? I'm loathe to add a third dimension to the matrix. I did just note that FreeBSD 7.0 64 bit is ABI-incompatible with FreeBSD 6.3 64 bit, lame. SGI claims good ABI across all the 6.5 releases, which is all there will be now. IBM claims good 32 bit ABI compat across AIX 4.x - 6.x and good 64 bit ABI compat across 5.x and 6.x, but incompatibility from 64 bit 4.x. (Microsoft has always been good here, but "behavioral" compat is the actual tricky issue.) And, what do folks think about putting "32" in new 32 bit platform names? I'm considering the following: MIPS32_{IRIX,LINUX,OPENBSD,NETBSD} MIPS64_IRIX (6.5) SPARC{32,64}_{LINUX,*BSD}(probably no SPARC32_*BSD actually, and SPARC32_LINUX is already in, but not building regularly) {SPARC64,I386,AMD64}_SOLARIS PPC{32,64}_AIX (PPC64_LINUX is blocked, Linux has problems booting on the hardware and I have no Mac G5 yet). AMD64_*BSD Also, maybe some of the code should be restructured to separate processor from OS? That might be primarily only pointer size. Any interest in "x86" instead of "I386"? If I make good progress against those 18 (!), I can see about PPC64_DARWIN, HPPA_*, IA64_*, ALPHA_*, ARM_*, which I lack hardware for. PPC_LINUX also should be converted to pthreads imho. Mostly this is all just a matter of installing the OS and configuring gcc. And, yeah, I have the two m3cgs stepping side by side to find the problem there, and will have use of a high dpi Windows laptop for that other problem.. And then of course, if the vast majority of platforms are named like that, there might be pressure to bring the rest in line. :) I386_{NT,LINUX,*BSD,CYGWIN,MINGWIN} - Jay From mika at async.caltech.edu Fri Oct 17 00:32:39 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 15:32:39 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? Message-ID: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu> Hello Modula-3 people, As I mentioned in an earlier email about printing structures (thanks Darko), I'm in the midst of coding an interpreter embedded in Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's JScheme for Java (well it was at first strongly based, but more and more loosely, if you know what I mean...) I expected that the performance of the interpreter would be much better in Modula-3 than in Java, and I have been testing on two different systems. One is my ancient FreeBSD-4.11 with an old PM3, and the other is CM3 on a recent Debian system. What I am finding is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting close to ten times as fast on some tasks at this point), but on Linux/CM3 it is much closer in speed to JScheme than I would like. When I started, with code that was essentially equivalent to JScheme, I found that it was a bit slower than JScheme on Linux/CM3 and possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to spend most of its time in (surprise, surprise!) memory allocation and garbage collection. The speedup I have achieved between the first implementation and now was due to the use of Modula-3 constructs that are superior to Java's, such as the use of arrays of RECORDs to make small stacks rather than linked lists. (I get readable code with much fewer memory allocations and GC work.) Now, since this is an interpreter, I as the implementer have limited control over how much memory is allocated and freed, and where it is needed. However, I can sometimes fall back on C-style memory management, but I would like to do it in a safe way. For instance, I have special-cased evaluation of Scheme primitives, as follows. Under the "normal" implementation, a list of things to evaluate is built up, passed to an evaluation function, and then the GC is left to sweep up the mess. The problem is that there are various tricky routes by which references can escape the evaluator, so you can't just assume that what you put in is going to be dead right after an eval and free it. Instead, I set a flag in the evaluator, which is TRUE if it is OK to free the list after the eval and FALSE if it's unclear (in which case the problem is left up to the GC). For the vast majority of Scheme primitives, one can indeed free the list right after the eval. Now of course I am not interested in unsafe code, so what I do is this: TYPE Pair = OBJECT first, rest : REFANY; END; VAR mu := NEW(MUTEX); free : Pair := NIL; PROCEDURE GetPair() : Pair = BEGIN LOCK mu DO IF free # NIL THEN TRY RETURN free FINALLY free := free.rest END END END; RETURN NEW(Pair) END GetPair; PROCEDURE ReturnPair(cons : Pair) = BEGIN cons.first := NIL; LOCK mu DO cons.rest := free; free := cons END END ReturnPair; my eval code looks like VAR okToFree : BOOLEAN; BEGIN args := GetPair(); ... result := EvalPrimitive(args, (*VAR OUT*) okToFree); IF okToFree THEN ReturnPair(args) END; RETURN result END and this does work well. In fact it speeds up the Linux implementation by almost 100% to recycle the lists like this *just* for the evaluation of Scheme primitives. But it's still ugly, isn't it? There's a mutex, and a global variable. And yes, the time spent messing with the mutex is noticeable, and I haven't even made the code multi-threaded yet (and that is coming!) So I'm thinking, what I really want is a structure that is attached to my current Thread.T. I want to be able to access just a single pointer (like the free list) but be sure it is unique to my current thread. No locking would be necessary if I could do this. Does anyone have an elegant solution that does something like this? Thread-specific "static" variables? Just one REFANY would be enough for a lot of uses... seems to me this should be a frequently occurring problem? Best regards, Mika From hosking at cs.purdue.edu Fri Oct 17 00:54:51 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Thu, 16 Oct 2008 23:54:51 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu> References: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu> Message-ID: Have you tried running @M3noincremental? On 16 Oct 2008, at 23:32, Mika Nystrom wrote: > Hello Modula-3 people, > > As I mentioned in an earlier email about printing structures (thanks > Darko), I'm in the midst of coding an interpreter embedded in > Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's > JScheme for Java (well it was at first strongly based, but more and > more loosely, if you know what I mean...) > > I expected that the performance of the interpreter would be much > better in Modula-3 than in Java, and I have been testing on two > different systems. One is my ancient FreeBSD-4.11 with an old PM3, > and the other is CM3 on a recent Debian system. What I am finding > is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting > close to ten times as fast on some tasks at this point), but on > Linux/CM3 it is much closer in speed to JScheme than I would like. > > When I started, with code that was essentially equivalent to JScheme, > I found that it was a bit slower than JScheme on Linux/CM3 and > possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to > spend most of its time in (surprise, surprise!) memory allocation > and garbage collection. The speedup I have achieved between the > first implementation and now was due to the use of Modula-3 constructs > that are superior to Java's, such as the use of arrays of RECORDs > to make small stacks rather than linked lists. (I get readable > code with much fewer memory allocations and GC work.) > > Now, since this is an interpreter, I as the implementer have limited > control over how much memory is allocated and freed, and where it is > needed. However, I can sometimes fall back on C-style memory > management, > but I would like to do it in a safe way. For instance, I have > special-cased > evaluation of Scheme primitives, as follows. > > Under the "normal" implementation, a list of things to evaluate is > built up, passed to an evaluation function, and then the GC is left > to sweep up the mess. The problem is that there are various tricky > routes by which references can escape the evaluator, so you can't > just assume that what you put in is going to be dead right after > an eval and free it. Instead, I set a flag in the evaluator, which > is TRUE if it is OK to free the list after the eval and FALSE if > it's unclear (in which case the problem is left up to the GC). > > For the vast majority of Scheme primitives, one can indeed free the > list right after the eval. Now of course I am not interested > in unsafe code, so what I do is this: > > TYPE Pair = OBJECT first, rest : REFANY; END; > > VAR > mu := NEW(MUTEX); > free : Pair := NIL; > > PROCEDURE GetPair() : Pair = > BEGIN > LOCK mu DO > IF free # NIL THEN > TRY > RETURN free > FINALLY > free := free.rest > END > END > END; > RETURN NEW(Pair) > END GetPair; > > PROCEDURE ReturnPair(cons : Pair) = > BEGIN > cons.first := NIL; > LOCK mu DO > cons.rest := free; > free := cons > END > END ReturnPair; > > my eval code looks like > > VAR okToFree : BOOLEAN; BEGIN > > args := GetPair(); ... > result := EvalPrimitive(args, (*VAR OUT*) okToFree); > > IF okToFree THEN ReturnPair(args) END; > RETURN result > END > > and this does work well. In fact it speeds up the Linux > implementation > by almost 100% to recycle the lists like this *just* for the > evaluation of Scheme primitives. > > But it's still ugly, isn't it? There's a mutex, and a global > variable. And yes, the time spent messing with the mutex is > noticeable, and I haven't even made the code multi-threaded yet > (and that is coming!) > > So I'm thinking, what I really want is a structure that is attached > to my current Thread.T. I want to be able to access just a single > pointer (like the free list) but be sure it is unique to my current > thread. No locking would be necessary if I could do this. > > Does anyone have an elegant solution that does something like this? > Thread-specific "static" variables? Just one REFANY would be enough > for a lot of uses... seems to me this should be a frequently > occurring problem? > > Best regards, > Mika > > > > > > From mika at async.caltech.edu Fri Oct 17 01:30:01 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 16:30:01 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Thu, 16 Oct 2008 23:54:51 BST." Message-ID: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> Hi Tony, I figured you would chime in! Yes, @M3noincremental seems to make things consistently a tad bit slower (but a very small difference), on both FreeBSD and Linux. @M3nogc makes a bigger difference, of course. Unfortunately I seem to have lost the code that did a lot of memory allocations. My tricks (as described in the email---and others!) have removed most of the troublesome memory allocations, but now I'm stuck with the mutex instead... Mika Tony Hosking writes: >Have you tried running @M3noincremental? > >On 16 Oct 2008, at 23:32, Mika Nystrom wrote: > >> Hello Modula-3 people, >> >> As I mentioned in an earlier email about printing structures (thanks >> Darko), I'm in the midst of coding an interpreter embedded in >> Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's >> JScheme for Java (well it was at first strongly based, but more and >> more loosely, if you know what I mean...) >> >> I expected that the performance of the interpreter would be much >> better in Modula-3 than in Java, and I have been testing on two >> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >> and the other is CM3 on a recent Debian system. What I am finding >> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting >> close to ten times as fast on some tasks at this point), but on >> Linux/CM3 it is much closer in speed to JScheme than I would like. >> >> When I started, with code that was essentially equivalent to JScheme, >> I found that it was a bit slower than JScheme on Linux/CM3 and >> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >> spend most of its time in (surprise, surprise!) memory allocation >> and garbage collection. The speedup I have achieved between the >> first implementation and now was due to the use of Modula-3 constructs >> that are superior to Java's, such as the use of arrays of RECORDs >> to make small stacks rather than linked lists. (I get readable >> code with much fewer memory allocations and GC work.) >> >> Now, since this is an interpreter, I as the implementer have limited >> control over how much memory is allocated and freed, and where it is >> needed. However, I can sometimes fall back on C-style memory >> management, >> but I would like to do it in a safe way. For instance, I have >> special-cased >> evaluation of Scheme primitives, as follows. >> >> Under the "normal" implementation, a list of things to evaluate is >> built up, passed to an evaluation function, and then the GC is left >> to sweep up the mess. The problem is that there are various tricky >> routes by which references can escape the evaluator, so you can't >> just assume that what you put in is going to be dead right after >> an eval and free it. Instead, I set a flag in the evaluator, which >> is TRUE if it is OK to free the list after the eval and FALSE if >> it's unclear (in which case the problem is left up to the GC). >> >> For the vast majority of Scheme primitives, one can indeed free the >> list right after the eval. Now of course I am not interested >> in unsafe code, so what I do is this: >> >> TYPE Pair = OBJECT first, rest : REFANY; END; >> >> VAR >> mu := NEW(MUTEX); >> free : Pair := NIL; >> >> PROCEDURE GetPair() : Pair = >> BEGIN >> LOCK mu DO >> IF free # NIL THEN >> TRY >> RETURN free >> FINALLY >> free := free.rest >> END >> END >> END; >> RETURN NEW(Pair) >> END GetPair; >> >> PROCEDURE ReturnPair(cons : Pair) = >> BEGIN >> cons.first := NIL; >> LOCK mu DO >> cons.rest := free; >> free := cons >> END >> END ReturnPair; >> >> my eval code looks like >> >> VAR okToFree : BOOLEAN; BEGIN >> >> args := GetPair(); ... >> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >> >> IF okToFree THEN ReturnPair(args) END; >> RETURN result >> END >> >> and this does work well. In fact it speeds up the Linux >> implementation >> by almost 100% to recycle the lists like this *just* for the >> evaluation of Scheme primitives. >> >> But it's still ugly, isn't it? There's a mutex, and a global >> variable. And yes, the time spent messing with the mutex is >> noticeable, and I haven't even made the code multi-threaded yet >> (and that is coming!) >> >> So I'm thinking, what I really want is a structure that is attached >> to my current Thread.T. I want to be able to access just a single >> pointer (like the free list) but be sure it is unique to my current >> thread. No locking would be necessary if I could do this. >> >> Does anyone have an elegant solution that does something like this? >> Thread-specific "static" variables? Just one REFANY would be enough >> for a lot of uses... seems to me this should be a frequently >> occurring problem? >> >> Best regards, >> Mika >> >> >> >> >> >> From jay.krell at cornell.edu Fri Oct 17 06:40:28 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 17 Oct 2008 04:40:28 +0000 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> References: Your message of <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> Message-ID: Making this per-thread is a fairly classic good improvement. You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and allowing for a free to come in from any thread. A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Including an array of mutexes. If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a global counter that gets assigned into the thread on first use per-thread. The cache could also be more than one element. How do you manage okToFree? Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be. - Jay ---------------------------------------- > To: hosking at cs.purdue.edu > Date: Thu, 16 Oct 2008 16:30:01 -0700 > From: mika at async.caltech.edu > CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu > Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? > > Hi Tony, > > I figured you would chime in! > > Yes, @M3noincremental seems to make things consistently a tad bit > slower (but a very small difference), on both FreeBSD and Linux. > @M3nogc makes a bigger difference, of course. > > Unfortunately I seem to have lost the code that did a lot of memory > allocations. My tricks (as described in the email---and others!) > have removed most of the troublesome memory allocations, but now > I'm stuck with the mutex instead... > > Mika > > Tony Hosking writes: >>Have you tried running @M3noincremental? >> >>On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >> >>> Hello Modula-3 people, >>> >>> As I mentioned in an earlier email about printing structures (thanks >>> Darko), I'm in the midst of coding an interpreter embedded in >>> Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's >>> JScheme for Java (well it was at first strongly based, but more and >>> more loosely, if you know what I mean...) >>> >>> I expected that the performance of the interpreter would be much >>> better in Modula-3 than in Java, and I have been testing on two >>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>> and the other is CM3 on a recent Debian system. What I am finding >>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting >>> close to ten times as fast on some tasks at this point), but on >>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>> >>> When I started, with code that was essentially equivalent to JScheme, >>> I found that it was a bit slower than JScheme on Linux/CM3 and >>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>> spend most of its time in (surprise, surprise!) memory allocation >>> and garbage collection. The speedup I have achieved between the >>> first implementation and now was due to the use of Modula-3 constructs >>> that are superior to Java's, such as the use of arrays of RECORDs >>> to make small stacks rather than linked lists. (I get readable >>> code with much fewer memory allocations and GC work.) >>> >>> Now, since this is an interpreter, I as the implementer have limited >>> control over how much memory is allocated and freed, and where it is >>> needed. However, I can sometimes fall back on C-style memory >>> management, >>> but I would like to do it in a safe way. For instance, I have >>> special-cased >>> evaluation of Scheme primitives, as follows. >>> >>> Under the "normal" implementation, a list of things to evaluate is >>> built up, passed to an evaluation function, and then the GC is left >>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>> just assume that what you put in is going to be dead right after >>> an eval and free it. Instead, I set a flag in the evaluator, which >>> is TRUE if it is OK to free the list after the eval and FALSE if >>> it's unclear (in which case the problem is left up to the GC). >>> >>> For the vast majority of Scheme primitives, one can indeed free the >>> list right after the eval. Now of course I am not interested >>> in unsafe code, so what I do is this: >>> >>> TYPE Pair = OBJECT first, rest : REFANY; END; >>> >>> VAR >>> mu := NEW(MUTEX); >>> free : Pair := NIL; >>> >>> PROCEDURE GetPair() : Pair = >>> BEGIN >>> LOCK mu DO >>> IF free # NIL THEN >>> TRY >>> RETURN free >>> FINALLY >>> free := free.rest >>> END >>> END >>> END; >>> RETURN NEW(Pair) >>> END GetPair; >>> >>> PROCEDURE ReturnPair(cons : Pair) = >>> BEGIN >>> cons.first := NIL; >>> LOCK mu DO >>> cons.rest := free; >>> free := cons >>> END >>> END ReturnPair; >>> >>> my eval code looks like >>> >>> VAR okToFree : BOOLEAN; BEGIN >>> >>> args := GetPair(); ... >>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>> >>> IF okToFree THEN ReturnPair(args) END; >>> RETURN result >>> END >>> >>> and this does work well. In fact it speeds up the Linux >>> implementation >>> by almost 100% to recycle the lists like this *just* for the >>> evaluation of Scheme primitives. >>> >>> But it's still ugly, isn't it? There's a mutex, and a global >>> variable. And yes, the time spent messing with the mutex is >>> noticeable, and I haven't even made the code multi-threaded yet >>> (and that is coming!) >>> >>> So I'm thinking, what I really want is a structure that is attached >>> to my current Thread.T. I want to be able to access just a single >>> pointer (like the free list) but be sure it is unique to my current >>> thread. No locking would be necessary if I could do this. >>> >>> Does anyone have an elegant solution that does something like this? >>> Thread-specific "static" variables? Just one REFANY would be enough >>> for a lot of uses... seems to me this should be a frequently >>> occurring problem? >>> >>> Best regards, >>> Mika >>> >>> >>> >>> >>> >>> From mika at async.caltech.edu Fri Oct 17 08:32:15 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 23:32:15 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000." Message-ID: <200810170632.m9H6WFHd078061@camembert.async.caltech.edu> Well, I was thinking of something even simpler. A Thread.T is an OBJECT. It's garbage collected just like any other object, is it not? Why can't the thing that makes new threads simply include a single globally visible field in every Thread.T, of type REFANY? Call it "data". Then you can always manipulate Thread.Self().data as you see fit without any need for locks. There can be no problem with this as long as it is always manipulated from within that thread. Of course this can be trivially encapsulated by not revealing "data" and indeed always accessing it as Thread.Self().data. You would not normally access this from any other thread. It's indeed only meant to be used in the idiom x := Allocate(); TRY DoSomething(x) FINALLY Free(x) END It's also not really a "Free" but just returning the object to a free list (there can be no unsafe behavior here). As a "nicer" interface, one could register routines with a public interface, asking it to manufacture some kind of thread globals. For maximum sanity, they would be visible inside the MODULE that requested them, but I'm not sure how to accomplish this. And of course there's not much point in any of this unless it can be made efficient or else a mutex plus a true global will work just as well. What I'm talking about I guess could be done by hacking up Thread.Fork() to return a subtype of Thread.T, but that won't work for the first thread. But with this method you could have arbitrary fields (and methods) attached to a Thread.T. How to collect everything you need is a different story... I'm not asking for a new language feature... really was just wondering if anyone had tried anything like this before, and now am rambling a bit. Mika Jay writes: > >Making this per-thread is a fairly classic good improvement. > >You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and a >llowing for a free to come in from any thread. > >A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Includi >ng an array of mutexes. > >If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a glo >bal counter that gets assigned into the thread on first use per-thread. > >The cache could also be more than one element. > >How do you manage okToFree? > >Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work > with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be. > > - Jay > >---------------------------------------- >> To: hosking at cs.purdue.edu >> Date: Thu, 16 Oct 2008 16:30:01 -0700 >> From: mika at async.caltech.edu >> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu >> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? >> >> Hi Tony, >> >> I figured you would chime in! >> >> Yes, @M3noincremental seems to make things consistently a tad bit >> slower (but a very small difference), on both FreeBSD and Linux. >> @M3nogc makes a bigger difference, of course. >> >> Unfortunately I seem to have lost the code that did a lot of memory >> allocations. My tricks (as described in the email---and others!) >> have removed most of the troublesome memory allocations, but now >> I'm stuck with the mutex instead... >> >> Mika >> >> Tony Hosking writes: >>>Have you tried running @M3noincremental? >>> >>>On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> As I mentioned in an earlier email about printing structures (thanks >>>> Darko), I'm in the midst of coding an interpreter embedded in >>>> Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's >>>> JScheme for Java (well it was at first strongly based, but more and >>>> more loosely, if you know what I mean...) >>>> >>>> I expected that the performance of the interpreter would be much >>>> better in Modula-3 than in Java, and I have been testing on two >>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>> and the other is CM3 on a recent Debian system. What I am finding >>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting >>> close to ten times as fast on some tasks at this point), but on >>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>> >>>> When I started, with code that was essentially equivalent to JScheme, >>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>> spend most of its time in (surprise, surprise!) memory allocation >>>> and garbage collection. The speedup I have achieved between the >>>> first implementation and now was due to the use of Modula-3 constructs >>>> that are superior to Java's, such as the use of arrays of RECORDs >>>> to make small stacks rather than linked lists. (I get readable >>>> code with much fewer memory allocations and GC work.) >>>> >>>> Now, since this is an interpreter, I as the implementer have limited >>>> control over how much memory is allocated and freed, and where it is >>>> needed. However, I can sometimes fall back on C-style memory >>>> management, >>>> but I would like to do it in a safe way. For instance, I have >>>> special-cased >>>> evaluation of Scheme primitives, as follows. >>>> >>>> Under the "normal" implementation, a list of things to evaluate is >>>> built up, passed to an evaluation function, and then the GC is left >>>> to sweep up the mess. The problem is that there are various tricky >>>> routes by which references can escape the evaluator, so you can't >>>> just assume that what you put in is going to be dead right after >>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>> it's unclear (in which case the problem is left up to the GC). >>>> >>>> For the vast majority of Scheme primitives, one can indeed free the >>>> list right after the eval. Now of course I am not interested >>>> in unsafe code, so what I do is this: >>>> >>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>> >>>> VAR >>>> mu := NEW(MUTEX); >>>> free : Pair := NIL; >>>> >>>> PROCEDURE GetPair() : Pair = >>>> BEGIN >>>> LOCK mu DO >>>> IF free # NIL THEN >>>> TRY >>>> RETURN free >>>> FINALLY >>>> free := free.rest >>>> END >>>> END >>>> END; >>>> RETURN NEW(Pair) >>>> END GetPair; >>>> >>>> PROCEDURE ReturnPair(cons : Pair) = >>>> BEGIN >>>> cons.first := NIL; >>>> LOCK mu DO >>>> cons.rest := free; >>>> free := cons >>>> END >>>> END ReturnPair; >>>> >>>> my eval code looks like >>>> >>>> VAR okToFree : BOOLEAN; BEGIN >>>> >>>> args := GetPair(); ... >>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>> >>>> IF okToFree THEN ReturnPair(args) END; >>>> RETURN result >>>> END >>>> >>>> and this does work well. In fact it speeds up the Linux >>>> implementation >>>> by almost 100% to recycle the lists like this *just* for the >>>> evaluation of Scheme primitives. >>>> >>>> But it's still ugly, isn't it? There's a mutex, and a global >>>> variable. And yes, the time spent messing with the mutex is >>>> noticeable, and I haven't even made the code multi-threaded yet >>>> (and that is coming!) >>>> >>>> So I'm thinking, what I really want is a structure that is attached >>>> to my current Thread.T. I want to be able to access just a single >>>> pointer (like the free list) but be sure it is unique to my current >>>> thread. No locking would be necessary if I could do this. >>>> >>>> Does anyone have an elegant solution that does something like this? >>>> Thread-specific "static" variables? Just one REFANY would be enough >>>> for a lot of uses... seems to me this should be a frequently >>>> occurring problem? >>>> >>>> Best regards, >>>> Mika >>>> >>>> >>>> >>>> >>>> >>>> From hosking at cs.purdue.edu Fri Oct 17 08:35:03 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 17 Oct 2008 07:35:03 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> References: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu> Message-ID: <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> I suspect part of the overhead of allocation in the new code is the need for thread-local allocation buffers, which means we need to access thread-local state. We really need an efficient way to do that, but pthreads thread-local accesses may be what is killing you. On 17 Oct 2008, at 00:30, Mika Nystrom wrote: > Hi Tony, > > I figured you would chime in! > > Yes, @M3noincremental seems to make things consistently a tad bit > slower (but a very small difference), on both FreeBSD and Linux. > @M3nogc makes a bigger difference, of course. > > Unfortunately I seem to have lost the code that did a lot of memory > allocations. My tricks (as described in the email---and others!) > have removed most of the troublesome memory allocations, but now > I'm stuck with the mutex instead... > > Mika > > Tony Hosking writes: >> Have you tried running @M3noincremental? >> >> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >> >>> Hello Modula-3 people, >>> >>> As I mentioned in an earlier email about printing structures (thanks >>> Darko), I'm in the midst of coding an interpreter embedded in >>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>> Norvig's >>> JScheme for Java (well it was at first strongly based, but more and >>> more loosely, if you know what I mean...) >>> >>> I expected that the performance of the interpreter would be much >>> better in Modula-3 than in Java, and I have been testing on two >>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>> and the other is CM3 on a recent Debian system. What I am finding >>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>> (getting >>> close to ten times as fast on some tasks at this point), but on >>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>> >>> When I started, with code that was essentially equivalent to >>> JScheme, >>> I found that it was a bit slower than JScheme on Linux/CM3 and >>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>> spend most of its time in (surprise, surprise!) memory allocation >>> and garbage collection. The speedup I have achieved between the >>> first implementation and now was due to the use of Modula-3 >>> constructs >>> that are superior to Java's, such as the use of arrays of RECORDs >>> to make small stacks rather than linked lists. (I get readable >>> code with much fewer memory allocations and GC work.) >>> >>> Now, since this is an interpreter, I as the implementer have limited >>> control over how much memory is allocated and freed, and where it is >>> needed. However, I can sometimes fall back on C-style memory >>> management, >>> but I would like to do it in a safe way. For instance, I have >>> special-cased >>> evaluation of Scheme primitives, as follows. >>> >>> Under the "normal" implementation, a list of things to evaluate is >>> built up, passed to an evaluation function, and then the GC is left >>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>> just assume that what you put in is going to be dead right after >>> an eval and free it. Instead, I set a flag in the evaluator, which >>> is TRUE if it is OK to free the list after the eval and FALSE if >>> it's unclear (in which case the problem is left up to the GC). >>> >>> For the vast majority of Scheme primitives, one can indeed free the >>> list right after the eval. Now of course I am not interested >>> in unsafe code, so what I do is this: >>> >>> TYPE Pair = OBJECT first, rest : REFANY; END; >>> >>> VAR >>> mu := NEW(MUTEX); >>> free : Pair := NIL; >>> >>> PROCEDURE GetPair() : Pair = >>> BEGIN >>> LOCK mu DO >>> IF free # NIL THEN >>> TRY >>> RETURN free >>> FINALLY >>> free := free.rest >>> END >>> END >>> END; >>> RETURN NEW(Pair) >>> END GetPair; >>> >>> PROCEDURE ReturnPair(cons : Pair) = >>> BEGIN >>> cons.first := NIL; >>> LOCK mu DO >>> cons.rest := free; >>> free := cons >>> END >>> END ReturnPair; >>> >>> my eval code looks like >>> >>> VAR okToFree : BOOLEAN; BEGIN >>> >>> args := GetPair(); ... >>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>> >>> IF okToFree THEN ReturnPair(args) END; >>> RETURN result >>> END >>> >>> and this does work well. In fact it speeds up the Linux >>> implementation >>> by almost 100% to recycle the lists like this *just* for the >>> evaluation of Scheme primitives. >>> >>> But it's still ugly, isn't it? There's a mutex, and a global >>> variable. And yes, the time spent messing with the mutex is >>> noticeable, and I haven't even made the code multi-threaded yet >>> (and that is coming!) >>> >>> So I'm thinking, what I really want is a structure that is attached >>> to my current Thread.T. I want to be able to access just a single >>> pointer (like the free list) but be sure it is unique to my current >>> thread. No locking would be necessary if I could do this. >>> >>> Does anyone have an elegant solution that does something like this? >>> Thread-specific "static" variables? Just one REFANY would be enough >>> for a lot of uses... seems to me this should be a frequently >>> occurring problem? >>> >>> Best regards, >>> Mika >>> >>> >>> >>> >>> >>> From mika at async.caltech.edu Fri Oct 17 08:50:13 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 16 Oct 2008 23:50:13 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000." Message-ID: <200810170650.m9H6oDU0078549@camembert.async.caltech.edu> Jay writes: ... >How do you manage okToFree? ... I forgot to answer this q. Well, the primitive evaluation in the interpreter is just a big CASE statement. I really just look at where it references the list I am making, and if it references the list at all in a branch, I insert the code "okToFree := FALSE". The first two parameters are passed in separately. Here's the code... since you ask! This is the code for the special case of a two-argument Scheme procedure call, such as (+ x 1) . PROCEDURE Apply2(t : T; interp : Scheme.T; a1, a2 : Object) : Object VAR d1, d2 := GetCons(); free := TRUE; BEGIN d1.first := a1; d1.rest := d2; d2.first := a2; d2.rest := NIL; WITH res = Prims(t, interp, d1, a1, a2, free) DO IF free THEN ReturnCons(d1); ReturnCons(d2) END; RETURN res END END Apply2; PROCEDURE Prims(t : T; interp : Scheme.T; args, x, y : Object; VAR free : BOOLEAN) : Object = (* The (hopefully temporary) list of arguments is args. x and y are the first two elements of args *) BEGIN CASE VAL(t.idNumber,P) OF P.Eq => RETURN NumCompare(args, '=') (* known not to let args escape *) | P.List => free := FALSE; RETURN args (* args escapes, dont know whither *) | P.Car => RETURN PedanticFirst(x) (* doesn't even use args *) (* and about another 100 cases follow here *) END END Prims; Mika From mika at async.caltech.edu Fri Oct 17 10:03:18 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 17 Oct 2008 01:03:18 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST." <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> Message-ID: <200810170803.m9H83IIC080081@camembert.async.caltech.edu> Ok this suggests that using thread local state to get around the problem won't help either. Can I ask a question... I am looking at ThreadPThread.m3... Why do you have to lock the slotMu in Self()? PROCEDURE Self (): T = (* If not the initial thread and not created by Fork, returns NIL *) (* LL = 0 *) VAR me := GetActivation(); t: T; BEGIN IF me = NIL THEN RETURN NIL END; WITH r = Upthread.mutex_lock(slotMu) DO <*ASSERT r=0*> END; t := slots[me.slot]; WITH r = Upthread.mutex_unlock(slotMu) DO <*ASSERT r=0*> END; IF (t.act # me) THEN Die(ThisLine(), "thread with bad slot!") END; RETURN t; END Self; Is it just because of AssignSlots? If so.. it's actually a very rare event that there would ever be a conflict, no? (Only when "slots" is extended?) Can data be stored in an "Activation"? Not TRACED data, obviously, hmm... Mika Tony Hosking writes: >I suspect part of the overhead of allocation in the new code is the >need for thread-local allocation buffers, which means we need to >access thread-local state. We really need an efficient way to do >that, but pthreads thread-local accesses may be what is killing you. > >On 17 Oct 2008, at 00:30, Mika Nystrom wrote: > >> Hi Tony, >> >> I figured you would chime in! >> >> Yes, @M3noincremental seems to make things consistently a tad bit >> slower (but a very small difference), on both FreeBSD and Linux. >> @M3nogc makes a bigger difference, of course. >> >> Unfortunately I seem to have lost the code that did a lot of memory >> allocations. My tricks (as described in the email---and others!) >> have removed most of the troublesome memory allocations, but now >> I'm stuck with the mutex instead... >> >> Mika >> >> Tony Hosking writes: >>> Have you tried running @M3noincremental? >>> >>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> As I mentioned in an earlier email about printing structures (thanks >>>> Darko), I'm in the midst of coding an interpreter embedded in >>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>> Norvig's >>>> JScheme for Java (well it was at first strongly based, but more and >>>> more loosely, if you know what I mean...) >>>> >>>> I expected that the performance of the interpreter would be much >>>> better in Modula-3 than in Java, and I have been testing on two >>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>> and the other is CM3 on a recent Debian system. What I am finding >>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>> (getting >>>> close to ten times as fast on some tasks at this point), but on >>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>> >>>> When I started, with code that was essentially equivalent to >>>> JScheme, >>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>> spend most of its time in (surprise, surprise!) memory allocation >>>> and garbage collection. The speedup I have achieved between the >>>> first implementation and now was due to the use of Modula-3 >>>> constructs >>>> that are superior to Java's, such as the use of arrays of RECORDs >>>> to make small stacks rather than linked lists. (I get readable >>>> code with much fewer memory allocations and GC work.) >>>> >>>> Now, since this is an interpreter, I as the implementer have limited >>>> control over how much memory is allocated and freed, and where it is >>>> needed. However, I can sometimes fall back on C-style memory >>>> management, >>>> but I would like to do it in a safe way. For instance, I have >>>> special-cased >>>> evaluation of Scheme primitives, as follows. >>>> >>>> Under the "normal" implementation, a list of things to evaluate is >>>> built up, passed to an evaluation function, and then the GC is left >>>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>>> just assume that what you put in is going to be dead right after >>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>> it's unclear (in which case the problem is left up to the GC). >>>> >>>> For the vast majority of Scheme primitives, one can indeed free the >>>> list right after the eval. Now of course I am not interested >>>> in unsafe code, so what I do is this: >>>> >>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>> >>>> VAR >>>> mu := NEW(MUTEX); >>>> free : Pair := NIL; >>>> >>>> PROCEDURE GetPair() : Pair = >>>> BEGIN >>>> LOCK mu DO >>>> IF free # NIL THEN >>>> TRY >>>> RETURN free >>>> FINALLY >>>> free := free.rest >>>> END >>>> END >>>> END; >>>> RETURN NEW(Pair) >>>> END GetPair; >>>> >>>> PROCEDURE ReturnPair(cons : Pair) = >>>> BEGIN >>>> cons.first := NIL; >>>> LOCK mu DO >>>> cons.rest := free; >>>> free := cons >>>> END >>>> END ReturnPair; >>>> >>>> my eval code looks like >>>> >>>> VAR okToFree : BOOLEAN; BEGIN >>>> >>>> args := GetPair(); ... >>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>> >>>> IF okToFree THEN ReturnPair(args) END; >>>> RETURN result >>>> END >>>> >>>> and this does work well. In fact it speeds up the Linux >>>> implementation >>>> by almost 100% to recycle the lists like this *just* for the >>>> evaluation of Scheme primitives. >>>> >>>> But it's still ugly, isn't it? There's a mutex, and a global >>>> variable. And yes, the time spent messing with the mutex is >>>> noticeable, and I haven't even made the code multi-threaded yet >>>> (and that is coming!) >>>> >>>> So I'm thinking, what I really want is a structure that is attached >>>> to my current Thread.T. I want to be able to access just a single >>>> pointer (like the free list) but be sure it is unique to my current >>>> thread. No locking would be necessary if I could do this. >>>> >>>> Does anyone have an elegant solution that does something like this? >>>> Thread-specific "static" variables? Just one REFANY would be enough >>>> for a lot of uses... seems to me this should be a frequently >>>> occurring problem? >>>> >>>> Best regards, >>>> Mika >>>> >>>> >>>> >>>> >>>> >>>> From mika at async.caltech.edu Fri Oct 17 10:32:28 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 17 Oct 2008 01:32:28 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST." <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> Message-ID: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> Ok I am sorry I am slow to pick up on this. I take it the problem is actually the Upthread.getspecific routine, which itself calls something get_curthread somewhere inside pthreads, which in turn involves a context switch to the supervisor---the identity of the current thread is just not accessible anywhere in user space. Also explains why this program runs faster with my old PM3, which uses longjmp threads. The only way to avoid it (really) is to pass a pointer to the Thread.T of the currently executing thread in the activation record of *every* procedure, so that allocators can find it when necessary.... but that is very expensive in terms of stack memory. Or I can just make a structure like that that I pass around where I need it in my own program. Thread-specific and user-managed. I believe I have just answered all my own questions, but I hope Tony will correct me if my answers are incorrect. Mika Tony Hosking writes: >I suspect part of the overhead of allocation in the new code is the >need for thread-local allocation buffers, which means we need to >access thread-local state. We really need an efficient way to do >that, but pthreads thread-local accesses may be what is killing you. > >On 17 Oct 2008, at 00:30, Mika Nystrom wrote: > >> Hi Tony, >> >> I figured you would chime in! >> >> Yes, @M3noincremental seems to make things consistently a tad bit >> slower (but a very small difference), on both FreeBSD and Linux. >> @M3nogc makes a bigger difference, of course. >> >> Unfortunately I seem to have lost the code that did a lot of memory >> allocations. My tricks (as described in the email---and others!) >> have removed most of the troublesome memory allocations, but now >> I'm stuck with the mutex instead... >> >> Mika >> >> Tony Hosking writes: >>> Have you tried running @M3noincremental? >>> >>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>> >>>> Hello Modula-3 people, >>>> >>>> As I mentioned in an earlier email about printing structures (thanks >>>> Darko), I'm in the midst of coding an interpreter embedded in >>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>> Norvig's >>>> JScheme for Java (well it was at first strongly based, but more and >>>> more loosely, if you know what I mean...) >>>> >>>> I expected that the performance of the interpreter would be much >>>> better in Modula-3 than in Java, and I have been testing on two >>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>> and the other is CM3 on a recent Debian system. What I am finding >>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>> (getting >>>> close to ten times as fast on some tasks at this point), but on >>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>> >>>> When I started, with code that was essentially equivalent to >>>> JScheme, >>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>> spend most of its time in (surprise, surprise!) memory allocation >>>> and garbage collection. The speedup I have achieved between the >>>> first implementation and now was due to the use of Modula-3 >>>> constructs >>>> that are superior to Java's, such as the use of arrays of RECORDs >>>> to make small stacks rather than linked lists. (I get readable >>>> code with much fewer memory allocations and GC work.) >>>> >>>> Now, since this is an interpreter, I as the implementer have limited >>>> control over how much memory is allocated and freed, and where it is >>>> needed. However, I can sometimes fall back on C-style memory >>>> management, >>>> but I would like to do it in a safe way. For instance, I have >>>> special-cased >>>> evaluation of Scheme primitives, as follows. >>>> >>>> Under the "normal" implementation, a list of things to evaluate is >>>> built up, passed to an evaluation function, and then the GC is left >>>> to sweep up the mess. The problem is that there are various tricky >>> routes by which references can escape the evaluator, so you can't >>>> just assume that what you put in is going to be dead right after >>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>> it's unclear (in which case the problem is left up to the GC). >>>> >>>> For the vast majority of Scheme primitives, one can indeed free the >>>> list right after the eval. Now of course I am not interested >>>> in unsafe code, so what I do is this: >>>> >>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>> >>>> VAR >>>> mu := NEW(MUTEX); >>>> free : Pair := NIL; >>>> >>>> PROCEDURE GetPair() : Pair = >>>> BEGIN >>>> LOCK mu DO >>>> IF free # NIL THEN >>>> TRY >>>> RETURN free >>>> FINALLY >>>> free := free.rest >>>> END >>>> END >>>> END; >>>> RETURN NEW(Pair) >>>> END GetPair; >>>> >>>> PROCEDURE ReturnPair(cons : Pair) = >>>> BEGIN >>>> cons.first := NIL; >>>> LOCK mu DO >>>> cons.rest := free; >>>> free := cons >>>> END >>>> END ReturnPair; >>>> >>>> my eval code looks like >>>> >>>> VAR okToFree : BOOLEAN; BEGIN >>>> >>>> args := GetPair(); ... >>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>> >>>> IF okToFree THEN ReturnPair(args) END; >>>> RETURN result >>>> END >>>> >>>> and this does work well. In fact it speeds up the Linux >>>> implementation >>>> by almost 100% to recycle the lists like this *just* for the >>>> evaluation of Scheme primitives. >>>> >>>> But it's still ugly, isn't it? There's a mutex, and a global >>>> variable. And yes, the time spent messing with the mutex is >>>> noticeable, and I haven't even made the code multi-threaded yet >>>> (and that is coming!) >>>> >>>> So I'm thinking, what I really want is a structure that is attached >>>> to my current Thread.T. I want to be able to access just a single >>>> pointer (like the free list) but be sure it is unique to my current >>>> thread. No locking would be necessary if I could do this. >>>> >>>> Does anyone have an elegant solution that does something like this? >>>> Thread-specific "static" variables? Just one REFANY would be enough >>>> for a lot of uses... seems to me this should be a frequently >>>> occurring problem? >>>> >>>> Best regards, >>>> Mika >>>> >>>> >>>> >>>> >>>> >>>> From jay.krell at cornell.edu Sat Oct 18 00:42:35 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 17 Oct 2008 22:42:35 +0000 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> References: Your message of <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> Message-ID: Right and wrong. Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGetValue. Wrong that this necessarily incurs a switch to the supervisor/kernel, and perhaps wrong to call that at a "context switch". It depends on the operating system. I will explain. On Windows/x86, the FS register points to a partly documented per-thread data structure. C and C++ exception handling use FS:0. Disassemble any code. You'll find it is used. Not by Modula-3 though. Disassemble TlsGetValue. cdb /z %windir%\system32\kernel32.dll 0:000> uf kernel32!TlsGetValue kernel32!TlsGetValue: typical looking prolog.. 7dd813e0 8bff mov edi,edi 7dd813e2 55 push ebp 7dd813e3 8bec mov ebp,esp fs:18 contains a "normal" "linear" pointer to fs:0 Get that pointer. 7dd813e5 64a118000000 mov eax,dword ptr fs:[00000018h] get the index 7dd813eb 8b4d08 mov ecx,dword ptr [ebp+8] SetLastError(0) 7dd813ee 83603400 and dword ptr [eax+34h],0 There are 64 preallocated thread local slots -- compare the index to 64. 7dd813f2 83f940 cmp ecx,40h If it above or equal to 64, go use the non preallocated slots. 7dd813f5 0f8353e20200 jae kernel32!lstrcmpi+0x4b22 (7ddaf64e) preallocated slots are at fs:e10; get the data and done kernel32!TlsGetValue+0x1b: 7dd813fb 8b8488100e0000 mov eax,dword ptr [eax+ecx*4+0E10h] epilog kernel32!TlsGetValue+0x22: 7dd81402 5d pop ebp 7dd81403 c20400 ret 4 get here for indices>= 64 compare index to 1088 == 1024 + 64, as there are another 1024 more slowly available slots kernel32!lstrcmpi+0x4b22: 7ddaf64e 81f940040000 cmp ecx,440h if it is below 1024, go use those slots 7ddaf654 7211 jb kernel32!lstrcmpi+0x4b3b (7ddaf667) index is above or equal to 1024, SetLastError(invalid parameter) kernel32!lstrcmpi+0x4b2a: 7ddaf656 680d0000c0 push 0C000000Dh 7ddaf65b e80025fdff call kernel32!GetProcessHeap+0x12 (7dd81b60) and return 0 -- 0 is not unambiguously an error -- that's why last error was cleared at the start kernel32!lstrcmpi+0x4b34: 7ddaf660 33c0 xor eax,eax 7ddaf662 e99b1dfdff jmp kernel32!TlsGetValue+0x22 (7dd81402) This is where the slots between 64 and 1088 are used. Get pointer from FS:F94 and compare to null. If it is null, that is ok, it means nobody has yet calls TlsSetValue for this value, so it just retains its initial 0 value. kernel32!lstrcmpi+0x4b3b: 7ddaf667 8b80940f0000 mov eax,dword ptr [eax+0F94h] 7ddaf66d 85c0 test eax,eax 7ddaf66f 74ef je kernel32!lstrcmpi+0x4b34 (7ddaf660) Index is between 64 and 1088, and there is a non null pointer at FS:F94. Subtract 64 from index and index into pointer there. Note it does the subtraction after the multiplication, so subtracts 64*4=0x100. kernel32!lstrcmpi+0x4b45: 7ddaf671 8b848800ffffff mov eax,dword ptr [eax+ecx*4-100h] 7ddaf678 e9851dfdff jmp kernel32!TlsGetValue+0x22 (7dd81402) So, it is a few instructions but there is no context switch into the kernel/supervisor. Also, calls into the kernel aren't necessarily a "context switch". Some context is saved, and a bit is twiddled in the processor to indicate a privilege level change, but no page tables are altered and I believe no TLBs (translation lookaside buffer) are invalidated, and no thread scheduling decisions are made -- though upon exit from the kernel, APCs (asynchronous procedure call) can be run -- on the calling thread. A more expensive context switch is when another thread or another process runs. Switching threads requires saving more context, and switching processes requires changing the register that points to the page tables. One detail there -- calling into the x86 NT kernel does not preserve floating point state -- that's the additional state that a thread switch has to save, at least. NT/x86 kernel drivers aren't allowed to use floating point, with some exception, like if they are video drivers (only certain functions?) or they explicitly save/restore the floating point registers using public functions. I don't know about the other architectures. I think IA64 only preserves some floating point state, not all. Now, the question then is how is Upthread.getspecific implemented on other archictures and operating systems. We should look into that for various operating systems. Oh, also, let's see what __declspec(thread) does. >type t.c __declspec(thread) int a; void F1(int); void F2() { F1(a); } cl -c t.c link -dump -disasm t.obj Dump of file t.obj File Type: COFF OBJECT _F2: 00000000: 55 push ebp 00000001: 8B EC mov ebp,esp 00000003: A1 00 00 00 00 mov eax,dword ptr [__tls_index] 00000008: 64 8B 0D 00 00 00 mov ecx,dword ptr fs:[__tls_array] 00 0000000F: 8B 14 81 mov edx,dword ptr [ecx+eax*4] 00000012: 8B 82 00 00 00 00 mov eax,dword ptr _a[edx] 00000018: 50 push eax 00000019: E8 00 00 00 00 call _F1 0000001E: 83 C4 04 add esp,4 00000021: 5D pop ebp 00000022: C3 ret See the compiler generated code reference FS directly. The optimized version is: Dump of file t.obj File Type: COFF OBJECT _F2: 00000000: A1 00 00 00 00 mov eax,dword ptr [__tls_index] 00000005: 64 8B 0D 00 00 00 mov ecx,dword ptr fs:[__tls_array] 00 0000000C: 8B 14 81 mov edx,dword ptr [ecx+eax*4] 0000000F: 8B 82 00 00 00 00 mov eax,dword ptr _a[edx] 00000015: 50 push eax 00000016: E8 00 00 00 00 call _F1 0000001B: 59 pop ecx 0000001C: C3 ret - Jay > To: hosking at cs.purdue.edu > Date: Fri, 17 Oct 2008 01:32:28 -0700 > From: mika at async.caltech.edu > CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu > Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? > > Ok I am sorry I am slow to pick up on this. > > I take it the problem is actually the Upthread.getspecific routine, > which itself calls something get_curthread somewhere inside pthreads, > which in turn involves a context switch to the supervisor---the identity > of the current thread is just not accessible anywhere in user space. > Also explains why this program runs faster with my old PM3, which uses > longjmp threads. > > The only way to avoid it (really) is to pass a pointer to the > Thread.T of the currently executing thread in the activation record > of *every* procedure, so that allocators can find it when necessary.... > but that is very expensive in terms of stack memory. > > Or I can just make a structure like that that I pass around where > I need it in my own program. Thread-specific and user-managed. > > I believe I have just answered all my own questions, but I hope > Tony will correct me if my answers are incorrect. > > Mika > > Tony Hosking writes: >>I suspect part of the overhead of allocation in the new code is the >>need for thread-local allocation buffers, which means we need to >>access thread-local state. We really need an efficient way to do >>that, but pthreads thread-local accesses may be what is killing you. >> >>On 17 Oct 2008, at 00:30, Mika Nystrom wrote: >> >>> Hi Tony, >>> >>> I figured you would chime in! >>> >>> Yes, @M3noincremental seems to make things consistently a tad bit >>> slower (but a very small difference), on both FreeBSD and Linux. >>> @M3nogc makes a bigger difference, of course. >>> >>> Unfortunately I seem to have lost the code that did a lot of memory >>> allocations. My tricks (as described in the email---and others!) >>> have removed most of the troublesome memory allocations, but now >>> I'm stuck with the mutex instead... >>> >>> Mika >>> >>> Tony Hosking writes: >>>> Have you tried running @M3noincremental? >>>> >>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> As I mentioned in an earlier email about printing structures (thanks >>>>> Darko), I'm in the midst of coding an interpreter embedded in >>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>>> Norvig's >>>>> JScheme for Java (well it was at first strongly based, but more and >>>>> more loosely, if you know what I mean...) >>>>> >>>>> I expected that the performance of the interpreter would be much >>>>> better in Modula-3 than in Java, and I have been testing on two >>>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3, >>>>> and the other is CM3 on a recent Debian system. What I am finding >>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>>> (getting >>>>> close to ten times as fast on some tasks at this point), but on >>>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>>> >>>>> When I started, with code that was essentially equivalent to >>>>> JScheme, >>>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>>> spend most of its time in (surprise, surprise!) memory allocation >>>>> and garbage collection. The speedup I have achieved between the >>>>> first implementation and now was due to the use of Modula-3 >>>>> constructs >>>>> that are superior to Java's, such as the use of arrays of RECORDs >>>>> to make small stacks rather than linked lists. (I get readable >>>>> code with much fewer memory allocations and GC work.) >>>>> >>>>> Now, since this is an interpreter, I as the implementer have limited >>>>> control over how much memory is allocated and freed, and where it is >>>>> needed. However, I can sometimes fall back on C-style memory >>>>> management, >>>>> but I would like to do it in a safe way. For instance, I have >>>>> special-cased >>>>> evaluation of Scheme primitives, as follows. >>>>> >>>>> Under the "normal" implementation, a list of things to evaluate is >>>>> built up, passed to an evaluation function, and then the GC is left >>>>> to sweep up the mess. The problem is that there are various tricky >>>> routes by which references can escape the evaluator, so you can't >>>>> just assume that what you put in is going to be dead right after >>>>> an eval and free it. Instead, I set a flag in the evaluator, which >>>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>>> it's unclear (in which case the problem is left up to the GC). >>>>> >>>>> For the vast majority of Scheme primitives, one can indeed free the >>>>> list right after the eval. Now of course I am not interested >>>>> in unsafe code, so what I do is this: >>>>> >>>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>>> >>>>> VAR >>>>> mu := NEW(MUTEX); >>>>> free : Pair := NIL; >>>>> >>>>> PROCEDURE GetPair() : Pair = >>>>> BEGIN >>>>> LOCK mu DO >>>>> IF free # NIL THEN >>>>> TRY >>>>> RETURN free >>>>> FINALLY >>>>> free := free.rest >>>>> END >>>>> END >>>>> END; >>>>> RETURN NEW(Pair) >>>>> END GetPair; >>>>> >>>>> PROCEDURE ReturnPair(cons : Pair) = >>>>> BEGIN >>>>> cons.first := NIL; >>>>> LOCK mu DO >>>>> cons.rest := free; >>>>> free := cons >>>>> END >>>>> END ReturnPair; >>>>> >>>>> my eval code looks like >>>>> >>>>> VAR okToFree : BOOLEAN; BEGIN >>>>> >>>>> args := GetPair(); ... >>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>>> >>>>> IF okToFree THEN ReturnPair(args) END; >>>>> RETURN result >>>>> END >>>>> >>>>> and this does work well. In fact it speeds up the Linux >>>>> implementation >>>>> by almost 100% to recycle the lists like this *just* for the >>>>> evaluation of Scheme primitives. >>>>> >>>>> But it's still ugly, isn't it? There's a mutex, and a global >>>>> variable. And yes, the time spent messing with the mutex is >>>>> noticeable, and I haven't even made the code multi-threaded yet >>>>> (and that is coming!) >>>>> >>>>> So I'm thinking, what I really want is a structure that is attached >>>>> to my current Thread.T. I want to be able to access just a single >>>>> pointer (like the free list) but be sure it is unique to my current >>>>> thread. No locking would be necessary if I could do this. >>>>> >>>>> Does anyone have an elegant solution that does something like this? >>>>> Thread-specific "static" variables? Just one REFANY would be enough >>>>> for a lot of uses... seems to me this should be a frequently >>>>> occurring problem? >>>>> >>>>> Best regards, >>>>> Mika >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> From mika at async.caltech.edu Sat Oct 18 01:00:28 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 17 Oct 2008 16:00:28 -0700 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: Your message of "Fri, 17 Oct 2008 22:42:35 -0000." Message-ID: <200810172300.m9HN0SfN008554@camembert.async.caltech.edu> No, I didn't mean that it *necessarily* involves a context switch. Obviously it doesn't, because the user-level threading doesn't ever need to do a "kernel" context switch (but of course does its own switching, however I don't see that it would need that to get or set a variable). I just meant that looking at the (C) implementation of pthreads I have (on FreeBSD), on that system, it does seem to, as the code in question is marked as "kernel code". In any case I think I have been able to solve my particular problem by identifying a data structure that is inherently only accessed from a single thread (in my program) and attaching my memory recycling trickery to that particular structure. I get very little memory allocation/GC and no need for locks at all, which is precisely the effect I was going for. I am still a little bit concerned about the performance of CM3-generated code but the main culprit appears to be TYPECASE/ISTYPE now, far from garbage collectors and thread libraries. I'll send an update if I can find something egregiously inefficient. Mika Jay writes: > >Right and wrong. > >Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGet >Value. >Wrong that this necessarily incurs a switch to the supervisor/kernel, and perh >aps wrong to call that at a "context switch". It depends on the operating syst >em. > >I will explain. > >On Windows/x86, the FS register points to a partly documented per-thread data >structure. >C and C++ exception handling use FS:0. >Disassemble any code. You'll find it is used. Not by Modula-3 though. > >Disassemble TlsGetValue. > > cdb /z %windir%\system32\kernel32.dll > >0:000> uf kernel32!TlsGetValue >kernel32!TlsGetValue: ... From mika at async.caltech.edu Sat Oct 18 10:41:30 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Sat, 18 Oct 2008 01:41:30 -0700 Subject: [M3devel] Fortran Message-ID: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu> Ok now in the realm of crazy questions---and I apologize to those whose inboxes I clog with some of my emails... If there is anyone out there in Modula-3-ether who has ever written or heard of ... an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs ... would he please make himself known to me? (I have a Scheme interpreter to trade...) Mika From lemming at henning-thielemann.de Sat Oct 18 17:34:50 2008 From: lemming at henning-thielemann.de (Henning Thielemann) Date: Sat, 18 Oct 2008 17:34:50 +0200 (MEST) Subject: [M3devel] Fortran In-Reply-To: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu> References: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu> Message-ID: On Sat, 18 Oct 2008, Mika Nystrom wrote: > Ok now in the realm of crazy questions---and I apologize to those > whose inboxes I clog with some of my emails... > > If there is anyone out there in Modula-3-ether who has ever written > or heard of ... > > an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs > > ... would he please make himself known to me? (I have a Scheme > interpreter to trade...) I have written a program for generating Modula-3 interfaces for LAPACK (linear algebra routines) using m3coco. But I'm afraid that my Fortran parser works only for LAPACK and no other library. I have just copied the CVS files to http://modula3.elegosoft.com/cgi-bin/cvsweb.cgi/m3/pm3/language/parsing/m3coco/test/?cvsroot=PM3 Before you check this out, I might move it to a different location, maybe cm3/m3-tools, if this is more appropriate. (Maybe you also need the revised m3coco version, which I only have on a branch, and never tried to merge it back to HEAD.) While searching my own code in the net, I found some nice interviews with Luca Cardelli: http://www.wikio.com/technology/development/modula-3 From mika at async.caltech.edu Tue Oct 21 13:05:01 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 21 Oct 2008 04:05:01 -0700 Subject: [M3devel] CM3 on Mac OS X Tiger Message-ID: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu> Hello everyone, Sorry if I have asked this before---I feel I must have, and Tony probably answered it, too, but I can't find it anywhere in my email archives. It looks like I finally upgraded my Mac to Tiger a half year ago, and everything broke. (Modula-3, emacs, make, etc etc etc etc.) I am finally getting around to fixing it. Now I am trying to compile CM3 in accordance with Tony's instructions as of June 24, 2007: (short quote here) > cd ~/cm3-cvs > mkdir boot > cd boot > tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz > ./cminstall Now you will have some kind of cm3 installed, presumably in /usr/ local/cm3/bin/cm3. Make sure you have a fresh CVS checkout in directory cm3 (let's assume this is in your home directory ~/cm3). Also, make sure you have an up-to-date version of the CM3 backend compiler cm3cg installed by executing the following: STEP 0: export CM3=/usr/local/cm3/bin/cm3 cd ~/cm3/m3-sys/m3cc $CM3 $CM3 -ship You can skip this last step if you know your backend compiler is up to date. Now, let's build the new compiler from scratch (this is the sequence I use regularly to test changes to the run-time system whenever I make them): STEP 1: cd ~/cm3/m3-libs/m3core $CM3 $CM3 -ship (end short quote, there's much more) What happens is that when building m3core, my compiler is building it against the interfaces in /usr/local/cm3, NOT the interfaces within m3core itself: --- building in PPC_DARWIN --- ignoring ../src/m3overrides new source -> compiling RTCollector.m3 "../src/runtime/common/RTCollector.m3", line 2914: unknown qualification '.' (AMD64_LINUX) "../src/runtime/common/RTCollector.m3", line 2915: unknown qualification '.' (SPARC32_LINUX) "../src/runtime/common/RTCollector.m3", line 2916: unknown qualification '.' (SPARC64_OPENBSD) "../src/runtime/common/RTCollector.m3", line 2917: unknown qualification '.' (PPC32_OPENBSD) 4 errors encountered stale imports -> compiling RTDebug.m3 Fatal Error: bad version stamps: RTDebug.m3 version stamp mismatch: Compiler.Platform => RTDebug.m3 => Compiler.i3 version stamp mismatch: Compiler.ThisPlatform <8b5a6f513e082750> => RTDebug.m3 <8e110d4fed998051> => Compiler.i3 I feel like I should REALLY know the answer to this, but how do I get the compiler to use only the local sources and not attempt to compile things with reference to the already-installed interfaces? Mika From hosking at cs.purdue.edu Tue Oct 21 13:21:36 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 12:21:36 +0100 Subject: [M3devel] CM3 on Mac OS X Tiger In-Reply-To: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu> References: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu> Message-ID: <27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu> This is a phase ordering problem that arises when you use an old compiler to compile newer sources. It really should be fixed somehow. In any case, the problem is those lines in RTCollector at the bottom (I deleted them yesterday on the main trunk) that refer to values supposedly built in to the compiler (which are not there for the old binary you are using). I think if you delete those lines then you should be OK. Once you have a new compiler bootstrapped (with those configuration values available built in) then you should be able to compile that code (excepting that I just deleted those lines yesterday). On 21 Oct 2008, at 12:05, Mika Nystrom wrote: > Hello everyone, > > Sorry if I have asked this before---I feel I must have, and Tony > probably answered it, too, but I can't find it anywhere in my email > archives. > > It looks like I finally upgraded my Mac to Tiger a half year ago, > and everything broke. (Modula-3, emacs, make, etc etc etc etc.) > I am finally getting around to fixing it. Now I am trying to > compile CM3 in accordance with Tony's instructions as of June 24, > 2007: > > (short quote here) >> cd ~/cm3-cvs >> mkdir boot >> cd boot >> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz >> ./cminstall > > Now you will have some kind of cm3 installed, presumably in /usr/ > local/cm3/bin/cm3. > > Make sure you have a fresh CVS checkout in directory cm3 (let's > assume this is in your home directory ~/cm3). Also, make sure you > have an up-to-date version of the CM3 backend compiler cm3cg > installed by executing the following: > > STEP 0: > > export CM3=/usr/local/cm3/bin/cm3 > cd ~/cm3/m3-sys/m3cc > $CM3 > $CM3 -ship > > You can skip this last step if you know your backend compiler is up > to date. > > Now, let's build the new compiler from scratch (this is the sequence > I use regularly to test changes to the run-time system whenever I > make them): > > STEP 1: > > cd ~/cm3/m3-libs/m3core > $CM3 > $CM3 -ship > (end short quote, there's much more) > > What happens is that when building m3core, my compiler is building > it against the interfaces in /usr/local/cm3, NOT the interfaces > within m3core itself: > > --- building in PPC_DARWIN --- > > ignoring ../src/m3overrides > > new source -> compiling RTCollector.m3 > "../src/runtime/common/RTCollector.m3", line 2914: unknown > qualification '.' (AMD64_LINUX) > "../src/runtime/common/RTCollector.m3", line 2915: unknown > qualification '.' (SPARC32_LINUX) > "../src/runtime/common/RTCollector.m3", line 2916: unknown > qualification '.' (SPARC64_OPENBSD) > "../src/runtime/common/RTCollector.m3", line 2917: unknown > qualification '.' (PPC32_OPENBSD) > 4 errors encountered > stale imports -> compiling RTDebug.m3 > > Fatal Error: bad version stamps: RTDebug.m3 > > version stamp mismatch: Compiler.Platform > => RTDebug.m3 > => Compiler.i3 > version stamp mismatch: Compiler.ThisPlatform > <8b5a6f513e082750> => RTDebug.m3 > <8e110d4fed998051> => Compiler.i3 > > I feel like I should REALLY know the answer to this, but how do I > get the compiler to use only the local sources and not attempt > to compile things with reference to the already-installed > interfaces? > > Mika From hosking at cs.purdue.edu Tue Oct 21 16:54:58 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 15:54:58 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> Message-ID: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu> I have one more question that I forgot to ask before. Did you evaluate performance with -O3 optimization in the backend? Generally, I have the following in my m3_backend specs so that turning on optimization results in -O3 (and lots of lovely inlining): proc m3_backend (source, object, optimize, debug) is local args = [ "-m32", "-quiet", source, "-o", object, % fPIC really is needed here, despite man gcc saying it is the default. % This is because man gcc is about Apple's gcc but m3cg is % built from FSF source. "-fPIC", "-fno-reorder-blocks" ] if optimize args += "-O3" end if debug args += "-gstabs" end if M3_PROFILING args += "-p" end return try_exec (m3back, args) end On 17 Oct 2008, at 09:32, Mika Nystrom wrote: > Ok I am sorry I am slow to pick up on this. > > I take it the problem is actually the Upthread.getspecific routine, > which itself calls something get_curthread somewhere inside pthreads, > which in turn involves a context switch to the supervisor---the > identity > of the current thread is just not accessible anywhere in user space. > Also explains why this program runs faster with my old PM3, which uses > longjmp threads. > > The only way to avoid it (really) is to pass a pointer to the > Thread.T of the currently executing thread in the activation record > of *every* procedure, so that allocators can find it when > necessary.... > but that is very expensive in terms of stack memory. > > Or I can just make a structure like that that I pass around where > I need it in my own program. Thread-specific and user-managed. > > I believe I have just answered all my own questions, but I hope > Tony will correct me if my answers are incorrect. > > Mika > > Tony Hosking writes: >> I suspect part of the overhead of allocation in the new code is the >> need for thread-local allocation buffers, which means we need to >> access thread-local state. We really need an efficient way to do >> that, but pthreads thread-local accesses may be what is killing you. >> >> On 17 Oct 2008, at 00:30, Mika Nystrom wrote: >> >>> Hi Tony, >>> >>> I figured you would chime in! >>> >>> Yes, @M3noincremental seems to make things consistently a tad bit >>> slower (but a very small difference), on both FreeBSD and Linux. >>> @M3nogc makes a bigger difference, of course. >>> >>> Unfortunately I seem to have lost the code that did a lot of memory >>> allocations. My tricks (as described in the email---and others!) >>> have removed most of the troublesome memory allocations, but now >>> I'm stuck with the mutex instead... >>> >>> Mika >>> >>> Tony Hosking writes: >>>> Have you tried running @M3noincremental? >>>> >>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>>> >>>>> Hello Modula-3 people, >>>>> >>>>> As I mentioned in an earlier email about printing structures >>>>> (thanks >>>>> Darko), I'm in the midst of coding an interpreter embedded in >>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>>> Norvig's >>>>> JScheme for Java (well it was at first strongly based, but more >>>>> and >>>>> more loosely, if you know what I mean...) >>>>> >>>>> I expected that the performance of the interpreter would be much >>>>> better in Modula-3 than in Java, and I have been testing on two >>>>> different systems. One is my ancient FreeBSD-4.11 with an old >>>>> PM3, >>>>> and the other is CM3 on a recent Debian system. What I am finding >>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>>> (getting >>>>> close to ten times as fast on some tasks at this point), but on >>>>> Linux/CM3 it is much closer in speed to JScheme than I would like. >>>>> >>>>> When I started, with code that was essentially equivalent to >>>>> JScheme, >>>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>>> spend most of its time in (surprise, surprise!) memory allocation >>>>> and garbage collection. The speedup I have achieved between the >>>>> first implementation and now was due to the use of Modula-3 >>>>> constructs >>>>> that are superior to Java's, such as the use of arrays of RECORDs >>>>> to make small stacks rather than linked lists. (I get readable >>>>> code with much fewer memory allocations and GC work.) >>>>> >>>>> Now, since this is an interpreter, I as the implementer have >>>>> limited >>>>> control over how much memory is allocated and freed, and where >>>>> it is >>>>> needed. However, I can sometimes fall back on C-style memory >>>>> management, >>>>> but I would like to do it in a safe way. For instance, I have >>>>> special-cased >>>>> evaluation of Scheme primitives, as follows. >>>>> >>>>> Under the "normal" implementation, a list of things to evaluate is >>>>> built up, passed to an evaluation function, and then the GC is >>>>> left >>>>> to sweep up the mess. The problem is that there are various >>>>> tricky >>>> routes by which references can escape the evaluator, so you can't >>>>> just assume that what you put in is going to be dead right after >>>>> an eval and free it. Instead, I set a flag in the evaluator, >>>>> which >>>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>>> it's unclear (in which case the problem is left up to the GC). >>>>> >>>>> For the vast majority of Scheme primitives, one can indeed free >>>>> the >>>>> list right after the eval. Now of course I am not interested >>>>> in unsafe code, so what I do is this: >>>>> >>>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>>> >>>>> VAR >>>>> mu := NEW(MUTEX); >>>>> free : Pair := NIL; >>>>> >>>>> PROCEDURE GetPair() : Pair = >>>>> BEGIN >>>>> LOCK mu DO >>>>> IF free # NIL THEN >>>>> TRY >>>>> RETURN free >>>>> FINALLY >>>>> free := free.rest >>>>> END >>>>> END >>>>> END; >>>>> RETURN NEW(Pair) >>>>> END GetPair; >>>>> >>>>> PROCEDURE ReturnPair(cons : Pair) = >>>>> BEGIN >>>>> cons.first := NIL; >>>>> LOCK mu DO >>>>> cons.rest := free; >>>>> free := cons >>>>> END >>>>> END ReturnPair; >>>>> >>>>> my eval code looks like >>>>> >>>>> VAR okToFree : BOOLEAN; BEGIN >>>>> >>>>> args := GetPair(); ... >>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>>> >>>>> IF okToFree THEN ReturnPair(args) END; >>>>> RETURN result >>>>> END >>>>> >>>>> and this does work well. In fact it speeds up the Linux >>>>> implementation >>>>> by almost 100% to recycle the lists like this *just* for the >>>>> evaluation of Scheme primitives. >>>>> >>>>> But it's still ugly, isn't it? There's a mutex, and a global >>>>> variable. And yes, the time spent messing with the mutex is >>>>> noticeable, and I haven't even made the code multi-threaded yet >>>>> (and that is coming!) >>>>> >>>>> So I'm thinking, what I really want is a structure that is >>>>> attached >>>>> to my current Thread.T. I want to be able to access just a single >>>>> pointer (like the free list) but be sure it is unique to my >>>>> current >>>>> thread. No locking would be necessary if I could do this. >>>>> >>>>> Does anyone have an elegant solution that does something like >>>>> this? >>>>> Thread-specific "static" variables? Just one REFANY would be >>>>> enough >>>>> for a lot of uses... seems to me this should be a frequently >>>>> occurring problem? >>>>> >>>>> Best regards, >>>>> Mika >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> From hosking at cs.purdue.edu Tue Oct 21 17:17:24 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 16:17:24 +0100 Subject: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas? In-Reply-To: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu> References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu> <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu> Message-ID: <1396C14A-B23D-4D19-804B-B1627B44106F@cs.purdue.edu> Also, turn off assertions. On 21 Oct 2008, at 15:54, Tony Hosking wrote: > I have one more question that I forgot to ask before. Did you > evaluate performance with -O3 optimization in the backend? > > Generally, I have the following in my m3_backend specs so that > turning on optimization results in -O3 (and lots of lovely inlining): > > proc m3_backend (source, object, optimize, debug) is > local args = > [ > "-m32", > "-quiet", > source, > "-o", > object, > % fPIC really is needed here, despite man gcc saying it is the > default. > % This is because man gcc is about Apple's gcc but m3cg is > % built from FSF source. > "-fPIC", > "-fno-reorder-blocks" > ] > if optimize args += "-O3" end > if debug args += "-gstabs" end > if M3_PROFILING args += "-p" end > return try_exec (m3back, args) > end > > > On 17 Oct 2008, at 09:32, Mika Nystrom wrote: > >> Ok I am sorry I am slow to pick up on this. >> >> I take it the problem is actually the Upthread.getspecific routine, >> which itself calls something get_curthread somewhere inside pthreads, >> which in turn involves a context switch to the supervisor---the >> identity >> of the current thread is just not accessible anywhere in user space. >> Also explains why this program runs faster with my old PM3, which >> uses >> longjmp threads. >> >> The only way to avoid it (really) is to pass a pointer to the >> Thread.T of the currently executing thread in the activation record >> of *every* procedure, so that allocators can find it when >> necessary.... >> but that is very expensive in terms of stack memory. >> >> Or I can just make a structure like that that I pass around where >> I need it in my own program. Thread-specific and user-managed. >> >> I believe I have just answered all my own questions, but I hope >> Tony will correct me if my answers are incorrect. >> >> Mika >> >> Tony Hosking writes: >>> I suspect part of the overhead of allocation in the new code is the >>> need for thread-local allocation buffers, which means we need to >>> access thread-local state. We really need an efficient way to do >>> that, but pthreads thread-local accesses may be what is killing you. >>> >>> On 17 Oct 2008, at 00:30, Mika Nystrom wrote: >>> >>>> Hi Tony, >>>> >>>> I figured you would chime in! >>>> >>>> Yes, @M3noincremental seems to make things consistently a tad bit >>>> slower (but a very small difference), on both FreeBSD and Linux. >>>> @M3nogc makes a bigger difference, of course. >>>> >>>> Unfortunately I seem to have lost the code that did a lot of memory >>>> allocations. My tricks (as described in the email---and others!) >>>> have removed most of the troublesome memory allocations, but now >>>> I'm stuck with the mutex instead... >>>> >>>> Mika >>>> >>>> Tony Hosking writes: >>>>> Have you tried running @M3noincremental? >>>>> >>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote: >>>>> >>>>>> Hello Modula-3 people, >>>>>> >>>>>> As I mentioned in an earlier email about printing structures >>>>>> (thanks >>>>>> Darko), I'm in the midst of coding an interpreter embedded in >>>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter >>>>>> Norvig's >>>>>> JScheme for Java (well it was at first strongly based, but more >>>>>> and >>>>>> more loosely, if you know what I mean...) >>>>>> >>>>>> I expected that the performance of the interpreter would be much >>>>>> better in Modula-3 than in Java, and I have been testing on two >>>>>> different systems. One is my ancient FreeBSD-4.11 with an old >>>>>> PM3, >>>>>> and the other is CM3 on a recent Debian system. What I am >>>>>> finding >>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 >>>>>> (getting >>>>>> close to ten times as fast on some tasks at this point), but on >>>>>> Linux/CM3 it is much closer in speed to JScheme than I would >>>>>> like. >>>>>> >>>>>> When I started, with code that was essentially equivalent to >>>>>> JScheme, >>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and >>>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to >>>>>> spend most of its time in (surprise, surprise!) memory allocation >>>>>> and garbage collection. The speedup I have achieved between the >>>>>> first implementation and now was due to the use of Modula-3 >>>>>> constructs >>>>>> that are superior to Java's, such as the use of arrays of RECORDs >>>>>> to make small stacks rather than linked lists. (I get readable >>>>>> code with much fewer memory allocations and GC work.) >>>>>> >>>>>> Now, since this is an interpreter, I as the implementer have >>>>>> limited >>>>>> control over how much memory is allocated and freed, and where >>>>>> it is >>>>>> needed. However, I can sometimes fall back on C-style memory >>>>>> management, >>>>>> but I would like to do it in a safe way. For instance, I have >>>>>> special-cased >>>>>> evaluation of Scheme primitives, as follows. >>>>>> >>>>>> Under the "normal" implementation, a list of things to evaluate >>>>>> is >>>>>> built up, passed to an evaluation function, and then the GC is >>>>>> left >>>>>> to sweep up the mess. The problem is that there are various >>>>>> tricky >>>>> routes by which references can escape the evaluator, so you can't >>>>>> just assume that what you put in is going to be dead right after >>>>>> an eval and free it. Instead, I set a flag in the evaluator, >>>>>> which >>>>>> is TRUE if it is OK to free the list after the eval and FALSE if >>>>>> it's unclear (in which case the problem is left up to the GC). >>>>>> >>>>>> For the vast majority of Scheme primitives, one can indeed free >>>>>> the >>>>>> list right after the eval. Now of course I am not interested >>>>>> in unsafe code, so what I do is this: >>>>>> >>>>>> TYPE Pair = OBJECT first, rest : REFANY; END; >>>>>> >>>>>> VAR >>>>>> mu := NEW(MUTEX); >>>>>> free : Pair := NIL; >>>>>> >>>>>> PROCEDURE GetPair() : Pair = >>>>>> BEGIN >>>>>> LOCK mu DO >>>>>> IF free # NIL THEN >>>>>> TRY >>>>>> RETURN free >>>>>> FINALLY >>>>>> free := free.rest >>>>>> END >>>>>> END >>>>>> END; >>>>>> RETURN NEW(Pair) >>>>>> END GetPair; >>>>>> >>>>>> PROCEDURE ReturnPair(cons : Pair) = >>>>>> BEGIN >>>>>> cons.first := NIL; >>>>>> LOCK mu DO >>>>>> cons.rest := free; >>>>>> free := cons >>>>>> END >>>>>> END ReturnPair; >>>>>> >>>>>> my eval code looks like >>>>>> >>>>>> VAR okToFree : BOOLEAN; BEGIN >>>>>> >>>>>> args := GetPair(); ... >>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree); >>>>>> >>>>>> IF okToFree THEN ReturnPair(args) END; >>>>>> RETURN result >>>>>> END >>>>>> >>>>>> and this does work well. In fact it speeds up the Linux >>>>>> implementation >>>>>> by almost 100% to recycle the lists like this *just* for the >>>>>> evaluation of Scheme primitives. >>>>>> >>>>>> But it's still ugly, isn't it? There's a mutex, and a global >>>>>> variable. And yes, the time spent messing with the mutex is >>>>>> noticeable, and I haven't even made the code multi-threaded yet >>>>>> (and that is coming!) >>>>>> >>>>>> So I'm thinking, what I really want is a structure that is >>>>>> attached >>>>>> to my current Thread.T. I want to be able to access just a >>>>>> single >>>>>> pointer (like the free list) but be sure it is unique to my >>>>>> current >>>>>> thread. No locking would be necessary if I could do this. >>>>>> >>>>>> Does anyone have an elegant solution that does something like >>>>>> this? >>>>>> Thread-specific "static" variables? Just one REFANY would be >>>>>> enough >>>>>> for a lot of uses... seems to me this should be a frequently >>>>>> occurring problem? >>>>>> >>>>>> Best regards, >>>>>> Mika >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> > From mika at async.caltech.edu Tue Oct 21 22:18:07 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 21 Oct 2008 13:18:07 -0700 Subject: [M3devel] CM3 on Mac OS X Tiger In-Reply-To: Your message of "Tue, 21 Oct 2008 12:21:36 BST." <27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu> Message-ID: <200810212018.m9LKI81o019865@camembert.async.caltech.edu> Hi Tony, Thanks for helping, as usual! I ran into this now, is this also a bootstrapping problem? (Moving on to building libm3, cleared out existing PPC_DARWIN, have rebuilt m3cc... only see a single version of Compiler.i3 anywhere...) Here's the log: [lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship --- building in PPC_DARWIN --- ignoring ../src/m3overrides new source -> compiling Atom.i3 new source -> compiling AtomList.i3 new source -> compiling OSError.i3 new source -> compiling File.i3 new source -> compiling RegularFile.i3 new source -> compiling Pipe.i3 new source -> compiling TextSeq.i3 new source -> compiling Pathname.i3 new source -> compiling FS.i3 new source -> compiling Process.i3 new source -> compiling Socket.i3 new source -> compiling Terminal.i3 new source -> compiling FS.m3 new source -> compiling Terminal.m3 new source -> compiling RegularFile.m3 new source -> compiling Pipe.m3 new source -> compiling Socket.m3 new source -> compiling OSConfig.i3 new source -> compiling OSErrorPosix.i3 new source -> compiling Fmt.i3 new source -> compiling OSErrorPosix.m3 new source -> compiling FilePosix.i3 new source -> compiling FilePosix.m3 new source -> compiling FSPosix.m3 new source -> compiling PipePosix.m3 new source -> compiling PathnamePosix.m3 new source -> compiling SocketPosix.m3 Fatal Error: bad version stamps: SocketPosix.m3 version stamp mismatch: Compiler.Platform => SocketPosix.m3 => Compiler.i3 version stamp mismatch: Compiler.ThisPlatform <8b5a6f513e082750> => SocketPosix.m3 <8e110d4fed998051> => Compiler.i3 [lapdog:~/cm3/m3-libs/libm3] mika% Tony Hosking writes: >This is a phase ordering problem that arises when you use an old >compiler to compile newer sources. It really should be fixed >somehow. In any case, the problem is those lines in RTCollector at >the bottom (I deleted them yesterday on the main trunk) that refer to >values supposedly built in to the compiler (which are not there for >the old binary you are using). I think if you delete those lines then >you should be OK. Once you have a new compiler bootstrapped (with >those configuration values available built in) then you should be able >to compile that code (excepting that I just deleted those lines >yesterday). > > >On 21 Oct 2008, at 12:05, Mika Nystrom wrote: > >> Hello everyone, >> >> Sorry if I have asked this before---I feel I must have, and Tony >> probably answered it, too, but I can't find it anywhere in my email >> archives. >> >> It looks like I finally upgraded my Mac to Tiger a half year ago, >> and everything broke. (Modula-3, emacs, make, etc etc etc etc.) >> I am finally getting around to fixing it. Now I am trying to >> compile CM3 in accordance with Tony's instructions as of June 24, >> 2007: >> >> (short quote here) >>> cd ~/cm3-cvs >>> mkdir boot >>> cd boot >>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz >>> ./cminstall >> >> Now you will have some kind of cm3 installed, presumably in /usr/ >> local/cm3/bin/cm3. >> >> Make sure you have a fresh CVS checkout in directory cm3 (let's >> assume this is in your home directory ~/cm3). Also, make sure you >> have an up-to-date version of the CM3 backend compiler cm3cg >> installed by executing the following: >> >> STEP 0: >> >> export CM3=/usr/local/cm3/bin/cm3 >> cd ~/cm3/m3-sys/m3cc >> $CM3 >> $CM3 -ship >> >> You can skip this last step if you know your backend compiler is up >> to date. >> >> Now, let's build the new compiler from scratch (this is the sequence >> I use regularly to test changes to the run-time system whenever I >> make them): >> >> STEP 1: >> >> cd ~/cm3/m3-libs/m3core >> $CM3 >> $CM3 -ship >> (end short quote, there's much more) >> >> What happens is that when building m3core, my compiler is building >> it against the interfaces in /usr/local/cm3, NOT the interfaces >> within m3core itself: >> >> --- building in PPC_DARWIN --- >> >> ignoring ../src/m3overrides >> >> new source -> compiling RTCollector.m3 >> "../src/runtime/common/RTCollector.m3", line 2914: unknown >> qualification '.' (AMD64_LINUX) >> "../src/runtime/common/RTCollector.m3", line 2915: unknown >> qualification '.' (SPARC32_LINUX) >> "../src/runtime/common/RTCollector.m3", line 2916: unknown >> qualification '.' (SPARC64_OPENBSD) >> "../src/runtime/common/RTCollector.m3", line 2917: unknown >> qualification '.' (PPC32_OPENBSD) >> 4 errors encountered >> stale imports -> compiling RTDebug.m3 >> >> Fatal Error: bad version stamps: RTDebug.m3 >> >> version stamp mismatch: Compiler.Platform >> => RTDebug.m3 >> => Compiler.i3 >> version stamp mismatch: Compiler.ThisPlatform >> <8b5a6f513e082750> => RTDebug.m3 >> <8e110d4fed998051> => Compiler.i3 >> >> I feel like I should REALLY know the answer to this, but how do I >> get the compiler to use only the local sources and not attempt >> to compile things with reference to the already-installed >> interfaces? >> >> Mika From hosking at cs.purdue.edu Tue Oct 21 23:29:07 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Tue, 21 Oct 2008 22:29:07 +0100 Subject: [M3devel] CM3 on Mac OS X Tiger In-Reply-To: <200810212018.m9LKI81o019865@camembert.async.caltech.edu> References: <200810212018.m9LKI81o019865@camembert.async.caltech.edu> Message-ID: Hmm. Not sure. Looks like it. On 21 Oct 2008, at 21:18, Mika Nystrom wrote: > Hi Tony, > > Thanks for helping, as usual! > > I ran into this now, is this also a bootstrapping problem? (Moving > on to building libm3, cleared out existing PPC_DARWIN, have rebuilt > m3cc... only see a single version of Compiler.i3 anywhere...) > > Here's the log: > > [lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship > --- building in PPC_DARWIN --- > > ignoring ../src/m3overrides > > new source -> compiling Atom.i3 > new source -> compiling AtomList.i3 > new source -> compiling OSError.i3 > new source -> compiling File.i3 > new source -> compiling RegularFile.i3 > new source -> compiling Pipe.i3 > new source -> compiling TextSeq.i3 > new source -> compiling Pathname.i3 > new source -> compiling FS.i3 > new source -> compiling Process.i3 > new source -> compiling Socket.i3 > new source -> compiling Terminal.i3 > new source -> compiling FS.m3 > new source -> compiling Terminal.m3 > new source -> compiling RegularFile.m3 > new source -> compiling Pipe.m3 > new source -> compiling Socket.m3 > new source -> compiling OSConfig.i3 > new source -> compiling OSErrorPosix.i3 > new source -> compiling Fmt.i3 > new source -> compiling OSErrorPosix.m3 > new source -> compiling FilePosix.i3 > new source -> compiling FilePosix.m3 > new source -> compiling FSPosix.m3 > new source -> compiling PipePosix.m3 > new source -> compiling PathnamePosix.m3 > new source -> compiling SocketPosix.m3 > > Fatal Error: bad version stamps: SocketPosix.m3 > > version stamp mismatch: Compiler.Platform > => SocketPosix.m3 > => Compiler.i3 > version stamp mismatch: Compiler.ThisPlatform > <8b5a6f513e082750> => SocketPosix.m3 > <8e110d4fed998051> => Compiler.i3 > [lapdog:~/cm3/m3-libs/libm3] mika% > > Tony Hosking writes: >> This is a phase ordering problem that arises when you use an old >> compiler to compile newer sources. It really should be fixed >> somehow. In any case, the problem is those lines in RTCollector at >> the bottom (I deleted them yesterday on the main trunk) that refer to >> values supposedly built in to the compiler (which are not there for >> the old binary you are using). I think if you delete those lines >> then >> you should be OK. Once you have a new compiler bootstrapped (with >> those configuration values available built in) then you should be >> able >> to compile that code (excepting that I just deleted those lines >> yesterday). >> >> >> On 21 Oct 2008, at 12:05, Mika Nystrom wrote: >> >>> Hello everyone, >>> >>> Sorry if I have asked this before---I feel I must have, and Tony >>> probably answered it, too, but I can't find it anywhere in my email >>> archives. >>> >>> It looks like I finally upgraded my Mac to Tiger a half year ago, >>> and everything broke. (Modula-3, emacs, make, etc etc etc etc.) >>> I am finally getting around to fixing it. Now I am trying to >>> compile CM3 in accordance with Tony's instructions as of June 24, >>> 2007: >>> >>> (short quote here) >>>> cd ~/cm3-cvs >>>> mkdir boot >>>> cd boot >>>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz >>>> ./cminstall >>> >>> Now you will have some kind of cm3 installed, presumably in /usr/ >>> local/cm3/bin/cm3. >>> >>> Make sure you have a fresh CVS checkout in directory cm3 (let's >>> assume this is in your home directory ~/cm3). Also, make sure you >>> have an up-to-date version of the CM3 backend compiler cm3cg >>> installed by executing the following: >>> >>> STEP 0: >>> >>> export CM3=/usr/local/cm3/bin/cm3 >>> cd ~/cm3/m3-sys/m3cc >>> $CM3 >>> $CM3 -ship >>> >>> You can skip this last step if you know your backend compiler is up >>> to date. >>> >>> Now, let's build the new compiler from scratch (this is the sequence >>> I use regularly to test changes to the run-time system whenever I >>> make them): >>> >>> STEP 1: >>> >>> cd ~/cm3/m3-libs/m3core >>> $CM3 >>> $CM3 -ship >>> (end short quote, there's much more) >>> >>> What happens is that when building m3core, my compiler is building >>> it against the interfaces in /usr/local/cm3, NOT the interfaces >>> within m3core itself: >>> >>> --- building in PPC_DARWIN --- >>> >>> ignoring ../src/m3overrides >>> >>> new source -> compiling RTCollector.m3 >>> "../src/runtime/common/RTCollector.m3", line 2914: unknown >>> qualification '.' (AMD64_LINUX) >>> "../src/runtime/common/RTCollector.m3", line 2915: unknown >>> qualification '.' (SPARC32_LINUX) >>> "../src/runtime/common/RTCollector.m3", line 2916: unknown >>> qualification '.' (SPARC64_OPENBSD) >>> "../src/runtime/common/RTCollector.m3", line 2917: unknown >>> qualification '.' (PPC32_OPENBSD) >>> 4 errors encountered >>> stale imports -> compiling RTDebug.m3 >>> >>> Fatal Error: bad version stamps: RTDebug.m3 >>> >>> version stamp mismatch: Compiler.Platform >>> => RTDebug.m3 >>> => Compiler.i3 >>> version stamp mismatch: Compiler.ThisPlatform >>> <8b5a6f513e082750> => RTDebug.m3 >>> <8e110d4fed998051> => Compiler.i3 >>> >>> I feel like I should REALLY know the answer to this, but how do I >>> get the compiler to use only the local sources and not attempt >>> to compile things with reference to the already-installed >>> interfaces? >>> >>> Mika From mika at async.caltech.edu Thu Oct 23 10:24:53 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 23 Oct 2008 01:24:53 -0700 Subject: [M3devel] NEW in RTType.m3 Message-ID: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu> Hello Modula-3 people, Does anyone know whether there is anything that prevents using NEW in RTType.m3? I added a lot of memory recycling to the Scheme interpreter I am working on, and now it seems it is spending a lot of time in Typecase and IsSubtype. I was wondering if it is possible to memoize IsSubtype inside RTType.m3... (specifically just replacing IsSubtype with an array lookup). It is the nature of the interpreter that it spends a lot of time checking types and narrowing things back and forth, as Scheme and Modula-3 references share the same representation. Mika From hosking at cs.purdue.edu Thu Oct 23 12:10:01 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Thu, 23 Oct 2008 11:10:01 +0100 Subject: [M3devel] NEW in RTType.m3 In-Reply-To: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu> References: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu> Message-ID: <7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu> Could be dangerous depending on module link orderings. Might be better to cache your own lookups in your interpreter. On 23 Oct 2008, at 09:24, Mika Nystrom wrote: > Hello Modula-3 people, > > Does anyone know whether there is anything that prevents using NEW > in RTType.m3? > > I added a lot of memory recycling to the Scheme interpreter I am > working on, and now it seems it is spending a lot of time in Typecase > and IsSubtype. I was wondering if it is possible to memoize IsSubtype > inside RTType.m3... (specifically just replacing IsSubtype with an > array lookup). > > It is the nature of the interpreter that it spends a lot of time > checking types and narrowing things back and forth, as Scheme and > Modula-3 references share the same representation. > > Mika From mika at async.caltech.edu Thu Oct 23 19:29:50 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 23 Oct 2008 10:29:50 -0700 Subject: [M3devel] NEW in RTType.m3 In-Reply-To: Your message of "Thu, 23 Oct 2008 11:10:01 BST." <7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu> Message-ID: <200810231729.m9NHToMC080136@camembert.async.caltech.edu> Well I'm not calling Typecase and IsSubtype directly---the compiler is inserting the calls. Here's an example of my code: 170 IF x # NIL AND ISTYPE(x,Symbol) THEN 171 RETURN env.lookup(x) 172 ELSIF x = NIL OR NOT ISTYPE(x,Pair) THEN 173 RETURN x 174 ELSE this code actually winds up in here (RTType.m3): PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN = VAR t: RT0.TypeDefn; BEGIN IF (a = RT0.NilTypecode) THEN RETURN TRUE END; t := Get (a); IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE END; WHILE (t.kind = ORD (TK.Obj)) DO IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END; t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent; IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE; END; END; IF (t.traced # 0) THEN RETURN (b = RT0.RefanyTypecode); ELSE RETURN (b = RT0.AddressTypecode); END; END IsSubtype; Again this is an example of something where the CM3 code seems to be hurting more than PM3, but it could be that for some reason I have more visibility into the CM3 code, or that there's an optimization difference (I haven't been able to investigate this fully yet). In any case, it's clear that if IsSubtype could be replaced with a table lookup, this kind of code would be accelerated by potentially a lot. Note that while in the above example the code might be accelerated by (in my opinion, less clear) use of TYPECODE (since I never subtype Symbol or Pair---for now!), this is not so for some NARROWs. The NARROWs also wind up calling RTType.IsSubtype, and they arise because I have types that depend on each other, and unless I want to introduce extra complexity (new partial revelations) or stick everything in the same interface, I am forced to NARROW something to avoid a circular dependency of interfaces... A method of A.T takes a B.T and a method of B.T takes an A.T, so I make a supertype X.T s.t. A.T <: X.T ; then I can declare B.T.m to take an X.T and NARROW it to A.T within B.T.m... triggering a call to the above code. (For simplicity's sake, X.T could be REFANY or ROOT.) An attempt to declare B.T.m as taking A.T would lead to a circular dependency between A and B. The code is really rather simple and it's a shame if you have to make it look much more complicated to avoid issues like these which might equally well be solved by tweaking the runtime implementation a bit. Mika Tony Hosking writes: >Could be dangerous depending on module link orderings. Might be >better to cache your own lookups in your interpreter. > >On 23 Oct 2008, at 09:24, Mika Nystrom wrote: > >> Hello Modula-3 people, >> >> Does anyone know whether there is anything that prevents using NEW >> in RTType.m3? >> >> I added a lot of memory recycling to the Scheme interpreter I am >> working on, and now it seems it is spending a lot of time in Typecase >> and IsSubtype. I was wondering if it is possible to memoize IsSubtype >> inside RTType.m3... (specifically just replacing IsSubtype with an >> array lookup). >> >> It is the nature of the interpreter that it spends a lot of time >> checking types and narrowing things back and forth, as Scheme and >> Modula-3 references share the same representation. >> >> Mika From mika at async.caltech.edu Sat Oct 25 05:16:56 2008 From: mika at async.caltech.edu (Mika Nystrom) Date: Fri, 24 Oct 2008 20:16:56 -0700 Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3 Message-ID: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu> Dear Modula-3 people, I had a crash in my program from a range error that I believe shouldn't have happened the way it did, although it's not in my code, so I'm not sure if there's a reason for the way it's done (matching a C declaration somewhere, maybe??). Here it is, from ThreadPosix.m3: PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN; timeoutInterval: LONGREAL := -1.0D0): WaitResult = <*FATAL Alerted*> BEGIN self.alertable := FALSE; RETURN XIOWait(fd, read, timeoutInterval); END IOWait; PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN; timeoutInterval: LONGREAL := -1.0D0): WaitResult RAISES {Alerted} = BEGIN self.alertable := TRUE; RETURN XIOWait(fd, read, timeoutInterval); END IOAlertWait; PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL): WaitResult RAISES {Alerted} = VAR res: INTEGER; fdindex := fd DIV FDSetSize; fdset := FDSet{fd MOD FDSetSize}; ... rest omitted ... Note that IOWait calls XIOWait. IOWait is declared as taking an INTEGER, but XIOWait takes a CARDINAL. So I really should use a CARDINAL in passing to IOWait, but since IOWait is the interface function it's not clear that I should do that (until my program crashes after passing -1 from some carelessly wrapped C code). I don't like the fact that I get a range error *inside* the library when it appears unnecessary---it should have happened in my code, as I make the call. Suggested improvement: declare all the FDs in SchedulerPosix.i3 (the interface that exports these routines) to be CARDINAL instead of INTEGER. Mika From hosking at cs.purdue.edu Mon Oct 27 15:28:52 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Mon, 27 Oct 2008 14:28:52 +0000 Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3 In-Reply-To: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu> References: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu> Message-ID: <5232F2E4-3B0E-49E5-B1C8-BB4D04C60C33@cs.purdue.edu> Sounds fair to me. On 25 Oct 2008, at 04:16, Mika Nystrom wrote: > > Dear Modula-3 people, > > I had a crash in my program from a range error that I believe > shouldn't have happened the way it did, although it's not in my > code, so I'm not sure if there's a reason for the way it's done > (matching > a C declaration somewhere, maybe??). > > Here it is, from ThreadPosix.m3: > > PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN; > timeoutInterval: LONGREAL := -1.0D0): WaitResult = > <*FATAL Alerted*> > BEGIN > self.alertable := FALSE; > RETURN XIOWait(fd, read, timeoutInterval); > END IOWait; > > PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN; > timeoutInterval: LONGREAL := -1.0D0): WaitResult > RAISES {Alerted} = > BEGIN > self.alertable := TRUE; > RETURN XIOWait(fd, read, timeoutInterval); > END IOAlertWait; > > PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL): > WaitResult > RAISES {Alerted} = > VAR res: INTEGER; > fdindex := fd DIV FDSetSize; > fdset := FDSet{fd MOD FDSetSize}; > ... rest omitted ... > > Note that IOWait calls XIOWait. IOWait is declared as taking an > INTEGER, but XIOWait takes a CARDINAL. > > So I really should use a CARDINAL in passing to IOWait, but since > IOWait is the interface function it's not clear that I should do > that (until my program crashes after passing -1 from some carelessly > wrapped C code). I don't like the fact that I get a range error > *inside* the library when it appears unnecessary---it should have > happened in my code, as I make the call. > > Suggested improvement: declare all the FDs in SchedulerPosix.i3 > (the interface that exports these routines) to be CARDINAL instead > of INTEGER. > > Mika From jay.krell at cornell.edu Thu Oct 30 22:21:09 2008 From: jay.krell at cornell.edu (Jay) Date: Thu, 30 Oct 2008 21:21:09 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: Please try this: http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 std failed to build because stubgen crashed, probably due to gc. cm3 does crash right away without @M3nogc. Something like this: cd /src wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 cd /cm3 rm -rf * tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 cd /src/cm3/scripts/python ./do-cm3-all.py realclean ./upgrade.py ./do-cm3-all.py realclean ./do-cm3-std.py buildship => it will fail, at zeus, but it should get far; you'll also need some X devel packages to get that far, I had a failure for lack of libXaw for example. I did not run anything, any of the GUI packages, but building itself with itself is a decent test. I renamed the old AMD64_LINUX archives to "1.0.0". http://www.opencm3.com/uploaded-archives/ This has the bug fix I commited last night to cm3cg, and therefore a 64 bit hosted cm3cg. jay at amd64a:/cm3/bin$ file * AMD64_LINUX: ASCII text cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped cm3.cfg: ASCII English text cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped Unix.common: ASCII English text Built on Debian 4.0r4 (r5 is out). jay at amd64a:/cm3/bin$ uname -a Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64 GNU/Linux jay at amd64a:/cm3/bin$ dmesg | head Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805) Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org) ( gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Tue Aug 19 04:30:56 UTC 2008 Though really I couldn't do it without Visual C++ on Windows providing excellent find-in-files and editing, nothing else comes close, I edit on Windows and scp the files over. :) - Jay ________________________________ From: jay.krell at cornell.edu To: dragisha at m3w.org; m3devel at elegosoft.com Date: Tue, 9 Sep 2008 09:43:03 +0000 Subject: Re: [M3devel] AMD64_LINUX status From hosking at cs.purdue.edu Fri Oct 31 11:19:51 2008 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 31 Oct 2008 10:19:51 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: Umm, I think I found your bug with GC: Check out "RTMachine.PointerAlignment". You have it set to BITSIZE(INTEGER). I suspect what you want is something like BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should probably be 2*BYTESIZE(ADDRESS). On 30 Oct 2008, at 21:21, Jay wrote: > > Please try this: > > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > > std failed to build because stubgen crashed, probably due to gc. > cm3 does crash right away without @M3nogc. > > Something like this: > cd /src > wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > cd /cm3 > rm -rf * > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > d5.7.0.tar.bz2 > cd /src/cm3/scripts/python > ./do-cm3-all.py realclean > ./upgrade.py > ./do-cm3-all.py realclean > ./do-cm3-std.py buildship > => it will fail, at zeus, but it should get far; you'll also need > some X devel packages to get that far, I had a failure for lack of > libXaw for example. I did not run anything, any of the GUI packages, > but building itself with itself is a decent test. > > I renamed the old AMD64_LINUX archives to "1.0.0". > http://www.opencm3.com/uploaded-archives/ > > This has the bug fix I commited last night to cm3cg, and therefore a > 64 bit hosted cm3cg. > > jay at amd64a:/cm3/bin$ file * > AMD64_LINUX: ASCII text > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > for GNU/Linux 2.6.0, not stripped > cm3.cfg: ASCII English text > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Li > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > 2.6.0, not stripped > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Li > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > 2.6.0, not stripped > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > for GNU/Linux 2.6.0, not stripped > Unix.common: ASCII English text > > Built on Debian 4.0r4 (r5 is out). > jay at amd64a:/cm3/bin$ uname -a > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > x86_64 GNU/Linux > jay at amd64a:/cm3/bin$ dmesg | head > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805) > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org > ) ( > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > Tue Aug 19 04:30:56 UTC 2008 > > Though really I couldn't do it without Visual C++ on Windows > providing excellent find-in-files and editing, nothing else comes > close, I edit on Windows and scp the files over. :) > > - Jay > > ________________________________ > > From: jay.krell at cornell.edu > To: dragisha at m3w.org; m3devel at elegosoft.com > Date: Tue, 9 Sep 2008 09:43:03 +0000 > Subject: Re: [M3devel] AMD64_LINUX status > > > > From jay.krell at cornell.edu Fri Oct 31 14:52:43 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 31 Oct 2008 13:52:43 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: Tony, Excellent, thanks, that helps. How do you know and confirm the right values? I don't like guessing. And then cause then of :) : SymbolPickling font metrics...Done./cm3/bin/m3bundle -name JunoBundle -F/tmp/qk/cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTABstubgen: Processing RemoteView.T ****** runtime error:*** NEW() was unable to allocate more memory.*** file "../src/runtime/common/RTAllocator.m3", line 285*** "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit 1536: /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB --procedure-- -line- -file---exec -- _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmplnetobjv1 44 /cm3/pkg/netobj/src/netobj.tmplnetobj 64 /cm3/pkg/netobj/src/netobj.tmplinclude_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args I should debug it, and double check that I upgraded what had to be upgraded. - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Fri, 31 Oct 2008 10:19:51 +0000> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD64_LINUX status> > Umm, I think I found your bug with GC:> > Check out "RTMachine.PointerAlignment". You have it set to > BITSIZE(INTEGER). I suspect what you want is something like > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > probably be 2*BYTESIZE(ADDRESS).> > > > On 30 Oct 2008, at 21:21, Jay wrote:> > >> > Please try this:> >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> >> > std failed to build because stubgen crashed, probably due to gc.> > cm3 does crash right away without @M3nogc.> >> > Something like this:> > cd /src> > wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > cd /cm3> > rm -rf *> > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > d5.7.0.tar.bz2> > cd /src/cm3/scripts/python> > ./do-cm3-all.py realclean> > ./upgrade.py> > ./do-cm3-all.py realclean> > ./do-cm3-std.py buildship> > => it will fail, at zeus, but it should get far; you'll also need > > some X devel packages to get that far, I had a failure for lack of > > libXaw for example. I did not run anything, any of the GUI packages, > > but building itself with itself is a decent test.> >> > I renamed the old AMD64_LINUX archives to "1.0.0".> > http://www.opencm3.com/uploaded-archives/> >> > This has the bug fix I commited last night to cm3cg, and therefore a > > 64 bit hosted cm3cg.> >> > jay at amd64a:/cm3/bin$ file *> > AMD64_LINUX: ASCII text> > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > cm3.cfg: ASCII English text> > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > Unix.common: ASCII English text> >> > Built on Debian 4.0r4 (r5 is out).> > jay at amd64a:/cm3/bin$ uname -a> > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > x86_64 GNU/Linux> > jay at amd64a:/cm3/bin$ dmesg | head> > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org > > ) (> > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > Tue Aug 19 04:30:56 UTC 2008> >> > Though really I couldn't do it without Visual C++ on Windows > > providing excellent find-in-files and editing, nothing else comes > > close, I edit on Windows and scp the files over. :)> >> > - Jay> >> > ________________________________> >> > From: jay.krell at cornell.edu> > To: dragisha at m3w.org; m3devel at elegosoft.com> > Date: Tue, 9 Sep 2008 09:43:03 +0000> > Subject: Re: [M3devel] AMD64_LINUX status> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Oct 31 15:25:13 2008 From: jay.krell at cornell.edu (Jay) Date: Fri, 31 Oct 2008 14:25:13 +0000 Subject: [M3devel] AMD64_LINUX status In-Reply-To: <1225462205.14482.60.camel@faramir.m3w.org> References: <1220941880.9421.11.camel@faramir.m3w.org> <1225462205.14482.60.camel@faramir.m3w.org> Message-ID: It seems like there's still a problem. I haven't debugged it yet. (I'm sure glad Tony found the other problem before I debugged it.) I updated http://www.opencm3.com/uploaded-archives with Tony's fix. The older builds are now 0.0.0.1 and 0.0.0.2. - Jay> Subject: Re: [M3devel] AMD64_LINUX status> From: dragisha at m3w.org> To: jay.krell at cornell.edu> CC: hosking at cs.purdue.edu; m3devel at elegosoft.com> Date: Fri, 31 Oct 2008 15:10:05 +0100> > So, we now have fully functional AMD64_LINUX (_with_ GC)?> > TIA> > On Fri, 2008-10-31 at 13:52 +0000, Jay wrote:> > Tony, Excellent, thanks, that helps.> > How do you know and confirm the right values? I don't like guessing.> > > > And then cause then of :) :> > > > Symbol> > Pickling font metrics...> > Done.> > /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk> > /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > stubgen: Processing RemoteView.T> > > > ***> > *** runtime error:> > *** NEW() was unable to allocate more memory.> > *** file "../src/runtime/common/RTAllocator.m3", line 285> > ***> > "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit> > 1536: /cm3> > /bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > --procedure-- -line- -file---> > exec -- > > _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmpl> > netobjv1 44 /cm3/pkg/netobj/src/netobj.tmpl> > netobj 64 /cm3/pkg/netobj/src/netobj.tmpl> > include_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile> > > > 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args> > > > > > I should debug it, and double check that I upgraded what had to be> > upgraded.> > > > - Jay> > > > > > > > > From: hosking at cs.purdue.edu> > > To: jay.krell at cornell.edu> > > Date: Fri, 31 Oct 2008 10:19:51 +0000> > > CC: m3devel at elegosoft.com> > > Subject: Re: [M3devel] AMD64_LINUX status> > > > > > Umm, I think I found your bug with GC:> > > > > > Check out "RTMachine.PointerAlignment". You have it set to > > > BITSIZE(INTEGER). I suspect what you want is something like > > > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > > > probably be 2*BYTESIZE(ADDRESS).> > > > > > > > > > > > On 30 Oct 2008, at 21:21, Jay wrote:> > > > > > >> > > > Please try this:> > > >> > > >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > >> > > > std failed to build because stubgen crashed, probably due to gc.> > > > cm3 does crash right away without @M3nogc.> > > >> > > > Something like this:> > > > cd /src> > > > wget> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > > cd /cm3> > > > rm -rf *> > > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > > > d5.7.0.tar.bz2> > > > cd /src/cm3/scripts/python> > > > ./do-cm3-all.py realclean> > > > ./upgrade.py> > > > ./do-cm3-all.py realclean> > > > ./do-cm3-std.py buildship> > > > => it will fail, at zeus, but it should get far; you'll also need > > > > some X devel packages to get that far, I had a failure for lack> > of > > > > libXaw for example. I did not run anything, any of the GUI> > packages, > > > > but building itself with itself is a decent test.> > > >> > > > I renamed the old AMD64_LINUX archives to "1.0.0".> > > > http://www.opencm3.com/uploaded-archives/> > > >> > > > This has the bug fix I commited last night to cm3cg, and therefore> > a > > > > 64 bit hosted cm3cg.> > > >> > > > jay at amd64a:/cm3/bin$ file *> > > > AMD64_LINUX: ASCII text> > > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > cm3.cfg: ASCII English text> > > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > Unix.common: ASCII English text> > > >> > > > Built on Debian 4.0r4 (r5 is out).> > > > jay at amd64a:/cm3/bin$ uname -a> > > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > > > x86_64 GNU/Linux> > > > jay at amd64a:/cm3/bin$ dmesg | head> > > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2)> > (dannf at debian.org > > > > ) (> > > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > > > Tue Aug 19 04:30:56 UTC 2008> > > >> > > > Though really I couldn't do it without Visual C++ on Windows > > > > providing excellent find-in-files and editing, nothing else comes > > > > close, I edit on Windows and scp the files over. :)> > > >> > > > - Jay> > > >> > > > ________________________________> > > >> > > > From: jay.krell at cornell.edu> > > > To: dragisha at m3w.org; m3devel at elegosoft.com> > > > Date: Tue, 9 Sep 2008 09:43:03 +0000> > > > Subject: Re: [M3devel] AMD64_LINUX status> > > >> > > >> > > >> > > >> > > > > > -- > Dragi?a Duri? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Oct 31 15:10:05 2008 From: dragisha at m3w.org (=?UTF-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 31 Oct 2008 15:10:05 +0100 Subject: [M3devel] AMD64_LINUX status In-Reply-To: References: <1220941880.9421.11.camel@faramir.m3w.org> Message-ID: <1225462205.14482.60.camel@faramir.m3w.org> So, we now have fully functional AMD64_LINUX (_with_ GC)? TIA On Fri, 2008-10-31 at 13:52 +0000, Jay wrote: > Tony, Excellent, thanks, that helps. > How do you know and confirm the right values? I don't like guessing. > > And then cause then of :) : > > Symbol > Pickling font metrics... > Done. > /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk > /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB > stubgen: Processing RemoteView.T > > *** > *** runtime error: > *** NEW() was unable to allocate more memory. > *** file "../src/runtime/common/RTAllocator.m3", line 285 > *** > "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit > 1536: /cm3 > /bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB > --procedure-- -line- -file--- > exec -- > _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmpl > netobjv1 44 /cm3/pkg/netobj/src/netobj.tmpl > netobj 64 /cm3/pkg/netobj/src/netobj.tmpl > include_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile > > 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args > > > I should debug it, and double check that I upgraded what had to be > upgraded. > > - Jay > > > > > From: hosking at cs.purdue.edu > > To: jay.krell at cornell.edu > > Date: Fri, 31 Oct 2008 10:19:51 +0000 > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] AMD64_LINUX status > > > > Umm, I think I found your bug with GC: > > > > Check out "RTMachine.PointerAlignment". You have it set to > > BITSIZE(INTEGER). I suspect what you want is something like > > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > > probably be 2*BYTESIZE(ADDRESS). > > > > > > > > On 30 Oct 2008, at 21:21, Jay wrote: > > > > > > > > Please try this: > > > > > > > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > > > > > > std failed to build because stubgen crashed, probably due to gc. > > > cm3 does crash right away without @M3nogc. > > > > > > Something like this: > > > cd /src > > > wget > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2 > > > cd /cm3 > > > rm -rf * > > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > > d5.7.0.tar.bz2 > > > cd /src/cm3/scripts/python > > > ./do-cm3-all.py realclean > > > ./upgrade.py > > > ./do-cm3-all.py realclean > > > ./do-cm3-std.py buildship > > > => it will fail, at zeus, but it should get far; you'll also need > > > some X devel packages to get that far, I had a failure for lack > of > > > libXaw for example. I did not run anything, any of the GUI > packages, > > > but building itself with itself is a decent test. > > > > > > I renamed the old AMD64_LINUX archives to "1.0.0". > > > http://www.opencm3.com/uploaded-archives/ > > > > > > This has the bug fix I commited last night to cm3cg, and therefore > a > > > 64 bit hosted cm3cg. > > > > > > jay at amd64a:/cm3/bin$ file * > > > AMD64_LINUX: ASCII text > > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared > libs), > > > for GNU/Linux 2.6.0, not stripped > > > cm3.cfg: ASCII English text > > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Li > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > 2.6.0, not stripped > > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Li > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > 2.6.0, not stripped > > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared > libs), > > > for GNU/Linux 2.6.0, not stripped > > > Unix.common: ASCII English text > > > > > > Built on Debian 4.0r4 (r5 is out). > > > jay at amd64a:/cm3/bin$ uname -a > > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > > x86_64 GNU/Linux > > > jay at amd64a:/cm3/bin$ dmesg | head > > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805) > > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) > (dannf at debian.org > > > ) ( > > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > > Tue Aug 19 04:30:56 UTC 2008 > > > > > > Though really I couldn't do it without Visual C++ on Windows > > > providing excellent find-in-files and editing, nothing else comes > > > close, I edit on Windows and scp the files over. :) > > > > > > - Jay > > > > > > ________________________________ > > > > > > From: jay.krell at cornell.edu > > > To: dragisha at m3w.org; m3devel at elegosoft.com > > > Date: Tue, 9 Sep 2008 09:43:03 +0000 > > > Subject: Re: [M3devel] AMD64_LINUX status > > > > > > > > > > > > > > > -- Dragi?a Duri?