From jay.krell at cornell.edu  Wed Oct  1 01:24:14 2008
From: jay.krell at cornell.edu (Jay)
Date: Tue, 30 Sep 2008 23:24:14 +0000
Subject: [M3devel] ARM Darwin
In-Reply-To: <7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org>
References: <F29CC4D9-0043-48B9-84F1-93E9F3336D40@darko.org>
	<5ED8E753-6B9E-4FED-8689-1D3D317A5A36@cs.purdue.edu> 
	<7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org>
Message-ID: <COL101-W3460EC073E17115925F24CE6430@phx.gbl>


Get me a machine and I'll work on it. :)
I'll get one before long but I'm bogged down with existing x86, AMD64, PPC, PPC64 (AIX), Mips (Irix) hardware not yet being used for all its meant..

I suspect Apple hasn't pushed their changes up, so be sure to poke around their gcc source.

> Apple are building their own ARM GCC and use that to configure the
> back end. Then the runtime issues which I imagine might be with the GC

gcc -v ?

> and threading. I'm not sure there will be any native treading and I'm
> sure VM will look very different.

I assume it'll look like most any Posix or *_DARWIN or 32bit thereof system.
I assume it has pthreads.

 - Jay


> From: darko at darko.org
> To: hosking at cs.purdue.edu
> Date: Tue, 30 Sep 2008 14:59:39 +0200
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] ARM Darwin
>
> Thanks, it should be a bit easier than the normal process since the
> compiler doesn't have to be fully bootstrapped, I just have to get a
> cross working. I know the first thing is to get the machine
> configuration correct, which I'll start when I get my hands on one of
> the machines in a couple of days. The other thing is to work out how
> Apple are building their own ARM GCC and use that to configure the
> back end. Then the runtime issues which I imagine might be with the GC
> and threading. I'm not sure there will be any native treading and I'm
> sure VM will look very different.
>
>
> On 30/09/2008, at 2:44 PM, Tony Hosking wrote:
>
>> I can share tips...
>>
>> On Sep 30, 2008, at 1:41 PM, Darko wrote:
>>
>>> Is anyone interested in working on an ARM port for Darwin? Or maybe
>>> just providing some tips as I give it a try?
>>>
>>> Cheers,
>>> Darko.
>>
>


From jay.krell at cornell.edu  Wed Oct  1 08:41:03 2008
From: jay.krell at cornell.edu (Jay)
Date: Wed, 1 Oct 2008 06:41:03 +0000
Subject: [M3devel] AMD-64 binaries?
In-Reply-To: <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu>
References: <48BDF24B.900@wichita.edu>
	<20080903075804.zhep2ichmow00scg@mail.elegosoft.com>
	<COL101-W839FDBE447569C4D9BACC6E6430@phx.gbl> 
	<30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu>
Message-ID: <COL101-W281BD9E78E32E04348F400E6420@phx.gbl>


No -- you would know best about AMD64_DARWIN.
I'm sure ALPHA_OSF used to work, but it's been so long, I don't think it counts.
 
I'm being lazy.
 
file AMD64_DARWIN/cm3cg
 => fat binary? I doubt it. 
 => with ppc, i386, amd64? (doubt it) 
 => or just ppc, i386?  (doubt it) 
 => or just i386? This is I "suspect".  
 => or just AMD64. This would be somewhat interesting. 
 
I'm pretty sure cm3cg is always 32bit "these days".
I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in the same way.
This was a nice thing to find, that the problem is portable to multiple?all 64 bit hosts.
 
I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same problem.
 
Anyway, I should really get to debugging this soon.
 
It's a bit odd because gcc itself doesn't have this bug and I reviewed a lot of the code and it was ok. I'm just going to have to step through it in parallel on 32bit and 64bit hosts and find where they diverge. A LOT was identical, like the files output by cm3 into cm3cg were identical.
I was close a few months ago but sloughed off.
 
 - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Tue, 30 Sep 2008 10:16:41 +0100> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD-64 binaries?> > 64-bit hosted tools? Do you mean only for Linux? I don't quite > understand what you are saying.> > On Sep 30, 2008, at 9:36 AM, Jay wrote:> > >> > I'm getting back to this now.> > I didn't realize it till this weekend, but that archive is > > "relatively incompatible".> > In particular it has 32bit hosted tools, and won't run on Debian > > 4.0r4 / AMD64.> > Something about glibc 2.4, when all I see on my system is 2.3.> > I'll see what I can do.> > Probably just rebuild cm3cg.> > I think it was built on Fedora, but could have been Ubuntu or > > OpenSuse.> > Probably just that Debian stable lags the others.> >> > The main problem to debug is why 64bit hosted tools "never" work.> > (Right?)> >> >> > Stay tuned for a bunch more ports "soon", I've got a bunch more > > hardware,> > that runs Linux and others (Solaris, AIX, Irix).. :)> >> > I'll be able to debug the high dpi gui problems on a friend's laptop > > soon too.> > Send me a repro. I expect it is trivial -- like anything with a > > scrollbar.> > I can try formsedit, etc.> >> >> > - Jay> >> >> >> Date: Wed, 3 Sep 2008 07:58:04 +0200> >> From: wagner at elegosoft.com> >> To: m3devel at elegosoft.com> >> Subject: Re: [M3devel] AMD-64 binaries?> >>> >> Quoting "Rodney M. Bates" :> >>> >>> Are there binaries for AMD-64 around that can be used> >>> to bootstrap a 64-bit Linux compiler?> >>> >> Have a look at> >>> >> http://www.opencm3.net/uploaded-archives/index.html> >>> >> There are some AMD64 archives; I don't know about their status> >> offhand, though. I think Jay Krell produced them.> >> AFAIK there is no regular build on this platform yet.> >>> >> Olaf> >> --> >> Olaf Wagner -- elego Software Solutions GmbH> >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany> >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 > >> 45 86 95> >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: > >> Berlin> >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: > >> DE163214194> >>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081001/5f46def2/attachment.html>

From jay.krell at cornell.edu  Wed Oct  1 09:02:29 2008
From: jay.krell at cornell.edu (Jay)
Date: Wed, 1 Oct 2008 07:02:29 +0000
Subject: [M3devel] m3cc build fails on older MacOS X
In-Reply-To: <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org>
References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com>
	<COL101-W243B2B91162A39B280C4AFE6430@phx.gbl>
	<CEDFF837-1CFA-4C43-B287-D480AE19B889@cs.purdue.edu> 
	<5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org>
Message-ID: <COL101-W62EB6B61A3107DDC347580E6420@phx.gbl>


well, I agree and disagree.

"Almost everyone" only cares about C++, C#, Windows, and a little bit of Linux and Java.
"Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc.

Supporting 10.2 and 10.3 "ought not" be so difficult, but ok.

I wiped out the install and won't likely come back to it until
a bunch of other things are done.
e.g.:
 debug 64 bit hosted cm3cg 
 move PPC_LINUX to pthreads 
 high dpi 
 bring up or backup a bunch of targets I have hardware for,
  and some others I don't have yet.

Adding back support for NT4/Win9x probably not hard, though
 similar with gcc on Mac, the current Microsoft tools no longer
 target them.

It all gets easier with virtualization..
(Which is easiest on x86/amd64.)

 - Jay


> From: darko at darko.org
> To: hosking at cs.purdue.edu
> Date: Tue, 30 Sep 2008 11:50:43 +0200
> CC: m3devel at elegosoft.com; jay.krell at cornell.edu
> Subject: Re: [M3devel] m3cc build fails on older MacOS X
>
> I think supporting the latest version is enough work. I don't see the
> point of supporting older releases. Also, this seems to be relevant to
> development on that version of the system. Anyone who wants to build
> can upgrade.
>
>
> On 30/09/2008, at 11:15 AM, Tony Hosking wrote:
>
>> Does anyone really care about 10.3 now? As I recall, it had some
>> pretty broken assumptions.
>>
>> On Sep 30, 2008, at 9:25 AM, Jay wrote:
>>
>>>
>>> I have a machine running 10.3 now.
>>>
>>> gcc-4.3.2 (the current release) won't (toplevel) configure on
>>> MacOSX 10.3 apparently because its assembler doesn't support
>>> ".machine".
>>> Current "cctools" won't compile on 10.3 without patches or other
>>> updates, due to mucking with ppc64 stuff, though that is easy to fix.
>>>
>>> A simple wrapper around as for use on 10.3 that strips the .machine
>>> directive is probably reasonable, or a patch to gcc to just not
>>> emit it for Darwin, except maybe for non-ppc, or subject to a switch.
>>>
>>> Other than support for more architectures, I never found any of the
>>> updates beyond 10.2 very interesting.
>>> Though current Firefox and Safari also won't run on 10.3.
>>>
>>> IF I get this working, maybe I'll bring 10.2 back up also..
>>>
>>> - Jay
>>>
>>> ________________________________
>>>
>>> From: jayk123 at hotmail.com
>>> To: wagner at elegosoft.com; m3devel at elegosoft.com
>>> Subject: RE: [M3devel] m3cc build fails on older MacOS X
>>> Date: Tue, 6 May 2008 10:49:11 +0000
>>>
>>>
>>>
>>>
>>> I don't know what these Darwin versions are.
>>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5?
>>> I used to run 10.2 and could perhaps bring it back (though I'd hate
>>> to lose my PPC_LINUX install.. :( )
>>>
>>>> make[2]: Nothing to be done for `all'.
>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>> `patsubst'. Stop.
>>>
>>> Hopefully that's enough context though.
>>>
>>> The rest is a cascade.
>>> What happens if you remove all my m3makefile wierdness (which works
>>> everywhere else..) and just configure and make?
>>>
>>> Can I ssh into this?
>>>
>>> - Jay
>>>
>>>
>>>
>>> ________________________________
>>>
>>>
>>>> Date: Tue, 6 May 2008 07:57:54 +0200
>>>> From: wagner at elegosoft.com
>>>> To: m3devel at elegosoft.com
>>>> Subject: [M3devel] m3cc build fails on older MacOS X
>>>>
>>>> On % uname -a
>>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30
>>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power
>>>> Macintosh powerpc:
>>>>
>>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o
>>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o
>>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./fnmatch.o
>>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./getruntime.o
>>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o
>>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./obstack.o
>>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o
>>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o
>>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o
>>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o
>>>> ./xstrndup.o> required-list
>>>> make[2]: Nothing to be done for `all'.
>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>> `patsubst'. Stop.
>>>> make: *** [all-libcpp] Error 2
>>>> /bin/sh: line 1: cd: gcc: No such file or directory
>>>> make: *** No rule to make target `s-modes'. Stop.
>>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314: quake
>>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2
>>>>
>>>> --procedure-- -line- -file---
>>>> cp_if --
>>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>> 9
>>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args
>>>>
>>>> Fatal Error: package build failed
>>>> ==> m3-sys/m3cc done
>>>>
>>>> Any ideas?
>>>>
>>>> Olaf
>>>> --
>>>> Olaf Wagner -- elego Software Solutions GmbH
>>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany
>>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23
>>>> 45 86 95
>>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz:
>>>> Berlin
>>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr:
>>>> DE163214194
>>>>
>>>
>>
>


From darko at darko.org  Wed Oct  1 09:10:35 2008
From: darko at darko.org (Darko)
Date: Wed, 1 Oct 2008 09:10:35 +0200
Subject: [M3devel] m3cc build fails on older MacOS X
In-Reply-To: <COL101-W62EB6B61A3107DDC347580E6420@phx.gbl>
References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com>
	<COL101-W243B2B91162A39B280C4AFE6430@phx.gbl>
	<CEDFF837-1CFA-4C43-B287-D480AE19B889@cs.purdue.edu>
	<5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org>
	<COL101-W62EB6B61A3107DDC347580E6420@phx.gbl>
Message-ID: <973F196C-4B4A-4526-878C-93942E48E72A@darko.org>

Why bother with it if no one uses it and no-one is going to use it?  
Supporting M3 on Macs is good because people will use it into the  
future. People aren't moving back to 10.3. I wouldn't bother with it  
at all.

On 01/10/2008, at 9:02 AM, Jay wrote:

>
> well, I agree and disagree.
>
> "Almost everyone" only cares about C++, C#, Windows, and a little  
> bit of Linux and Java.
> "Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc.
>
> Supporting 10.2 and 10.3 "ought not" be so difficult, but ok.
>
> I wiped out the install and won't likely come back to it until
> a bunch of other things are done.
> e.g.:
> debug 64 bit hosted cm3cg
> move PPC_LINUX to pthreads
> high dpi
> bring up or backup a bunch of targets I have hardware for,
>  and some others I don't have yet.
>
> Adding back support for NT4/Win9x probably not hard, though
> similar with gcc on Mac, the current Microsoft tools no longer
> target them.
>
> It all gets easier with virtualization..
> (Which is easiest on x86/amd64.)
>
> - Jay
>
>
>
>> From: darko at darko.org
>> To: hosking at cs.purdue.edu
>> Date: Tue, 30 Sep 2008 11:50:43 +0200
>> CC: m3devel at elegosoft.com; jay.krell at cornell.edu
>> Subject: Re: [M3devel] m3cc build fails on older MacOS X
>>
>> I think supporting the latest version is enough work. I don't see the
>> point of supporting older releases. Also, this seems to be relevant  
>> to
>> development on that version of the system. Anyone who wants to build
>> can upgrade.
>>
>>
>> On 30/09/2008, at 11:15 AM, Tony Hosking wrote:
>>
>>> Does anyone really care about 10.3 now? As I recall, it had some
>>> pretty broken assumptions.
>>>
>>> On Sep 30, 2008, at 9:25 AM, Jay wrote:
>>>
>>>>
>>>> I have a machine running 10.3 now.
>>>>
>>>> gcc-4.3.2 (the current release) won't (toplevel) configure on
>>>> MacOSX 10.3 apparently because its assembler doesn't support
>>>> ".machine".
>>>> Current "cctools" won't compile on 10.3 without patches or other
>>>> updates, due to mucking with ppc64 stuff, though that is easy to  
>>>> fix.
>>>>
>>>> A simple wrapper around as for use on 10.3 that strips the .machine
>>>> directive is probably reasonable, or a patch to gcc to just not
>>>> emit it for Darwin, except maybe for non-ppc, or subject to a  
>>>> switch.
>>>>
>>>> Other than support for more architectures, I never found any of the
>>>> updates beyond 10.2 very interesting.
>>>> Though current Firefox and Safari also won't run on 10.3.
>>>>
>>>> IF I get this working, maybe I'll bring 10.2 back up also..
>>>>
>>>> - Jay
>>>>
>>>> ________________________________
>>>>
>>>> From: jayk123 at hotmail.com
>>>> To: wagner at elegosoft.com; m3devel at elegosoft.com
>>>> Subject: RE: [M3devel] m3cc build fails on older MacOS X
>>>> Date: Tue, 6 May 2008 10:49:11 +0000
>>>>
>>>>
>>>>
>>>>
>>>> I don't know what these Darwin versions are.
>>>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5?
>>>> I used to run 10.2 and could perhaps bring it back (though I'd hate
>>>> to lose my PPC_LINUX install.. :( )
>>>>
>>>>> make[2]: Nothing to be done for `all'.
>>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>>> `patsubst'. Stop.
>>>>
>>>> Hopefully that's enough context though.
>>>>
>>>> The rest is a cascade.
>>>> What happens if you remove all my m3makefile wierdness (which works
>>>> everywhere else..) and just configure and make?
>>>>
>>>> Can I ssh into this?
>>>>
>>>> - Jay
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>> Date: Tue, 6 May 2008 07:57:54 +0200
>>>>> From: wagner at elegosoft.com
>>>>> To: m3devel at elegosoft.com
>>>>> Subject: [M3devel] m3cc build fails on older MacOS X
>>>>>
>>>>> On % uname -a
>>>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30
>>>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power
>>>>> Macintosh powerpc:
>>>>>
>>>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o
>>>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o
>>>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./ 
>>>>> fnmatch.o
>>>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./ 
>>>>> getruntime.o
>>>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o
>>>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./ 
>>>>> obstack.o
>>>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o
>>>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o
>>>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o
>>>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o
>>>>> ./xstrndup.o> required-list
>>>>> make[2]: Nothing to be done for `all'.
>>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>>> `patsubst'. Stop.
>>>>> make: *** [all-libcpp] Error 2
>>>>> /bin/sh: line 1: cd: gcc: No such file or directory
>>>>> make: *** No rule to make target `s-modes'. Stop.
>>>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314:  
>>>>> quake
>>>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2
>>>>>
>>>>> --procedure-- -line- -file---
>>>>> cp_if --
>>>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>>> 9
>>>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args
>>>>>
>>>>> Fatal Error: package build failed
>>>>> ==> m3-sys/m3cc done
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Olaf
>>>>> --
>>>>> Olaf Wagner -- elego Software Solutions GmbH
>>>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany
>>>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23
>>>>> 45 86 95
>>>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz:
>>>>> Berlin
>>>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr:
>>>>> DE163214194
>>>>>
>>>>
>>>
>>


From darko at darko.org  Wed Oct  1 12:03:15 2008
From: darko at darko.org (Darko)
Date: Wed, 1 Oct 2008 12:03:15 +0200
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: <DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu>
	<DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
Message-ID: <B971C9C9-251C-4F79-A12F-622F47883781@darko.org>

I've extended one of the modules with a function that formats any  
allocated value for printing. If you're interested I can clean them up  
a little and post them.


On 28/09/2008, at 8:01 AM, Darko wrote:

> As far as I know, yes, they're not in the binary. I'd love to be  
> proven wrong though, or fix it so they did. I have a module that  
> reads the .M3WEB file and maps it to types and a module that will  
> read and write any field within a type safely using a numeric index.  
> Neither is perfect. You can integrate the two to get what you want  
> but I seem to remember having some problems mapping type ids (UIDs?)  
> to typecodes at runtime.
>
>
> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>
>> Right, I am aware of those interfaces.. just wondering what was
>> out there.  Do I really need to look at .M3WEB?  I thought
>> that m3gdb could figure out things without anything outside
>> of the binary...
>>
>> I'm looking for essentially what m3gdb offers, say prints
>> at minimum the name of the type (this I recall is trivial with
>> some of the RT* interfaces) but hopefully also with field names
>> and values, but doesn't expand references recursively.. something
>> like that?
>>
>>   Mika
>>
>> Darko writes:
>>> You can use RTTipe to read the fields and values within a type. If  
>>> you
>>> also want the type and field names you can interpret the .M3WEB  
>>> file.
>>> I have a couple of modules that do something like that but they are
>>> not what you would call finished. What level of detail are you  
>>> after?
>>>
>>>
>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> I am working on a writing an interpreter that I'd like to embed in
>>>> various Modula-3 programs.  It so happens that this interpreter
>>>> might from time to time be manipulating arbitrary M3 REFs, and just
>>>> from the point of view of providing information to a human user,
>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>> have any code that accomplishes this, at least partly?  I'm  
>>>> thinking
>>>> that since m3gdb can do it, the information must all be in the
>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>> pickler can pickle things... hmm.
>>>>
>>>> I would greatly appreciate any guidance that's out there...
>>>>
>>>>  Best regards,
>>>>     Mika Nystrom
>


From hosking at cs.purdue.edu  Wed Oct  1 11:59:23 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Wed, 1 Oct 2008 10:59:23 +0100
Subject: [M3devel] AMD-64 binaries?
In-Reply-To: <COL101-W281BD9E78E32E04348F400E6420@phx.gbl>
References: <48BDF24B.900@wichita.edu>
	<20080903075804.zhep2ichmow00scg@mail.elegosoft.com>
	<COL101-W839FDBE447569C4D9BACC6E6430@phx.gbl>
	<30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu>
	<COL101-W281BD9E78E32E04348F400E6420@phx.gbl>
Message-ID: <26766FFA-C3B6-45D2-8156-80FD14922882@cs.purdue.edu>

I can definitely vouch for ALPHA_OSF having worked as recently as two  
years ago, but without the pthreads native threading system.  That  
port should have been easy enough I suspect.

On Oct 1, 2008, at 7:41 AM, Jay wrote:

> No -- you would know best about AMD64_DARWIN.
> I'm sure ALPHA_OSF used to work, but it's been so long, I don't  
> think it counts.
>
> I'm being lazy.
>
> file AMD64_DARWIN/cm3cg
>  => fat binary? I doubt it.
>  => with ppc, i386, amd64? (doubt it)
>  => or just ppc, i386?  (doubt it)
>  => or just i386? This is I "suspect".
>  => or just AMD64. This would be somewhat interesting.

I believe that is how I configured it.

> I'm pretty sure cm3cg is always 32bit "these days".

Nope, cm3cg on AMD64_DARWIN is 64-bit.

> I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in  
> the same way.
> This was a nice thing to find, that the problem is portable to  
> multiple?all 64 bit hosts.
>
> I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same  
> problem.

Don't thinks so.

> Anyway, I should really get to debugging this soon.
>
> It's a bit odd because gcc itself doesn't have this bug and I  
> reviewed a lot of the code and it was ok. I'm just going to have to  
> step through it in parallel on 32bit and 64bit hosts and find where  
> they diverge. A LOT was identical, like the files output by cm3 into  
> cm3cg were identical.

Yes, the intermediate code should be identical.  Any such problems  
would be with cm3cg.

> I was close a few months ago but sloughed off.

Good luck.

>
>
>  - Jay
>
>
> > From: hosking at cs.purdue.edu
> > To: jay.krell at cornell.edu
> > Date: Tue, 30 Sep 2008 10:16:41 +0100
> > CC: m3devel at elegosoft.com
> > Subject: Re: [M3devel] AMD-64 binaries?
> >
> > 64-bit hosted tools? Do you mean only for Linux? I don't quite
> > understand what you are saying.
> >
> > On Sep 30, 2008, at 9:36 AM, Jay wrote:
> >
> > >
> > > I'm getting back to this now.
> > > I didn't realize it till this weekend, but that archive is
> > > "relatively incompatible".
> > > In particular it has 32bit hosted tools, and won't run on Debian
> > > 4.0r4 / AMD64.
> > > Something about glibc 2.4, when all I see on my system is 2.3.
> > > I'll see what I can do.
> > > Probably just rebuild cm3cg.
> > > I think it was built on Fedora, but could have been Ubuntu or
> > > OpenSuse.
> > > Probably just that Debian stable lags the others.
> > >
> > > The main problem to debug is why 64bit hosted tools "never" work.
> > > (Right?)
> > >
> > >
> > > Stay tuned for a bunch more ports "soon", I've got a bunch more
> > > hardware,
> > > that runs Linux and others (Solaris, AIX, Irix).. :)
> > >
> > > I'll be able to debug the high dpi gui problems on a friend's  
> laptop
> > > soon too.
> > > Send me a repro. I expect it is trivial -- like anything with a
> > > scrollbar.
> > > I can try formsedit, etc.
> > >
> > >
> > > - Jay
> > >
> > >
> > >> Date: Wed, 3 Sep 2008 07:58:04 +0200
> > >> From: wagner at elegosoft.com
> > >> To: m3devel at elegosoft.com
> > >> Subject: Re: [M3devel] AMD-64 binaries?
> > >>
> > >> Quoting "Rodney M. Bates" :
> > >>
> > >>> Are there binaries for AMD-64 around that can be used
> > >>> to bootstrap a 64-bit Linux compiler?
> > >>
> > >> Have a look at
> > >>
> > >> http://www.opencm3.net/uploaded-archives/index.html
> > >>
> > >> There are some AMD64 archives; I don't know about their status
> > >> offhand, though. I think Jay Krell produced them.
> > >> AFAIK there is no regular build on this platform yet.
> > >>
> > >> Olaf
> > >> --
> > >> Olaf Wagner -- elego Software Solutions GmbH
> > >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany
> > >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23
> > >> 45 86 95
> > >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz:
> > >> Berlin
> > >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr:
> > >> DE163214194
> > >>
> >
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081001/d38ae06a/attachment.html>

From hosking at cs.purdue.edu  Wed Oct  1 12:07:00 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Wed, 1 Oct 2008 11:07:00 +0100
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: <B971C9C9-251C-4F79-A12F-622F47883781@darko.org>
References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu>
	<DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
	<B971C9C9-251C-4F79-A12F-622F47883781@darko.org>
Message-ID: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu>

m3gdb makes use of stabs debug information spat out by the backend.   
They are only in the binary if compiled -g.  There are other ways to  
get what you are after, as Darko has observed.

On Oct 1, 2008, at 11:03 AM, Darko wrote:

> I've extended one of the modules with a function that formats any  
> allocated value for printing. If you're interested I can clean them  
> up a little and post them.
>
>
> On 28/09/2008, at 8:01 AM, Darko wrote:
>
>> As far as I know, yes, they're not in the binary. I'd love to be  
>> proven wrong though, or fix it so they did. I have a module that  
>> reads the .M3WEB file and maps it to types and a module that will  
>> read and write any field within a type safely using a numeric  
>> index. Neither is perfect. You can integrate the two to get what  
>> you want but I seem to remember having some problems mapping type  
>> ids (UIDs?) to typecodes at runtime.
>>
>>
>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>
>>> Right, I am aware of those interfaces.. just wondering what was
>>> out there.  Do I really need to look at .M3WEB?  I thought
>>> that m3gdb could figure out things without anything outside
>>> of the binary...
>>>
>>> I'm looking for essentially what m3gdb offers, say prints
>>> at minimum the name of the type (this I recall is trivial with
>>> some of the RT* interfaces) but hopefully also with field names
>>> and values, but doesn't expand references recursively.. something
>>> like that?
>>>
>>>  Mika
>>>
>>> Darko writes:
>>>> You can use RTTipe to read the fields and values within a type.  
>>>> If you
>>>> also want the type and field names you can interpret the .M3WEB  
>>>> file.
>>>> I have a couple of modules that do something like that but they are
>>>> not what you would call finished. What level of detail are you  
>>>> after?
>>>>
>>>>
>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> I am working on a writing an interpreter that I'd like to embed in
>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>> might from time to time be manipulating arbitrary M3 REFs, and  
>>>>> just
>>>>> from the point of view of providing information to a human user,
>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>> thinking
>>>>> that since m3gdb can do it, the information must all be in the
>>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>>> pickler can pickle things... hmm.
>>>>>
>>>>> I would greatly appreciate any guidance that's out there...
>>>>>
>>>>> Best regards,
>>>>>    Mika Nystrom
>>


From darko at darko.org  Wed Oct  1 12:35:09 2008
From: darko at darko.org (Darko)
Date: Wed, 1 Oct 2008 12:35:09 +0200
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu>
References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu>
	<DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
	<B971C9C9-251C-4F79-A12F-622F47883781@darko.org>
	<2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu>
Message-ID: <B26C3B35-ADAA-4289-8006-F32D5CCCA407@darko.org>

Here's some info on the stabs format: http://www.cs.utah.edu/dept/old/texinfo/gdb/stabs_toc.html


On 01/10/2008, at 12:07 PM, Tony Hosking wrote:

> m3gdb makes use of stabs debug information spat out by the backend.   
> They are only in the binary if compiled -g.  There are other ways to  
> get what you are after, as Darko has observed.
>
> On Oct 1, 2008, at 11:03 AM, Darko wrote:
>
>> I've extended one of the modules with a function that formats any  
>> allocated value for printing. If you're interested I can clean them  
>> up a little and post them.
>>
>>
>> On 28/09/2008, at 8:01 AM, Darko wrote:
>>
>>> As far as I know, yes, they're not in the binary. I'd love to be  
>>> proven wrong though, or fix it so they did. I have a module that  
>>> reads the .M3WEB file and maps it to types and a module that will  
>>> read and write any field within a type safely using a numeric  
>>> index. Neither is perfect. You can integrate the two to get what  
>>> you want but I seem to remember having some problems mapping type  
>>> ids (UIDs?) to typecodes at runtime.
>>>
>>>
>>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>>
>>>> Right, I am aware of those interfaces.. just wondering what was
>>>> out there.  Do I really need to look at .M3WEB?  I thought
>>>> that m3gdb could figure out things without anything outside
>>>> of the binary...
>>>>
>>>> I'm looking for essentially what m3gdb offers, say prints
>>>> at minimum the name of the type (this I recall is trivial with
>>>> some of the RT* interfaces) but hopefully also with field names
>>>> and values, but doesn't expand references recursively.. something
>>>> like that?
>>>>
>>>> Mika
>>>>
>>>> Darko writes:
>>>>> You can use RTTipe to read the fields and values within a type.  
>>>>> If you
>>>>> also want the type and field names you can interpret the .M3WEB  
>>>>> file.
>>>>> I have a couple of modules that do something like that but they  
>>>>> are
>>>>> not what you would call finished. What level of detail are you  
>>>>> after?
>>>>>
>>>>>
>>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>>
>>>>>> Hello Modula-3 people,
>>>>>>
>>>>>> I am working on a writing an interpreter that I'd like to embed  
>>>>>> in
>>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>>> might from time to time be manipulating arbitrary M3 REFs, and  
>>>>>> just
>>>>>> from the point of view of providing information to a human user,
>>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>>> thinking
>>>>>> that since m3gdb can do it, the information must all be in the
>>>>>> binary---somehow.  (Even enumeration names, right?)  And since  
>>>>>> the
>>>>>> pickler can pickle things... hmm.
>>>>>>
>>>>>> I would greatly appreciate any guidance that's out there...
>>>>>>
>>>>>> Best regards,
>>>>>>   Mika Nystrom
>>>
>


From mika at async.caltech.edu  Wed Oct  1 20:09:58 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Wed, 01 Oct 2008 11:09:58 -0700
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: Your message of "Wed, 01 Oct 2008 12:03:15 +0200."
	<B971C9C9-251C-4F79-A12F-622F47883781@darko.org> 
Message-ID: <200810011809.m91I9wxY087739@camembert.async.caltech.edu>

Oh, I'd love to give it a try!

I'm a little surprised no one has chimed in on the question of
whether you really need .M3WEB... I could swear I can get good
symbolic debugging with m3gdb on just a binary...

     Mika

Darko writes:
>I've extended one of the modules with a function that formats any  
>allocated value for printing. If you're interested I can clean them up  
>a little and post them.
>
>
>On 28/09/2008, at 8:01 AM, Darko wrote:
>
>> As far as I know, yes, they're not in the binary. I'd love to be  
>> proven wrong though, or fix it so they did. I have a module that  
>> reads the .M3WEB file and maps it to types and a module that will  
>> read and write any field within a type safely using a numeric index.  
>> Neither is perfect. You can integrate the two to get what you want  
>> but I seem to remember having some problems mapping type ids (UIDs?)  
>> to typecodes at runtime.
>>
>>
>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>
>>> Right, I am aware of those interfaces.. just wondering what was
>>> out there.  Do I really need to look at .M3WEB?  I thought
>>> that m3gdb could figure out things without anything outside
>>> of the binary...
>>>
>>> I'm looking for essentially what m3gdb offers, say prints
>>> at minimum the name of the type (this I recall is trivial with
>>> some of the RT* interfaces) but hopefully also with field names
>>> and values, but doesn't expand references recursively.. something
>>> like that?
>>>
>>>   Mika
>>>
>>> Darko writes:
>>>> You can use RTTipe to read the fields and values within a type. If  
>>>> you
>>>> also want the type and field names you can interpret the .M3WEB  
>>>> file.
>>>> I have a couple of modules that do something like that but they are
>>>> not what you would call finished. What level of detail are you  
>>>> after?
>>>>
>>>>
>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> I am working on a writing an interpreter that I'd like to embed in
>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>> might from time to time be manipulating arbitrary M3 REFs, and just
>>>>> from the point of view of providing information to a human user,
>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>> thinking
>>>>> that since m3gdb can do it, the information must all be in the
>>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>>> pickler can pickle things... hmm.
>>>>>
>>>>> I would greatly appreciate any guidance that's out there...
>>>>>
>>>>>  Best regards,
>>>>>     Mika Nystrom
>>


From mika at async.caltech.edu  Wed Oct  1 20:10:38 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Wed, 01 Oct 2008 11:10:38 -0700
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: Your message of "Wed, 01 Oct 2008 11:07:00 BST."
	<2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> 
Message-ID: <200810011810.m91IAcDW087832@camembert.async.caltech.edu>

Ok, ignore my previous email :-)

Tony Hosking writes:
>m3gdb makes use of stabs debug information spat out by the backend.   
>They are only in the binary if compiled -g.  There are other ways to  
>get what you are after, as Darko has observed.
>
>On Oct 1, 2008, at 11:03 AM, Darko wrote:
>
>> I've extended one of the modules with a function that formats any  
>> allocated value for printing. If you're interested I can clean them  
>> up a little and post them.
>>
>>
>> On 28/09/2008, at 8:01 AM, Darko wrote:
>>
>>> As far as I know, yes, they're not in the binary. I'd love to be  
>>> proven wrong though, or fix it so they did. I have a module that  
>>> reads the .M3WEB file and maps it to types and a module that will  
>>> read and write any field within a type safely using a numeric  
>>> index. Neither is perfect. You can integrate the two to get what  
>>> you want but I seem to remember having some problems mapping type  
>>> ids (UIDs?) to typecodes at runtime.
>>>
>>>
>>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>>
>>>> Right, I am aware of those interfaces.. just wondering what was
>>>> out there.  Do I really need to look at .M3WEB?  I thought
>>>> that m3gdb could figure out things without anything outside
>>>> of the binary...
>>>>
>>>> I'm looking for essentially what m3gdb offers, say prints
>>>> at minimum the name of the type (this I recall is trivial with
>>>> some of the RT* interfaces) but hopefully also with field names
>>>> and values, but doesn't expand references recursively.. something
>>>> like that?
>>>>
>>>>  Mika
>>>>
>>>> Darko writes:
>>>>> You can use RTTipe to read the fields and values within a type.  
>>>>> If you
>>>>> also want the type and field names you can interpret the .M3WEB  
>>>>> file.
>>>>> I have a couple of modules that do something like that but they are
>>>>> not what you would call finished. What level of detail are you  
>>>>> after?
>>>>>
>>>>>
>>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>>
>>>>>> Hello Modula-3 people,
>>>>>>
>>>>>> I am working on a writing an interpreter that I'd like to embed in
>>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>>> might from time to time be manipulating arbitrary M3 REFs, and  
>>>>>> just
>>>>>> from the point of view of providing information to a human user,
>>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>>> thinking
>>>>>> that since m3gdb can do it, the information must all be in the
>>>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>>>> pickler can pickle things... hmm.
>>>>>>
>>>>>> I would greatly appreciate any guidance that's out there...
>>>>>>
>>>>>> Best regards,
>>>>>>    Mika Nystrom
>>>


From jay.krell at cornell.edu  Sun Oct 12 11:51:03 2008
From: jay.krell at cornell.edu (Jay)
Date: Sun, 12 Oct 2008 09:51:03 +0000
Subject: [M3devel] a bunch of new/old platform names?
Message-ID: <COL101-W614506DC49BC7BC3640D65E6370@phx.gbl>


I plan on soon bringing "back" some old ports -- building current archives -- and bring up some new ports.

Specifically I have hardware: RS/6000 (PPC64/AIX), SGI (MIPS), SPARC64, plus the usual x86/AMD64.

Two of the platforms did exist.

In particular, "MIPS_IRIX" is "IRIX5".
  Reuse IRIX5, or introduce MIPS_IRIX?

PPC_AIX is IBMR2 or such.
  Same question.

Also, must versions really be in platform names?
I'm loathe to add a third dimension to the matrix.
I did just note that FreeBSD 7.0 64 bit is ABI-incompatible with FreeBSD 6.3 64 bit, lame.

SGI claims good ABI across all the 6.5 releases, which is all there will be now.
IBM claims good 32 bit ABI compat across AIX 4.x - 6.x and good 64 bit ABI compat across 5.x and 6.x, but incompatibility from 64 bit 4.x.
(Microsoft has always been good here, but "behavioral" compat is the actual tricky issue.)

And, what do folks think about putting "32" in new 32 bit platform names?

I'm considering the following:
  MIPS32_{IRIX,LINUX,OPENBSD,NETBSD} 
  MIPS64_IRIX (6.5) 
  SPARC{32,64}_{LINUX,*BSD}(probably no SPARC32_*BSD actually, and SPARC32_LINUX is already in, but not building regularly) 
  {SPARC64,I386,AMD64}_SOLARIS 
  PPC{32,64}_AIX 
    (PPC64_LINUX is blocked, Linux has problems booting on the hardware and I have no Mac G5 yet). 
 AMD64_*BSD 

Also, maybe some of the code should be restructured to separate processor from OS?
That might be primarily only pointer size.

Any interest in "x86" instead of "I386"?

If I make good progress against those 18 (!), I can see about PPC64_DARWIN, HPPA_*, IA64_*, ALPHA_*, ARM_*, which I lack hardware for. PPC_LINUX also should be converted to pthreads imho.
Mostly this is all just a matter of installing the OS and configuring gcc.
 
And, yeah, I have the two m3cgs stepping side by side to find the problem there, and will have use of a high dpi Windows laptop for that other problem..

And then of course, if the vast majority of platforms are named like that, there might be pressure to bring the rest in line. :) I386_{NT,LINUX,*BSD,CYGWIN,MINGWIN}

 - Jay

From mika at async.caltech.edu  Fri Oct 17 00:32:39 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 15:32:39 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
Message-ID: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu>

Hello Modula-3 people,

As I mentioned in an earlier email about printing structures (thanks
Darko), I'm in the midst of coding an interpreter embedded in
Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
JScheme for Java (well it was at first strongly based, but more and
more loosely, if you know what I mean...)

I expected that the performance of the interpreter would be much
better in Modula-3 than in Java, and I have been testing on two
different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
and the other is CM3 on a recent Debian system.  What I am finding
is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
close to ten times as fast on some tasks at this point), but on
Linux/CM3 it is much closer in speed to JScheme than I would like.

When I started, with code that was essentially equivalent to JScheme,
I found that it was a bit slower than JScheme on Linux/CM3 and
possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
spend most of its time in (surprise, surprise!) memory allocation
and garbage collection.  The speedup I have achieved between the
first implementation and now was due to the use of Modula-3 constructs
that are superior to Java's, such as the use of arrays of RECORDs
to make small stacks rather than linked lists.  (I get readable
code with much fewer memory allocations and GC work.)

Now, since this is an interpreter, I as the implementer have limited
control over how much memory is allocated and freed, and where it is
needed.  However, I can sometimes fall back on C-style memory management,
but I would like to do it in a safe way.  For instance, I have special-cased
evaluation of Scheme primitives, as follows.

Under the "normal" implementation, a list of things to evaluate is
built up, passed to an evaluation function, and then the GC is left
to sweep up the mess.  The problem is that there are various tricky
routes by which references can escape the evaluator, so you can't
just assume that what you put in is going to be dead right after
an eval and free it.  Instead, I set a flag in the evaluator, which
is TRUE if it is OK to free the list after the eval and FALSE if
it's unclear (in which case the problem is left up to the GC).

For the vast majority of Scheme primitives, one can indeed free the
list right after the eval.  Now of course I am not interested
in unsafe code, so what I do is this:

TYPE Pair = OBJECT first, rest : REFANY; END;

VAR
  mu := NEW(MUTEX);
  free : Pair := NIL;

PROCEDURE GetPair() : Pair =
  BEGIN
    LOCK mu DO
      IF free # NIL THEN
        TRY
          RETURN free
        FINALLY
          free := free.rest
        END
      END
    END;
    RETURN NEW(Pair)
  END GetPair;

PROCEDURE ReturnPair(cons : Pair) = 
  BEGIN
    cons.first := NIL;
    LOCK mu DO
      cons.rest := free;
      free := cons
    END
  END ReturnPair;

my eval code looks like

VAR okToFree : BOOLEAN; BEGIN

   args := GetPair(); ...
   result := EvalPrimitive(args, (*VAR OUT*) okToFree);

   IF okToFree THEN ReturnPair(args) END;
   RETURN result
END

and this does work well.  In fact it speeds up the Linux implementation
by almost 100% to recycle the lists like this *just* for the
evaluation of Scheme primitives.

But it's still ugly, isn't it?  There's a mutex, and a global
variable.  And yes, the time spent messing with the mutex is
noticeable, and I haven't even made the code multi-threaded yet
(and that is coming!)

So I'm thinking, what I really want is a structure that is attached
to my current Thread.T.  I want to be able to access just a single 
pointer (like the free list) but be sure it is unique to my current
thread.  No locking would be necessary if I could do this.

Does anyone have an elegant solution that does something like this?
Thread-specific "static" variables?  Just one REFANY would be enough
for a lot of uses...  seems to me this should be a frequently
occurring problem?

     Best regards,
       Mika
    

From hosking at cs.purdue.edu  Fri Oct 17 00:54:51 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Thu, 16 Oct 2008 23:54:51 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu>
References: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu>
Message-ID: <C17F2003-446E-466C-84DC-DA8E23A96726@cs.purdue.edu>

Have you tried running @M3noincremental?

On 16 Oct 2008, at 23:32, Mika Nystrom wrote:

> Hello Modula-3 people,
>
> As I mentioned in an earlier email about printing structures (thanks
> Darko), I'm in the midst of coding an interpreter embedded in
> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
> JScheme for Java (well it was at first strongly based, but more and
> more loosely, if you know what I mean...)
>
> I expected that the performance of the interpreter would be much
> better in Modula-3 than in Java, and I have been testing on two
> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
> and the other is CM3 on a recent Debian system.  What I am finding
> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
> close to ten times as fast on some tasks at this point), but on
> Linux/CM3 it is much closer in speed to JScheme than I would like.
>
> When I started, with code that was essentially equivalent to JScheme,
> I found that it was a bit slower than JScheme on Linux/CM3 and
> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
> spend most of its time in (surprise, surprise!) memory allocation
> and garbage collection.  The speedup I have achieved between the
> first implementation and now was due to the use of Modula-3 constructs
> that are superior to Java's, such as the use of arrays of RECORDs
> to make small stacks rather than linked lists.  (I get readable
> code with much fewer memory allocations and GC work.)
>
> Now, since this is an interpreter, I as the implementer have limited
> control over how much memory is allocated and freed, and where it is
> needed.  However, I can sometimes fall back on C-style memory  
> management,
> but I would like to do it in a safe way.  For instance, I have  
> special-cased
> evaluation of Scheme primitives, as follows.
>
> Under the "normal" implementation, a list of things to evaluate is
> built up, passed to an evaluation function, and then the GC is left
> to sweep up the mess.  The problem is that there are various tricky
> routes by which references can escape the evaluator, so you can't
> just assume that what you put in is going to be dead right after
> an eval and free it.  Instead, I set a flag in the evaluator, which
> is TRUE if it is OK to free the list after the eval and FALSE if
> it's unclear (in which case the problem is left up to the GC).
>
> For the vast majority of Scheme primitives, one can indeed free the
> list right after the eval.  Now of course I am not interested
> in unsafe code, so what I do is this:
>
> TYPE Pair = OBJECT first, rest : REFANY; END;
>
> VAR
>  mu := NEW(MUTEX);
>  free : Pair := NIL;
>
> PROCEDURE GetPair() : Pair =
>  BEGIN
>    LOCK mu DO
>      IF free # NIL THEN
>        TRY
>          RETURN free
>        FINALLY
>          free := free.rest
>        END
>      END
>    END;
>    RETURN NEW(Pair)
>  END GetPair;
>
> PROCEDURE ReturnPair(cons : Pair) =
>  BEGIN
>    cons.first := NIL;
>    LOCK mu DO
>      cons.rest := free;
>      free := cons
>    END
>  END ReturnPair;
>
> my eval code looks like
>
> VAR okToFree : BOOLEAN; BEGIN
>
>   args := GetPair(); ...
>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>
>   IF okToFree THEN ReturnPair(args) END;
>   RETURN result
> END
>
> and this does work well.  In fact it speeds up the Linux  
> implementation
> by almost 100% to recycle the lists like this *just* for the
> evaluation of Scheme primitives.
>
> But it's still ugly, isn't it?  There's a mutex, and a global
> variable.  And yes, the time spent messing with the mutex is
> noticeable, and I haven't even made the code multi-threaded yet
> (and that is coming!)
>
> So I'm thinking, what I really want is a structure that is attached
> to my current Thread.T.  I want to be able to access just a single
> pointer (like the free list) but be sure it is unique to my current
> thread.  No locking would be necessary if I could do this.
>
> Does anyone have an elegant solution that does something like this?
> Thread-specific "static" variables?  Just one REFANY would be enough
> for a lot of uses...  seems to me this should be a frequently
> occurring problem?
>
>     Best regards,
>       Mika
>
>
>
>
>
>


From mika at async.caltech.edu  Fri Oct 17 01:30:01 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 16:30:01 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Thu, 16 Oct 2008 23:54:51 BST."
	<C17F2003-446E-466C-84DC-DA8E23A96726@cs.purdue.edu> 
Message-ID: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>

Hi Tony,

I figured you would chime in!

Yes, @M3noincremental seems to make things consistently a tad bit
slower (but a very small difference), on both FreeBSD and Linux.
@M3nogc makes a bigger difference, of course.

Unfortunately I seem to have lost the code that did a lot of memory
allocations.  My tricks (as described in the email---and others!)
have removed most of the troublesome memory allocations, but now
I'm stuck with the mutex instead...

      Mika

Tony Hosking writes:
>Have you tried running @M3noincremental?
>
>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>
>> Hello Modula-3 people,
>>
>> As I mentioned in an earlier email about printing structures (thanks
>> Darko), I'm in the midst of coding an interpreter embedded in
>> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
>> JScheme for Java (well it was at first strongly based, but more and
>> more loosely, if you know what I mean...)
>>
>> I expected that the performance of the interpreter would be much
>> better in Modula-3 than in Java, and I have been testing on two
>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>> and the other is CM3 on a recent Debian system.  What I am finding
>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>> close to ten times as fast on some tasks at this point), but on
>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>
>> When I started, with code that was essentially equivalent to JScheme,
>> I found that it was a bit slower than JScheme on Linux/CM3 and
>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>> spend most of its time in (surprise, surprise!) memory allocation
>> and garbage collection.  The speedup I have achieved between the
>> first implementation and now was due to the use of Modula-3 constructs
>> that are superior to Java's, such as the use of arrays of RECORDs
>> to make small stacks rather than linked lists.  (I get readable
>> code with much fewer memory allocations and GC work.)
>>
>> Now, since this is an interpreter, I as the implementer have limited
>> control over how much memory is allocated and freed, and where it is
>> needed.  However, I can sometimes fall back on C-style memory  
>> management,
>> but I would like to do it in a safe way.  For instance, I have  
>> special-cased
>> evaluation of Scheme primitives, as follows.
>>
>> Under the "normal" implementation, a list of things to evaluate is
>> built up, passed to an evaluation function, and then the GC is left
>> to sweep up the mess.  The problem is that there are various tricky
>> routes by which references can escape the evaluator, so you can't
>> just assume that what you put in is going to be dead right after
>> an eval and free it.  Instead, I set a flag in the evaluator, which
>> is TRUE if it is OK to free the list after the eval and FALSE if
>> it's unclear (in which case the problem is left up to the GC).
>>
>> For the vast majority of Scheme primitives, one can indeed free the
>> list right after the eval.  Now of course I am not interested
>> in unsafe code, so what I do is this:
>>
>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>
>> VAR
>>  mu := NEW(MUTEX);
>>  free : Pair := NIL;
>>
>> PROCEDURE GetPair() : Pair =
>>  BEGIN
>>    LOCK mu DO
>>      IF free # NIL THEN
>>        TRY
>>          RETURN free
>>        FINALLY
>>          free := free.rest
>>        END
>>      END
>>    END;
>>    RETURN NEW(Pair)
>>  END GetPair;
>>
>> PROCEDURE ReturnPair(cons : Pair) =
>>  BEGIN
>>    cons.first := NIL;
>>    LOCK mu DO
>>      cons.rest := free;
>>      free := cons
>>    END
>>  END ReturnPair;
>>
>> my eval code looks like
>>
>> VAR okToFree : BOOLEAN; BEGIN
>>
>>   args := GetPair(); ...
>>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>
>>   IF okToFree THEN ReturnPair(args) END;
>>   RETURN result
>> END
>>
>> and this does work well.  In fact it speeds up the Linux  
>> implementation
>> by almost 100% to recycle the lists like this *just* for the
>> evaluation of Scheme primitives.
>>
>> But it's still ugly, isn't it?  There's a mutex, and a global
>> variable.  And yes, the time spent messing with the mutex is
>> noticeable, and I haven't even made the code multi-threaded yet
>> (and that is coming!)
>>
>> So I'm thinking, what I really want is a structure that is attached
>> to my current Thread.T.  I want to be able to access just a single
>> pointer (like the free list) but be sure it is unique to my current
>> thread.  No locking would be necessary if I could do this.
>>
>> Does anyone have an elegant solution that does something like this?
>> Thread-specific "static" variables?  Just one REFANY would be enough
>> for a lot of uses...  seems to me this should be a frequently
>> occurring problem?
>>
>>     Best regards,
>>       Mika
>>
>>
>>
>>
>>
>>


From jay.krell at cornell.edu  Fri Oct 17 06:40:28 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 17 Oct 2008 04:40:28 +0000
Subject: [M3devel] M3 programming problem : GC efficiency /
	per-thread	storage areas?
In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
References: Your message of 
	<200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
Message-ID: <COL101-W4964BD437A46A53516DAA3E6320@phx.gbl>


Making this per-thread is a fairly classic good improvement.

You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and allowing for a free to come in from any thread.

A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Including an array of mutexes.

If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a global counter that gets assigned into the thread on first use per-thread.

The cache could also be more than one element.

How do you manage okToFree?

Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be.
 
 - Jay

----------------------------------------
> To: hosking at cs.purdue.edu
> Date: Thu, 16 Oct 2008 16:30:01 -0700
> From: mika at async.caltech.edu
> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu
> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread	storage areas?
> 
> Hi Tony,
> 
> I figured you would chime in!
> 
> Yes, @M3noincremental seems to make things consistently a tad bit
> slower (but a very small difference), on both FreeBSD and Linux.
> @M3nogc makes a bigger difference, of course.
> 
> Unfortunately I seem to have lost the code that did a lot of memory
> allocations.  My tricks (as described in the email---and others!)
> have removed most of the troublesome memory allocations, but now
> I'm stuck with the mutex instead...
> 
>       Mika
> 
> Tony Hosking writes:
>>Have you tried running @M3noincremental?
>>
>>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>
>>> Hello Modula-3 people,
>>>
>>> As I mentioned in an earlier email about printing structures (thanks
>>> Darko), I'm in the midst of coding an interpreter embedded in
>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
>>> JScheme for Java (well it was at first strongly based, but more and
>>> more loosely, if you know what I mean...)
>>>
>>> I expected that the performance of the interpreter would be much
>>> better in Modula-3 than in Java, and I have been testing on two
>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>> and the other is CM3 on a recent Debian system.  What I am finding
>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>>> close to ten times as fast on some tasks at this point), but on
>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>
>>> When I started, with code that was essentially equivalent to JScheme,
>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>> spend most of its time in (surprise, surprise!) memory allocation
>>> and garbage collection.  The speedup I have achieved between the
>>> first implementation and now was due to the use of Modula-3 constructs
>>> that are superior to Java's, such as the use of arrays of RECORDs
>>> to make small stacks rather than linked lists.  (I get readable
>>> code with much fewer memory allocations and GC work.)
>>>
>>> Now, since this is an interpreter, I as the implementer have limited
>>> control over how much memory is allocated and freed, and where it is
>>> needed.  However, I can sometimes fall back on C-style memory  
>>> management,
>>> but I would like to do it in a safe way.  For instance, I have  
>>> special-cased
>>> evaluation of Scheme primitives, as follows.
>>>
>>> Under the "normal" implementation, a list of things to evaluate is
>>> built up, passed to an evaluation function, and then the GC is left
>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>> just assume that what you put in is going to be dead right after
>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>> it's unclear (in which case the problem is left up to the GC).
>>>
>>> For the vast majority of Scheme primitives, one can indeed free the
>>> list right after the eval.  Now of course I am not interested
>>> in unsafe code, so what I do is this:
>>>
>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>
>>> VAR
>>>  mu := NEW(MUTEX);
>>>  free : Pair := NIL;
>>>
>>> PROCEDURE GetPair() : Pair =
>>>  BEGIN
>>>    LOCK mu DO
>>>      IF free # NIL THEN
>>>        TRY
>>>          RETURN free
>>>        FINALLY
>>>          free := free.rest
>>>        END
>>>      END
>>>    END;
>>>    RETURN NEW(Pair)
>>>  END GetPair;
>>>
>>> PROCEDURE ReturnPair(cons : Pair) =
>>>  BEGIN
>>>    cons.first := NIL;
>>>    LOCK mu DO
>>>      cons.rest := free;
>>>      free := cons
>>>    END
>>>  END ReturnPair;
>>>
>>> my eval code looks like
>>>
>>> VAR okToFree : BOOLEAN; BEGIN
>>>
>>>   args := GetPair(); ...
>>>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>
>>>   IF okToFree THEN ReturnPair(args) END;
>>>   RETURN result
>>> END
>>>
>>> and this does work well.  In fact it speeds up the Linux  
>>> implementation
>>> by almost 100% to recycle the lists like this *just* for the
>>> evaluation of Scheme primitives.
>>>
>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>> variable.  And yes, the time spent messing with the mutex is
>>> noticeable, and I haven't even made the code multi-threaded yet
>>> (and that is coming!)
>>>
>>> So I'm thinking, what I really want is a structure that is attached
>>> to my current Thread.T.  I want to be able to access just a single
>>> pointer (like the free list) but be sure it is unique to my current
>>> thread.  No locking would be necessary if I could do this.
>>>
>>> Does anyone have an elegant solution that does something like this?
>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>> for a lot of uses...  seems to me this should be a frequently
>>> occurring problem?
>>>
>>>     Best regards,
>>>       Mika
>>>
>>>
>>>
>>>
>>>
>>>


From mika at async.caltech.edu  Fri Oct 17 08:32:15 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 23:32:15 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000."
	<COL101-W4964BD437A46A53516DAA3E6320@phx.gbl> 
Message-ID: <200810170632.m9H6WFHd078061@camembert.async.caltech.edu>


Well, I was thinking of something even simpler.  A Thread.T is an
OBJECT.  It's garbage collected just like any other object, is it
not?  

Why can't the thing that makes new threads simply include a single
globally visible field in every Thread.T, of type REFANY?  Call it "data".

Then you can always manipulate Thread.Self().data as you see fit
without any need for locks.  There can be no problem with this as
long as it is always manipulated from within that thread.
Of course this can be trivially encapsulated by not revealing "data"
and indeed always accessing it as Thread.Self().data.

You would not normally access this from any other thread.  It's indeed
only meant to be used in the idiom

  x := Allocate();
  TRY
    DoSomething(x)
  FINALLY
    Free(x)
  END

It's also not really a "Free" but just returning the object to a free
list (there can be no unsafe behavior here).

As a "nicer" interface, one could register routines with a public
interface, asking it to manufacture some kind of thread globals.
For maximum sanity, they would be visible inside the MODULE that
requested them, but I'm not sure how to accomplish this.  And of
course there's not much point in any of this unless it can be made
efficient or else a mutex plus a true global will work just as well.

What I'm talking about I guess could be done by hacking up Thread.Fork()
to return a subtype of Thread.T, but that won't work for the first
thread.  But with this method you could have arbitrary fields (and
methods) attached to a Thread.T.  How to collect everything you need
is a different story...

I'm not asking for a new language feature... really was just wondering
if anyone had tried anything like this before, and now am rambling a
bit.
 
     Mika

Jay writes:
>
>Making this per-thread is a fairly classic good improvement.
>
>You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and a
>llowing for a free to come in from any thread.
>
>A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Includi
>ng an array of mutexes.
>
>If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a glo
>bal counter that gets assigned into the thread on first use per-thread.
>
>The cache could also be more than one element.
>
>How do you manage okToFree?
>
>Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work
> with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be.
> 
> - Jay
>
>----------------------------------------
>> To: hosking at cs.purdue.edu
>> Date: Thu, 16 Oct 2008 16:30:01 -0700
>> From: mika at async.caltech.edu
>> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu
>> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread	storage areas?
>> 
>> Hi Tony,
>> 
>> I figured you would chime in!
>> 
>> Yes, @M3noincremental seems to make things consistently a tad bit
>> slower (but a very small difference), on both FreeBSD and Linux.
>> @M3nogc makes a bigger difference, of course.
>> 
>> Unfortunately I seem to have lost the code that did a lot of memory
>> allocations.  My tricks (as described in the email---and others!)
>> have removed most of the troublesome memory allocations, but now
>> I'm stuck with the mutex instead...
>> 
>>       Mika
>> 
>> Tony Hosking writes:
>>>Have you tried running @M3noincremental?
>>>
>>>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> As I mentioned in an earlier email about printing structures (thanks
>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
>>>> JScheme for Java (well it was at first strongly based, but more and
>>>> more loosely, if you know what I mean...)
>>>>
>>>> I expected that the performance of the interpreter would be much
>>>> better in Modula-3 than in Java, and I have been testing on two
>>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>>> close to ten times as fast on some tasks at this point), but on
>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>
>>>> When I started, with code that was essentially equivalent to JScheme,
>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>> and garbage collection.  The speedup I have achieved between the
>>>> first implementation and now was due to the use of Modula-3 constructs
>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>> to make small stacks rather than linked lists.  (I get readable
>>>> code with much fewer memory allocations and GC work.)
>>>>
>>>> Now, since this is an interpreter, I as the implementer have limited
>>>> control over how much memory is allocated and freed, and where it is
>>>> needed.  However, I can sometimes fall back on C-style memory  
>>>> management,
>>>> but I would like to do it in a safe way.  For instance, I have  
>>>> special-cased
>>>> evaluation of Scheme primitives, as follows.
>>>>
>>>> Under the "normal" implementation, a list of things to evaluate is
>>>> built up, passed to an evaluation function, and then the GC is left
>>>> to sweep up the mess.  The problem is that there are various tricky
>>>> routes by which references can escape the evaluator, so you can't
>>>> just assume that what you put in is going to be dead right after
>>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>> it's unclear (in which case the problem is left up to the GC).
>>>>
>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>> list right after the eval.  Now of course I am not interested
>>>> in unsafe code, so what I do is this:
>>>>
>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>
>>>> VAR
>>>>  mu := NEW(MUTEX);
>>>>  free : Pair := NIL;
>>>>
>>>> PROCEDURE GetPair() : Pair =
>>>>  BEGIN
>>>>    LOCK mu DO
>>>>      IF free # NIL THEN
>>>>        TRY
>>>>          RETURN free
>>>>        FINALLY
>>>>          free := free.rest
>>>>        END
>>>>      END
>>>>    END;
>>>>    RETURN NEW(Pair)
>>>>  END GetPair;
>>>>
>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>  BEGIN
>>>>    cons.first := NIL;
>>>>    LOCK mu DO
>>>>      cons.rest := free;
>>>>      free := cons
>>>>    END
>>>>  END ReturnPair;
>>>>
>>>> my eval code looks like
>>>>
>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>
>>>>   args := GetPair(); ...
>>>>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>
>>>>   IF okToFree THEN ReturnPair(args) END;
>>>>   RETURN result
>>>> END
>>>>
>>>> and this does work well.  In fact it speeds up the Linux  
>>>> implementation
>>>> by almost 100% to recycle the lists like this *just* for the
>>>> evaluation of Scheme primitives.
>>>>
>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>> variable.  And yes, the time spent messing with the mutex is
>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>> (and that is coming!)
>>>>
>>>> So I'm thinking, what I really want is a structure that is attached
>>>> to my current Thread.T.  I want to be able to access just a single
>>>> pointer (like the free list) but be sure it is unique to my current
>>>> thread.  No locking would be necessary if I could do this.
>>>>
>>>> Does anyone have an elegant solution that does something like this?
>>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>>> for a lot of uses...  seems to me this should be a frequently
>>>> occurring problem?
>>>>
>>>>     Best regards,
>>>>       Mika
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


From hosking at cs.purdue.edu  Fri Oct 17 08:35:03 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Fri, 17 Oct 2008 07:35:03 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
References: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
Message-ID: <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu>

I suspect part of the overhead of allocation in the new code is the  
need for thread-local allocation buffers, which means we need to  
access thread-local state.  We really need an efficient way to do  
that, but pthreads thread-local accesses may be what is killing you.

On 17 Oct 2008, at 00:30, Mika Nystrom wrote:

> Hi Tony,
>
> I figured you would chime in!
>
> Yes, @M3noincremental seems to make things consistently a tad bit
> slower (but a very small difference), on both FreeBSD and Linux.
> @M3nogc makes a bigger difference, of course.
>
> Unfortunately I seem to have lost the code that did a lot of memory
> allocations.  My tricks (as described in the email---and others!)
> have removed most of the troublesome memory allocations, but now
> I'm stuck with the mutex instead...
>
>      Mika
>
> Tony Hosking writes:
>> Have you tried running @M3noincremental?
>>
>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>
>>> Hello Modula-3 people,
>>>
>>> As I mentioned in an earlier email about printing structures (thanks
>>> Darko), I'm in the midst of coding an interpreter embedded in
>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter  
>>> Norvig's
>>> JScheme for Java (well it was at first strongly based, but more and
>>> more loosely, if you know what I mean...)
>>>
>>> I expected that the performance of the interpreter would be much
>>> better in Modula-3 than in Java, and I have been testing on two
>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>> and the other is CM3 on a recent Debian system.  What I am finding
>>> is that it is indeed much faster than JScheme on FreeBSD/PM3  
>>> (getting
>>> close to ten times as fast on some tasks at this point), but on
>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>
>>> When I started, with code that was essentially equivalent to  
>>> JScheme,
>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>> spend most of its time in (surprise, surprise!) memory allocation
>>> and garbage collection.  The speedup I have achieved between the
>>> first implementation and now was due to the use of Modula-3  
>>> constructs
>>> that are superior to Java's, such as the use of arrays of RECORDs
>>> to make small stacks rather than linked lists.  (I get readable
>>> code with much fewer memory allocations and GC work.)
>>>
>>> Now, since this is an interpreter, I as the implementer have limited
>>> control over how much memory is allocated and freed, and where it is
>>> needed.  However, I can sometimes fall back on C-style memory
>>> management,
>>> but I would like to do it in a safe way.  For instance, I have
>>> special-cased
>>> evaluation of Scheme primitives, as follows.
>>>
>>> Under the "normal" implementation, a list of things to evaluate is
>>> built up, passed to an evaluation function, and then the GC is left
>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>> just assume that what you put in is going to be dead right after
>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>> it's unclear (in which case the problem is left up to the GC).
>>>
>>> For the vast majority of Scheme primitives, one can indeed free the
>>> list right after the eval.  Now of course I am not interested
>>> in unsafe code, so what I do is this:
>>>
>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>
>>> VAR
>>> mu := NEW(MUTEX);
>>> free : Pair := NIL;
>>>
>>> PROCEDURE GetPair() : Pair =
>>> BEGIN
>>>   LOCK mu DO
>>>     IF free # NIL THEN
>>>       TRY
>>>         RETURN free
>>>       FINALLY
>>>         free := free.rest
>>>       END
>>>     END
>>>   END;
>>>   RETURN NEW(Pair)
>>> END GetPair;
>>>
>>> PROCEDURE ReturnPair(cons : Pair) =
>>> BEGIN
>>>   cons.first := NIL;
>>>   LOCK mu DO
>>>     cons.rest := free;
>>>     free := cons
>>>   END
>>> END ReturnPair;
>>>
>>> my eval code looks like
>>>
>>> VAR okToFree : BOOLEAN; BEGIN
>>>
>>>  args := GetPair(); ...
>>>  result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>
>>>  IF okToFree THEN ReturnPair(args) END;
>>>  RETURN result
>>> END
>>>
>>> and this does work well.  In fact it speeds up the Linux
>>> implementation
>>> by almost 100% to recycle the lists like this *just* for the
>>> evaluation of Scheme primitives.
>>>
>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>> variable.  And yes, the time spent messing with the mutex is
>>> noticeable, and I haven't even made the code multi-threaded yet
>>> (and that is coming!)
>>>
>>> So I'm thinking, what I really want is a structure that is attached
>>> to my current Thread.T.  I want to be able to access just a single
>>> pointer (like the free list) but be sure it is unique to my current
>>> thread.  No locking would be necessary if I could do this.
>>>
>>> Does anyone have an elegant solution that does something like this?
>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>> for a lot of uses...  seems to me this should be a frequently
>>> occurring problem?
>>>
>>>    Best regards,
>>>      Mika
>>>
>>>
>>>
>>>
>>>
>>>


From mika at async.caltech.edu  Fri Oct 17 08:50:13 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 23:50:13 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000."
	<COL101-W4964BD437A46A53516DAA3E6320@phx.gbl> 
Message-ID: <200810170650.m9H6oDU0078549@camembert.async.caltech.edu>

Jay writes:
...
>How do you manage okToFree?
...

I forgot to answer this q.

Well, the primitive evaluation in the interpreter is just a big
CASE statement.  I really just look at where it references the list
I am making, and if it references the list at all in a branch, I
insert the code "okToFree := FALSE".  The first two parameters are
passed in separately.  

Here's the code... since you ask!

This is the code for the special case of a two-argument Scheme procedure call,
such as (+ x 1) .

PROCEDURE Apply2(t : T; interp : Scheme.T; a1, a2 : Object) : Object
  VAR
      d1, d2 := GetCons();
      free := TRUE;
  BEGIN
      d1.first := a1; d1.rest := d2;
      d2.first := a2; d2.rest := NIL;

      WITH res = Prims(t, interp, d1, a1, a2, free) DO
        IF free THEN
          ReturnCons(d1); ReturnCons(d2)
        END;
        RETURN res
      END
  END Apply2;

PROCEDURE Prims(t : T; interp : Scheme.T; args, x, y : Object;
                VAR free : BOOLEAN) : Object =

   (* The (hopefully temporary) list of arguments is args.  x and
      y are the first two elements of args *)

   BEGIN
      CASE VAL(t.idNumber,P) OF
          P.Eq => RETURN NumCompare(args, '=')  (* known not to let args escape *)
        |
          P.List => free := FALSE; RETURN args  (* args escapes, dont know whither *)
        |
          P.Car => RETURN PedanticFirst(x)  (* doesn't even use args *)

        (* and about another 100 cases follow here *)

      END
   END Prims;

       Mika


From mika at async.caltech.edu  Fri Oct 17 10:03:18 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 17 Oct 2008 01:03:18 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST."
	<0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> 
Message-ID: <200810170803.m9H83IIC080081@camembert.async.caltech.edu>

Ok this suggests that using thread local state to get around the
problem won't help either.

Can I ask a question... I am looking at ThreadPThread.m3...

Why do you have to lock the slotMu in Self()?

PROCEDURE Self (): T =
  (* If not the initial thread and not created by Fork, returns NIL *)
  (* LL = 0 *)
  VAR
    me := GetActivation();
    t: T;
  BEGIN
    IF me = NIL THEN RETURN NIL END;
    WITH r = Upthread.mutex_lock(slotMu) DO <*ASSERT r=0*> END;
      t := slots[me.slot];
    WITH r = Upthread.mutex_unlock(slotMu) DO <*ASSERT r=0*> END;
    IF (t.act # me) THEN Die(ThisLine(), "thread with bad slot!") END;
    RETURN t;
  END Self;

Is it just because of AssignSlots?  If so.. it's actually a very rare
event that there would ever be a conflict, no?  (Only when "slots" is
extended?)

Can data be stored in an "Activation"?  Not TRACED data, obviously, hmm...

     Mika


Tony Hosking writes:
>I suspect part of the overhead of allocation in the new code is the  
>need for thread-local allocation buffers, which means we need to  
>access thread-local state.  We really need an efficient way to do  
>that, but pthreads thread-local accesses may be what is killing you.
>
>On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>
>> Hi Tony,
>>
>> I figured you would chime in!
>>
>> Yes, @M3noincremental seems to make things consistently a tad bit
>> slower (but a very small difference), on both FreeBSD and Linux.
>> @M3nogc makes a bigger difference, of course.
>>
>> Unfortunately I seem to have lost the code that did a lot of memory
>> allocations.  My tricks (as described in the email---and others!)
>> have removed most of the troublesome memory allocations, but now
>> I'm stuck with the mutex instead...
>>
>>      Mika
>>
>> Tony Hosking writes:
>>> Have you tried running @M3noincremental?
>>>
>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> As I mentioned in an earlier email about printing structures (thanks
>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter  
>>>> Norvig's
>>>> JScheme for Java (well it was at first strongly based, but more and
>>>> more loosely, if you know what I mean...)
>>>>
>>>> I expected that the performance of the interpreter would be much
>>>> better in Modula-3 than in Java, and I have been testing on two
>>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3  
>>>> (getting
>>>> close to ten times as fast on some tasks at this point), but on
>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>
>>>> When I started, with code that was essentially equivalent to  
>>>> JScheme,
>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>> and garbage collection.  The speedup I have achieved between the
>>>> first implementation and now was due to the use of Modula-3  
>>>> constructs
>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>> to make small stacks rather than linked lists.  (I get readable
>>>> code with much fewer memory allocations and GC work.)
>>>>
>>>> Now, since this is an interpreter, I as the implementer have limited
>>>> control over how much memory is allocated and freed, and where it is
>>>> needed.  However, I can sometimes fall back on C-style memory
>>>> management,
>>>> but I would like to do it in a safe way.  For instance, I have
>>>> special-cased
>>>> evaluation of Scheme primitives, as follows.
>>>>
>>>> Under the "normal" implementation, a list of things to evaluate is
>>>> built up, passed to an evaluation function, and then the GC is left
>>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>>> just assume that what you put in is going to be dead right after
>>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>> it's unclear (in which case the problem is left up to the GC).
>>>>
>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>> list right after the eval.  Now of course I am not interested
>>>> in unsafe code, so what I do is this:
>>>>
>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>
>>>> VAR
>>>> mu := NEW(MUTEX);
>>>> free : Pair := NIL;
>>>>
>>>> PROCEDURE GetPair() : Pair =
>>>> BEGIN
>>>>   LOCK mu DO
>>>>     IF free # NIL THEN
>>>>       TRY
>>>>         RETURN free
>>>>       FINALLY
>>>>         free := free.rest
>>>>       END
>>>>     END
>>>>   END;
>>>>   RETURN NEW(Pair)
>>>> END GetPair;
>>>>
>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>> BEGIN
>>>>   cons.first := NIL;
>>>>   LOCK mu DO
>>>>     cons.rest := free;
>>>>     free := cons
>>>>   END
>>>> END ReturnPair;
>>>>
>>>> my eval code looks like
>>>>
>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>
>>>>  args := GetPair(); ...
>>>>  result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>
>>>>  IF okToFree THEN ReturnPair(args) END;
>>>>  RETURN result
>>>> END
>>>>
>>>> and this does work well.  In fact it speeds up the Linux
>>>> implementation
>>>> by almost 100% to recycle the lists like this *just* for the
>>>> evaluation of Scheme primitives.
>>>>
>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>> variable.  And yes, the time spent messing with the mutex is
>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>> (and that is coming!)
>>>>
>>>> So I'm thinking, what I really want is a structure that is attached
>>>> to my current Thread.T.  I want to be able to access just a single
>>>> pointer (like the free list) but be sure it is unique to my current
>>>> thread.  No locking would be necessary if I could do this.
>>>>
>>>> Does anyone have an elegant solution that does something like this?
>>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>>> for a lot of uses...  seems to me this should be a frequently
>>>> occurring problem?
>>>>
>>>>    Best regards,
>>>>      Mika
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


From mika at async.caltech.edu  Fri Oct 17 10:32:28 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 17 Oct 2008 01:32:28 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST."
	<0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> 
Message-ID: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>

Ok I am sorry I am slow to pick up on this.

I take it the problem is actually the Upthread.getspecific routine,
which itself calls something get_curthread somewhere inside pthreads,
which in turn involves a context switch to the supervisor---the identity
of the current thread is just not accessible anywhere in user space.
Also explains why this program runs faster with my old PM3, which uses
longjmp threads.

The only way to avoid it (really) is to pass a pointer to the
Thread.T of the currently executing thread in the activation record
of *every* procedure, so that allocators can find it when necessary....
but that is very expensive in terms of stack memory.

Or I can just make a structure like that that I pass around where
I need it in my own program.  Thread-specific and user-managed.

I believe I have just answered all my own questions, but I hope
Tony will correct me if my answers are incorrect.

    Mika

Tony Hosking writes:
>I suspect part of the overhead of allocation in the new code is the  
>need for thread-local allocation buffers, which means we need to  
>access thread-local state.  We really need an efficient way to do  
>that, but pthreads thread-local accesses may be what is killing you.
>
>On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>
>> Hi Tony,
>>
>> I figured you would chime in!
>>
>> Yes, @M3noincremental seems to make things consistently a tad bit
>> slower (but a very small difference), on both FreeBSD and Linux.
>> @M3nogc makes a bigger difference, of course.
>>
>> Unfortunately I seem to have lost the code that did a lot of memory
>> allocations.  My tricks (as described in the email---and others!)
>> have removed most of the troublesome memory allocations, but now
>> I'm stuck with the mutex instead...
>>
>>      Mika
>>
>> Tony Hosking writes:
>>> Have you tried running @M3noincremental?
>>>
>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> As I mentioned in an earlier email about printing structures (thanks
>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter  
>>>> Norvig's
>>>> JScheme for Java (well it was at first strongly based, but more and
>>>> more loosely, if you know what I mean...)
>>>>
>>>> I expected that the performance of the interpreter would be much
>>>> better in Modula-3 than in Java, and I have been testing on two
>>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3  
>>>> (getting
>>>> close to ten times as fast on some tasks at this point), but on
>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>
>>>> When I started, with code that was essentially equivalent to  
>>>> JScheme,
>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>> and garbage collection.  The speedup I have achieved between the
>>>> first implementation and now was due to the use of Modula-3  
>>>> constructs
>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>> to make small stacks rather than linked lists.  (I get readable
>>>> code with much fewer memory allocations and GC work.)
>>>>
>>>> Now, since this is an interpreter, I as the implementer have limited
>>>> control over how much memory is allocated and freed, and where it is
>>>> needed.  However, I can sometimes fall back on C-style memory
>>>> management,
>>>> but I would like to do it in a safe way.  For instance, I have
>>>> special-cased
>>>> evaluation of Scheme primitives, as follows.
>>>>
>>>> Under the "normal" implementation, a list of things to evaluate is
>>>> built up, passed to an evaluation function, and then the GC is left
>>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>>> just assume that what you put in is going to be dead right after
>>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>> it's unclear (in which case the problem is left up to the GC).
>>>>
>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>> list right after the eval.  Now of course I am not interested
>>>> in unsafe code, so what I do is this:
>>>>
>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>
>>>> VAR
>>>> mu := NEW(MUTEX);
>>>> free : Pair := NIL;
>>>>
>>>> PROCEDURE GetPair() : Pair =
>>>> BEGIN
>>>>   LOCK mu DO
>>>>     IF free # NIL THEN
>>>>       TRY
>>>>         RETURN free
>>>>       FINALLY
>>>>         free := free.rest
>>>>       END
>>>>     END
>>>>   END;
>>>>   RETURN NEW(Pair)
>>>> END GetPair;
>>>>
>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>> BEGIN
>>>>   cons.first := NIL;
>>>>   LOCK mu DO
>>>>     cons.rest := free;
>>>>     free := cons
>>>>   END
>>>> END ReturnPair;
>>>>
>>>> my eval code looks like
>>>>
>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>
>>>>  args := GetPair(); ...
>>>>  result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>
>>>>  IF okToFree THEN ReturnPair(args) END;
>>>>  RETURN result
>>>> END
>>>>
>>>> and this does work well.  In fact it speeds up the Linux
>>>> implementation
>>>> by almost 100% to recycle the lists like this *just* for the
>>>> evaluation of Scheme primitives.
>>>>
>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>> variable.  And yes, the time spent messing with the mutex is
>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>> (and that is coming!)
>>>>
>>>> So I'm thinking, what I really want is a structure that is attached
>>>> to my current Thread.T.  I want to be able to access just a single
>>>> pointer (like the free list) but be sure it is unique to my current
>>>> thread.  No locking would be necessary if I could do this.
>>>>
>>>> Does anyone have an elegant solution that does something like this?
>>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>>> for a lot of uses...  seems to me this should be a frequently
>>>> occurring problem?
>>>>
>>>>    Best regards,
>>>>      Mika
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


From jay.krell at cornell.edu  Sat Oct 18 00:42:35 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 17 Oct 2008 22:42:35 +0000
Subject: [M3devel] M3 programming problem : GC efficiency /
	per-thread	storage areas?
In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
References: Your message of 
	<200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
Message-ID: <COL101-W48200DF8FB7269A7B2E229E6320@phx.gbl>


Right and wrong.

Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGetValue.
Wrong that this necessarily incurs a switch to the supervisor/kernel, and perhaps wrong to call that at a "context switch". It depends on the operating system.

I will explain.

On Windows/x86, the FS register points to a partly documented per-thread data structure.
C and C++ exception handling use FS:0.
Disassemble any code. You'll find it is used. Not by Modula-3 though.

Disassemble TlsGetValue.

 cdb /z %windir%\system32\kernel32.dll  

0:000> uf kernel32!TlsGetValue
kernel32!TlsGetValue:

 typical looking prolog.. 
7dd813e0 8bff            mov     edi,edi
7dd813e2 55              push    ebp
7dd813e3 8bec            mov     ebp,esp

 fs:18 contains a "normal" "linear" pointer to fs:0 
 Get that pointer. 
7dd813e5 64a118000000    mov     eax,dword ptr fs:[00000018h]

 get the index 
7dd813eb 8b4d08          mov     ecx,dword ptr [ebp+8]

 SetLastError(0) 
7dd813ee 83603400        and     dword ptr [eax+34h],0

  There are 64 preallocated thread local slots -- compare the index to 64. 
7dd813f2 83f940          cmp     ecx,40h   

  If it above or equal to 64, go use the non preallocated slots. 
7dd813f5 0f8353e20200    jae     kernel32!lstrcmpi+0x4b22 (7ddaf64e)

  preallocated slots are at fs:e10; get the data and done  
kernel32!TlsGetValue+0x1b:
7dd813fb 8b8488100e0000  mov     eax,dword ptr [eax+ecx*4+0E10h]

 epilog 

kernel32!TlsGetValue+0x22:
7dd81402 5d              pop     ebp
7dd81403 c20400          ret     4

 get here for indices>= 64
 compare index to 1088 == 1024 + 64, as there are another 1024 more slowly available slots  

kernel32!lstrcmpi+0x4b22:
7ddaf64e 81f940040000    cmp     ecx,440h

 if it is below 1024, go use those slots 

7ddaf654 7211            jb      kernel32!lstrcmpi+0x4b3b (7ddaf667)

 index is above or equal to 1024, SetLastError(invalid parameter) 

kernel32!lstrcmpi+0x4b2a:
7ddaf656 680d0000c0      push    0C000000Dh
7ddaf65b e80025fdff      call    kernel32!GetProcessHeap+0x12 (7dd81b60)

 and return 0 -- 0 is not unambiguously an error -- that's why last error was cleared at the start 

kernel32!lstrcmpi+0x4b34:
7ddaf660 33c0            xor     eax,eax
7ddaf662 e99b1dfdff      jmp     kernel32!TlsGetValue+0x22 (7dd81402)

 This is where the slots between 64 and 1088 are used. 
 Get pointer from FS:F94 and compare to null.
  If it is null, that is ok, it means nobody has yet calls TlsSetValue for this value,
  so it just retains its initial 0 value. 
kernel32!lstrcmpi+0x4b3b:
7ddaf667 8b80940f0000    mov     eax,dword ptr [eax+0F94h]
7ddaf66d 85c0            test    eax,eax
7ddaf66f 74ef            je      kernel32!lstrcmpi+0x4b34 (7ddaf660)

 Index is between 64 and 1088, and there is a non null pointer at FS:F94.
 Subtract 64 from index and index into pointer there. 
 Note it does the subtraction after the multiplication, so subtracts 64*4=0x100.

kernel32!lstrcmpi+0x4b45:
7ddaf671 8b848800ffffff  mov     eax,dword ptr [eax+ecx*4-100h]
7ddaf678 e9851dfdff      jmp     kernel32!TlsGetValue+0x22 (7dd81402)


So, it is a few instructions but there is no context switch into the kernel/supervisor.

Also, calls into the kernel aren't necessarily a "context switch".
Some context is saved, and a bit is twiddled in the processor to indicate a privilege level change, but no page tables are altered and I believe no TLBs (translation lookaside buffer) are invalidated, and no thread scheduling decisions are made -- though upon exit from the kernel, APCs (asynchronous procedure call) can be run -- on the calling thread. 

A more expensive context switch is when another thread or another process runs.
Switching threads requires saving more context, and switching processes requires changing the register that points to the page tables.
One detail there -- calling into the x86 NT kernel does not preserve floating point state -- that's the additional state that a thread switch has to save, at least. NT/x86 kernel drivers aren't allowed to use floating point, with some exception, like if they are video drivers (only certain functions?) or they explicitly save/restore the floating point registers using public functions.
I don't know about the other architectures. I think IA64 only preserves some floating point state, not all.


Now, the question then is how is Upthread.getspecific implemented on other archictures and operating systems.
We should look into that for various operating systems.


Oh, also, let's see what __declspec(thread) does.

>type t.c


__declspec(thread) int a;

void F1(int);

void F2() { F1(a); }

cl -c t.c

link -dump -disasm t.obj


Dump of file t.obj

File Type: COFF OBJECT

_F2:
  00000000: 55                 push        ebp
  00000001: 8B EC              mov         ebp,esp
  00000003: A1 00 00 00 00     mov         eax,dword ptr [__tls_index]
  00000008: 64 8B 0D 00 00 00  mov         ecx,dword ptr fs:[__tls_array]
            00
  0000000F: 8B 14 81           mov         edx,dword ptr [ecx+eax*4]
  00000012: 8B 82 00 00 00 00  mov         eax,dword ptr _a[edx]
  00000018: 50                 push        eax
  00000019: E8 00 00 00 00     call        _F1
  0000001E: 83 C4 04           add         esp,4
  00000021: 5D                 pop         ebp
  00000022: C3                 ret

See the compiler generated code reference FS directly.

The optimized version is:

Dump of file t.obj

File Type: COFF OBJECT

_F2:
  00000000: A1 00 00 00 00     mov         eax,dword ptr [__tls_index]
  00000005: 64 8B 0D 00 00 00  mov         ecx,dword ptr fs:[__tls_array]
            00
  0000000C: 8B 14 81           mov         edx,dword ptr [ecx+eax*4]
  0000000F: 8B 82 00 00 00 00  mov         eax,dword ptr _a[edx]
  00000015: 50                 push        eax
  00000016: E8 00 00 00 00     call        _F1
  0000001B: 59                 pop         ecx
  0000001C: C3                 ret

 - Jay


> To: hosking at cs.purdue.edu
> Date: Fri, 17 Oct 2008 01:32:28 -0700
> From: mika at async.caltech.edu
> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu
> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas?
>
> Ok I am sorry I am slow to pick up on this.
>
> I take it the problem is actually the Upthread.getspecific routine,
> which itself calls something get_curthread somewhere inside pthreads,
> which in turn involves a context switch to the supervisor---the identity
> of the current thread is just not accessible anywhere in user space.
> Also explains why this program runs faster with my old PM3, which uses
> longjmp threads.
>
> The only way to avoid it (really) is to pass a pointer to the
> Thread.T of the currently executing thread in the activation record
> of *every* procedure, so that allocators can find it when necessary....
> but that is very expensive in terms of stack memory.
>
> Or I can just make a structure like that that I pass around where
> I need it in my own program. Thread-specific and user-managed.
>
> I believe I have just answered all my own questions, but I hope
> Tony will correct me if my answers are incorrect.
>
> Mika
>
> Tony Hosking writes:
>>I suspect part of the overhead of allocation in the new code is the
>>need for thread-local allocation buffers, which means we need to
>>access thread-local state. We really need an efficient way to do
>>that, but pthreads thread-local accesses may be what is killing you.
>>
>>On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>>
>>> Hi Tony,
>>>
>>> I figured you would chime in!
>>>
>>> Yes, @M3noincremental seems to make things consistently a tad bit
>>> slower (but a very small difference), on both FreeBSD and Linux.
>>> @M3nogc makes a bigger difference, of course.
>>>
>>> Unfortunately I seem to have lost the code that did a lot of memory
>>> allocations. My tricks (as described in the email---and others!)
>>> have removed most of the troublesome memory allocations, but now
>>> I'm stuck with the mutex instead...
>>>
>>> Mika
>>>
>>> Tony Hosking writes:
>>>> Have you tried running @M3noincremental?
>>>>
>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> As I mentioned in an earlier email about printing structures (thanks
>>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter
>>>>> Norvig's
>>>>> JScheme for Java (well it was at first strongly based, but more and
>>>>> more loosely, if you know what I mean...)
>>>>>
>>>>> I expected that the performance of the interpreter would be much
>>>>> better in Modula-3 than in Java, and I have been testing on two
>>>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3,
>>>>> and the other is CM3 on a recent Debian system. What I am finding
>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3
>>>>> (getting
>>>>> close to ten times as fast on some tasks at this point), but on
>>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>>
>>>>> When I started, with code that was essentially equivalent to
>>>>> JScheme,
>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to
>>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>>> and garbage collection. The speedup I have achieved between the
>>>>> first implementation and now was due to the use of Modula-3
>>>>> constructs
>>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>>> to make small stacks rather than linked lists. (I get readable
>>>>> code with much fewer memory allocations and GC work.)
>>>>>
>>>>> Now, since this is an interpreter, I as the implementer have limited
>>>>> control over how much memory is allocated and freed, and where it is
>>>>> needed. However, I can sometimes fall back on C-style memory
>>>>> management,
>>>>> but I would like to do it in a safe way. For instance, I have
>>>>> special-cased
>>>>> evaluation of Scheme primitives, as follows.
>>>>>
>>>>> Under the "normal" implementation, a list of things to evaluate is
>>>>> built up, passed to an evaluation function, and then the GC is left
>>>>> to sweep up the mess. The problem is that there are various tricky
>>>> routes by which references can escape the evaluator, so you can't
>>>>> just assume that what you put in is going to be dead right after
>>>>> an eval and free it. Instead, I set a flag in the evaluator, which
>>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>>> it's unclear (in which case the problem is left up to the GC).
>>>>>
>>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>>> list right after the eval. Now of course I am not interested
>>>>> in unsafe code, so what I do is this:
>>>>>
>>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>>
>>>>> VAR
>>>>> mu := NEW(MUTEX);
>>>>> free : Pair := NIL;
>>>>>
>>>>> PROCEDURE GetPair() : Pair =
>>>>> BEGIN
>>>>> LOCK mu DO
>>>>> IF free # NIL THEN
>>>>> TRY
>>>>> RETURN free
>>>>> FINALLY
>>>>> free := free.rest
>>>>> END
>>>>> END
>>>>> END;
>>>>> RETURN NEW(Pair)
>>>>> END GetPair;
>>>>>
>>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>> BEGIN
>>>>> cons.first := NIL;
>>>>> LOCK mu DO
>>>>> cons.rest := free;
>>>>> free := cons
>>>>> END
>>>>> END ReturnPair;
>>>>>
>>>>> my eval code looks like
>>>>>
>>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>>
>>>>> args := GetPair(); ...
>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>>
>>>>> IF okToFree THEN ReturnPair(args) END;
>>>>> RETURN result
>>>>> END
>>>>>
>>>>> and this does work well. In fact it speeds up the Linux
>>>>> implementation
>>>>> by almost 100% to recycle the lists like this *just* for the
>>>>> evaluation of Scheme primitives.
>>>>>
>>>>> But it's still ugly, isn't it? There's a mutex, and a global
>>>>> variable. And yes, the time spent messing with the mutex is
>>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>>> (and that is coming!)
>>>>>
>>>>> So I'm thinking, what I really want is a structure that is attached
>>>>> to my current Thread.T. I want to be able to access just a single
>>>>> pointer (like the free list) but be sure it is unique to my current
>>>>> thread. No locking would be necessary if I could do this.
>>>>>
>>>>> Does anyone have an elegant solution that does something like this?
>>>>> Thread-specific "static" variables? Just one REFANY would be enough
>>>>> for a lot of uses... seems to me this should be a frequently
>>>>> occurring problem?
>>>>>
>>>>> Best regards,
>>>>> Mika
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>


From mika at async.caltech.edu  Sat Oct 18 01:00:28 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 17 Oct 2008 16:00:28 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 22:42:35 -0000."
	<COL101-W48200DF8FB7269A7B2E229E6320@phx.gbl> 
Message-ID: <200810172300.m9HN0SfN008554@camembert.async.caltech.edu>


No, I didn't mean that it *necessarily* involves a context switch.
Obviously it doesn't, because the user-level threading doesn't
ever need to do a "kernel" context switch (but of course does its
own switching, however I don't see that it would need that to get 
or set a variable).

I just meant that looking at the (C) implementation of pthreads I
have (on FreeBSD), on that system, it does seem to, as the code in
question is marked as "kernel code".

In any case I think I have been able to solve my particular problem
by identifying a data structure that is inherently only accessed
from a single thread (in my program) and attaching my memory recycling
trickery to that particular structure.  I get very little memory
allocation/GC and no need for locks at all, which is precisely the
effect I was going for.

I am still a little bit concerned about the performance of CM3-generated
code but the main culprit appears to be TYPECASE/ISTYPE now, far
from garbage collectors and thread libraries.  I'll send an update
if I can find something egregiously inefficient.

    Mika

Jay writes:
>
>Right and wrong.
>
>Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGet
>Value.
>Wrong that this necessarily incurs a switch to the supervisor/kernel, and perh
>aps wrong to call that at a "context switch". It depends on the operating syst
>em.
>
>I will explain.
>
>On Windows/x86, the FS register points to a partly documented per-thread data 
>structure.
>C and C++ exception handling use FS:0.
>Disassemble any code. You'll find it is used. Not by Modula-3 though.
>
>Disassemble TlsGetValue.
>
> cdb /z %windir%\system32\kernel32.dll  
>
>0:000> uf kernel32!TlsGetValue
>kernel32!TlsGetValue:
...


From mika at async.caltech.edu  Sat Oct 18 10:41:30 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Sat, 18 Oct 2008 01:41:30 -0700
Subject: [M3devel] Fortran
Message-ID: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu>


Ok now in the realm of crazy questions---and I apologize to those
whose inboxes I clog with some of my emails...

If there is anyone out there in Modula-3-ether who has ever written
or heard of ...

  an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs

... would he please make himself known to me?  (I have a Scheme
interpreter to trade...)

    Mika


From lemming at henning-thielemann.de  Sat Oct 18 17:34:50 2008
From: lemming at henning-thielemann.de (Henning Thielemann)
Date: Sat, 18 Oct 2008 17:34:50 +0200 (MEST)
Subject: [M3devel] Fortran
In-Reply-To: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu>
References: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu>
Message-ID: <Pine.SOC.4.64.0810181646120.28054@haydn.informatik.uni-halle.de>


On Sat, 18 Oct 2008, Mika Nystrom wrote:

> Ok now in the realm of crazy questions---and I apologize to those
> whose inboxes I clog with some of my emails...
>
> If there is anyone out there in Modula-3-ether who has ever written
> or heard of ...
>
>  an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs
>
> ... would he please make himself known to me?  (I have a Scheme
> interpreter to trade...)

I have written a program for generating Modula-3 interfaces for LAPACK 
(linear algebra routines) using m3coco. But I'm afraid that my Fortran 
parser works only for LAPACK and no other library. I have just copied the 
CVS files to
    http://modula3.elegosoft.com/cgi-bin/cvsweb.cgi/m3/pm3/language/parsing/m3coco/test/?cvsroot=PM3
   Before you check this out, I might move it to a different location, 
maybe cm3/m3-tools, if this is more appropriate. (Maybe you also need the 
revised m3coco version, which I only have on a branch, and never tried to 
merge it back to HEAD.)


While searching my own code in the net, I found some nice interviews with 
Luca Cardelli:
   http://www.wikio.com/technology/development/modula-3


From mika at async.caltech.edu  Tue Oct 21 13:05:01 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Tue, 21 Oct 2008 04:05:01 -0700
Subject: [M3devel] CM3 on Mac OS X Tiger
Message-ID: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu>

Hello everyone,

Sorry if I have asked this before---I feel I must have, and Tony
probably answered it, too, but I can't find it anywhere in my email
archives.

It looks like I finally upgraded my Mac to Tiger a half year ago,
and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
I am finally getting around to fixing it.  Now I am trying to
compile CM3 in accordance with Tony's instructions as of June 24, 2007:

(short quote here)
> cd ~/cm3-cvs
> mkdir boot
> cd boot
> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
> ./cminstall

Now you will have some kind of cm3 installed, presumably in /usr/
local/cm3/bin/cm3.

Make sure you have a fresh CVS checkout in directory cm3 (let's
assume this is in your home directory ~/cm3).  Also, make sure you
have an up-to-date version of the CM3 backend compiler cm3cg
installed by executing the following:

STEP 0:

export CM3=/usr/local/cm3/bin/cm3
cd ~/cm3/m3-sys/m3cc
$CM3
$CM3 -ship

You can skip this last step if you know your backend compiler is up
to date.

Now, let's build the new compiler from scratch (this is the sequence
I use regularly to test changes to the run-time system whenever I
make them):

STEP 1:

cd ~/cm3/m3-libs/m3core
$CM3
$CM3 -ship
(end short quote, there's much more)

What happens is that when building m3core, my compiler is building
it against the interfaces in /usr/local/cm3, NOT the interfaces
within m3core itself:

--- building in PPC_DARWIN ---

ignoring ../src/m3overrides

new source -> compiling RTCollector.m3
"../src/runtime/common/RTCollector.m3", line 2914: unknown qualification '.' (AMD64_LINUX)
"../src/runtime/common/RTCollector.m3", line 2915: unknown qualification '.' (SPARC32_LINUX)
"../src/runtime/common/RTCollector.m3", line 2916: unknown qualification '.' (SPARC64_OPENBSD)
"../src/runtime/common/RTCollector.m3", line 2917: unknown qualification '.' (PPC32_OPENBSD)
4 errors encountered
stale imports -> compiling RTDebug.m3

Fatal Error: bad version stamps: RTDebug.m3

version stamp mismatch: Compiler.Platform
  <df3c2b13d1d385ee> => RTDebug.m3
  <da77490d024222ef> => Compiler.i3  
version stamp mismatch: Compiler.ThisPlatform
  <8b5a6f513e082750> => RTDebug.m3
  <8e110d4fed998051> => Compiler.i3  

I feel like I should REALLY know the answer to this, but how do I 
get the compiler to use only the local sources and not attempt
to compile things with reference to the already-installed 
interfaces?

    Mika


From hosking at cs.purdue.edu  Tue Oct 21 13:21:36 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 12:21:36 +0100
Subject: [M3devel] CM3 on Mac OS X Tiger
In-Reply-To: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu>
References: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu>
Message-ID: <27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu>

This is a phase ordering problem that arises when you use an old  
compiler to compile newer sources.  It really should be fixed  
somehow.  In any case, the problem is those lines in RTCollector at  
the bottom (I deleted them yesterday on the main trunk) that refer to  
values supposedly built in to the compiler (which are not there for  
the old binary you are using).  I think if you delete those lines then  
you should be OK.  Once you have a new compiler bootstrapped (with  
those configuration values available built in) then you should be able  
to compile that code (excepting that I just deleted those lines  
yesterday).


On 21 Oct 2008, at 12:05, Mika Nystrom wrote:

> Hello everyone,
>
> Sorry if I have asked this before---I feel I must have, and Tony
> probably answered it, too, but I can't find it anywhere in my email
> archives.
>
> It looks like I finally upgraded my Mac to Tiger a half year ago,
> and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
> I am finally getting around to fixing it.  Now I am trying to
> compile CM3 in accordance with Tony's instructions as of June 24,  
> 2007:
>
> (short quote here)
>> cd ~/cm3-cvs
>> mkdir boot
>> cd boot
>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
>> ./cminstall
>
> Now you will have some kind of cm3 installed, presumably in /usr/
> local/cm3/bin/cm3.
>
> Make sure you have a fresh CVS checkout in directory cm3 (let's
> assume this is in your home directory ~/cm3).  Also, make sure you
> have an up-to-date version of the CM3 backend compiler cm3cg
> installed by executing the following:
>
> STEP 0:
>
> export CM3=/usr/local/cm3/bin/cm3
> cd ~/cm3/m3-sys/m3cc
> $CM3
> $CM3 -ship
>
> You can skip this last step if you know your backend compiler is up
> to date.
>
> Now, let's build the new compiler from scratch (this is the sequence
> I use regularly to test changes to the run-time system whenever I
> make them):
>
> STEP 1:
>
> cd ~/cm3/m3-libs/m3core
> $CM3
> $CM3 -ship
> (end short quote, there's much more)
>
> What happens is that when building m3core, my compiler is building
> it against the interfaces in /usr/local/cm3, NOT the interfaces
> within m3core itself:
>
> --- building in PPC_DARWIN ---
>
> ignoring ../src/m3overrides
>
> new source -> compiling RTCollector.m3
> "../src/runtime/common/RTCollector.m3", line 2914: unknown  
> qualification '.' (AMD64_LINUX)
> "../src/runtime/common/RTCollector.m3", line 2915: unknown  
> qualification '.' (SPARC32_LINUX)
> "../src/runtime/common/RTCollector.m3", line 2916: unknown  
> qualification '.' (SPARC64_OPENBSD)
> "../src/runtime/common/RTCollector.m3", line 2917: unknown  
> qualification '.' (PPC32_OPENBSD)
> 4 errors encountered
> stale imports -> compiling RTDebug.m3
>
> Fatal Error: bad version stamps: RTDebug.m3
>
> version stamp mismatch: Compiler.Platform
>  <df3c2b13d1d385ee> => RTDebug.m3
>  <da77490d024222ef> => Compiler.i3
> version stamp mismatch: Compiler.ThisPlatform
>  <8b5a6f513e082750> => RTDebug.m3
>  <8e110d4fed998051> => Compiler.i3
>
> I feel like I should REALLY know the answer to this, but how do I
> get the compiler to use only the local sources and not attempt
> to compile things with reference to the already-installed
> interfaces?
>
>    Mika


From hosking at cs.purdue.edu  Tue Oct 21 16:54:58 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 15:54:58 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
Message-ID: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu>

I have one more question that I forgot to ask before.  Did you  
evaluate performance with -O3 optimization in the backend?

Generally, I have the following in my m3_backend specs so that turning  
on optimization results in -O3 (and lots of lovely inlining):

proc m3_backend (source, object, optimize, debug) is
   local args =
   [
     "-m32",
     "-quiet",
     source,
     "-o",
     object,
     % fPIC really is needed here, despite man gcc saying it is the  
default.
     % This is because man gcc is about Apple's gcc but m3cg is
     % built from FSF source.
     "-fPIC",
     "-fno-reorder-blocks"
   ]
   if optimize  args += "-O3"  end
   if debug     args += "-gstabs"  end
   if M3_PROFILING args += "-p" end
   return try_exec (m3back, args)
end


On 17 Oct 2008, at 09:32, Mika Nystrom wrote:

> Ok I am sorry I am slow to pick up on this.
>
> I take it the problem is actually the Upthread.getspecific routine,
> which itself calls something get_curthread somewhere inside pthreads,
> which in turn involves a context switch to the supervisor---the  
> identity
> of the current thread is just not accessible anywhere in user space.
> Also explains why this program runs faster with my old PM3, which uses
> longjmp threads.
>
> The only way to avoid it (really) is to pass a pointer to the
> Thread.T of the currently executing thread in the activation record
> of *every* procedure, so that allocators can find it when  
> necessary....
> but that is very expensive in terms of stack memory.
>
> Or I can just make a structure like that that I pass around where
> I need it in my own program.  Thread-specific and user-managed.
>
> I believe I have just answered all my own questions, but I hope
> Tony will correct me if my answers are incorrect.
>
>    Mika
>
> Tony Hosking writes:
>> I suspect part of the overhead of allocation in the new code is the
>> need for thread-local allocation buffers, which means we need to
>> access thread-local state.  We really need an efficient way to do
>> that, but pthreads thread-local accesses may be what is killing you.
>>
>> On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>>
>>> Hi Tony,
>>>
>>> I figured you would chime in!
>>>
>>> Yes, @M3noincremental seems to make things consistently a tad bit
>>> slower (but a very small difference), on both FreeBSD and Linux.
>>> @M3nogc makes a bigger difference, of course.
>>>
>>> Unfortunately I seem to have lost the code that did a lot of memory
>>> allocations.  My tricks (as described in the email---and others!)
>>> have removed most of the troublesome memory allocations, but now
>>> I'm stuck with the mutex instead...
>>>
>>>     Mika
>>>
>>> Tony Hosking writes:
>>>> Have you tried running @M3noincremental?
>>>>
>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> As I mentioned in an earlier email about printing structures  
>>>>> (thanks
>>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter
>>>>> Norvig's
>>>>> JScheme for Java (well it was at first strongly based, but more  
>>>>> and
>>>>> more loosely, if you know what I mean...)
>>>>>
>>>>> I expected that the performance of the interpreter would be much
>>>>> better in Modula-3 than in Java, and I have been testing on two
>>>>> different systems.  One is my ancient FreeBSD-4.11 with an old  
>>>>> PM3,
>>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3
>>>>> (getting
>>>>> close to ten times as fast on some tasks at this point), but on
>>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>>
>>>>> When I started, with code that was essentially equivalent to
>>>>> JScheme,
>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>>> and garbage collection.  The speedup I have achieved between the
>>>>> first implementation and now was due to the use of Modula-3
>>>>> constructs
>>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>>> to make small stacks rather than linked lists.  (I get readable
>>>>> code with much fewer memory allocations and GC work.)
>>>>>
>>>>> Now, since this is an interpreter, I as the implementer have  
>>>>> limited
>>>>> control over how much memory is allocated and freed, and where  
>>>>> it is
>>>>> needed.  However, I can sometimes fall back on C-style memory
>>>>> management,
>>>>> but I would like to do it in a safe way.  For instance, I have
>>>>> special-cased
>>>>> evaluation of Scheme primitives, as follows.
>>>>>
>>>>> Under the "normal" implementation, a list of things to evaluate is
>>>>> built up, passed to an evaluation function, and then the GC is  
>>>>> left
>>>>> to sweep up the mess.  The problem is that there are various  
>>>>> tricky
>>>> routes by which references can escape the evaluator, so you can't
>>>>> just assume that what you put in is going to be dead right after
>>>>> an eval and free it.  Instead, I set a flag in the evaluator,  
>>>>> which
>>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>>> it's unclear (in which case the problem is left up to the GC).
>>>>>
>>>>> For the vast majority of Scheme primitives, one can indeed free  
>>>>> the
>>>>> list right after the eval.  Now of course I am not interested
>>>>> in unsafe code, so what I do is this:
>>>>>
>>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>>
>>>>> VAR
>>>>> mu := NEW(MUTEX);
>>>>> free : Pair := NIL;
>>>>>
>>>>> PROCEDURE GetPair() : Pair =
>>>>> BEGIN
>>>>>  LOCK mu DO
>>>>>    IF free # NIL THEN
>>>>>      TRY
>>>>>        RETURN free
>>>>>      FINALLY
>>>>>        free := free.rest
>>>>>      END
>>>>>    END
>>>>>  END;
>>>>>  RETURN NEW(Pair)
>>>>> END GetPair;
>>>>>
>>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>> BEGIN
>>>>>  cons.first := NIL;
>>>>>  LOCK mu DO
>>>>>    cons.rest := free;
>>>>>    free := cons
>>>>>  END
>>>>> END ReturnPair;
>>>>>
>>>>> my eval code looks like
>>>>>
>>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>>
>>>>> args := GetPair(); ...
>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>>
>>>>> IF okToFree THEN ReturnPair(args) END;
>>>>> RETURN result
>>>>> END
>>>>>
>>>>> and this does work well.  In fact it speeds up the Linux
>>>>> implementation
>>>>> by almost 100% to recycle the lists like this *just* for the
>>>>> evaluation of Scheme primitives.
>>>>>
>>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>>> variable.  And yes, the time spent messing with the mutex is
>>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>>> (and that is coming!)
>>>>>
>>>>> So I'm thinking, what I really want is a structure that is  
>>>>> attached
>>>>> to my current Thread.T.  I want to be able to access just a single
>>>>> pointer (like the free list) but be sure it is unique to my  
>>>>> current
>>>>> thread.  No locking would be necessary if I could do this.
>>>>>
>>>>> Does anyone have an elegant solution that does something like  
>>>>> this?
>>>>> Thread-specific "static" variables?  Just one REFANY would be  
>>>>> enough
>>>>> for a lot of uses...  seems to me this should be a frequently
>>>>> occurring problem?
>>>>>
>>>>>   Best regards,
>>>>>     Mika
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>


From hosking at cs.purdue.edu  Tue Oct 21 17:17:24 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 16:17:24 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu>
References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
	<34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu>
Message-ID: <1396C14A-B23D-4D19-804B-B1627B44106F@cs.purdue.edu>

Also, turn off assertions.

On 21 Oct 2008, at 15:54, Tony Hosking wrote:

> I have one more question that I forgot to ask before.  Did you  
> evaluate performance with -O3 optimization in the backend?
>
> Generally, I have the following in my m3_backend specs so that  
> turning on optimization results in -O3 (and lots of lovely inlining):
>
> proc m3_backend (source, object, optimize, debug) is
>  local args =
>  [
>    "-m32",
>    "-quiet",
>    source,
>    "-o",
>    object,
>    % fPIC really is needed here, despite man gcc saying it is the  
> default.
>    % This is because man gcc is about Apple's gcc but m3cg is
>    % built from FSF source.
>    "-fPIC",
>    "-fno-reorder-blocks"
>  ]
>  if optimize  args += "-O3"  end
>  if debug     args += "-gstabs"  end
>  if M3_PROFILING args += "-p" end
>  return try_exec (m3back, args)
> end
>
>
> On 17 Oct 2008, at 09:32, Mika Nystrom wrote:
>
>> Ok I am sorry I am slow to pick up on this.
>>
>> I take it the problem is actually the Upthread.getspecific routine,
>> which itself calls something get_curthread somewhere inside pthreads,
>> which in turn involves a context switch to the supervisor---the  
>> identity
>> of the current thread is just not accessible anywhere in user space.
>> Also explains why this program runs faster with my old PM3, which  
>> uses
>> longjmp threads.
>>
>> The only way to avoid it (really) is to pass a pointer to the
>> Thread.T of the currently executing thread in the activation record
>> of *every* procedure, so that allocators can find it when  
>> necessary....
>> but that is very expensive in terms of stack memory.
>>
>> Or I can just make a structure like that that I pass around where
>> I need it in my own program.  Thread-specific and user-managed.
>>
>> I believe I have just answered all my own questions, but I hope
>> Tony will correct me if my answers are incorrect.
>>
>>   Mika
>>
>> Tony Hosking writes:
>>> I suspect part of the overhead of allocation in the new code is the
>>> need for thread-local allocation buffers, which means we need to
>>> access thread-local state.  We really need an efficient way to do
>>> that, but pthreads thread-local accesses may be what is killing you.
>>>
>>> On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>>>
>>>> Hi Tony,
>>>>
>>>> I figured you would chime in!
>>>>
>>>> Yes, @M3noincremental seems to make things consistently a tad bit
>>>> slower (but a very small difference), on both FreeBSD and Linux.
>>>> @M3nogc makes a bigger difference, of course.
>>>>
>>>> Unfortunately I seem to have lost the code that did a lot of memory
>>>> allocations.  My tricks (as described in the email---and others!)
>>>> have removed most of the troublesome memory allocations, but now
>>>> I'm stuck with the mutex instead...
>>>>
>>>>    Mika
>>>>
>>>> Tony Hosking writes:
>>>>> Have you tried running @M3noincremental?
>>>>>
>>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>>>
>>>>>> Hello Modula-3 people,
>>>>>>
>>>>>> As I mentioned in an earlier email about printing structures  
>>>>>> (thanks
>>>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter
>>>>>> Norvig's
>>>>>> JScheme for Java (well it was at first strongly based, but more  
>>>>>> and
>>>>>> more loosely, if you know what I mean...)
>>>>>>
>>>>>> I expected that the performance of the interpreter would be much
>>>>>> better in Modula-3 than in Java, and I have been testing on two
>>>>>> different systems.  One is my ancient FreeBSD-4.11 with an old  
>>>>>> PM3,
>>>>>> and the other is CM3 on a recent Debian system.  What I am  
>>>>>> finding
>>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3
>>>>>> (getting
>>>>>> close to ten times as fast on some tasks at this point), but on
>>>>>> Linux/CM3 it is much closer in speed to JScheme than I would  
>>>>>> like.
>>>>>>
>>>>>> When I started, with code that was essentially equivalent to
>>>>>> JScheme,
>>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>>>> and garbage collection.  The speedup I have achieved between the
>>>>>> first implementation and now was due to the use of Modula-3
>>>>>> constructs
>>>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>>>> to make small stacks rather than linked lists.  (I get readable
>>>>>> code with much fewer memory allocations and GC work.)
>>>>>>
>>>>>> Now, since this is an interpreter, I as the implementer have  
>>>>>> limited
>>>>>> control over how much memory is allocated and freed, and where  
>>>>>> it is
>>>>>> needed.  However, I can sometimes fall back on C-style memory
>>>>>> management,
>>>>>> but I would like to do it in a safe way.  For instance, I have
>>>>>> special-cased
>>>>>> evaluation of Scheme primitives, as follows.
>>>>>>
>>>>>> Under the "normal" implementation, a list of things to evaluate  
>>>>>> is
>>>>>> built up, passed to an evaluation function, and then the GC is  
>>>>>> left
>>>>>> to sweep up the mess.  The problem is that there are various  
>>>>>> tricky
>>>>> routes by which references can escape the evaluator, so you can't
>>>>>> just assume that what you put in is going to be dead right after
>>>>>> an eval and free it.  Instead, I set a flag in the evaluator,  
>>>>>> which
>>>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>>>> it's unclear (in which case the problem is left up to the GC).
>>>>>>
>>>>>> For the vast majority of Scheme primitives, one can indeed free  
>>>>>> the
>>>>>> list right after the eval.  Now of course I am not interested
>>>>>> in unsafe code, so what I do is this:
>>>>>>
>>>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>>>
>>>>>> VAR
>>>>>> mu := NEW(MUTEX);
>>>>>> free : Pair := NIL;
>>>>>>
>>>>>> PROCEDURE GetPair() : Pair =
>>>>>> BEGIN
>>>>>> LOCK mu DO
>>>>>>   IF free # NIL THEN
>>>>>>     TRY
>>>>>>       RETURN free
>>>>>>     FINALLY
>>>>>>       free := free.rest
>>>>>>     END
>>>>>>   END
>>>>>> END;
>>>>>> RETURN NEW(Pair)
>>>>>> END GetPair;
>>>>>>
>>>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>>> BEGIN
>>>>>> cons.first := NIL;
>>>>>> LOCK mu DO
>>>>>>   cons.rest := free;
>>>>>>   free := cons
>>>>>> END
>>>>>> END ReturnPair;
>>>>>>
>>>>>> my eval code looks like
>>>>>>
>>>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>>>
>>>>>> args := GetPair(); ...
>>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>>>
>>>>>> IF okToFree THEN ReturnPair(args) END;
>>>>>> RETURN result
>>>>>> END
>>>>>>
>>>>>> and this does work well.  In fact it speeds up the Linux
>>>>>> implementation
>>>>>> by almost 100% to recycle the lists like this *just* for the
>>>>>> evaluation of Scheme primitives.
>>>>>>
>>>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>>>> variable.  And yes, the time spent messing with the mutex is
>>>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>>>> (and that is coming!)
>>>>>>
>>>>>> So I'm thinking, what I really want is a structure that is  
>>>>>> attached
>>>>>> to my current Thread.T.  I want to be able to access just a  
>>>>>> single
>>>>>> pointer (like the free list) but be sure it is unique to my  
>>>>>> current
>>>>>> thread.  No locking would be necessary if I could do this.
>>>>>>
>>>>>> Does anyone have an elegant solution that does something like  
>>>>>> this?
>>>>>> Thread-specific "static" variables?  Just one REFANY would be  
>>>>>> enough
>>>>>> for a lot of uses...  seems to me this should be a frequently
>>>>>> occurring problem?
>>>>>>
>>>>>>  Best regards,
>>>>>>    Mika
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>


From mika at async.caltech.edu  Tue Oct 21 22:18:07 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Tue, 21 Oct 2008 13:18:07 -0700
Subject: [M3devel] CM3 on Mac OS X Tiger
In-Reply-To: Your message of "Tue, 21 Oct 2008 12:21:36 BST."
	<27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu> 
Message-ID: <200810212018.m9LKI81o019865@camembert.async.caltech.edu>

Hi Tony,

Thanks for helping, as usual!

I ran into this now, is this also a bootstrapping problem?  (Moving
on to building libm3, cleared out existing PPC_DARWIN, have rebuilt
m3cc... only see a single version of Compiler.i3 anywhere...)

Here's the log:

[lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship
--- building in PPC_DARWIN ---

ignoring ../src/m3overrides

new source -> compiling Atom.i3
new source -> compiling AtomList.i3
new source -> compiling OSError.i3
new source -> compiling File.i3
new source -> compiling RegularFile.i3
new source -> compiling Pipe.i3
new source -> compiling TextSeq.i3
new source -> compiling Pathname.i3
new source -> compiling FS.i3
new source -> compiling Process.i3
new source -> compiling Socket.i3
new source -> compiling Terminal.i3
new source -> compiling FS.m3
new source -> compiling Terminal.m3
new source -> compiling RegularFile.m3
new source -> compiling Pipe.m3
new source -> compiling Socket.m3
new source -> compiling OSConfig.i3
new source -> compiling OSErrorPosix.i3
new source -> compiling Fmt.i3
new source -> compiling OSErrorPosix.m3
new source -> compiling FilePosix.i3
new source -> compiling FilePosix.m3
new source -> compiling FSPosix.m3
new source -> compiling PipePosix.m3
new source -> compiling PathnamePosix.m3
new source -> compiling SocketPosix.m3

Fatal Error: bad version stamps: SocketPosix.m3

version stamp mismatch: Compiler.Platform
  <df3c2b13d1d385ee> => SocketPosix.m3
  <da77490d024222ef> => Compiler.i3  
version stamp mismatch: Compiler.ThisPlatform
  <8b5a6f513e082750> => SocketPosix.m3
  <8e110d4fed998051> => Compiler.i3  
[lapdog:~/cm3/m3-libs/libm3] mika% 

Tony Hosking writes:
>This is a phase ordering problem that arises when you use an old  
>compiler to compile newer sources.  It really should be fixed  
>somehow.  In any case, the problem is those lines in RTCollector at  
>the bottom (I deleted them yesterday on the main trunk) that refer to  
>values supposedly built in to the compiler (which are not there for  
>the old binary you are using).  I think if you delete those lines then  
>you should be OK.  Once you have a new compiler bootstrapped (with  
>those configuration values available built in) then you should be able  
>to compile that code (excepting that I just deleted those lines  
>yesterday).
>
>
>On 21 Oct 2008, at 12:05, Mika Nystrom wrote:
>
>> Hello everyone,
>>
>> Sorry if I have asked this before---I feel I must have, and Tony
>> probably answered it, too, but I can't find it anywhere in my email
>> archives.
>>
>> It looks like I finally upgraded my Mac to Tiger a half year ago,
>> and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
>> I am finally getting around to fixing it.  Now I am trying to
>> compile CM3 in accordance with Tony's instructions as of June 24,  
>> 2007:
>>
>> (short quote here)
>>> cd ~/cm3-cvs
>>> mkdir boot
>>> cd boot
>>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
>>> ./cminstall
>>
>> Now you will have some kind of cm3 installed, presumably in /usr/
>> local/cm3/bin/cm3.
>>
>> Make sure you have a fresh CVS checkout in directory cm3 (let's
>> assume this is in your home directory ~/cm3).  Also, make sure you
>> have an up-to-date version of the CM3 backend compiler cm3cg
>> installed by executing the following:
>>
>> STEP 0:
>>
>> export CM3=/usr/local/cm3/bin/cm3
>> cd ~/cm3/m3-sys/m3cc
>> $CM3
>> $CM3 -ship
>>
>> You can skip this last step if you know your backend compiler is up
>> to date.
>>
>> Now, let's build the new compiler from scratch (this is the sequence
>> I use regularly to test changes to the run-time system whenever I
>> make them):
>>
>> STEP 1:
>>
>> cd ~/cm3/m3-libs/m3core
>> $CM3
>> $CM3 -ship
>> (end short quote, there's much more)
>>
>> What happens is that when building m3core, my compiler is building
>> it against the interfaces in /usr/local/cm3, NOT the interfaces
>> within m3core itself:
>>
>> --- building in PPC_DARWIN ---
>>
>> ignoring ../src/m3overrides
>>
>> new source -> compiling RTCollector.m3
>> "../src/runtime/common/RTCollector.m3", line 2914: unknown  
>> qualification '.' (AMD64_LINUX)
>> "../src/runtime/common/RTCollector.m3", line 2915: unknown  
>> qualification '.' (SPARC32_LINUX)
>> "../src/runtime/common/RTCollector.m3", line 2916: unknown  
>> qualification '.' (SPARC64_OPENBSD)
>> "../src/runtime/common/RTCollector.m3", line 2917: unknown  
>> qualification '.' (PPC32_OPENBSD)
>> 4 errors encountered
>> stale imports -> compiling RTDebug.m3
>>
>> Fatal Error: bad version stamps: RTDebug.m3
>>
>> version stamp mismatch: Compiler.Platform
>>  <df3c2b13d1d385ee> => RTDebug.m3
>>  <da77490d024222ef> => Compiler.i3
>> version stamp mismatch: Compiler.ThisPlatform
>>  <8b5a6f513e082750> => RTDebug.m3
>>  <8e110d4fed998051> => Compiler.i3
>>
>> I feel like I should REALLY know the answer to this, but how do I
>> get the compiler to use only the local sources and not attempt
>> to compile things with reference to the already-installed
>> interfaces?
>>
>>    Mika


From hosking at cs.purdue.edu  Tue Oct 21 23:29:07 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 22:29:07 +0100
Subject: [M3devel] CM3 on Mac OS X Tiger
In-Reply-To: <200810212018.m9LKI81o019865@camembert.async.caltech.edu>
References: <200810212018.m9LKI81o019865@camembert.async.caltech.edu>
Message-ID: <BF077330-03E9-45CB-8F30-27066330331B@cs.purdue.edu>

Hmm.  Not sure.  Looks like it.

On 21 Oct 2008, at 21:18, Mika Nystrom wrote:

> Hi Tony,
>
> Thanks for helping, as usual!
>
> I ran into this now, is this also a bootstrapping problem?  (Moving
> on to building libm3, cleared out existing PPC_DARWIN, have rebuilt
> m3cc... only see a single version of Compiler.i3 anywhere...)
>
> Here's the log:
>
> [lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship
> --- building in PPC_DARWIN ---
>
> ignoring ../src/m3overrides
>
> new source -> compiling Atom.i3
> new source -> compiling AtomList.i3
> new source -> compiling OSError.i3
> new source -> compiling File.i3
> new source -> compiling RegularFile.i3
> new source -> compiling Pipe.i3
> new source -> compiling TextSeq.i3
> new source -> compiling Pathname.i3
> new source -> compiling FS.i3
> new source -> compiling Process.i3
> new source -> compiling Socket.i3
> new source -> compiling Terminal.i3
> new source -> compiling FS.m3
> new source -> compiling Terminal.m3
> new source -> compiling RegularFile.m3
> new source -> compiling Pipe.m3
> new source -> compiling Socket.m3
> new source -> compiling OSConfig.i3
> new source -> compiling OSErrorPosix.i3
> new source -> compiling Fmt.i3
> new source -> compiling OSErrorPosix.m3
> new source -> compiling FilePosix.i3
> new source -> compiling FilePosix.m3
> new source -> compiling FSPosix.m3
> new source -> compiling PipePosix.m3
> new source -> compiling PathnamePosix.m3
> new source -> compiling SocketPosix.m3
>
> Fatal Error: bad version stamps: SocketPosix.m3
>
> version stamp mismatch: Compiler.Platform
>  <df3c2b13d1d385ee> => SocketPosix.m3
>  <da77490d024222ef> => Compiler.i3
> version stamp mismatch: Compiler.ThisPlatform
>  <8b5a6f513e082750> => SocketPosix.m3
>  <8e110d4fed998051> => Compiler.i3
> [lapdog:~/cm3/m3-libs/libm3] mika%
>
> Tony Hosking writes:
>> This is a phase ordering problem that arises when you use an old
>> compiler to compile newer sources.  It really should be fixed
>> somehow.  In any case, the problem is those lines in RTCollector at
>> the bottom (I deleted them yesterday on the main trunk) that refer to
>> values supposedly built in to the compiler (which are not there for
>> the old binary you are using).  I think if you delete those lines  
>> then
>> you should be OK.  Once you have a new compiler bootstrapped (with
>> those configuration values available built in) then you should be  
>> able
>> to compile that code (excepting that I just deleted those lines
>> yesterday).
>>
>>
>> On 21 Oct 2008, at 12:05, Mika Nystrom wrote:
>>
>>> Hello everyone,
>>>
>>> Sorry if I have asked this before---I feel I must have, and Tony
>>> probably answered it, too, but I can't find it anywhere in my email
>>> archives.
>>>
>>> It looks like I finally upgraded my Mac to Tiger a half year ago,
>>> and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
>>> I am finally getting around to fixing it.  Now I am trying to
>>> compile CM3 in accordance with Tony's instructions as of June 24,
>>> 2007:
>>>
>>> (short quote here)
>>>> cd ~/cm3-cvs
>>>> mkdir boot
>>>> cd boot
>>>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
>>>> ./cminstall
>>>
>>> Now you will have some kind of cm3 installed, presumably in /usr/
>>> local/cm3/bin/cm3.
>>>
>>> Make sure you have a fresh CVS checkout in directory cm3 (let's
>>> assume this is in your home directory ~/cm3).  Also, make sure you
>>> have an up-to-date version of the CM3 backend compiler cm3cg
>>> installed by executing the following:
>>>
>>> STEP 0:
>>>
>>> export CM3=/usr/local/cm3/bin/cm3
>>> cd ~/cm3/m3-sys/m3cc
>>> $CM3
>>> $CM3 -ship
>>>
>>> You can skip this last step if you know your backend compiler is up
>>> to date.
>>>
>>> Now, let's build the new compiler from scratch (this is the sequence
>>> I use regularly to test changes to the run-time system whenever I
>>> make them):
>>>
>>> STEP 1:
>>>
>>> cd ~/cm3/m3-libs/m3core
>>> $CM3
>>> $CM3 -ship
>>> (end short quote, there's much more)
>>>
>>> What happens is that when building m3core, my compiler is building
>>> it against the interfaces in /usr/local/cm3, NOT the interfaces
>>> within m3core itself:
>>>
>>> --- building in PPC_DARWIN ---
>>>
>>> ignoring ../src/m3overrides
>>>
>>> new source -> compiling RTCollector.m3
>>> "../src/runtime/common/RTCollector.m3", line 2914: unknown
>>> qualification '.' (AMD64_LINUX)
>>> "../src/runtime/common/RTCollector.m3", line 2915: unknown
>>> qualification '.' (SPARC32_LINUX)
>>> "../src/runtime/common/RTCollector.m3", line 2916: unknown
>>> qualification '.' (SPARC64_OPENBSD)
>>> "../src/runtime/common/RTCollector.m3", line 2917: unknown
>>> qualification '.' (PPC32_OPENBSD)
>>> 4 errors encountered
>>> stale imports -> compiling RTDebug.m3
>>>
>>> Fatal Error: bad version stamps: RTDebug.m3
>>>
>>> version stamp mismatch: Compiler.Platform
>>> <df3c2b13d1d385ee> => RTDebug.m3
>>> <da77490d024222ef> => Compiler.i3
>>> version stamp mismatch: Compiler.ThisPlatform
>>> <8b5a6f513e082750> => RTDebug.m3
>>> <8e110d4fed998051> => Compiler.i3
>>>
>>> I feel like I should REALLY know the answer to this, but how do I
>>> get the compiler to use only the local sources and not attempt
>>> to compile things with reference to the already-installed
>>> interfaces?
>>>
>>>   Mika


From mika at async.caltech.edu  Thu Oct 23 10:24:53 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 23 Oct 2008 01:24:53 -0700
Subject: [M3devel] NEW in RTType.m3
Message-ID: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu>

Hello Modula-3 people,

Does anyone know whether there is anything that prevents using NEW
in RTType.m3?

I added a lot of memory recycling to the Scheme interpreter I am
working on, and now it seems it is spending a lot of time in Typecase
and IsSubtype.  I was wondering if it is possible to memoize IsSubtype
inside RTType.m3...  (specifically just replacing IsSubtype with an
array lookup).  

It is the nature of the interpreter that it spends a lot of time
checking types and narrowing things back and forth, as Scheme and
Modula-3 references share the same representation.

      Mika


From hosking at cs.purdue.edu  Thu Oct 23 12:10:01 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Thu, 23 Oct 2008 11:10:01 +0100
Subject: [M3devel] NEW in RTType.m3
In-Reply-To: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu>
References: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu>
Message-ID: <7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu>

Could be dangerous depending on module link orderings.  Might be  
better to cache your own lookups in your interpreter.

On 23 Oct 2008, at 09:24, Mika Nystrom wrote:

> Hello Modula-3 people,
>
> Does anyone know whether there is anything that prevents using NEW
> in RTType.m3?
>
> I added a lot of memory recycling to the Scheme interpreter I am
> working on, and now it seems it is spending a lot of time in Typecase
> and IsSubtype.  I was wondering if it is possible to memoize IsSubtype
> inside RTType.m3...  (specifically just replacing IsSubtype with an
> array lookup).
>
> It is the nature of the interpreter that it spends a lot of time
> checking types and narrowing things back and forth, as Scheme and
> Modula-3 references share the same representation.
>
>      Mika


From mika at async.caltech.edu  Thu Oct 23 19:29:50 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 23 Oct 2008 10:29:50 -0700
Subject: [M3devel] NEW in RTType.m3
In-Reply-To: Your message of "Thu, 23 Oct 2008 11:10:01 BST."
	<7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu> 
Message-ID: <200810231729.m9NHToMC080136@camembert.async.caltech.edu>


Well I'm not calling Typecase and IsSubtype directly---the compiler
is inserting the calls.

Here's an example of my code:

170           IF x # NIL AND ISTYPE(x,Symbol) THEN
171             RETURN env.lookup(x)
172           ELSIF x = NIL OR NOT ISTYPE(x,Pair) THEN 
173             RETURN x
174           ELSE

this code actually winds up in here (RTType.m3):

PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
  VAR t: RT0.TypeDefn;
  BEGIN
    IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
    t := Get (a);
    IF (t = NIL) THEN RETURN FALSE; END;
    IF (t.typecode = b) THEN RETURN TRUE END;
    WHILE (t.kind = ORD (TK.Obj)) DO
      IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
      t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
      IF (t = NIL) THEN RETURN FALSE; END;
      IF (t.typecode = b) THEN RETURN TRUE; END;
    END;
    IF (t.traced # 0)
      THEN RETURN (b = RT0.RefanyTypecode);
      ELSE RETURN (b = RT0.AddressTypecode);
    END;
  END IsSubtype;

Again this is an example of something where the CM3 code seems to
be hurting more than PM3, but it could be that for some reason I
have more visibility into the CM3 code, or that there's an optimization
difference (I haven't been able to investigate this fully yet).  In
any case, it's clear that if IsSubtype could be replaced with a
table lookup, this kind of code would be accelerated by potentially
a lot.

Note that while in the above example the code might be accelerated
by (in my opinion, less clear) use of TYPECODE (since I never subtype
Symbol or Pair---for now!), this is not so for some NARROWs.  The
NARROWs also wind up calling RTType.IsSubtype, and they arise because
I have types that depend on each other, and unless I want to introduce
extra complexity (new partial revelations) or stick everything in
the same interface, I am forced to NARROW something to avoid a
circular dependency of interfaces...  A method of A.T takes a B.T
and a method of B.T takes an A.T, so I make a supertype X.T s.t.
A.T <: X.T ; then I can declare B.T.m to take an X.T and NARROW it
to A.T within B.T.m... triggering a call to the above code.  (For
simplicity's sake, X.T could be REFANY or ROOT.)  An attempt to
declare B.T.m as taking A.T would lead to a circular dependency
between A and B.  The code is really rather simple and it's a shame
if you have to make it look much more complicated to avoid issues
like these which might equally well be solved by tweaking the runtime
implementation a bit.

     Mika

Tony Hosking writes:
>Could be dangerous depending on module link orderings.  Might be  
>better to cache your own lookups in your interpreter.
>
>On 23 Oct 2008, at 09:24, Mika Nystrom wrote:
>
>> Hello Modula-3 people,
>>
>> Does anyone know whether there is anything that prevents using NEW
>> in RTType.m3?
>>
>> I added a lot of memory recycling to the Scheme interpreter I am
>> working on, and now it seems it is spending a lot of time in Typecase
>> and IsSubtype.  I was wondering if it is possible to memoize IsSubtype
>> inside RTType.m3...  (specifically just replacing IsSubtype with an
>> array lookup).
>>
>> It is the nature of the interpreter that it spends a lot of time
>> checking types and narrowing things back and forth, as Scheme and
>> Modula-3 references share the same representation.
>>
>>      Mika


From mika at async.caltech.edu  Sat Oct 25 05:16:56 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 24 Oct 2008 20:16:56 -0700
Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3
Message-ID: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu>


Dear Modula-3 people,

I had a crash in my program from a range error that I believe
shouldn't have happened the way it did, although it's not in my
code, so I'm not sure if there's a reason for the way it's done (matching
a C declaration somewhere, maybe??).

Here it is, from ThreadPosix.m3:

PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN;
                  timeoutInterval: LONGREAL := -1.0D0): WaitResult =
  <*FATAL Alerted*>
  BEGIN
    self.alertable := FALSE;
    RETURN XIOWait(fd, read, timeoutInterval);
  END IOWait;

PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN;
                  timeoutInterval: LONGREAL := -1.0D0): WaitResult
                  RAISES {Alerted} =
  BEGIN
    self.alertable := TRUE;
    RETURN XIOWait(fd, read, timeoutInterval);
  END IOAlertWait;

PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL): WaitResult
    RAISES {Alerted} =
  VAR res: INTEGER;
      fdindex := fd DIV FDSetSize;
      fdset := FDSet{fd MOD FDSetSize};
... rest omitted ...

Note that IOWait calls XIOWait.  IOWait is declared as taking an
INTEGER, but XIOWait takes a CARDINAL.

So I really should use a CARDINAL in passing to IOWait, but since
IOWait is the interface function it's not clear that I should do
that (until my program crashes after passing -1 from some carelessly
wrapped C code).  I don't like the fact that I get a range error
*inside* the library when it appears unnecessary---it should have
happened in my code, as I make the call.

Suggested improvement: declare all the FDs in SchedulerPosix.i3
(the interface that exports these routines) to be CARDINAL instead
of INTEGER.

     Mika


From hosking at cs.purdue.edu  Mon Oct 27 15:28:52 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Mon, 27 Oct 2008 14:28:52 +0000
Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3
In-Reply-To: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu>
References: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu>
Message-ID: <5232F2E4-3B0E-49E5-B1C8-BB4D04C60C33@cs.purdue.edu>

Sounds fair to me.

On 25 Oct 2008, at 04:16, Mika Nystrom wrote:

>
> Dear Modula-3 people,
>
> I had a crash in my program from a range error that I believe
> shouldn't have happened the way it did, although it's not in my
> code, so I'm not sure if there's a reason for the way it's done  
> (matching
> a C declaration somewhere, maybe??).
>
> Here it is, from ThreadPosix.m3:
>
> PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN;
>                  timeoutInterval: LONGREAL := -1.0D0): WaitResult =
>  <*FATAL Alerted*>
>  BEGIN
>    self.alertable := FALSE;
>    RETURN XIOWait(fd, read, timeoutInterval);
>  END IOWait;
>
> PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN;
>                  timeoutInterval: LONGREAL := -1.0D0): WaitResult
>                  RAISES {Alerted} =
>  BEGIN
>    self.alertable := TRUE;
>    RETURN XIOWait(fd, read, timeoutInterval);
>  END IOAlertWait;
>
> PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL):  
> WaitResult
>    RAISES {Alerted} =
>  VAR res: INTEGER;
>      fdindex := fd DIV FDSetSize;
>      fdset := FDSet{fd MOD FDSetSize};
> ... rest omitted ...
>
> Note that IOWait calls XIOWait.  IOWait is declared as taking an
> INTEGER, but XIOWait takes a CARDINAL.
>
> So I really should use a CARDINAL in passing to IOWait, but since
> IOWait is the interface function it's not clear that I should do
> that (until my program crashes after passing -1 from some carelessly
> wrapped C code).  I don't like the fact that I get a range error
> *inside* the library when it appears unnecessary---it should have
> happened in my code, as I make the call.
>
> Suggested improvement: declare all the FDs in SchedulerPosix.i3
> (the interface that exports these routines) to be CARDINAL instead
> of INTEGER.
>
>     Mika


From jay.krell at cornell.edu  Thu Oct 30 22:21:09 2008
From: jay.krell at cornell.edu (Jay)
Date: Thu, 30 Oct 2008 21:21:09 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
Message-ID: <COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>


Please try this:

 http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2

std failed to build because stubgen crashed, probably due to gc.
cm3 does crash right away without @M3nogc.

Something like this:
    cd /src 
    wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2  
    cd /cm3  
    rm -rf *  
    tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2  
    cd /src/cm3/scripts/python  
    ./do-cm3-all.py realclean  
    ./upgrade.py  
    ./do-cm3-all.py realclean  
    ./do-cm3-std.py buildship  
    => it will fail, at zeus, but it should get far; you'll also need some X devel packages to get that far, I had a failure for lack of libXaw for example. I did not run anything, any of the GUI packages, but building itself with itself is a decent test.

I renamed the old AMD64_LINUX archives to "1.0.0".
 http://www.opencm3.com/uploaded-archives/

This has the bug fix I commited last night to cm3cg, and therefore a 64 bit hosted cm3cg.

jay at amd64a:/cm3/bin$ file *
AMD64_LINUX: ASCII text
cm3:         ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
cm3.cfg:     ASCII English text
cm3cg:       ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li
nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
m3bundle:    ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li
nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
mklib:       ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
Unix.common: ASCII English text

Built on Debian 4.0r4 (r5 is out).
jay at amd64a:/cm3/bin$ uname -a
Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64 GNU/Linux
jay at amd64a:/cm3/bin$ dmesg | head
Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)
Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org) (
gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Tue Aug 19 04:30:56 UTC 2008

Though really I couldn't do it without Visual C++ on Windows providing excellent find-in-files and editing, nothing else comes close, I edit on Windows and scp the files over. :)

 - Jay

________________________________

From: jay.krell at cornell.edu
To: dragisha at m3w.org; m3devel at elegosoft.com
Date: Tue, 9 Sep 2008 09:43:03 +0000
Subject: Re: [M3devel] AMD64_LINUX status


From hosking at cs.purdue.edu  Fri Oct 31 11:19:51 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Fri, 31 Oct 2008 10:19:51 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
Message-ID: <BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>

Umm, I think I found your bug with GC:

Check out "RTMachine.PointerAlignment".  You have it set to  
BITSIZE(INTEGER).  I suspect what you want is something like  
BYTESIZE(ADDRESS).  Also, "RTMachine.StackFrameAlignment" should  
probably be 2*BYTESIZE(ADDRESS).


On 30 Oct 2008, at 21:21, Jay wrote:

>
> Please try this:
>
> http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
>
> std failed to build because stubgen crashed, probably due to gc.
> cm3 does crash right away without @M3nogc.
>
> Something like this:
>    cd /src
>    wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
>    cd /cm3
>    rm -rf *
>    tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- 
> d5.7.0.tar.bz2
>    cd /src/cm3/scripts/python
>    ./do-cm3-all.py realclean
>    ./upgrade.py
>    ./do-cm3-all.py realclean
>    ./do-cm3-std.py buildship
>    => it will fail, at zeus, but it should get far; you'll also need  
> some X devel packages to get that far, I had a failure for lack of  
> libXaw for example. I did not run anything, any of the GUI packages,  
> but building itself with itself is a decent test.
>
> I renamed the old AMD64_LINUX archives to "1.0.0".
> http://www.opencm3.com/uploaded-archives/
>
> This has the bug fix I commited last night to cm3cg, and therefore a  
> 64 bit hosted cm3cg.
>
> jay at amd64a:/cm3/bin$ file *
> AMD64_LINUX: ASCII text
> cm3:         ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs),  
> for GNU/Linux 2.6.0, not stripped
> cm3.cfg:     ASCII English text
> cm3cg:       ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Li
> nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux  
> 2.6.0, not stripped
> m3bundle:    ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Li
> nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux  
> 2.6.0, not stripped
> mklib:       ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs),  
> for GNU/Linux 2.6.0, not stripped
> Unix.common: ASCII English text
>
> Built on Debian 4.0r4 (r5 is out).
> jay at amd64a:/cm3/bin$ uname -a
> Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008  
> x86_64 GNU/Linux
> jay at amd64a:/cm3/bin$ dmesg | head
> Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)
> Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org 
> ) (
> gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP  
> Tue Aug 19 04:30:56 UTC 2008
>
> Though really I couldn't do it without Visual C++ on Windows  
> providing excellent find-in-files and editing, nothing else comes  
> close, I edit on Windows and scp the files over. :)
>
> - Jay
>
> ________________________________
>
> From: jay.krell at cornell.edu
> To: dragisha at m3w.org; m3devel at elegosoft.com
> Date: Tue, 9 Sep 2008 09:43:03 +0000
> Subject: Re: [M3devel] AMD64_LINUX status
>
>
>
>


From jay.krell at cornell.edu  Fri Oct 31 14:52:43 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 31 Oct 2008 13:52:43 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl> 
	<BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
Message-ID: <COL101-W42D125019F4B7531A26BA6E6200@phx.gbl>


Tony, Excellent, thanks, that helps.
How do you know and confirm the right values? I don't like guessing.
 
And then cause then of :) :
 
  SymbolPickling font metrics...Done./cm3/bin/m3bundle -name JunoBundle -F/tmp/qk/cm3/bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTABstubgen: Processing RemoteView.T
****** runtime error:***    NEW() was unable to allocate more memory.***    file "../src/runtime/common/RTAllocator.m3", line 285***
"/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit 1536: /cm3/bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTAB
--procedure--  -line-  -file---exec               --  <builtin>_v_netobj          37  /cm3/pkg/netobj/src/netobj.tmplnetobjv1           44  /cm3/pkg/netobj/src/netobj.tmplnetobj             64  /cm3/pkg/netobj/src/netobj.tmplinclude_dir        71  /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile                    8  /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args
 
 
I should debug it, and double check that I upgraded what had to be upgraded.
 
 - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Fri, 31 Oct 2008 10:19:51 +0000> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD64_LINUX status> > Umm, I think I found your bug with GC:> > Check out "RTMachine.PointerAlignment". You have it set to > BITSIZE(INTEGER). I suspect what you want is something like > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > probably be 2*BYTESIZE(ADDRESS).> > > > On 30 Oct 2008, at 21:21, Jay wrote:> > >> > Please try this:> >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> >> > std failed to build because stubgen crashed, probably due to gc.> > cm3 does crash right away without @M3nogc.> >> > Something like this:> > cd /src> > wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > cd /cm3> > rm -rf *> > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > d5.7.0.tar.bz2> > cd /src/cm3/scripts/python> > ./do-cm3-all.py realclean> > ./upgrade.py> > ./do-cm3-all.py realclean> > ./do-cm3-std.py buildship> > => it will fail, at zeus, but it should get far; you'll also need > > some X devel packages to get that far, I had a failure for lack of > > libXaw for example. I did not run anything, any of the GUI packages, > > but building itself with itself is a decent test.> >> > I renamed the old AMD64_LINUX archives to "1.0.0".> > http://www.opencm3.com/uploaded-archives/> >> > This has the bug fix I commited last night to cm3cg, and therefore a > > 64 bit hosted cm3cg.> >> > jay at amd64a:/cm3/bin$ file *> > AMD64_LINUX: ASCII text> > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > cm3.cfg: ASCII English text> > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > Unix.common: ASCII English text> >> > Built on Debian 4.0r4 (r5 is out).> > jay at amd64a:/cm3/bin$ uname -a> > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > x86_64 GNU/Linux> > jay at amd64a:/cm3/bin$ dmesg | head> > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org > > ) (> > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > Tue Aug 19 04:30:56 UTC 2008> >> > Though really I couldn't do it without Visual C++ on Windows > > providing excellent find-in-files and editing, nothing else comes > > close, I edit on Windows and scp the files over. :)> >> > - Jay> >> > ________________________________> >> > From: jay.krell at cornell.edu> > To: dragisha at m3w.org; m3devel at elegosoft.com> > Date: Tue, 9 Sep 2008 09:43:03 +0000> > Subject: Re: [M3devel] AMD64_LINUX status> >> >> >> >> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081031/dfecf655/attachment.html>

From jay.krell at cornell.edu  Fri Oct 31 15:25:13 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 31 Oct 2008 14:25:13 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <1225462205.14482.60.camel@faramir.m3w.org>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
	<BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
	<COL101-W42D125019F4B7531A26BA6E6200@phx.gbl> 
	<1225462205.14482.60.camel@faramir.m3w.org>
Message-ID: <COL101-W728C265A8AF283199F0034E6200@phx.gbl>


It seems like there's still a problem. I haven't debugged it yet.
(I'm sure glad Tony found the other problem before I debugged it.)
I updated http://www.opencm3.com/uploaded-archives with Tony's fix.
The older builds are now 0.0.0.1 and 0.0.0.2.
 
 - Jay> Subject: Re: [M3devel] AMD64_LINUX status> From: dragisha at m3w.org> To: jay.krell at cornell.edu> CC: hosking at cs.purdue.edu; m3devel at elegosoft.com> Date: Fri, 31 Oct 2008 15:10:05 +0100> > So, we now have fully functional AMD64_LINUX (_with_ GC)?> > TIA> > On Fri, 2008-10-31 at 13:52 +0000, Jay wrote:> > Tony, Excellent, thanks, that helps.> > How do you know and confirm the right values? I don't like guessing.> > > > And then cause then of :) :> > > > Symbol> > Pickling font metrics...> > Done.> > /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk> > /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > stubgen: Processing RemoteView.T> > > > ***> > *** runtime error:> > *** NEW() was unable to allocate more memory.> > *** file "../src/runtime/common/RTAllocator.m3", line 285> > ***> > "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit> > 1536: /cm3> > /bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > --procedure-- -line- -file---> > exec -- <builtin>> > _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmpl> > netobjv1 44 /cm3/pkg/netobj/src/netobj.tmpl> > netobj 64 /cm3/pkg/netobj/src/netobj.tmpl> > include_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile> > > > 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args> > > > > > I should debug it, and double check that I upgraded what had to be> > upgraded.> > > > - Jay> > > > > > > > > From: hosking at cs.purdue.edu> > > To: jay.krell at cornell.edu> > > Date: Fri, 31 Oct 2008 10:19:51 +0000> > > CC: m3devel at elegosoft.com> > > Subject: Re: [M3devel] AMD64_LINUX status> > > > > > Umm, I think I found your bug with GC:> > > > > > Check out "RTMachine.PointerAlignment". You have it set to > > > BITSIZE(INTEGER). I suspect what you want is something like > > > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > > > probably be 2*BYTESIZE(ADDRESS).> > > > > > > > > > > > On 30 Oct 2008, at 21:21, Jay wrote:> > > > > > >> > > > Please try this:> > > >> > > >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > >> > > > std failed to build because stubgen crashed, probably due to gc.> > > > cm3 does crash right away without @M3nogc.> > > >> > > > Something like this:> > > > cd /src> > > > wget> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > > cd /cm3> > > > rm -rf *> > > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > > > d5.7.0.tar.bz2> > > > cd /src/cm3/scripts/python> > > > ./do-cm3-all.py realclean> > > > ./upgrade.py> > > > ./do-cm3-all.py realclean> > > > ./do-cm3-std.py buildship> > > > => it will fail, at zeus, but it should get far; you'll also need > > > > some X devel packages to get that far, I had a failure for lack> > of > > > > libXaw for example. I did not run anything, any of the GUI> > packages, > > > > but building itself with itself is a decent test.> > > >> > > > I renamed the old AMD64_LINUX archives to "1.0.0".> > > > http://www.opencm3.com/uploaded-archives/> > > >> > > > This has the bug fix I commited last night to cm3cg, and therefore> > a > > > > 64 bit hosted cm3cg.> > > >> > > > jay at amd64a:/cm3/bin$ file *> > > > AMD64_LINUX: ASCII text> > > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > cm3.cfg: ASCII English text> > > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > Unix.common: ASCII English text> > > >> > > > Built on Debian 4.0r4 (r5 is out).> > > > jay at amd64a:/cm3/bin$ uname -a> > > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > > > x86_64 GNU/Linux> > > > jay at amd64a:/cm3/bin$ dmesg | head> > > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2)> > (dannf at debian.org > > > > ) (> > > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > > > Tue Aug 19 04:30:56 UTC 2008> > > >> > > > Though really I couldn't do it without Visual C++ on Windows > > > > providing excellent find-in-files and editing, nothing else comes > > > > close, I edit on Windows and scp the files over. :)> > > >> > > > - Jay> > > >> > > > ________________________________> > > >> > > > From: jay.krell at cornell.edu> > > > To: dragisha at m3w.org; m3devel at elegosoft.com> > > > Date: Tue, 9 Sep 2008 09:43:03 +0000> > > > Subject: Re: [M3devel] AMD64_LINUX status> > > >> > > >> > > >> > > >> > > > > > -- > Dragi?a Duri? <dragisha at m3w.org>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081031/8799c470/attachment.html>

From dragisha at m3w.org  Fri Oct 31 15:10:05 2008
From: dragisha at m3w.org (=?UTF-8?Q?Dragi=C5=A1a_Duri=C4=87?=)
Date: Fri, 31 Oct 2008 15:10:05 +0100
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <COL101-W42D125019F4B7531A26BA6E6200@phx.gbl>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
	<BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
	<COL101-W42D125019F4B7531A26BA6E6200@phx.gbl>
Message-ID: <1225462205.14482.60.camel@faramir.m3w.org>

So, we now have fully functional AMD64_LINUX (_with_ GC)?

TIA

On Fri, 2008-10-31 at 13:52 +0000, Jay wrote:
> Tony, Excellent, thanks, that helps.
> How do you know and confirm the right values? I don't like guessing.
>  
> And then cause then of :) :
>  
>   Symbol
> Pickling font metrics...
> Done.
> /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk
> /cm3/bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTAB
> stubgen: Processing RemoteView.T
> 
> ***
> *** runtime error:
> ***    NEW() was unable to allocate more memory.
> ***    file "../src/runtime/common/RTAllocator.m3", line 285
> ***
> "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit
> 1536: /cm3
> /bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTAB
> --procedure--  -line-  -file---
> exec               --  <builtin>
> _v_netobj          37  /cm3/pkg/netobj/src/netobj.tmpl
> netobjv1           44  /cm3/pkg/netobj/src/netobj.tmpl
> netobj             64  /cm3/pkg/netobj/src/netobj.tmpl
> include_dir        71  /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile
> 
> 8  /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args
>  
>  
> I should debug it, and double check that I upgraded what had to be
> upgraded.
>  
>  - Jay
> 
> 
> 
> > From: hosking at cs.purdue.edu
> > To: jay.krell at cornell.edu
> > Date: Fri, 31 Oct 2008 10:19:51 +0000
> > CC: m3devel at elegosoft.com
> > Subject: Re: [M3devel] AMD64_LINUX status
> > 
> > Umm, I think I found your bug with GC:
> > 
> > Check out "RTMachine.PointerAlignment". You have it set to 
> > BITSIZE(INTEGER). I suspect what you want is something like 
> > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should 
> > probably be 2*BYTESIZE(ADDRESS).
> > 
> > 
> > 
> > On 30 Oct 2008, at 21:21, Jay wrote:
> > 
> > >
> > > Please try this:
> > >
> > >
> http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
> > >
> > > std failed to build because stubgen crashed, probably due to gc.
> > > cm3 does crash right away without @M3nogc.
> > >
> > > Something like this:
> > > cd /src
> > > wget
> http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
> > > cd /cm3
> > > rm -rf *
> > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- 
> > > d5.7.0.tar.bz2
> > > cd /src/cm3/scripts/python
> > > ./do-cm3-all.py realclean
> > > ./upgrade.py
> > > ./do-cm3-all.py realclean
> > > ./do-cm3-std.py buildship
> > > => it will fail, at zeus, but it should get far; you'll also need 
> > > some X devel packages to get that far, I had a failure for lack
> of 
> > > libXaw for example. I did not run anything, any of the GUI
> packages, 
> > > but building itself with itself is a decent test.
> > >
> > > I renamed the old AMD64_LINUX archives to "1.0.0".
> > > http://www.opencm3.com/uploaded-archives/
> > >
> > > This has the bug fix I commited last night to cm3cg, and therefore
> a 
> > > 64 bit hosted cm3cg.
> > >
> > > jay at amd64a:/cm3/bin$ file *
> > > AMD64_LINUX: ASCII text
> > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared
> libs), 
> > > for GNU/Linux 2.6.0, not stripped
> > > cm3.cfg: ASCII English text
> > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Li
> > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 
> > > 2.6.0, not stripped
> > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Li
> > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 
> > > 2.6.0, not stripped
> > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared
> libs), 
> > > for GNU/Linux 2.6.0, not stripped
> > > Unix.common: ASCII English text
> > >
> > > Built on Debian 4.0r4 (r5 is out).
> > > jay at amd64a:/cm3/bin$ uname -a
> > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 
> > > x86_64 GNU/Linux
> > > jay at amd64a:/cm3/bin$ dmesg | head
> > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)
> > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2)
> (dannf at debian.org 
> > > ) (
> > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP 
> > > Tue Aug 19 04:30:56 UTC 2008
> > >
> > > Though really I couldn't do it without Visual C++ on Windows 
> > > providing excellent find-in-files and editing, nothing else comes 
> > > close, I edit on Windows and scp the files over. :)
> > >
> > > - Jay
> > >
> > > ________________________________
> > >
> > > From: jay.krell at cornell.edu
> > > To: dragisha at m3w.org; m3devel at elegosoft.com
> > > Date: Tue, 9 Sep 2008 09:43:03 +0000
> > > Subject: Re: [M3devel] AMD64_LINUX status
> > >
> > >
> > >
> > >
> > 
> 
-- 
Dragi?a Duri? <dragisha at m3w.org>


From jay.krell at cornell.edu  Wed Oct  1 01:24:14 2008
From: jay.krell at cornell.edu (Jay)
Date: Tue, 30 Sep 2008 23:24:14 +0000
Subject: [M3devel] ARM Darwin
In-Reply-To: <7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org>
References: <F29CC4D9-0043-48B9-84F1-93E9F3336D40@darko.org>
	<5ED8E753-6B9E-4FED-8689-1D3D317A5A36@cs.purdue.edu> 
	<7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org>
Message-ID: <COL101-W3460EC073E17115925F24CE6430@phx.gbl>


Get me a machine and I'll work on it. :)
I'll get one before long but I'm bogged down with existing x86, AMD64, PPC, PPC64 (AIX), Mips (Irix) hardware not yet being used for all its meant..

I suspect Apple hasn't pushed their changes up, so be sure to poke around their gcc source.

> Apple are building their own ARM GCC and use that to configure the
> back end. Then the runtime issues which I imagine might be with the GC

gcc -v ?

> and threading. I'm not sure there will be any native treading and I'm
> sure VM will look very different.

I assume it'll look like most any Posix or *_DARWIN or 32bit thereof system.
I assume it has pthreads.

 - Jay


> From: darko at darko.org
> To: hosking at cs.purdue.edu
> Date: Tue, 30 Sep 2008 14:59:39 +0200
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] ARM Darwin
>
> Thanks, it should be a bit easier than the normal process since the
> compiler doesn't have to be fully bootstrapped, I just have to get a
> cross working. I know the first thing is to get the machine
> configuration correct, which I'll start when I get my hands on one of
> the machines in a couple of days. The other thing is to work out how
> Apple are building their own ARM GCC and use that to configure the
> back end. Then the runtime issues which I imagine might be with the GC
> and threading. I'm not sure there will be any native treading and I'm
> sure VM will look very different.
>
>
> On 30/09/2008, at 2:44 PM, Tony Hosking wrote:
>
>> I can share tips...
>>
>> On Sep 30, 2008, at 1:41 PM, Darko wrote:
>>
>>> Is anyone interested in working on an ARM port for Darwin? Or maybe
>>> just providing some tips as I give it a try?
>>>
>>> Cheers,
>>> Darko.
>>
>


From jay.krell at cornell.edu  Wed Oct  1 08:41:03 2008
From: jay.krell at cornell.edu (Jay)
Date: Wed, 1 Oct 2008 06:41:03 +0000
Subject: [M3devel] AMD-64 binaries?
In-Reply-To: <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu>
References: <48BDF24B.900@wichita.edu>
	<20080903075804.zhep2ichmow00scg@mail.elegosoft.com>
	<COL101-W839FDBE447569C4D9BACC6E6430@phx.gbl> 
	<30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu>
Message-ID: <COL101-W281BD9E78E32E04348F400E6420@phx.gbl>


No -- you would know best about AMD64_DARWIN.
I'm sure ALPHA_OSF used to work, but it's been so long, I don't think it counts.
 
I'm being lazy.
 
file AMD64_DARWIN/cm3cg
 => fat binary? I doubt it. 
 => with ppc, i386, amd64? (doubt it) 
 => or just ppc, i386?  (doubt it) 
 => or just i386? This is I "suspect".  
 => or just AMD64. This would be somewhat interesting. 
 
I'm pretty sure cm3cg is always 32bit "these days".
I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in the same way.
This was a nice thing to find, that the problem is portable to multiple?all 64 bit hosts.
 
I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same problem.
 
Anyway, I should really get to debugging this soon.
 
It's a bit odd because gcc itself doesn't have this bug and I reviewed a lot of the code and it was ok. I'm just going to have to step through it in parallel on 32bit and 64bit hosts and find where they diverge. A LOT was identical, like the files output by cm3 into cm3cg were identical.
I was close a few months ago but sloughed off.
 
 - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Tue, 30 Sep 2008 10:16:41 +0100> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD-64 binaries?> > 64-bit hosted tools? Do you mean only for Linux? I don't quite > understand what you are saying.> > On Sep 30, 2008, at 9:36 AM, Jay wrote:> > >> > I'm getting back to this now.> > I didn't realize it till this weekend, but that archive is > > "relatively incompatible".> > In particular it has 32bit hosted tools, and won't run on Debian > > 4.0r4 / AMD64.> > Something about glibc 2.4, when all I see on my system is 2.3.> > I'll see what I can do.> > Probably just rebuild cm3cg.> > I think it was built on Fedora, but could have been Ubuntu or > > OpenSuse.> > Probably just that Debian stable lags the others.> >> > The main problem to debug is why 64bit hosted tools "never" work.> > (Right?)> >> >> > Stay tuned for a bunch more ports "soon", I've got a bunch more > > hardware,> > that runs Linux and others (Solaris, AIX, Irix).. :)> >> > I'll be able to debug the high dpi gui problems on a friend's laptop > > soon too.> > Send me a repro. I expect it is trivial -- like anything with a > > scrollbar.> > I can try formsedit, etc.> >> >> > - Jay> >> >> >> Date: Wed, 3 Sep 2008 07:58:04 +0200> >> From: wagner at elegosoft.com> >> To: m3devel at elegosoft.com> >> Subject: Re: [M3devel] AMD-64 binaries?> >>> >> Quoting "Rodney M. Bates" :> >>> >>> Are there binaries for AMD-64 around that can be used> >>> to bootstrap a 64-bit Linux compiler?> >>> >> Have a look at> >>> >> http://www.opencm3.net/uploaded-archives/index.html> >>> >> There are some AMD64 archives; I don't know about their status> >> offhand, though. I think Jay Krell produced them.> >> AFAIK there is no regular build on this platform yet.> >>> >> Olaf> >> --> >> Olaf Wagner -- elego Software Solutions GmbH> >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany> >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 > >> 45 86 95> >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: > >> Berlin> >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: > >> DE163214194> >>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081001/5f46def2/attachment-0001.html>

From jay.krell at cornell.edu  Wed Oct  1 09:02:29 2008
From: jay.krell at cornell.edu (Jay)
Date: Wed, 1 Oct 2008 07:02:29 +0000
Subject: [M3devel] m3cc build fails on older MacOS X
In-Reply-To: <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org>
References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com>
	<COL101-W243B2B91162A39B280C4AFE6430@phx.gbl>
	<CEDFF837-1CFA-4C43-B287-D480AE19B889@cs.purdue.edu> 
	<5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org>
Message-ID: <COL101-W62EB6B61A3107DDC347580E6420@phx.gbl>


well, I agree and disagree.

"Almost everyone" only cares about C++, C#, Windows, and a little bit of Linux and Java.
"Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc.

Supporting 10.2 and 10.3 "ought not" be so difficult, but ok.

I wiped out the install and won't likely come back to it until
a bunch of other things are done.
e.g.:
 debug 64 bit hosted cm3cg 
 move PPC_LINUX to pthreads 
 high dpi 
 bring up or backup a bunch of targets I have hardware for,
  and some others I don't have yet.

Adding back support for NT4/Win9x probably not hard, though
 similar with gcc on Mac, the current Microsoft tools no longer
 target them.

It all gets easier with virtualization..
(Which is easiest on x86/amd64.)

 - Jay


> From: darko at darko.org
> To: hosking at cs.purdue.edu
> Date: Tue, 30 Sep 2008 11:50:43 +0200
> CC: m3devel at elegosoft.com; jay.krell at cornell.edu
> Subject: Re: [M3devel] m3cc build fails on older MacOS X
>
> I think supporting the latest version is enough work. I don't see the
> point of supporting older releases. Also, this seems to be relevant to
> development on that version of the system. Anyone who wants to build
> can upgrade.
>
>
> On 30/09/2008, at 11:15 AM, Tony Hosking wrote:
>
>> Does anyone really care about 10.3 now? As I recall, it had some
>> pretty broken assumptions.
>>
>> On Sep 30, 2008, at 9:25 AM, Jay wrote:
>>
>>>
>>> I have a machine running 10.3 now.
>>>
>>> gcc-4.3.2 (the current release) won't (toplevel) configure on
>>> MacOSX 10.3 apparently because its assembler doesn't support
>>> ".machine".
>>> Current "cctools" won't compile on 10.3 without patches or other
>>> updates, due to mucking with ppc64 stuff, though that is easy to fix.
>>>
>>> A simple wrapper around as for use on 10.3 that strips the .machine
>>> directive is probably reasonable, or a patch to gcc to just not
>>> emit it for Darwin, except maybe for non-ppc, or subject to a switch.
>>>
>>> Other than support for more architectures, I never found any of the
>>> updates beyond 10.2 very interesting.
>>> Though current Firefox and Safari also won't run on 10.3.
>>>
>>> IF I get this working, maybe I'll bring 10.2 back up also..
>>>
>>> - Jay
>>>
>>> ________________________________
>>>
>>> From: jayk123 at hotmail.com
>>> To: wagner at elegosoft.com; m3devel at elegosoft.com
>>> Subject: RE: [M3devel] m3cc build fails on older MacOS X
>>> Date: Tue, 6 May 2008 10:49:11 +0000
>>>
>>>
>>>
>>>
>>> I don't know what these Darwin versions are.
>>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5?
>>> I used to run 10.2 and could perhaps bring it back (though I'd hate
>>> to lose my PPC_LINUX install.. :( )
>>>
>>>> make[2]: Nothing to be done for `all'.
>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>> `patsubst'. Stop.
>>>
>>> Hopefully that's enough context though.
>>>
>>> The rest is a cascade.
>>> What happens if you remove all my m3makefile wierdness (which works
>>> everywhere else..) and just configure and make?
>>>
>>> Can I ssh into this?
>>>
>>> - Jay
>>>
>>>
>>>
>>> ________________________________
>>>
>>>
>>>> Date: Tue, 6 May 2008 07:57:54 +0200
>>>> From: wagner at elegosoft.com
>>>> To: m3devel at elegosoft.com
>>>> Subject: [M3devel] m3cc build fails on older MacOS X
>>>>
>>>> On % uname -a
>>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30
>>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power
>>>> Macintosh powerpc:
>>>>
>>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o
>>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o
>>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./fnmatch.o
>>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./getruntime.o
>>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o
>>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./obstack.o
>>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o
>>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o
>>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o
>>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o
>>>> ./xstrndup.o> required-list
>>>> make[2]: Nothing to be done for `all'.
>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>> `patsubst'. Stop.
>>>> make: *** [all-libcpp] Error 2
>>>> /bin/sh: line 1: cd: gcc: No such file or directory
>>>> make: *** No rule to make target `s-modes'. Stop.
>>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314: quake
>>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2
>>>>
>>>> --procedure-- -line- -file---
>>>> cp_if --
>>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>> 9
>>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args
>>>>
>>>> Fatal Error: package build failed
>>>> ==> m3-sys/m3cc done
>>>>
>>>> Any ideas?
>>>>
>>>> Olaf
>>>> --
>>>> Olaf Wagner -- elego Software Solutions GmbH
>>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany
>>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23
>>>> 45 86 95
>>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz:
>>>> Berlin
>>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr:
>>>> DE163214194
>>>>
>>>
>>
>


From darko at darko.org  Wed Oct  1 09:10:35 2008
From: darko at darko.org (Darko)
Date: Wed, 1 Oct 2008 09:10:35 +0200
Subject: [M3devel] m3cc build fails on older MacOS X
In-Reply-To: <COL101-W62EB6B61A3107DDC347580E6420@phx.gbl>
References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com>
	<COL101-W243B2B91162A39B280C4AFE6430@phx.gbl>
	<CEDFF837-1CFA-4C43-B287-D480AE19B889@cs.purdue.edu>
	<5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org>
	<COL101-W62EB6B61A3107DDC347580E6420@phx.gbl>
Message-ID: <973F196C-4B4A-4526-878C-93942E48E72A@darko.org>

Why bother with it if no one uses it and no-one is going to use it?  
Supporting M3 on Macs is good because people will use it into the  
future. People aren't moving back to 10.3. I wouldn't bother with it  
at all.

On 01/10/2008, at 9:02 AM, Jay wrote:

>
> well, I agree and disagree.
>
> "Almost everyone" only cares about C++, C#, Windows, and a little  
> bit of Linux and Java.
> "Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc.
>
> Supporting 10.2 and 10.3 "ought not" be so difficult, but ok.
>
> I wiped out the install and won't likely come back to it until
> a bunch of other things are done.
> e.g.:
> debug 64 bit hosted cm3cg
> move PPC_LINUX to pthreads
> high dpi
> bring up or backup a bunch of targets I have hardware for,
>  and some others I don't have yet.
>
> Adding back support for NT4/Win9x probably not hard, though
> similar with gcc on Mac, the current Microsoft tools no longer
> target them.
>
> It all gets easier with virtualization..
> (Which is easiest on x86/amd64.)
>
> - Jay
>
>
>
>> From: darko at darko.org
>> To: hosking at cs.purdue.edu
>> Date: Tue, 30 Sep 2008 11:50:43 +0200
>> CC: m3devel at elegosoft.com; jay.krell at cornell.edu
>> Subject: Re: [M3devel] m3cc build fails on older MacOS X
>>
>> I think supporting the latest version is enough work. I don't see the
>> point of supporting older releases. Also, this seems to be relevant  
>> to
>> development on that version of the system. Anyone who wants to build
>> can upgrade.
>>
>>
>> On 30/09/2008, at 11:15 AM, Tony Hosking wrote:
>>
>>> Does anyone really care about 10.3 now? As I recall, it had some
>>> pretty broken assumptions.
>>>
>>> On Sep 30, 2008, at 9:25 AM, Jay wrote:
>>>
>>>>
>>>> I have a machine running 10.3 now.
>>>>
>>>> gcc-4.3.2 (the current release) won't (toplevel) configure on
>>>> MacOSX 10.3 apparently because its assembler doesn't support
>>>> ".machine".
>>>> Current "cctools" won't compile on 10.3 without patches or other
>>>> updates, due to mucking with ppc64 stuff, though that is easy to  
>>>> fix.
>>>>
>>>> A simple wrapper around as for use on 10.3 that strips the .machine
>>>> directive is probably reasonable, or a patch to gcc to just not
>>>> emit it for Darwin, except maybe for non-ppc, or subject to a  
>>>> switch.
>>>>
>>>> Other than support for more architectures, I never found any of the
>>>> updates beyond 10.2 very interesting.
>>>> Though current Firefox and Safari also won't run on 10.3.
>>>>
>>>> IF I get this working, maybe I'll bring 10.2 back up also..
>>>>
>>>> - Jay
>>>>
>>>> ________________________________
>>>>
>>>> From: jayk123 at hotmail.com
>>>> To: wagner at elegosoft.com; m3devel at elegosoft.com
>>>> Subject: RE: [M3devel] m3cc build fails on older MacOS X
>>>> Date: Tue, 6 May 2008 10:49:11 +0000
>>>>
>>>>
>>>>
>>>>
>>>> I don't know what these Darwin versions are.
>>>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5?
>>>> I used to run 10.2 and could perhaps bring it back (though I'd hate
>>>> to lose my PPC_LINUX install.. :( )
>>>>
>>>>> make[2]: Nothing to be done for `all'.
>>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>>> `patsubst'. Stop.
>>>>
>>>> Hopefully that's enough context though.
>>>>
>>>> The rest is a cascade.
>>>> What happens if you remove all my m3makefile wierdness (which works
>>>> everywhere else..) and just configure and make?
>>>>
>>>> Can I ssh into this?
>>>>
>>>> - Jay
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>> Date: Tue, 6 May 2008 07:57:54 +0200
>>>>> From: wagner at elegosoft.com
>>>>> To: m3devel at elegosoft.com
>>>>> Subject: [M3devel] m3cc build fails on older MacOS X
>>>>>
>>>>> On % uname -a
>>>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30
>>>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power
>>>>> Macintosh powerpc:
>>>>>
>>>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o
>>>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o
>>>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./ 
>>>>> fnmatch.o
>>>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./ 
>>>>> getruntime.o
>>>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o
>>>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./ 
>>>>> obstack.o
>>>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o
>>>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o
>>>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o
>>>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o
>>>>> ./xstrndup.o> required-list
>>>>> make[2]: Nothing to be done for `all'.
>>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>>> `patsubst'. Stop.
>>>>> make: *** [all-libcpp] Error 2
>>>>> /bin/sh: line 1: cd: gcc: No such file or directory
>>>>> make: *** No rule to make target `s-modes'. Stop.
>>>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314:  
>>>>> quake
>>>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2
>>>>>
>>>>> --procedure-- -line- -file---
>>>>> cp_if --
>>>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>>> 9
>>>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args
>>>>>
>>>>> Fatal Error: package build failed
>>>>> ==> m3-sys/m3cc done
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Olaf
>>>>> --
>>>>> Olaf Wagner -- elego Software Solutions GmbH
>>>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany
>>>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23
>>>>> 45 86 95
>>>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz:
>>>>> Berlin
>>>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr:
>>>>> DE163214194
>>>>>
>>>>
>>>
>>


From darko at darko.org  Wed Oct  1 12:03:15 2008
From: darko at darko.org (Darko)
Date: Wed, 1 Oct 2008 12:03:15 +0200
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: <DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu>
	<DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
Message-ID: <B971C9C9-251C-4F79-A12F-622F47883781@darko.org>

I've extended one of the modules with a function that formats any  
allocated value for printing. If you're interested I can clean them up  
a little and post them.


On 28/09/2008, at 8:01 AM, Darko wrote:

> As far as I know, yes, they're not in the binary. I'd love to be  
> proven wrong though, or fix it so they did. I have a module that  
> reads the .M3WEB file and maps it to types and a module that will  
> read and write any field within a type safely using a numeric index.  
> Neither is perfect. You can integrate the two to get what you want  
> but I seem to remember having some problems mapping type ids (UIDs?)  
> to typecodes at runtime.
>
>
> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>
>> Right, I am aware of those interfaces.. just wondering what was
>> out there.  Do I really need to look at .M3WEB?  I thought
>> that m3gdb could figure out things without anything outside
>> of the binary...
>>
>> I'm looking for essentially what m3gdb offers, say prints
>> at minimum the name of the type (this I recall is trivial with
>> some of the RT* interfaces) but hopefully also with field names
>> and values, but doesn't expand references recursively.. something
>> like that?
>>
>>   Mika
>>
>> Darko writes:
>>> You can use RTTipe to read the fields and values within a type. If  
>>> you
>>> also want the type and field names you can interpret the .M3WEB  
>>> file.
>>> I have a couple of modules that do something like that but they are
>>> not what you would call finished. What level of detail are you  
>>> after?
>>>
>>>
>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> I am working on a writing an interpreter that I'd like to embed in
>>>> various Modula-3 programs.  It so happens that this interpreter
>>>> might from time to time be manipulating arbitrary M3 REFs, and just
>>>> from the point of view of providing information to a human user,
>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>> have any code that accomplishes this, at least partly?  I'm  
>>>> thinking
>>>> that since m3gdb can do it, the information must all be in the
>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>> pickler can pickle things... hmm.
>>>>
>>>> I would greatly appreciate any guidance that's out there...
>>>>
>>>>  Best regards,
>>>>     Mika Nystrom
>


From hosking at cs.purdue.edu  Wed Oct  1 11:59:23 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Wed, 1 Oct 2008 10:59:23 +0100
Subject: [M3devel] AMD-64 binaries?
In-Reply-To: <COL101-W281BD9E78E32E04348F400E6420@phx.gbl>
References: <48BDF24B.900@wichita.edu>
	<20080903075804.zhep2ichmow00scg@mail.elegosoft.com>
	<COL101-W839FDBE447569C4D9BACC6E6430@phx.gbl>
	<30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu>
	<COL101-W281BD9E78E32E04348F400E6420@phx.gbl>
Message-ID: <26766FFA-C3B6-45D2-8156-80FD14922882@cs.purdue.edu>

I can definitely vouch for ALPHA_OSF having worked as recently as two  
years ago, but without the pthreads native threading system.  That  
port should have been easy enough I suspect.

On Oct 1, 2008, at 7:41 AM, Jay wrote:

> No -- you would know best about AMD64_DARWIN.
> I'm sure ALPHA_OSF used to work, but it's been so long, I don't  
> think it counts.
>
> I'm being lazy.
>
> file AMD64_DARWIN/cm3cg
>  => fat binary? I doubt it.
>  => with ppc, i386, amd64? (doubt it)
>  => or just ppc, i386?  (doubt it)
>  => or just i386? This is I "suspect".
>  => or just AMD64. This would be somewhat interesting.

I believe that is how I configured it.

> I'm pretty sure cm3cg is always 32bit "these days".

Nope, cm3cg on AMD64_DARWIN is 64-bit.

> I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in  
> the same way.
> This was a nice thing to find, that the problem is portable to  
> multiple?all 64 bit hosts.
>
> I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same  
> problem.

Don't thinks so.

> Anyway, I should really get to debugging this soon.
>
> It's a bit odd because gcc itself doesn't have this bug and I  
> reviewed a lot of the code and it was ok. I'm just going to have to  
> step through it in parallel on 32bit and 64bit hosts and find where  
> they diverge. A LOT was identical, like the files output by cm3 into  
> cm3cg were identical.

Yes, the intermediate code should be identical.  Any such problems  
would be with cm3cg.

> I was close a few months ago but sloughed off.

Good luck.

>
>
>  - Jay
>
>
> > From: hosking at cs.purdue.edu
> > To: jay.krell at cornell.edu
> > Date: Tue, 30 Sep 2008 10:16:41 +0100
> > CC: m3devel at elegosoft.com
> > Subject: Re: [M3devel] AMD-64 binaries?
> >
> > 64-bit hosted tools? Do you mean only for Linux? I don't quite
> > understand what you are saying.
> >
> > On Sep 30, 2008, at 9:36 AM, Jay wrote:
> >
> > >
> > > I'm getting back to this now.
> > > I didn't realize it till this weekend, but that archive is
> > > "relatively incompatible".
> > > In particular it has 32bit hosted tools, and won't run on Debian
> > > 4.0r4 / AMD64.
> > > Something about glibc 2.4, when all I see on my system is 2.3.
> > > I'll see what I can do.
> > > Probably just rebuild cm3cg.
> > > I think it was built on Fedora, but could have been Ubuntu or
> > > OpenSuse.
> > > Probably just that Debian stable lags the others.
> > >
> > > The main problem to debug is why 64bit hosted tools "never" work.
> > > (Right?)
> > >
> > >
> > > Stay tuned for a bunch more ports "soon", I've got a bunch more
> > > hardware,
> > > that runs Linux and others (Solaris, AIX, Irix).. :)
> > >
> > > I'll be able to debug the high dpi gui problems on a friend's  
> laptop
> > > soon too.
> > > Send me a repro. I expect it is trivial -- like anything with a
> > > scrollbar.
> > > I can try formsedit, etc.
> > >
> > >
> > > - Jay
> > >
> > >
> > >> Date: Wed, 3 Sep 2008 07:58:04 +0200
> > >> From: wagner at elegosoft.com
> > >> To: m3devel at elegosoft.com
> > >> Subject: Re: [M3devel] AMD-64 binaries?
> > >>
> > >> Quoting "Rodney M. Bates" :
> > >>
> > >>> Are there binaries for AMD-64 around that can be used
> > >>> to bootstrap a 64-bit Linux compiler?
> > >>
> > >> Have a look at
> > >>
> > >> http://www.opencm3.net/uploaded-archives/index.html
> > >>
> > >> There are some AMD64 archives; I don't know about their status
> > >> offhand, though. I think Jay Krell produced them.
> > >> AFAIK there is no regular build on this platform yet.
> > >>
> > >> Olaf
> > >> --
> > >> Olaf Wagner -- elego Software Solutions GmbH
> > >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany
> > >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23
> > >> 45 86 95
> > >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz:
> > >> Berlin
> > >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr:
> > >> DE163214194
> > >>
> >
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081001/d38ae06a/attachment-0001.html>

From hosking at cs.purdue.edu  Wed Oct  1 12:07:00 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Wed, 1 Oct 2008 11:07:00 +0100
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: <B971C9C9-251C-4F79-A12F-622F47883781@darko.org>
References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu>
	<DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
	<B971C9C9-251C-4F79-A12F-622F47883781@darko.org>
Message-ID: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu>

m3gdb makes use of stabs debug information spat out by the backend.   
They are only in the binary if compiled -g.  There are other ways to  
get what you are after, as Darko has observed.

On Oct 1, 2008, at 11:03 AM, Darko wrote:

> I've extended one of the modules with a function that formats any  
> allocated value for printing. If you're interested I can clean them  
> up a little and post them.
>
>
> On 28/09/2008, at 8:01 AM, Darko wrote:
>
>> As far as I know, yes, they're not in the binary. I'd love to be  
>> proven wrong though, or fix it so they did. I have a module that  
>> reads the .M3WEB file and maps it to types and a module that will  
>> read and write any field within a type safely using a numeric  
>> index. Neither is perfect. You can integrate the two to get what  
>> you want but I seem to remember having some problems mapping type  
>> ids (UIDs?) to typecodes at runtime.
>>
>>
>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>
>>> Right, I am aware of those interfaces.. just wondering what was
>>> out there.  Do I really need to look at .M3WEB?  I thought
>>> that m3gdb could figure out things without anything outside
>>> of the binary...
>>>
>>> I'm looking for essentially what m3gdb offers, say prints
>>> at minimum the name of the type (this I recall is trivial with
>>> some of the RT* interfaces) but hopefully also with field names
>>> and values, but doesn't expand references recursively.. something
>>> like that?
>>>
>>>  Mika
>>>
>>> Darko writes:
>>>> You can use RTTipe to read the fields and values within a type.  
>>>> If you
>>>> also want the type and field names you can interpret the .M3WEB  
>>>> file.
>>>> I have a couple of modules that do something like that but they are
>>>> not what you would call finished. What level of detail are you  
>>>> after?
>>>>
>>>>
>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> I am working on a writing an interpreter that I'd like to embed in
>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>> might from time to time be manipulating arbitrary M3 REFs, and  
>>>>> just
>>>>> from the point of view of providing information to a human user,
>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>> thinking
>>>>> that since m3gdb can do it, the information must all be in the
>>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>>> pickler can pickle things... hmm.
>>>>>
>>>>> I would greatly appreciate any guidance that's out there...
>>>>>
>>>>> Best regards,
>>>>>    Mika Nystrom
>>


From darko at darko.org  Wed Oct  1 12:35:09 2008
From: darko at darko.org (Darko)
Date: Wed, 1 Oct 2008 12:35:09 +0200
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu>
References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu>
	<DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
	<B971C9C9-251C-4F79-A12F-622F47883781@darko.org>
	<2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu>
Message-ID: <B26C3B35-ADAA-4289-8006-F32D5CCCA407@darko.org>

Here's some info on the stabs format: http://www.cs.utah.edu/dept/old/texinfo/gdb/stabs_toc.html


On 01/10/2008, at 12:07 PM, Tony Hosking wrote:

> m3gdb makes use of stabs debug information spat out by the backend.   
> They are only in the binary if compiled -g.  There are other ways to  
> get what you are after, as Darko has observed.
>
> On Oct 1, 2008, at 11:03 AM, Darko wrote:
>
>> I've extended one of the modules with a function that formats any  
>> allocated value for printing. If you're interested I can clean them  
>> up a little and post them.
>>
>>
>> On 28/09/2008, at 8:01 AM, Darko wrote:
>>
>>> As far as I know, yes, they're not in the binary. I'd love to be  
>>> proven wrong though, or fix it so they did. I have a module that  
>>> reads the .M3WEB file and maps it to types and a module that will  
>>> read and write any field within a type safely using a numeric  
>>> index. Neither is perfect. You can integrate the two to get what  
>>> you want but I seem to remember having some problems mapping type  
>>> ids (UIDs?) to typecodes at runtime.
>>>
>>>
>>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>>
>>>> Right, I am aware of those interfaces.. just wondering what was
>>>> out there.  Do I really need to look at .M3WEB?  I thought
>>>> that m3gdb could figure out things without anything outside
>>>> of the binary...
>>>>
>>>> I'm looking for essentially what m3gdb offers, say prints
>>>> at minimum the name of the type (this I recall is trivial with
>>>> some of the RT* interfaces) but hopefully also with field names
>>>> and values, but doesn't expand references recursively.. something
>>>> like that?
>>>>
>>>> Mika
>>>>
>>>> Darko writes:
>>>>> You can use RTTipe to read the fields and values within a type.  
>>>>> If you
>>>>> also want the type and field names you can interpret the .M3WEB  
>>>>> file.
>>>>> I have a couple of modules that do something like that but they  
>>>>> are
>>>>> not what you would call finished. What level of detail are you  
>>>>> after?
>>>>>
>>>>>
>>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>>
>>>>>> Hello Modula-3 people,
>>>>>>
>>>>>> I am working on a writing an interpreter that I'd like to embed  
>>>>>> in
>>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>>> might from time to time be manipulating arbitrary M3 REFs, and  
>>>>>> just
>>>>>> from the point of view of providing information to a human user,
>>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>>> thinking
>>>>>> that since m3gdb can do it, the information must all be in the
>>>>>> binary---somehow.  (Even enumeration names, right?)  And since  
>>>>>> the
>>>>>> pickler can pickle things... hmm.
>>>>>>
>>>>>> I would greatly appreciate any guidance that's out there...
>>>>>>
>>>>>> Best regards,
>>>>>>   Mika Nystrom
>>>
>


From mika at async.caltech.edu  Wed Oct  1 20:09:58 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Wed, 01 Oct 2008 11:09:58 -0700
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: Your message of "Wed, 01 Oct 2008 12:03:15 +0200."
	<B971C9C9-251C-4F79-A12F-622F47883781@darko.org> 
Message-ID: <200810011809.m91I9wxY087739@camembert.async.caltech.edu>

Oh, I'd love to give it a try!

I'm a little surprised no one has chimed in on the question of
whether you really need .M3WEB... I could swear I can get good
symbolic debugging with m3gdb on just a binary...

     Mika

Darko writes:
>I've extended one of the modules with a function that formats any  
>allocated value for printing. If you're interested I can clean them up  
>a little and post them.
>
>
>On 28/09/2008, at 8:01 AM, Darko wrote:
>
>> As far as I know, yes, they're not in the binary. I'd love to be  
>> proven wrong though, or fix it so they did. I have a module that  
>> reads the .M3WEB file and maps it to types and a module that will  
>> read and write any field within a type safely using a numeric index.  
>> Neither is perfect. You can integrate the two to get what you want  
>> but I seem to remember having some problems mapping type ids (UIDs?)  
>> to typecodes at runtime.
>>
>>
>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>
>>> Right, I am aware of those interfaces.. just wondering what was
>>> out there.  Do I really need to look at .M3WEB?  I thought
>>> that m3gdb could figure out things without anything outside
>>> of the binary...
>>>
>>> I'm looking for essentially what m3gdb offers, say prints
>>> at minimum the name of the type (this I recall is trivial with
>>> some of the RT* interfaces) but hopefully also with field names
>>> and values, but doesn't expand references recursively.. something
>>> like that?
>>>
>>>   Mika
>>>
>>> Darko writes:
>>>> You can use RTTipe to read the fields and values within a type. If  
>>>> you
>>>> also want the type and field names you can interpret the .M3WEB  
>>>> file.
>>>> I have a couple of modules that do something like that but they are
>>>> not what you would call finished. What level of detail are you  
>>>> after?
>>>>
>>>>
>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> I am working on a writing an interpreter that I'd like to embed in
>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>> might from time to time be manipulating arbitrary M3 REFs, and just
>>>>> from the point of view of providing information to a human user,
>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>> thinking
>>>>> that since m3gdb can do it, the information must all be in the
>>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>>> pickler can pickle things... hmm.
>>>>>
>>>>> I would greatly appreciate any guidance that's out there...
>>>>>
>>>>>  Best regards,
>>>>>     Mika Nystrom
>>


From mika at async.caltech.edu  Wed Oct  1 20:10:38 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Wed, 01 Oct 2008 11:10:38 -0700
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: Your message of "Wed, 01 Oct 2008 11:07:00 BST."
	<2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> 
Message-ID: <200810011810.m91IAcDW087832@camembert.async.caltech.edu>

Ok, ignore my previous email :-)

Tony Hosking writes:
>m3gdb makes use of stabs debug information spat out by the backend.   
>They are only in the binary if compiled -g.  There are other ways to  
>get what you are after, as Darko has observed.
>
>On Oct 1, 2008, at 11:03 AM, Darko wrote:
>
>> I've extended one of the modules with a function that formats any  
>> allocated value for printing. If you're interested I can clean them  
>> up a little and post them.
>>
>>
>> On 28/09/2008, at 8:01 AM, Darko wrote:
>>
>>> As far as I know, yes, they're not in the binary. I'd love to be  
>>> proven wrong though, or fix it so they did. I have a module that  
>>> reads the .M3WEB file and maps it to types and a module that will  
>>> read and write any field within a type safely using a numeric  
>>> index. Neither is perfect. You can integrate the two to get what  
>>> you want but I seem to remember having some problems mapping type  
>>> ids (UIDs?) to typecodes at runtime.
>>>
>>>
>>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>>
>>>> Right, I am aware of those interfaces.. just wondering what was
>>>> out there.  Do I really need to look at .M3WEB?  I thought
>>>> that m3gdb could figure out things without anything outside
>>>> of the binary...
>>>>
>>>> I'm looking for essentially what m3gdb offers, say prints
>>>> at minimum the name of the type (this I recall is trivial with
>>>> some of the RT* interfaces) but hopefully also with field names
>>>> and values, but doesn't expand references recursively.. something
>>>> like that?
>>>>
>>>>  Mika
>>>>
>>>> Darko writes:
>>>>> You can use RTTipe to read the fields and values within a type.  
>>>>> If you
>>>>> also want the type and field names you can interpret the .M3WEB  
>>>>> file.
>>>>> I have a couple of modules that do something like that but they are
>>>>> not what you would call finished. What level of detail are you  
>>>>> after?
>>>>>
>>>>>
>>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>>
>>>>>> Hello Modula-3 people,
>>>>>>
>>>>>> I am working on a writing an interpreter that I'd like to embed in
>>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>>> might from time to time be manipulating arbitrary M3 REFs, and  
>>>>>> just
>>>>>> from the point of view of providing information to a human user,
>>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>>> thinking
>>>>>> that since m3gdb can do it, the information must all be in the
>>>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>>>> pickler can pickle things... hmm.
>>>>>>
>>>>>> I would greatly appreciate any guidance that's out there...
>>>>>>
>>>>>> Best regards,
>>>>>>    Mika Nystrom
>>>


From jay.krell at cornell.edu  Sun Oct 12 11:51:03 2008
From: jay.krell at cornell.edu (Jay)
Date: Sun, 12 Oct 2008 09:51:03 +0000
Subject: [M3devel] a bunch of new/old platform names?
Message-ID: <COL101-W614506DC49BC7BC3640D65E6370@phx.gbl>


I plan on soon bringing "back" some old ports -- building current archives -- and bring up some new ports.

Specifically I have hardware: RS/6000 (PPC64/AIX), SGI (MIPS), SPARC64, plus the usual x86/AMD64.

Two of the platforms did exist.

In particular, "MIPS_IRIX" is "IRIX5".
  Reuse IRIX5, or introduce MIPS_IRIX?

PPC_AIX is IBMR2 or such.
  Same question.

Also, must versions really be in platform names?
I'm loathe to add a third dimension to the matrix.
I did just note that FreeBSD 7.0 64 bit is ABI-incompatible with FreeBSD 6.3 64 bit, lame.

SGI claims good ABI across all the 6.5 releases, which is all there will be now.
IBM claims good 32 bit ABI compat across AIX 4.x - 6.x and good 64 bit ABI compat across 5.x and 6.x, but incompatibility from 64 bit 4.x.
(Microsoft has always been good here, but "behavioral" compat is the actual tricky issue.)

And, what do folks think about putting "32" in new 32 bit platform names?

I'm considering the following:
  MIPS32_{IRIX,LINUX,OPENBSD,NETBSD} 
  MIPS64_IRIX (6.5) 
  SPARC{32,64}_{LINUX,*BSD}(probably no SPARC32_*BSD actually, and SPARC32_LINUX is already in, but not building regularly) 
  {SPARC64,I386,AMD64}_SOLARIS 
  PPC{32,64}_AIX 
    (PPC64_LINUX is blocked, Linux has problems booting on the hardware and I have no Mac G5 yet). 
 AMD64_*BSD 

Also, maybe some of the code should be restructured to separate processor from OS?
That might be primarily only pointer size.

Any interest in "x86" instead of "I386"?

If I make good progress against those 18 (!), I can see about PPC64_DARWIN, HPPA_*, IA64_*, ALPHA_*, ARM_*, which I lack hardware for. PPC_LINUX also should be converted to pthreads imho.
Mostly this is all just a matter of installing the OS and configuring gcc.
 
And, yeah, I have the two m3cgs stepping side by side to find the problem there, and will have use of a high dpi Windows laptop for that other problem..

And then of course, if the vast majority of platforms are named like that, there might be pressure to bring the rest in line. :) I386_{NT,LINUX,*BSD,CYGWIN,MINGWIN}

 - Jay

From mika at async.caltech.edu  Fri Oct 17 00:32:39 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 15:32:39 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
Message-ID: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu>

Hello Modula-3 people,

As I mentioned in an earlier email about printing structures (thanks
Darko), I'm in the midst of coding an interpreter embedded in
Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
JScheme for Java (well it was at first strongly based, but more and
more loosely, if you know what I mean...)

I expected that the performance of the interpreter would be much
better in Modula-3 than in Java, and I have been testing on two
different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
and the other is CM3 on a recent Debian system.  What I am finding
is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
close to ten times as fast on some tasks at this point), but on
Linux/CM3 it is much closer in speed to JScheme than I would like.

When I started, with code that was essentially equivalent to JScheme,
I found that it was a bit slower than JScheme on Linux/CM3 and
possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
spend most of its time in (surprise, surprise!) memory allocation
and garbage collection.  The speedup I have achieved between the
first implementation and now was due to the use of Modula-3 constructs
that are superior to Java's, such as the use of arrays of RECORDs
to make small stacks rather than linked lists.  (I get readable
code with much fewer memory allocations and GC work.)

Now, since this is an interpreter, I as the implementer have limited
control over how much memory is allocated and freed, and where it is
needed.  However, I can sometimes fall back on C-style memory management,
but I would like to do it in a safe way.  For instance, I have special-cased
evaluation of Scheme primitives, as follows.

Under the "normal" implementation, a list of things to evaluate is
built up, passed to an evaluation function, and then the GC is left
to sweep up the mess.  The problem is that there are various tricky
routes by which references can escape the evaluator, so you can't
just assume that what you put in is going to be dead right after
an eval and free it.  Instead, I set a flag in the evaluator, which
is TRUE if it is OK to free the list after the eval and FALSE if
it's unclear (in which case the problem is left up to the GC).

For the vast majority of Scheme primitives, one can indeed free the
list right after the eval.  Now of course I am not interested
in unsafe code, so what I do is this:

TYPE Pair = OBJECT first, rest : REFANY; END;

VAR
  mu := NEW(MUTEX);
  free : Pair := NIL;

PROCEDURE GetPair() : Pair =
  BEGIN
    LOCK mu DO
      IF free # NIL THEN
        TRY
          RETURN free
        FINALLY
          free := free.rest
        END
      END
    END;
    RETURN NEW(Pair)
  END GetPair;

PROCEDURE ReturnPair(cons : Pair) = 
  BEGIN
    cons.first := NIL;
    LOCK mu DO
      cons.rest := free;
      free := cons
    END
  END ReturnPair;

my eval code looks like

VAR okToFree : BOOLEAN; BEGIN

   args := GetPair(); ...
   result := EvalPrimitive(args, (*VAR OUT*) okToFree);

   IF okToFree THEN ReturnPair(args) END;
   RETURN result
END

and this does work well.  In fact it speeds up the Linux implementation
by almost 100% to recycle the lists like this *just* for the
evaluation of Scheme primitives.

But it's still ugly, isn't it?  There's a mutex, and a global
variable.  And yes, the time spent messing with the mutex is
noticeable, and I haven't even made the code multi-threaded yet
(and that is coming!)

So I'm thinking, what I really want is a structure that is attached
to my current Thread.T.  I want to be able to access just a single 
pointer (like the free list) but be sure it is unique to my current
thread.  No locking would be necessary if I could do this.

Does anyone have an elegant solution that does something like this?
Thread-specific "static" variables?  Just one REFANY would be enough
for a lot of uses...  seems to me this should be a frequently
occurring problem?

     Best regards,
       Mika
    

From hosking at cs.purdue.edu  Fri Oct 17 00:54:51 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Thu, 16 Oct 2008 23:54:51 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu>
References: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu>
Message-ID: <C17F2003-446E-466C-84DC-DA8E23A96726@cs.purdue.edu>

Have you tried running @M3noincremental?

On 16 Oct 2008, at 23:32, Mika Nystrom wrote:

> Hello Modula-3 people,
>
> As I mentioned in an earlier email about printing structures (thanks
> Darko), I'm in the midst of coding an interpreter embedded in
> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
> JScheme for Java (well it was at first strongly based, but more and
> more loosely, if you know what I mean...)
>
> I expected that the performance of the interpreter would be much
> better in Modula-3 than in Java, and I have been testing on two
> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
> and the other is CM3 on a recent Debian system.  What I am finding
> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
> close to ten times as fast on some tasks at this point), but on
> Linux/CM3 it is much closer in speed to JScheme than I would like.
>
> When I started, with code that was essentially equivalent to JScheme,
> I found that it was a bit slower than JScheme on Linux/CM3 and
> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
> spend most of its time in (surprise, surprise!) memory allocation
> and garbage collection.  The speedup I have achieved between the
> first implementation and now was due to the use of Modula-3 constructs
> that are superior to Java's, such as the use of arrays of RECORDs
> to make small stacks rather than linked lists.  (I get readable
> code with much fewer memory allocations and GC work.)
>
> Now, since this is an interpreter, I as the implementer have limited
> control over how much memory is allocated and freed, and where it is
> needed.  However, I can sometimes fall back on C-style memory  
> management,
> but I would like to do it in a safe way.  For instance, I have  
> special-cased
> evaluation of Scheme primitives, as follows.
>
> Under the "normal" implementation, a list of things to evaluate is
> built up, passed to an evaluation function, and then the GC is left
> to sweep up the mess.  The problem is that there are various tricky
> routes by which references can escape the evaluator, so you can't
> just assume that what you put in is going to be dead right after
> an eval and free it.  Instead, I set a flag in the evaluator, which
> is TRUE if it is OK to free the list after the eval and FALSE if
> it's unclear (in which case the problem is left up to the GC).
>
> For the vast majority of Scheme primitives, one can indeed free the
> list right after the eval.  Now of course I am not interested
> in unsafe code, so what I do is this:
>
> TYPE Pair = OBJECT first, rest : REFANY; END;
>
> VAR
>  mu := NEW(MUTEX);
>  free : Pair := NIL;
>
> PROCEDURE GetPair() : Pair =
>  BEGIN
>    LOCK mu DO
>      IF free # NIL THEN
>        TRY
>          RETURN free
>        FINALLY
>          free := free.rest
>        END
>      END
>    END;
>    RETURN NEW(Pair)
>  END GetPair;
>
> PROCEDURE ReturnPair(cons : Pair) =
>  BEGIN
>    cons.first := NIL;
>    LOCK mu DO
>      cons.rest := free;
>      free := cons
>    END
>  END ReturnPair;
>
> my eval code looks like
>
> VAR okToFree : BOOLEAN; BEGIN
>
>   args := GetPair(); ...
>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>
>   IF okToFree THEN ReturnPair(args) END;
>   RETURN result
> END
>
> and this does work well.  In fact it speeds up the Linux  
> implementation
> by almost 100% to recycle the lists like this *just* for the
> evaluation of Scheme primitives.
>
> But it's still ugly, isn't it?  There's a mutex, and a global
> variable.  And yes, the time spent messing with the mutex is
> noticeable, and I haven't even made the code multi-threaded yet
> (and that is coming!)
>
> So I'm thinking, what I really want is a structure that is attached
> to my current Thread.T.  I want to be able to access just a single
> pointer (like the free list) but be sure it is unique to my current
> thread.  No locking would be necessary if I could do this.
>
> Does anyone have an elegant solution that does something like this?
> Thread-specific "static" variables?  Just one REFANY would be enough
> for a lot of uses...  seems to me this should be a frequently
> occurring problem?
>
>     Best regards,
>       Mika
>
>
>
>
>
>


From mika at async.caltech.edu  Fri Oct 17 01:30:01 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 16:30:01 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Thu, 16 Oct 2008 23:54:51 BST."
	<C17F2003-446E-466C-84DC-DA8E23A96726@cs.purdue.edu> 
Message-ID: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>

Hi Tony,

I figured you would chime in!

Yes, @M3noincremental seems to make things consistently a tad bit
slower (but a very small difference), on both FreeBSD and Linux.
@M3nogc makes a bigger difference, of course.

Unfortunately I seem to have lost the code that did a lot of memory
allocations.  My tricks (as described in the email---and others!)
have removed most of the troublesome memory allocations, but now
I'm stuck with the mutex instead...

      Mika

Tony Hosking writes:
>Have you tried running @M3noincremental?
>
>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>
>> Hello Modula-3 people,
>>
>> As I mentioned in an earlier email about printing structures (thanks
>> Darko), I'm in the midst of coding an interpreter embedded in
>> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
>> JScheme for Java (well it was at first strongly based, but more and
>> more loosely, if you know what I mean...)
>>
>> I expected that the performance of the interpreter would be much
>> better in Modula-3 than in Java, and I have been testing on two
>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>> and the other is CM3 on a recent Debian system.  What I am finding
>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>> close to ten times as fast on some tasks at this point), but on
>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>
>> When I started, with code that was essentially equivalent to JScheme,
>> I found that it was a bit slower than JScheme on Linux/CM3 and
>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>> spend most of its time in (surprise, surprise!) memory allocation
>> and garbage collection.  The speedup I have achieved between the
>> first implementation and now was due to the use of Modula-3 constructs
>> that are superior to Java's, such as the use of arrays of RECORDs
>> to make small stacks rather than linked lists.  (I get readable
>> code with much fewer memory allocations and GC work.)
>>
>> Now, since this is an interpreter, I as the implementer have limited
>> control over how much memory is allocated and freed, and where it is
>> needed.  However, I can sometimes fall back on C-style memory  
>> management,
>> but I would like to do it in a safe way.  For instance, I have  
>> special-cased
>> evaluation of Scheme primitives, as follows.
>>
>> Under the "normal" implementation, a list of things to evaluate is
>> built up, passed to an evaluation function, and then the GC is left
>> to sweep up the mess.  The problem is that there are various tricky
>> routes by which references can escape the evaluator, so you can't
>> just assume that what you put in is going to be dead right after
>> an eval and free it.  Instead, I set a flag in the evaluator, which
>> is TRUE if it is OK to free the list after the eval and FALSE if
>> it's unclear (in which case the problem is left up to the GC).
>>
>> For the vast majority of Scheme primitives, one can indeed free the
>> list right after the eval.  Now of course I am not interested
>> in unsafe code, so what I do is this:
>>
>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>
>> VAR
>>  mu := NEW(MUTEX);
>>  free : Pair := NIL;
>>
>> PROCEDURE GetPair() : Pair =
>>  BEGIN
>>    LOCK mu DO
>>      IF free # NIL THEN
>>        TRY
>>          RETURN free
>>        FINALLY
>>          free := free.rest
>>        END
>>      END
>>    END;
>>    RETURN NEW(Pair)
>>  END GetPair;
>>
>> PROCEDURE ReturnPair(cons : Pair) =
>>  BEGIN
>>    cons.first := NIL;
>>    LOCK mu DO
>>      cons.rest := free;
>>      free := cons
>>    END
>>  END ReturnPair;
>>
>> my eval code looks like
>>
>> VAR okToFree : BOOLEAN; BEGIN
>>
>>   args := GetPair(); ...
>>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>
>>   IF okToFree THEN ReturnPair(args) END;
>>   RETURN result
>> END
>>
>> and this does work well.  In fact it speeds up the Linux  
>> implementation
>> by almost 100% to recycle the lists like this *just* for the
>> evaluation of Scheme primitives.
>>
>> But it's still ugly, isn't it?  There's a mutex, and a global
>> variable.  And yes, the time spent messing with the mutex is
>> noticeable, and I haven't even made the code multi-threaded yet
>> (and that is coming!)
>>
>> So I'm thinking, what I really want is a structure that is attached
>> to my current Thread.T.  I want to be able to access just a single
>> pointer (like the free list) but be sure it is unique to my current
>> thread.  No locking would be necessary if I could do this.
>>
>> Does anyone have an elegant solution that does something like this?
>> Thread-specific "static" variables?  Just one REFANY would be enough
>> for a lot of uses...  seems to me this should be a frequently
>> occurring problem?
>>
>>     Best regards,
>>       Mika
>>
>>
>>
>>
>>
>>


From jay.krell at cornell.edu  Fri Oct 17 06:40:28 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 17 Oct 2008 04:40:28 +0000
Subject: [M3devel] M3 programming problem : GC efficiency /
	per-thread	storage areas?
In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
References: Your message of 
	<200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
Message-ID: <COL101-W4964BD437A46A53516DAA3E6320@phx.gbl>


Making this per-thread is a fairly classic good improvement.

You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and allowing for a free to come in from any thread.

A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Including an array of mutexes.

If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a global counter that gets assigned into the thread on first use per-thread.

The cache could also be more than one element.

How do you manage okToFree?

Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be.
 
 - Jay

----------------------------------------
> To: hosking at cs.purdue.edu
> Date: Thu, 16 Oct 2008 16:30:01 -0700
> From: mika at async.caltech.edu
> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu
> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread	storage areas?
> 
> Hi Tony,
> 
> I figured you would chime in!
> 
> Yes, @M3noincremental seems to make things consistently a tad bit
> slower (but a very small difference), on both FreeBSD and Linux.
> @M3nogc makes a bigger difference, of course.
> 
> Unfortunately I seem to have lost the code that did a lot of memory
> allocations.  My tricks (as described in the email---and others!)
> have removed most of the troublesome memory allocations, but now
> I'm stuck with the mutex instead...
> 
>       Mika
> 
> Tony Hosking writes:
>>Have you tried running @M3noincremental?
>>
>>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>
>>> Hello Modula-3 people,
>>>
>>> As I mentioned in an earlier email about printing structures (thanks
>>> Darko), I'm in the midst of coding an interpreter embedded in
>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
>>> JScheme for Java (well it was at first strongly based, but more and
>>> more loosely, if you know what I mean...)
>>>
>>> I expected that the performance of the interpreter would be much
>>> better in Modula-3 than in Java, and I have been testing on two
>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>> and the other is CM3 on a recent Debian system.  What I am finding
>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>>> close to ten times as fast on some tasks at this point), but on
>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>
>>> When I started, with code that was essentially equivalent to JScheme,
>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>> spend most of its time in (surprise, surprise!) memory allocation
>>> and garbage collection.  The speedup I have achieved between the
>>> first implementation and now was due to the use of Modula-3 constructs
>>> that are superior to Java's, such as the use of arrays of RECORDs
>>> to make small stacks rather than linked lists.  (I get readable
>>> code with much fewer memory allocations and GC work.)
>>>
>>> Now, since this is an interpreter, I as the implementer have limited
>>> control over how much memory is allocated and freed, and where it is
>>> needed.  However, I can sometimes fall back on C-style memory  
>>> management,
>>> but I would like to do it in a safe way.  For instance, I have  
>>> special-cased
>>> evaluation of Scheme primitives, as follows.
>>>
>>> Under the "normal" implementation, a list of things to evaluate is
>>> built up, passed to an evaluation function, and then the GC is left
>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>> just assume that what you put in is going to be dead right after
>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>> it's unclear (in which case the problem is left up to the GC).
>>>
>>> For the vast majority of Scheme primitives, one can indeed free the
>>> list right after the eval.  Now of course I am not interested
>>> in unsafe code, so what I do is this:
>>>
>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>
>>> VAR
>>>  mu := NEW(MUTEX);
>>>  free : Pair := NIL;
>>>
>>> PROCEDURE GetPair() : Pair =
>>>  BEGIN
>>>    LOCK mu DO
>>>      IF free # NIL THEN
>>>        TRY
>>>          RETURN free
>>>        FINALLY
>>>          free := free.rest
>>>        END
>>>      END
>>>    END;
>>>    RETURN NEW(Pair)
>>>  END GetPair;
>>>
>>> PROCEDURE ReturnPair(cons : Pair) =
>>>  BEGIN
>>>    cons.first := NIL;
>>>    LOCK mu DO
>>>      cons.rest := free;
>>>      free := cons
>>>    END
>>>  END ReturnPair;
>>>
>>> my eval code looks like
>>>
>>> VAR okToFree : BOOLEAN; BEGIN
>>>
>>>   args := GetPair(); ...
>>>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>
>>>   IF okToFree THEN ReturnPair(args) END;
>>>   RETURN result
>>> END
>>>
>>> and this does work well.  In fact it speeds up the Linux  
>>> implementation
>>> by almost 100% to recycle the lists like this *just* for the
>>> evaluation of Scheme primitives.
>>>
>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>> variable.  And yes, the time spent messing with the mutex is
>>> noticeable, and I haven't even made the code multi-threaded yet
>>> (and that is coming!)
>>>
>>> So I'm thinking, what I really want is a structure that is attached
>>> to my current Thread.T.  I want to be able to access just a single
>>> pointer (like the free list) but be sure it is unique to my current
>>> thread.  No locking would be necessary if I could do this.
>>>
>>> Does anyone have an elegant solution that does something like this?
>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>> for a lot of uses...  seems to me this should be a frequently
>>> occurring problem?
>>>
>>>     Best regards,
>>>       Mika
>>>
>>>
>>>
>>>
>>>
>>>


From mika at async.caltech.edu  Fri Oct 17 08:32:15 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 23:32:15 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000."
	<COL101-W4964BD437A46A53516DAA3E6320@phx.gbl> 
Message-ID: <200810170632.m9H6WFHd078061@camembert.async.caltech.edu>


Well, I was thinking of something even simpler.  A Thread.T is an
OBJECT.  It's garbage collected just like any other object, is it
not?  

Why can't the thing that makes new threads simply include a single
globally visible field in every Thread.T, of type REFANY?  Call it "data".

Then you can always manipulate Thread.Self().data as you see fit
without any need for locks.  There can be no problem with this as
long as it is always manipulated from within that thread.
Of course this can be trivially encapsulated by not revealing "data"
and indeed always accessing it as Thread.Self().data.

You would not normally access this from any other thread.  It's indeed
only meant to be used in the idiom

  x := Allocate();
  TRY
    DoSomething(x)
  FINALLY
    Free(x)
  END

It's also not really a "Free" but just returning the object to a free
list (there can be no unsafe behavior here).

As a "nicer" interface, one could register routines with a public
interface, asking it to manufacture some kind of thread globals.
For maximum sanity, they would be visible inside the MODULE that
requested them, but I'm not sure how to accomplish this.  And of
course there's not much point in any of this unless it can be made
efficient or else a mutex plus a true global will work just as well.

What I'm talking about I guess could be done by hacking up Thread.Fork()
to return a subtype of Thread.T, but that won't work for the first
thread.  But with this method you could have arbitrary fields (and
methods) attached to a Thread.T.  How to collect everything you need
is a different story...

I'm not asking for a new language feature... really was just wondering
if anyone had tried anything like this before, and now am rambling a
bit.
 
     Mika

Jay writes:
>
>Making this per-thread is a fairly classic good improvement.
>
>You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and a
>llowing for a free to come in from any thread.
>
>A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Includi
>ng an array of mutexes.
>
>If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a glo
>bal counter that gets assigned into the thread on first use per-thread.
>
>The cache could also be more than one element.
>
>How do you manage okToFree?
>
>Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work
> with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be.
> 
> - Jay
>
>----------------------------------------
>> To: hosking at cs.purdue.edu
>> Date: Thu, 16 Oct 2008 16:30:01 -0700
>> From: mika at async.caltech.edu
>> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu
>> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread	storage areas?
>> 
>> Hi Tony,
>> 
>> I figured you would chime in!
>> 
>> Yes, @M3noincremental seems to make things consistently a tad bit
>> slower (but a very small difference), on both FreeBSD and Linux.
>> @M3nogc makes a bigger difference, of course.
>> 
>> Unfortunately I seem to have lost the code that did a lot of memory
>> allocations.  My tricks (as described in the email---and others!)
>> have removed most of the troublesome memory allocations, but now
>> I'm stuck with the mutex instead...
>> 
>>       Mika
>> 
>> Tony Hosking writes:
>>>Have you tried running @M3noincremental?
>>>
>>>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> As I mentioned in an earlier email about printing structures (thanks
>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
>>>> JScheme for Java (well it was at first strongly based, but more and
>>>> more loosely, if you know what I mean...)
>>>>
>>>> I expected that the performance of the interpreter would be much
>>>> better in Modula-3 than in Java, and I have been testing on two
>>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>>> close to ten times as fast on some tasks at this point), but on
>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>
>>>> When I started, with code that was essentially equivalent to JScheme,
>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>> and garbage collection.  The speedup I have achieved between the
>>>> first implementation and now was due to the use of Modula-3 constructs
>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>> to make small stacks rather than linked lists.  (I get readable
>>>> code with much fewer memory allocations and GC work.)
>>>>
>>>> Now, since this is an interpreter, I as the implementer have limited
>>>> control over how much memory is allocated and freed, and where it is
>>>> needed.  However, I can sometimes fall back on C-style memory  
>>>> management,
>>>> but I would like to do it in a safe way.  For instance, I have  
>>>> special-cased
>>>> evaluation of Scheme primitives, as follows.
>>>>
>>>> Under the "normal" implementation, a list of things to evaluate is
>>>> built up, passed to an evaluation function, and then the GC is left
>>>> to sweep up the mess.  The problem is that there are various tricky
>>>> routes by which references can escape the evaluator, so you can't
>>>> just assume that what you put in is going to be dead right after
>>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>> it's unclear (in which case the problem is left up to the GC).
>>>>
>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>> list right after the eval.  Now of course I am not interested
>>>> in unsafe code, so what I do is this:
>>>>
>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>
>>>> VAR
>>>>  mu := NEW(MUTEX);
>>>>  free : Pair := NIL;
>>>>
>>>> PROCEDURE GetPair() : Pair =
>>>>  BEGIN
>>>>    LOCK mu DO
>>>>      IF free # NIL THEN
>>>>        TRY
>>>>          RETURN free
>>>>        FINALLY
>>>>          free := free.rest
>>>>        END
>>>>      END
>>>>    END;
>>>>    RETURN NEW(Pair)
>>>>  END GetPair;
>>>>
>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>  BEGIN
>>>>    cons.first := NIL;
>>>>    LOCK mu DO
>>>>      cons.rest := free;
>>>>      free := cons
>>>>    END
>>>>  END ReturnPair;
>>>>
>>>> my eval code looks like
>>>>
>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>
>>>>   args := GetPair(); ...
>>>>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>
>>>>   IF okToFree THEN ReturnPair(args) END;
>>>>   RETURN result
>>>> END
>>>>
>>>> and this does work well.  In fact it speeds up the Linux  
>>>> implementation
>>>> by almost 100% to recycle the lists like this *just* for the
>>>> evaluation of Scheme primitives.
>>>>
>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>> variable.  And yes, the time spent messing with the mutex is
>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>> (and that is coming!)
>>>>
>>>> So I'm thinking, what I really want is a structure that is attached
>>>> to my current Thread.T.  I want to be able to access just a single
>>>> pointer (like the free list) but be sure it is unique to my current
>>>> thread.  No locking would be necessary if I could do this.
>>>>
>>>> Does anyone have an elegant solution that does something like this?
>>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>>> for a lot of uses...  seems to me this should be a frequently
>>>> occurring problem?
>>>>
>>>>     Best regards,
>>>>       Mika
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


From hosking at cs.purdue.edu  Fri Oct 17 08:35:03 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Fri, 17 Oct 2008 07:35:03 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
References: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
Message-ID: <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu>

I suspect part of the overhead of allocation in the new code is the  
need for thread-local allocation buffers, which means we need to  
access thread-local state.  We really need an efficient way to do  
that, but pthreads thread-local accesses may be what is killing you.

On 17 Oct 2008, at 00:30, Mika Nystrom wrote:

> Hi Tony,
>
> I figured you would chime in!
>
> Yes, @M3noincremental seems to make things consistently a tad bit
> slower (but a very small difference), on both FreeBSD and Linux.
> @M3nogc makes a bigger difference, of course.
>
> Unfortunately I seem to have lost the code that did a lot of memory
> allocations.  My tricks (as described in the email---and others!)
> have removed most of the troublesome memory allocations, but now
> I'm stuck with the mutex instead...
>
>      Mika
>
> Tony Hosking writes:
>> Have you tried running @M3noincremental?
>>
>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>
>>> Hello Modula-3 people,
>>>
>>> As I mentioned in an earlier email about printing structures (thanks
>>> Darko), I'm in the midst of coding an interpreter embedded in
>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter  
>>> Norvig's
>>> JScheme for Java (well it was at first strongly based, but more and
>>> more loosely, if you know what I mean...)
>>>
>>> I expected that the performance of the interpreter would be much
>>> better in Modula-3 than in Java, and I have been testing on two
>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>> and the other is CM3 on a recent Debian system.  What I am finding
>>> is that it is indeed much faster than JScheme on FreeBSD/PM3  
>>> (getting
>>> close to ten times as fast on some tasks at this point), but on
>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>
>>> When I started, with code that was essentially equivalent to  
>>> JScheme,
>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>> spend most of its time in (surprise, surprise!) memory allocation
>>> and garbage collection.  The speedup I have achieved between the
>>> first implementation and now was due to the use of Modula-3  
>>> constructs
>>> that are superior to Java's, such as the use of arrays of RECORDs
>>> to make small stacks rather than linked lists.  (I get readable
>>> code with much fewer memory allocations and GC work.)
>>>
>>> Now, since this is an interpreter, I as the implementer have limited
>>> control over how much memory is allocated and freed, and where it is
>>> needed.  However, I can sometimes fall back on C-style memory
>>> management,
>>> but I would like to do it in a safe way.  For instance, I have
>>> special-cased
>>> evaluation of Scheme primitives, as follows.
>>>
>>> Under the "normal" implementation, a list of things to evaluate is
>>> built up, passed to an evaluation function, and then the GC is left
>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>> just assume that what you put in is going to be dead right after
>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>> it's unclear (in which case the problem is left up to the GC).
>>>
>>> For the vast majority of Scheme primitives, one can indeed free the
>>> list right after the eval.  Now of course I am not interested
>>> in unsafe code, so what I do is this:
>>>
>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>
>>> VAR
>>> mu := NEW(MUTEX);
>>> free : Pair := NIL;
>>>
>>> PROCEDURE GetPair() : Pair =
>>> BEGIN
>>>   LOCK mu DO
>>>     IF free # NIL THEN
>>>       TRY
>>>         RETURN free
>>>       FINALLY
>>>         free := free.rest
>>>       END
>>>     END
>>>   END;
>>>   RETURN NEW(Pair)
>>> END GetPair;
>>>
>>> PROCEDURE ReturnPair(cons : Pair) =
>>> BEGIN
>>>   cons.first := NIL;
>>>   LOCK mu DO
>>>     cons.rest := free;
>>>     free := cons
>>>   END
>>> END ReturnPair;
>>>
>>> my eval code looks like
>>>
>>> VAR okToFree : BOOLEAN; BEGIN
>>>
>>>  args := GetPair(); ...
>>>  result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>
>>>  IF okToFree THEN ReturnPair(args) END;
>>>  RETURN result
>>> END
>>>
>>> and this does work well.  In fact it speeds up the Linux
>>> implementation
>>> by almost 100% to recycle the lists like this *just* for the
>>> evaluation of Scheme primitives.
>>>
>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>> variable.  And yes, the time spent messing with the mutex is
>>> noticeable, and I haven't even made the code multi-threaded yet
>>> (and that is coming!)
>>>
>>> So I'm thinking, what I really want is a structure that is attached
>>> to my current Thread.T.  I want to be able to access just a single
>>> pointer (like the free list) but be sure it is unique to my current
>>> thread.  No locking would be necessary if I could do this.
>>>
>>> Does anyone have an elegant solution that does something like this?
>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>> for a lot of uses...  seems to me this should be a frequently
>>> occurring problem?
>>>
>>>    Best regards,
>>>      Mika
>>>
>>>
>>>
>>>
>>>
>>>


From mika at async.caltech.edu  Fri Oct 17 08:50:13 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 23:50:13 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000."
	<COL101-W4964BD437A46A53516DAA3E6320@phx.gbl> 
Message-ID: <200810170650.m9H6oDU0078549@camembert.async.caltech.edu>

Jay writes:
...
>How do you manage okToFree?
...

I forgot to answer this q.

Well, the primitive evaluation in the interpreter is just a big
CASE statement.  I really just look at where it references the list
I am making, and if it references the list at all in a branch, I
insert the code "okToFree := FALSE".  The first two parameters are
passed in separately.  

Here's the code... since you ask!

This is the code for the special case of a two-argument Scheme procedure call,
such as (+ x 1) .

PROCEDURE Apply2(t : T; interp : Scheme.T; a1, a2 : Object) : Object
  VAR
      d1, d2 := GetCons();
      free := TRUE;
  BEGIN
      d1.first := a1; d1.rest := d2;
      d2.first := a2; d2.rest := NIL;

      WITH res = Prims(t, interp, d1, a1, a2, free) DO
        IF free THEN
          ReturnCons(d1); ReturnCons(d2)
        END;
        RETURN res
      END
  END Apply2;

PROCEDURE Prims(t : T; interp : Scheme.T; args, x, y : Object;
                VAR free : BOOLEAN) : Object =

   (* The (hopefully temporary) list of arguments is args.  x and
      y are the first two elements of args *)

   BEGIN
      CASE VAL(t.idNumber,P) OF
          P.Eq => RETURN NumCompare(args, '=')  (* known not to let args escape *)
        |
          P.List => free := FALSE; RETURN args  (* args escapes, dont know whither *)
        |
          P.Car => RETURN PedanticFirst(x)  (* doesn't even use args *)

        (* and about another 100 cases follow here *)

      END
   END Prims;

       Mika


From mika at async.caltech.edu  Fri Oct 17 10:03:18 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 17 Oct 2008 01:03:18 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST."
	<0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> 
Message-ID: <200810170803.m9H83IIC080081@camembert.async.caltech.edu>

Ok this suggests that using thread local state to get around the
problem won't help either.

Can I ask a question... I am looking at ThreadPThread.m3...

Why do you have to lock the slotMu in Self()?

PROCEDURE Self (): T =
  (* If not the initial thread and not created by Fork, returns NIL *)
  (* LL = 0 *)
  VAR
    me := GetActivation();
    t: T;
  BEGIN
    IF me = NIL THEN RETURN NIL END;
    WITH r = Upthread.mutex_lock(slotMu) DO <*ASSERT r=0*> END;
      t := slots[me.slot];
    WITH r = Upthread.mutex_unlock(slotMu) DO <*ASSERT r=0*> END;
    IF (t.act # me) THEN Die(ThisLine(), "thread with bad slot!") END;
    RETURN t;
  END Self;

Is it just because of AssignSlots?  If so.. it's actually a very rare
event that there would ever be a conflict, no?  (Only when "slots" is
extended?)

Can data be stored in an "Activation"?  Not TRACED data, obviously, hmm...

     Mika


Tony Hosking writes:
>I suspect part of the overhead of allocation in the new code is the  
>need for thread-local allocation buffers, which means we need to  
>access thread-local state.  We really need an efficient way to do  
>that, but pthreads thread-local accesses may be what is killing you.
>
>On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>
>> Hi Tony,
>>
>> I figured you would chime in!
>>
>> Yes, @M3noincremental seems to make things consistently a tad bit
>> slower (but a very small difference), on both FreeBSD and Linux.
>> @M3nogc makes a bigger difference, of course.
>>
>> Unfortunately I seem to have lost the code that did a lot of memory
>> allocations.  My tricks (as described in the email---and others!)
>> have removed most of the troublesome memory allocations, but now
>> I'm stuck with the mutex instead...
>>
>>      Mika
>>
>> Tony Hosking writes:
>>> Have you tried running @M3noincremental?
>>>
>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> As I mentioned in an earlier email about printing structures (thanks
>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter  
>>>> Norvig's
>>>> JScheme for Java (well it was at first strongly based, but more and
>>>> more loosely, if you know what I mean...)
>>>>
>>>> I expected that the performance of the interpreter would be much
>>>> better in Modula-3 than in Java, and I have been testing on two
>>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3  
>>>> (getting
>>>> close to ten times as fast on some tasks at this point), but on
>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>
>>>> When I started, with code that was essentially equivalent to  
>>>> JScheme,
>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>> and garbage collection.  The speedup I have achieved between the
>>>> first implementation and now was due to the use of Modula-3  
>>>> constructs
>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>> to make small stacks rather than linked lists.  (I get readable
>>>> code with much fewer memory allocations and GC work.)
>>>>
>>>> Now, since this is an interpreter, I as the implementer have limited
>>>> control over how much memory is allocated and freed, and where it is
>>>> needed.  However, I can sometimes fall back on C-style memory
>>>> management,
>>>> but I would like to do it in a safe way.  For instance, I have
>>>> special-cased
>>>> evaluation of Scheme primitives, as follows.
>>>>
>>>> Under the "normal" implementation, a list of things to evaluate is
>>>> built up, passed to an evaluation function, and then the GC is left
>>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>>> just assume that what you put in is going to be dead right after
>>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>> it's unclear (in which case the problem is left up to the GC).
>>>>
>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>> list right after the eval.  Now of course I am not interested
>>>> in unsafe code, so what I do is this:
>>>>
>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>
>>>> VAR
>>>> mu := NEW(MUTEX);
>>>> free : Pair := NIL;
>>>>
>>>> PROCEDURE GetPair() : Pair =
>>>> BEGIN
>>>>   LOCK mu DO
>>>>     IF free # NIL THEN
>>>>       TRY
>>>>         RETURN free
>>>>       FINALLY
>>>>         free := free.rest
>>>>       END
>>>>     END
>>>>   END;
>>>>   RETURN NEW(Pair)
>>>> END GetPair;
>>>>
>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>> BEGIN
>>>>   cons.first := NIL;
>>>>   LOCK mu DO
>>>>     cons.rest := free;
>>>>     free := cons
>>>>   END
>>>> END ReturnPair;
>>>>
>>>> my eval code looks like
>>>>
>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>
>>>>  args := GetPair(); ...
>>>>  result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>
>>>>  IF okToFree THEN ReturnPair(args) END;
>>>>  RETURN result
>>>> END
>>>>
>>>> and this does work well.  In fact it speeds up the Linux
>>>> implementation
>>>> by almost 100% to recycle the lists like this *just* for the
>>>> evaluation of Scheme primitives.
>>>>
>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>> variable.  And yes, the time spent messing with the mutex is
>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>> (and that is coming!)
>>>>
>>>> So I'm thinking, what I really want is a structure that is attached
>>>> to my current Thread.T.  I want to be able to access just a single
>>>> pointer (like the free list) but be sure it is unique to my current
>>>> thread.  No locking would be necessary if I could do this.
>>>>
>>>> Does anyone have an elegant solution that does something like this?
>>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>>> for a lot of uses...  seems to me this should be a frequently
>>>> occurring problem?
>>>>
>>>>    Best regards,
>>>>      Mika
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


From mika at async.caltech.edu  Fri Oct 17 10:32:28 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 17 Oct 2008 01:32:28 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST."
	<0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> 
Message-ID: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>

Ok I am sorry I am slow to pick up on this.

I take it the problem is actually the Upthread.getspecific routine,
which itself calls something get_curthread somewhere inside pthreads,
which in turn involves a context switch to the supervisor---the identity
of the current thread is just not accessible anywhere in user space.
Also explains why this program runs faster with my old PM3, which uses
longjmp threads.

The only way to avoid it (really) is to pass a pointer to the
Thread.T of the currently executing thread in the activation record
of *every* procedure, so that allocators can find it when necessary....
but that is very expensive in terms of stack memory.

Or I can just make a structure like that that I pass around where
I need it in my own program.  Thread-specific and user-managed.

I believe I have just answered all my own questions, but I hope
Tony will correct me if my answers are incorrect.

    Mika

Tony Hosking writes:
>I suspect part of the overhead of allocation in the new code is the  
>need for thread-local allocation buffers, which means we need to  
>access thread-local state.  We really need an efficient way to do  
>that, but pthreads thread-local accesses may be what is killing you.
>
>On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>
>> Hi Tony,
>>
>> I figured you would chime in!
>>
>> Yes, @M3noincremental seems to make things consistently a tad bit
>> slower (but a very small difference), on both FreeBSD and Linux.
>> @M3nogc makes a bigger difference, of course.
>>
>> Unfortunately I seem to have lost the code that did a lot of memory
>> allocations.  My tricks (as described in the email---and others!)
>> have removed most of the troublesome memory allocations, but now
>> I'm stuck with the mutex instead...
>>
>>      Mika
>>
>> Tony Hosking writes:
>>> Have you tried running @M3noincremental?
>>>
>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> As I mentioned in an earlier email about printing structures (thanks
>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter  
>>>> Norvig's
>>>> JScheme for Java (well it was at first strongly based, but more and
>>>> more loosely, if you know what I mean...)
>>>>
>>>> I expected that the performance of the interpreter would be much
>>>> better in Modula-3 than in Java, and I have been testing on two
>>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3  
>>>> (getting
>>>> close to ten times as fast on some tasks at this point), but on
>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>
>>>> When I started, with code that was essentially equivalent to  
>>>> JScheme,
>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>> and garbage collection.  The speedup I have achieved between the
>>>> first implementation and now was due to the use of Modula-3  
>>>> constructs
>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>> to make small stacks rather than linked lists.  (I get readable
>>>> code with much fewer memory allocations and GC work.)
>>>>
>>>> Now, since this is an interpreter, I as the implementer have limited
>>>> control over how much memory is allocated and freed, and where it is
>>>> needed.  However, I can sometimes fall back on C-style memory
>>>> management,
>>>> but I would like to do it in a safe way.  For instance, I have
>>>> special-cased
>>>> evaluation of Scheme primitives, as follows.
>>>>
>>>> Under the "normal" implementation, a list of things to evaluate is
>>>> built up, passed to an evaluation function, and then the GC is left
>>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>>> just assume that what you put in is going to be dead right after
>>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>> it's unclear (in which case the problem is left up to the GC).
>>>>
>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>> list right after the eval.  Now of course I am not interested
>>>> in unsafe code, so what I do is this:
>>>>
>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>
>>>> VAR
>>>> mu := NEW(MUTEX);
>>>> free : Pair := NIL;
>>>>
>>>> PROCEDURE GetPair() : Pair =
>>>> BEGIN
>>>>   LOCK mu DO
>>>>     IF free # NIL THEN
>>>>       TRY
>>>>         RETURN free
>>>>       FINALLY
>>>>         free := free.rest
>>>>       END
>>>>     END
>>>>   END;
>>>>   RETURN NEW(Pair)
>>>> END GetPair;
>>>>
>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>> BEGIN
>>>>   cons.first := NIL;
>>>>   LOCK mu DO
>>>>     cons.rest := free;
>>>>     free := cons
>>>>   END
>>>> END ReturnPair;
>>>>
>>>> my eval code looks like
>>>>
>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>
>>>>  args := GetPair(); ...
>>>>  result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>
>>>>  IF okToFree THEN ReturnPair(args) END;
>>>>  RETURN result
>>>> END
>>>>
>>>> and this does work well.  In fact it speeds up the Linux
>>>> implementation
>>>> by almost 100% to recycle the lists like this *just* for the
>>>> evaluation of Scheme primitives.
>>>>
>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>> variable.  And yes, the time spent messing with the mutex is
>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>> (and that is coming!)
>>>>
>>>> So I'm thinking, what I really want is a structure that is attached
>>>> to my current Thread.T.  I want to be able to access just a single
>>>> pointer (like the free list) but be sure it is unique to my current
>>>> thread.  No locking would be necessary if I could do this.
>>>>
>>>> Does anyone have an elegant solution that does something like this?
>>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>>> for a lot of uses...  seems to me this should be a frequently
>>>> occurring problem?
>>>>
>>>>    Best regards,
>>>>      Mika
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


From jay.krell at cornell.edu  Sat Oct 18 00:42:35 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 17 Oct 2008 22:42:35 +0000
Subject: [M3devel] M3 programming problem : GC efficiency /
	per-thread	storage areas?
In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
References: Your message of 
	<200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
Message-ID: <COL101-W48200DF8FB7269A7B2E229E6320@phx.gbl>


Right and wrong.

Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGetValue.
Wrong that this necessarily incurs a switch to the supervisor/kernel, and perhaps wrong to call that at a "context switch". It depends on the operating system.

I will explain.

On Windows/x86, the FS register points to a partly documented per-thread data structure.
C and C++ exception handling use FS:0.
Disassemble any code. You'll find it is used. Not by Modula-3 though.

Disassemble TlsGetValue.

 cdb /z %windir%\system32\kernel32.dll  

0:000> uf kernel32!TlsGetValue
kernel32!TlsGetValue:

 typical looking prolog.. 
7dd813e0 8bff            mov     edi,edi
7dd813e2 55              push    ebp
7dd813e3 8bec            mov     ebp,esp

 fs:18 contains a "normal" "linear" pointer to fs:0 
 Get that pointer. 
7dd813e5 64a118000000    mov     eax,dword ptr fs:[00000018h]

 get the index 
7dd813eb 8b4d08          mov     ecx,dword ptr [ebp+8]

 SetLastError(0) 
7dd813ee 83603400        and     dword ptr [eax+34h],0

  There are 64 preallocated thread local slots -- compare the index to 64. 
7dd813f2 83f940          cmp     ecx,40h   

  If it above or equal to 64, go use the non preallocated slots. 
7dd813f5 0f8353e20200    jae     kernel32!lstrcmpi+0x4b22 (7ddaf64e)

  preallocated slots are at fs:e10; get the data and done  
kernel32!TlsGetValue+0x1b:
7dd813fb 8b8488100e0000  mov     eax,dword ptr [eax+ecx*4+0E10h]

 epilog 

kernel32!TlsGetValue+0x22:
7dd81402 5d              pop     ebp
7dd81403 c20400          ret     4

 get here for indices>= 64
 compare index to 1088 == 1024 + 64, as there are another 1024 more slowly available slots  

kernel32!lstrcmpi+0x4b22:
7ddaf64e 81f940040000    cmp     ecx,440h

 if it is below 1024, go use those slots 

7ddaf654 7211            jb      kernel32!lstrcmpi+0x4b3b (7ddaf667)

 index is above or equal to 1024, SetLastError(invalid parameter) 

kernel32!lstrcmpi+0x4b2a:
7ddaf656 680d0000c0      push    0C000000Dh
7ddaf65b e80025fdff      call    kernel32!GetProcessHeap+0x12 (7dd81b60)

 and return 0 -- 0 is not unambiguously an error -- that's why last error was cleared at the start 

kernel32!lstrcmpi+0x4b34:
7ddaf660 33c0            xor     eax,eax
7ddaf662 e99b1dfdff      jmp     kernel32!TlsGetValue+0x22 (7dd81402)

 This is where the slots between 64 and 1088 are used. 
 Get pointer from FS:F94 and compare to null.
  If it is null, that is ok, it means nobody has yet calls TlsSetValue for this value,
  so it just retains its initial 0 value. 
kernel32!lstrcmpi+0x4b3b:
7ddaf667 8b80940f0000    mov     eax,dword ptr [eax+0F94h]
7ddaf66d 85c0            test    eax,eax
7ddaf66f 74ef            je      kernel32!lstrcmpi+0x4b34 (7ddaf660)

 Index is between 64 and 1088, and there is a non null pointer at FS:F94.
 Subtract 64 from index and index into pointer there. 
 Note it does the subtraction after the multiplication, so subtracts 64*4=0x100.

kernel32!lstrcmpi+0x4b45:
7ddaf671 8b848800ffffff  mov     eax,dword ptr [eax+ecx*4-100h]
7ddaf678 e9851dfdff      jmp     kernel32!TlsGetValue+0x22 (7dd81402)


So, it is a few instructions but there is no context switch into the kernel/supervisor.

Also, calls into the kernel aren't necessarily a "context switch".
Some context is saved, and a bit is twiddled in the processor to indicate a privilege level change, but no page tables are altered and I believe no TLBs (translation lookaside buffer) are invalidated, and no thread scheduling decisions are made -- though upon exit from the kernel, APCs (asynchronous procedure call) can be run -- on the calling thread. 

A more expensive context switch is when another thread or another process runs.
Switching threads requires saving more context, and switching processes requires changing the register that points to the page tables.
One detail there -- calling into the x86 NT kernel does not preserve floating point state -- that's the additional state that a thread switch has to save, at least. NT/x86 kernel drivers aren't allowed to use floating point, with some exception, like if they are video drivers (only certain functions?) or they explicitly save/restore the floating point registers using public functions.
I don't know about the other architectures. I think IA64 only preserves some floating point state, not all.


Now, the question then is how is Upthread.getspecific implemented on other archictures and operating systems.
We should look into that for various operating systems.


Oh, also, let's see what __declspec(thread) does.

>type t.c


__declspec(thread) int a;

void F1(int);

void F2() { F1(a); }

cl -c t.c

link -dump -disasm t.obj


Dump of file t.obj

File Type: COFF OBJECT

_F2:
  00000000: 55                 push        ebp
  00000001: 8B EC              mov         ebp,esp
  00000003: A1 00 00 00 00     mov         eax,dword ptr [__tls_index]
  00000008: 64 8B 0D 00 00 00  mov         ecx,dword ptr fs:[__tls_array]
            00
  0000000F: 8B 14 81           mov         edx,dword ptr [ecx+eax*4]
  00000012: 8B 82 00 00 00 00  mov         eax,dword ptr _a[edx]
  00000018: 50                 push        eax
  00000019: E8 00 00 00 00     call        _F1
  0000001E: 83 C4 04           add         esp,4
  00000021: 5D                 pop         ebp
  00000022: C3                 ret

See the compiler generated code reference FS directly.

The optimized version is:

Dump of file t.obj

File Type: COFF OBJECT

_F2:
  00000000: A1 00 00 00 00     mov         eax,dword ptr [__tls_index]
  00000005: 64 8B 0D 00 00 00  mov         ecx,dword ptr fs:[__tls_array]
            00
  0000000C: 8B 14 81           mov         edx,dword ptr [ecx+eax*4]
  0000000F: 8B 82 00 00 00 00  mov         eax,dword ptr _a[edx]
  00000015: 50                 push        eax
  00000016: E8 00 00 00 00     call        _F1
  0000001B: 59                 pop         ecx
  0000001C: C3                 ret

 - Jay


> To: hosking at cs.purdue.edu
> Date: Fri, 17 Oct 2008 01:32:28 -0700
> From: mika at async.caltech.edu
> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu
> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas?
>
> Ok I am sorry I am slow to pick up on this.
>
> I take it the problem is actually the Upthread.getspecific routine,
> which itself calls something get_curthread somewhere inside pthreads,
> which in turn involves a context switch to the supervisor---the identity
> of the current thread is just not accessible anywhere in user space.
> Also explains why this program runs faster with my old PM3, which uses
> longjmp threads.
>
> The only way to avoid it (really) is to pass a pointer to the
> Thread.T of the currently executing thread in the activation record
> of *every* procedure, so that allocators can find it when necessary....
> but that is very expensive in terms of stack memory.
>
> Or I can just make a structure like that that I pass around where
> I need it in my own program. Thread-specific and user-managed.
>
> I believe I have just answered all my own questions, but I hope
> Tony will correct me if my answers are incorrect.
>
> Mika
>
> Tony Hosking writes:
>>I suspect part of the overhead of allocation in the new code is the
>>need for thread-local allocation buffers, which means we need to
>>access thread-local state. We really need an efficient way to do
>>that, but pthreads thread-local accesses may be what is killing you.
>>
>>On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>>
>>> Hi Tony,
>>>
>>> I figured you would chime in!
>>>
>>> Yes, @M3noincremental seems to make things consistently a tad bit
>>> slower (but a very small difference), on both FreeBSD and Linux.
>>> @M3nogc makes a bigger difference, of course.
>>>
>>> Unfortunately I seem to have lost the code that did a lot of memory
>>> allocations. My tricks (as described in the email---and others!)
>>> have removed most of the troublesome memory allocations, but now
>>> I'm stuck with the mutex instead...
>>>
>>> Mika
>>>
>>> Tony Hosking writes:
>>>> Have you tried running @M3noincremental?
>>>>
>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> As I mentioned in an earlier email about printing structures (thanks
>>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter
>>>>> Norvig's
>>>>> JScheme for Java (well it was at first strongly based, but more and
>>>>> more loosely, if you know what I mean...)
>>>>>
>>>>> I expected that the performance of the interpreter would be much
>>>>> better in Modula-3 than in Java, and I have been testing on two
>>>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3,
>>>>> and the other is CM3 on a recent Debian system. What I am finding
>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3
>>>>> (getting
>>>>> close to ten times as fast on some tasks at this point), but on
>>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>>
>>>>> When I started, with code that was essentially equivalent to
>>>>> JScheme,
>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to
>>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>>> and garbage collection. The speedup I have achieved between the
>>>>> first implementation and now was due to the use of Modula-3
>>>>> constructs
>>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>>> to make small stacks rather than linked lists. (I get readable
>>>>> code with much fewer memory allocations and GC work.)
>>>>>
>>>>> Now, since this is an interpreter, I as the implementer have limited
>>>>> control over how much memory is allocated and freed, and where it is
>>>>> needed. However, I can sometimes fall back on C-style memory
>>>>> management,
>>>>> but I would like to do it in a safe way. For instance, I have
>>>>> special-cased
>>>>> evaluation of Scheme primitives, as follows.
>>>>>
>>>>> Under the "normal" implementation, a list of things to evaluate is
>>>>> built up, passed to an evaluation function, and then the GC is left
>>>>> to sweep up the mess. The problem is that there are various tricky
>>>> routes by which references can escape the evaluator, so you can't
>>>>> just assume that what you put in is going to be dead right after
>>>>> an eval and free it. Instead, I set a flag in the evaluator, which
>>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>>> it's unclear (in which case the problem is left up to the GC).
>>>>>
>>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>>> list right after the eval. Now of course I am not interested
>>>>> in unsafe code, so what I do is this:
>>>>>
>>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>>
>>>>> VAR
>>>>> mu := NEW(MUTEX);
>>>>> free : Pair := NIL;
>>>>>
>>>>> PROCEDURE GetPair() : Pair =
>>>>> BEGIN
>>>>> LOCK mu DO
>>>>> IF free # NIL THEN
>>>>> TRY
>>>>> RETURN free
>>>>> FINALLY
>>>>> free := free.rest
>>>>> END
>>>>> END
>>>>> END;
>>>>> RETURN NEW(Pair)
>>>>> END GetPair;
>>>>>
>>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>> BEGIN
>>>>> cons.first := NIL;
>>>>> LOCK mu DO
>>>>> cons.rest := free;
>>>>> free := cons
>>>>> END
>>>>> END ReturnPair;
>>>>>
>>>>> my eval code looks like
>>>>>
>>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>>
>>>>> args := GetPair(); ...
>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>>
>>>>> IF okToFree THEN ReturnPair(args) END;
>>>>> RETURN result
>>>>> END
>>>>>
>>>>> and this does work well. In fact it speeds up the Linux
>>>>> implementation
>>>>> by almost 100% to recycle the lists like this *just* for the
>>>>> evaluation of Scheme primitives.
>>>>>
>>>>> But it's still ugly, isn't it? There's a mutex, and a global
>>>>> variable. And yes, the time spent messing with the mutex is
>>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>>> (and that is coming!)
>>>>>
>>>>> So I'm thinking, what I really want is a structure that is attached
>>>>> to my current Thread.T. I want to be able to access just a single
>>>>> pointer (like the free list) but be sure it is unique to my current
>>>>> thread. No locking would be necessary if I could do this.
>>>>>
>>>>> Does anyone have an elegant solution that does something like this?
>>>>> Thread-specific "static" variables? Just one REFANY would be enough
>>>>> for a lot of uses... seems to me this should be a frequently
>>>>> occurring problem?
>>>>>
>>>>> Best regards,
>>>>> Mika
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>


From mika at async.caltech.edu  Sat Oct 18 01:00:28 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 17 Oct 2008 16:00:28 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 22:42:35 -0000."
	<COL101-W48200DF8FB7269A7B2E229E6320@phx.gbl> 
Message-ID: <200810172300.m9HN0SfN008554@camembert.async.caltech.edu>


No, I didn't mean that it *necessarily* involves a context switch.
Obviously it doesn't, because the user-level threading doesn't
ever need to do a "kernel" context switch (but of course does its
own switching, however I don't see that it would need that to get 
or set a variable).

I just meant that looking at the (C) implementation of pthreads I
have (on FreeBSD), on that system, it does seem to, as the code in
question is marked as "kernel code".

In any case I think I have been able to solve my particular problem
by identifying a data structure that is inherently only accessed
from a single thread (in my program) and attaching my memory recycling
trickery to that particular structure.  I get very little memory
allocation/GC and no need for locks at all, which is precisely the
effect I was going for.

I am still a little bit concerned about the performance of CM3-generated
code but the main culprit appears to be TYPECASE/ISTYPE now, far
from garbage collectors and thread libraries.  I'll send an update
if I can find something egregiously inefficient.

    Mika

Jay writes:
>
>Right and wrong.
>
>Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGet
>Value.
>Wrong that this necessarily incurs a switch to the supervisor/kernel, and perh
>aps wrong to call that at a "context switch". It depends on the operating syst
>em.
>
>I will explain.
>
>On Windows/x86, the FS register points to a partly documented per-thread data 
>structure.
>C and C++ exception handling use FS:0.
>Disassemble any code. You'll find it is used. Not by Modula-3 though.
>
>Disassemble TlsGetValue.
>
> cdb /z %windir%\system32\kernel32.dll  
>
>0:000> uf kernel32!TlsGetValue
>kernel32!TlsGetValue:
...


From mika at async.caltech.edu  Sat Oct 18 10:41:30 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Sat, 18 Oct 2008 01:41:30 -0700
Subject: [M3devel] Fortran
Message-ID: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu>


Ok now in the realm of crazy questions---and I apologize to those
whose inboxes I clog with some of my emails...

If there is anyone out there in Modula-3-ether who has ever written
or heard of ...

  an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs

... would he please make himself known to me?  (I have a Scheme
interpreter to trade...)

    Mika


From lemming at henning-thielemann.de  Sat Oct 18 17:34:50 2008
From: lemming at henning-thielemann.de (Henning Thielemann)
Date: Sat, 18 Oct 2008 17:34:50 +0200 (MEST)
Subject: [M3devel] Fortran
In-Reply-To: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu>
References: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu>
Message-ID: <Pine.SOC.4.64.0810181646120.28054@haydn.informatik.uni-halle.de>


On Sat, 18 Oct 2008, Mika Nystrom wrote:

> Ok now in the realm of crazy questions---and I apologize to those
> whose inboxes I clog with some of my emails...
>
> If there is anyone out there in Modula-3-ether who has ever written
> or heard of ...
>
>  an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs
>
> ... would he please make himself known to me?  (I have a Scheme
> interpreter to trade...)

I have written a program for generating Modula-3 interfaces for LAPACK 
(linear algebra routines) using m3coco. But I'm afraid that my Fortran 
parser works only for LAPACK and no other library. I have just copied the 
CVS files to
    http://modula3.elegosoft.com/cgi-bin/cvsweb.cgi/m3/pm3/language/parsing/m3coco/test/?cvsroot=PM3
   Before you check this out, I might move it to a different location, 
maybe cm3/m3-tools, if this is more appropriate. (Maybe you also need the 
revised m3coco version, which I only have on a branch, and never tried to 
merge it back to HEAD.)


While searching my own code in the net, I found some nice interviews with 
Luca Cardelli:
   http://www.wikio.com/technology/development/modula-3


From mika at async.caltech.edu  Tue Oct 21 13:05:01 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Tue, 21 Oct 2008 04:05:01 -0700
Subject: [M3devel] CM3 on Mac OS X Tiger
Message-ID: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu>

Hello everyone,

Sorry if I have asked this before---I feel I must have, and Tony
probably answered it, too, but I can't find it anywhere in my email
archives.

It looks like I finally upgraded my Mac to Tiger a half year ago,
and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
I am finally getting around to fixing it.  Now I am trying to
compile CM3 in accordance with Tony's instructions as of June 24, 2007:

(short quote here)
> cd ~/cm3-cvs
> mkdir boot
> cd boot
> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
> ./cminstall

Now you will have some kind of cm3 installed, presumably in /usr/
local/cm3/bin/cm3.

Make sure you have a fresh CVS checkout in directory cm3 (let's
assume this is in your home directory ~/cm3).  Also, make sure you
have an up-to-date version of the CM3 backend compiler cm3cg
installed by executing the following:

STEP 0:

export CM3=/usr/local/cm3/bin/cm3
cd ~/cm3/m3-sys/m3cc
$CM3
$CM3 -ship

You can skip this last step if you know your backend compiler is up
to date.

Now, let's build the new compiler from scratch (this is the sequence
I use regularly to test changes to the run-time system whenever I
make them):

STEP 1:

cd ~/cm3/m3-libs/m3core
$CM3
$CM3 -ship
(end short quote, there's much more)

What happens is that when building m3core, my compiler is building
it against the interfaces in /usr/local/cm3, NOT the interfaces
within m3core itself:

--- building in PPC_DARWIN ---

ignoring ../src/m3overrides

new source -> compiling RTCollector.m3
"../src/runtime/common/RTCollector.m3", line 2914: unknown qualification '.' (AMD64_LINUX)
"../src/runtime/common/RTCollector.m3", line 2915: unknown qualification '.' (SPARC32_LINUX)
"../src/runtime/common/RTCollector.m3", line 2916: unknown qualification '.' (SPARC64_OPENBSD)
"../src/runtime/common/RTCollector.m3", line 2917: unknown qualification '.' (PPC32_OPENBSD)
4 errors encountered
stale imports -> compiling RTDebug.m3

Fatal Error: bad version stamps: RTDebug.m3

version stamp mismatch: Compiler.Platform
  <df3c2b13d1d385ee> => RTDebug.m3
  <da77490d024222ef> => Compiler.i3  
version stamp mismatch: Compiler.ThisPlatform
  <8b5a6f513e082750> => RTDebug.m3
  <8e110d4fed998051> => Compiler.i3  

I feel like I should REALLY know the answer to this, but how do I 
get the compiler to use only the local sources and not attempt
to compile things with reference to the already-installed 
interfaces?

    Mika


From hosking at cs.purdue.edu  Tue Oct 21 13:21:36 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 12:21:36 +0100
Subject: [M3devel] CM3 on Mac OS X Tiger
In-Reply-To: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu>
References: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu>
Message-ID: <27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu>

This is a phase ordering problem that arises when you use an old  
compiler to compile newer sources.  It really should be fixed  
somehow.  In any case, the problem is those lines in RTCollector at  
the bottom (I deleted them yesterday on the main trunk) that refer to  
values supposedly built in to the compiler (which are not there for  
the old binary you are using).  I think if you delete those lines then  
you should be OK.  Once you have a new compiler bootstrapped (with  
those configuration values available built in) then you should be able  
to compile that code (excepting that I just deleted those lines  
yesterday).


On 21 Oct 2008, at 12:05, Mika Nystrom wrote:

> Hello everyone,
>
> Sorry if I have asked this before---I feel I must have, and Tony
> probably answered it, too, but I can't find it anywhere in my email
> archives.
>
> It looks like I finally upgraded my Mac to Tiger a half year ago,
> and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
> I am finally getting around to fixing it.  Now I am trying to
> compile CM3 in accordance with Tony's instructions as of June 24,  
> 2007:
>
> (short quote here)
>> cd ~/cm3-cvs
>> mkdir boot
>> cd boot
>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
>> ./cminstall
>
> Now you will have some kind of cm3 installed, presumably in /usr/
> local/cm3/bin/cm3.
>
> Make sure you have a fresh CVS checkout in directory cm3 (let's
> assume this is in your home directory ~/cm3).  Also, make sure you
> have an up-to-date version of the CM3 backend compiler cm3cg
> installed by executing the following:
>
> STEP 0:
>
> export CM3=/usr/local/cm3/bin/cm3
> cd ~/cm3/m3-sys/m3cc
> $CM3
> $CM3 -ship
>
> You can skip this last step if you know your backend compiler is up
> to date.
>
> Now, let's build the new compiler from scratch (this is the sequence
> I use regularly to test changes to the run-time system whenever I
> make them):
>
> STEP 1:
>
> cd ~/cm3/m3-libs/m3core
> $CM3
> $CM3 -ship
> (end short quote, there's much more)
>
> What happens is that when building m3core, my compiler is building
> it against the interfaces in /usr/local/cm3, NOT the interfaces
> within m3core itself:
>
> --- building in PPC_DARWIN ---
>
> ignoring ../src/m3overrides
>
> new source -> compiling RTCollector.m3
> "../src/runtime/common/RTCollector.m3", line 2914: unknown  
> qualification '.' (AMD64_LINUX)
> "../src/runtime/common/RTCollector.m3", line 2915: unknown  
> qualification '.' (SPARC32_LINUX)
> "../src/runtime/common/RTCollector.m3", line 2916: unknown  
> qualification '.' (SPARC64_OPENBSD)
> "../src/runtime/common/RTCollector.m3", line 2917: unknown  
> qualification '.' (PPC32_OPENBSD)
> 4 errors encountered
> stale imports -> compiling RTDebug.m3
>
> Fatal Error: bad version stamps: RTDebug.m3
>
> version stamp mismatch: Compiler.Platform
>  <df3c2b13d1d385ee> => RTDebug.m3
>  <da77490d024222ef> => Compiler.i3
> version stamp mismatch: Compiler.ThisPlatform
>  <8b5a6f513e082750> => RTDebug.m3
>  <8e110d4fed998051> => Compiler.i3
>
> I feel like I should REALLY know the answer to this, but how do I
> get the compiler to use only the local sources and not attempt
> to compile things with reference to the already-installed
> interfaces?
>
>    Mika


From hosking at cs.purdue.edu  Tue Oct 21 16:54:58 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 15:54:58 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
Message-ID: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu>

I have one more question that I forgot to ask before.  Did you  
evaluate performance with -O3 optimization in the backend?

Generally, I have the following in my m3_backend specs so that turning  
on optimization results in -O3 (and lots of lovely inlining):

proc m3_backend (source, object, optimize, debug) is
   local args =
   [
     "-m32",
     "-quiet",
     source,
     "-o",
     object,
     % fPIC really is needed here, despite man gcc saying it is the  
default.
     % This is because man gcc is about Apple's gcc but m3cg is
     % built from FSF source.
     "-fPIC",
     "-fno-reorder-blocks"
   ]
   if optimize  args += "-O3"  end
   if debug     args += "-gstabs"  end
   if M3_PROFILING args += "-p" end
   return try_exec (m3back, args)
end


On 17 Oct 2008, at 09:32, Mika Nystrom wrote:

> Ok I am sorry I am slow to pick up on this.
>
> I take it the problem is actually the Upthread.getspecific routine,
> which itself calls something get_curthread somewhere inside pthreads,
> which in turn involves a context switch to the supervisor---the  
> identity
> of the current thread is just not accessible anywhere in user space.
> Also explains why this program runs faster with my old PM3, which uses
> longjmp threads.
>
> The only way to avoid it (really) is to pass a pointer to the
> Thread.T of the currently executing thread in the activation record
> of *every* procedure, so that allocators can find it when  
> necessary....
> but that is very expensive in terms of stack memory.
>
> Or I can just make a structure like that that I pass around where
> I need it in my own program.  Thread-specific and user-managed.
>
> I believe I have just answered all my own questions, but I hope
> Tony will correct me if my answers are incorrect.
>
>    Mika
>
> Tony Hosking writes:
>> I suspect part of the overhead of allocation in the new code is the
>> need for thread-local allocation buffers, which means we need to
>> access thread-local state.  We really need an efficient way to do
>> that, but pthreads thread-local accesses may be what is killing you.
>>
>> On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>>
>>> Hi Tony,
>>>
>>> I figured you would chime in!
>>>
>>> Yes, @M3noincremental seems to make things consistently a tad bit
>>> slower (but a very small difference), on both FreeBSD and Linux.
>>> @M3nogc makes a bigger difference, of course.
>>>
>>> Unfortunately I seem to have lost the code that did a lot of memory
>>> allocations.  My tricks (as described in the email---and others!)
>>> have removed most of the troublesome memory allocations, but now
>>> I'm stuck with the mutex instead...
>>>
>>>     Mika
>>>
>>> Tony Hosking writes:
>>>> Have you tried running @M3noincremental?
>>>>
>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> As I mentioned in an earlier email about printing structures  
>>>>> (thanks
>>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter
>>>>> Norvig's
>>>>> JScheme for Java (well it was at first strongly based, but more  
>>>>> and
>>>>> more loosely, if you know what I mean...)
>>>>>
>>>>> I expected that the performance of the interpreter would be much
>>>>> better in Modula-3 than in Java, and I have been testing on two
>>>>> different systems.  One is my ancient FreeBSD-4.11 with an old  
>>>>> PM3,
>>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3
>>>>> (getting
>>>>> close to ten times as fast on some tasks at this point), but on
>>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>>
>>>>> When I started, with code that was essentially equivalent to
>>>>> JScheme,
>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>>> and garbage collection.  The speedup I have achieved between the
>>>>> first implementation and now was due to the use of Modula-3
>>>>> constructs
>>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>>> to make small stacks rather than linked lists.  (I get readable
>>>>> code with much fewer memory allocations and GC work.)
>>>>>
>>>>> Now, since this is an interpreter, I as the implementer have  
>>>>> limited
>>>>> control over how much memory is allocated and freed, and where  
>>>>> it is
>>>>> needed.  However, I can sometimes fall back on C-style memory
>>>>> management,
>>>>> but I would like to do it in a safe way.  For instance, I have
>>>>> special-cased
>>>>> evaluation of Scheme primitives, as follows.
>>>>>
>>>>> Under the "normal" implementation, a list of things to evaluate is
>>>>> built up, passed to an evaluation function, and then the GC is  
>>>>> left
>>>>> to sweep up the mess.  The problem is that there are various  
>>>>> tricky
>>>> routes by which references can escape the evaluator, so you can't
>>>>> just assume that what you put in is going to be dead right after
>>>>> an eval and free it.  Instead, I set a flag in the evaluator,  
>>>>> which
>>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>>> it's unclear (in which case the problem is left up to the GC).
>>>>>
>>>>> For the vast majority of Scheme primitives, one can indeed free  
>>>>> the
>>>>> list right after the eval.  Now of course I am not interested
>>>>> in unsafe code, so what I do is this:
>>>>>
>>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>>
>>>>> VAR
>>>>> mu := NEW(MUTEX);
>>>>> free : Pair := NIL;
>>>>>
>>>>> PROCEDURE GetPair() : Pair =
>>>>> BEGIN
>>>>>  LOCK mu DO
>>>>>    IF free # NIL THEN
>>>>>      TRY
>>>>>        RETURN free
>>>>>      FINALLY
>>>>>        free := free.rest
>>>>>      END
>>>>>    END
>>>>>  END;
>>>>>  RETURN NEW(Pair)
>>>>> END GetPair;
>>>>>
>>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>> BEGIN
>>>>>  cons.first := NIL;
>>>>>  LOCK mu DO
>>>>>    cons.rest := free;
>>>>>    free := cons
>>>>>  END
>>>>> END ReturnPair;
>>>>>
>>>>> my eval code looks like
>>>>>
>>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>>
>>>>> args := GetPair(); ...
>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>>
>>>>> IF okToFree THEN ReturnPair(args) END;
>>>>> RETURN result
>>>>> END
>>>>>
>>>>> and this does work well.  In fact it speeds up the Linux
>>>>> implementation
>>>>> by almost 100% to recycle the lists like this *just* for the
>>>>> evaluation of Scheme primitives.
>>>>>
>>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>>> variable.  And yes, the time spent messing with the mutex is
>>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>>> (and that is coming!)
>>>>>
>>>>> So I'm thinking, what I really want is a structure that is  
>>>>> attached
>>>>> to my current Thread.T.  I want to be able to access just a single
>>>>> pointer (like the free list) but be sure it is unique to my  
>>>>> current
>>>>> thread.  No locking would be necessary if I could do this.
>>>>>
>>>>> Does anyone have an elegant solution that does something like  
>>>>> this?
>>>>> Thread-specific "static" variables?  Just one REFANY would be  
>>>>> enough
>>>>> for a lot of uses...  seems to me this should be a frequently
>>>>> occurring problem?
>>>>>
>>>>>   Best regards,
>>>>>     Mika
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>


From hosking at cs.purdue.edu  Tue Oct 21 17:17:24 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 16:17:24 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu>
References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
	<34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu>
Message-ID: <1396C14A-B23D-4D19-804B-B1627B44106F@cs.purdue.edu>

Also, turn off assertions.

On 21 Oct 2008, at 15:54, Tony Hosking wrote:

> I have one more question that I forgot to ask before.  Did you  
> evaluate performance with -O3 optimization in the backend?
>
> Generally, I have the following in my m3_backend specs so that  
> turning on optimization results in -O3 (and lots of lovely inlining):
>
> proc m3_backend (source, object, optimize, debug) is
>  local args =
>  [
>    "-m32",
>    "-quiet",
>    source,
>    "-o",
>    object,
>    % fPIC really is needed here, despite man gcc saying it is the  
> default.
>    % This is because man gcc is about Apple's gcc but m3cg is
>    % built from FSF source.
>    "-fPIC",
>    "-fno-reorder-blocks"
>  ]
>  if optimize  args += "-O3"  end
>  if debug     args += "-gstabs"  end
>  if M3_PROFILING args += "-p" end
>  return try_exec (m3back, args)
> end
>
>
> On 17 Oct 2008, at 09:32, Mika Nystrom wrote:
>
>> Ok I am sorry I am slow to pick up on this.
>>
>> I take it the problem is actually the Upthread.getspecific routine,
>> which itself calls something get_curthread somewhere inside pthreads,
>> which in turn involves a context switch to the supervisor---the  
>> identity
>> of the current thread is just not accessible anywhere in user space.
>> Also explains why this program runs faster with my old PM3, which  
>> uses
>> longjmp threads.
>>
>> The only way to avoid it (really) is to pass a pointer to the
>> Thread.T of the currently executing thread in the activation record
>> of *every* procedure, so that allocators can find it when  
>> necessary....
>> but that is very expensive in terms of stack memory.
>>
>> Or I can just make a structure like that that I pass around where
>> I need it in my own program.  Thread-specific and user-managed.
>>
>> I believe I have just answered all my own questions, but I hope
>> Tony will correct me if my answers are incorrect.
>>
>>   Mika
>>
>> Tony Hosking writes:
>>> I suspect part of the overhead of allocation in the new code is the
>>> need for thread-local allocation buffers, which means we need to
>>> access thread-local state.  We really need an efficient way to do
>>> that, but pthreads thread-local accesses may be what is killing you.
>>>
>>> On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>>>
>>>> Hi Tony,
>>>>
>>>> I figured you would chime in!
>>>>
>>>> Yes, @M3noincremental seems to make things consistently a tad bit
>>>> slower (but a very small difference), on both FreeBSD and Linux.
>>>> @M3nogc makes a bigger difference, of course.
>>>>
>>>> Unfortunately I seem to have lost the code that did a lot of memory
>>>> allocations.  My tricks (as described in the email---and others!)
>>>> have removed most of the troublesome memory allocations, but now
>>>> I'm stuck with the mutex instead...
>>>>
>>>>    Mika
>>>>
>>>> Tony Hosking writes:
>>>>> Have you tried running @M3noincremental?
>>>>>
>>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>>>
>>>>>> Hello Modula-3 people,
>>>>>>
>>>>>> As I mentioned in an earlier email about printing structures  
>>>>>> (thanks
>>>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter
>>>>>> Norvig's
>>>>>> JScheme for Java (well it was at first strongly based, but more  
>>>>>> and
>>>>>> more loosely, if you know what I mean...)
>>>>>>
>>>>>> I expected that the performance of the interpreter would be much
>>>>>> better in Modula-3 than in Java, and I have been testing on two
>>>>>> different systems.  One is my ancient FreeBSD-4.11 with an old  
>>>>>> PM3,
>>>>>> and the other is CM3 on a recent Debian system.  What I am  
>>>>>> finding
>>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3
>>>>>> (getting
>>>>>> close to ten times as fast on some tasks at this point), but on
>>>>>> Linux/CM3 it is much closer in speed to JScheme than I would  
>>>>>> like.
>>>>>>
>>>>>> When I started, with code that was essentially equivalent to
>>>>>> JScheme,
>>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>>>> and garbage collection.  The speedup I have achieved between the
>>>>>> first implementation and now was due to the use of Modula-3
>>>>>> constructs
>>>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>>>> to make small stacks rather than linked lists.  (I get readable
>>>>>> code with much fewer memory allocations and GC work.)
>>>>>>
>>>>>> Now, since this is an interpreter, I as the implementer have  
>>>>>> limited
>>>>>> control over how much memory is allocated and freed, and where  
>>>>>> it is
>>>>>> needed.  However, I can sometimes fall back on C-style memory
>>>>>> management,
>>>>>> but I would like to do it in a safe way.  For instance, I have
>>>>>> special-cased
>>>>>> evaluation of Scheme primitives, as follows.
>>>>>>
>>>>>> Under the "normal" implementation, a list of things to evaluate  
>>>>>> is
>>>>>> built up, passed to an evaluation function, and then the GC is  
>>>>>> left
>>>>>> to sweep up the mess.  The problem is that there are various  
>>>>>> tricky
>>>>> routes by which references can escape the evaluator, so you can't
>>>>>> just assume that what you put in is going to be dead right after
>>>>>> an eval and free it.  Instead, I set a flag in the evaluator,  
>>>>>> which
>>>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>>>> it's unclear (in which case the problem is left up to the GC).
>>>>>>
>>>>>> For the vast majority of Scheme primitives, one can indeed free  
>>>>>> the
>>>>>> list right after the eval.  Now of course I am not interested
>>>>>> in unsafe code, so what I do is this:
>>>>>>
>>>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>>>
>>>>>> VAR
>>>>>> mu := NEW(MUTEX);
>>>>>> free : Pair := NIL;
>>>>>>
>>>>>> PROCEDURE GetPair() : Pair =
>>>>>> BEGIN
>>>>>> LOCK mu DO
>>>>>>   IF free # NIL THEN
>>>>>>     TRY
>>>>>>       RETURN free
>>>>>>     FINALLY
>>>>>>       free := free.rest
>>>>>>     END
>>>>>>   END
>>>>>> END;
>>>>>> RETURN NEW(Pair)
>>>>>> END GetPair;
>>>>>>
>>>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>>> BEGIN
>>>>>> cons.first := NIL;
>>>>>> LOCK mu DO
>>>>>>   cons.rest := free;
>>>>>>   free := cons
>>>>>> END
>>>>>> END ReturnPair;
>>>>>>
>>>>>> my eval code looks like
>>>>>>
>>>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>>>
>>>>>> args := GetPair(); ...
>>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>>>
>>>>>> IF okToFree THEN ReturnPair(args) END;
>>>>>> RETURN result
>>>>>> END
>>>>>>
>>>>>> and this does work well.  In fact it speeds up the Linux
>>>>>> implementation
>>>>>> by almost 100% to recycle the lists like this *just* for the
>>>>>> evaluation of Scheme primitives.
>>>>>>
>>>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>>>> variable.  And yes, the time spent messing with the mutex is
>>>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>>>> (and that is coming!)
>>>>>>
>>>>>> So I'm thinking, what I really want is a structure that is  
>>>>>> attached
>>>>>> to my current Thread.T.  I want to be able to access just a  
>>>>>> single
>>>>>> pointer (like the free list) but be sure it is unique to my  
>>>>>> current
>>>>>> thread.  No locking would be necessary if I could do this.
>>>>>>
>>>>>> Does anyone have an elegant solution that does something like  
>>>>>> this?
>>>>>> Thread-specific "static" variables?  Just one REFANY would be  
>>>>>> enough
>>>>>> for a lot of uses...  seems to me this should be a frequently
>>>>>> occurring problem?
>>>>>>
>>>>>>  Best regards,
>>>>>>    Mika
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>


From mika at async.caltech.edu  Tue Oct 21 22:18:07 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Tue, 21 Oct 2008 13:18:07 -0700
Subject: [M3devel] CM3 on Mac OS X Tiger
In-Reply-To: Your message of "Tue, 21 Oct 2008 12:21:36 BST."
	<27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu> 
Message-ID: <200810212018.m9LKI81o019865@camembert.async.caltech.edu>

Hi Tony,

Thanks for helping, as usual!

I ran into this now, is this also a bootstrapping problem?  (Moving
on to building libm3, cleared out existing PPC_DARWIN, have rebuilt
m3cc... only see a single version of Compiler.i3 anywhere...)

Here's the log:

[lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship
--- building in PPC_DARWIN ---

ignoring ../src/m3overrides

new source -> compiling Atom.i3
new source -> compiling AtomList.i3
new source -> compiling OSError.i3
new source -> compiling File.i3
new source -> compiling RegularFile.i3
new source -> compiling Pipe.i3
new source -> compiling TextSeq.i3
new source -> compiling Pathname.i3
new source -> compiling FS.i3
new source -> compiling Process.i3
new source -> compiling Socket.i3
new source -> compiling Terminal.i3
new source -> compiling FS.m3
new source -> compiling Terminal.m3
new source -> compiling RegularFile.m3
new source -> compiling Pipe.m3
new source -> compiling Socket.m3
new source -> compiling OSConfig.i3
new source -> compiling OSErrorPosix.i3
new source -> compiling Fmt.i3
new source -> compiling OSErrorPosix.m3
new source -> compiling FilePosix.i3
new source -> compiling FilePosix.m3
new source -> compiling FSPosix.m3
new source -> compiling PipePosix.m3
new source -> compiling PathnamePosix.m3
new source -> compiling SocketPosix.m3

Fatal Error: bad version stamps: SocketPosix.m3

version stamp mismatch: Compiler.Platform
  <df3c2b13d1d385ee> => SocketPosix.m3
  <da77490d024222ef> => Compiler.i3  
version stamp mismatch: Compiler.ThisPlatform
  <8b5a6f513e082750> => SocketPosix.m3
  <8e110d4fed998051> => Compiler.i3  
[lapdog:~/cm3/m3-libs/libm3] mika% 

Tony Hosking writes:
>This is a phase ordering problem that arises when you use an old  
>compiler to compile newer sources.  It really should be fixed  
>somehow.  In any case, the problem is those lines in RTCollector at  
>the bottom (I deleted them yesterday on the main trunk) that refer to  
>values supposedly built in to the compiler (which are not there for  
>the old binary you are using).  I think if you delete those lines then  
>you should be OK.  Once you have a new compiler bootstrapped (with  
>those configuration values available built in) then you should be able  
>to compile that code (excepting that I just deleted those lines  
>yesterday).
>
>
>On 21 Oct 2008, at 12:05, Mika Nystrom wrote:
>
>> Hello everyone,
>>
>> Sorry if I have asked this before---I feel I must have, and Tony
>> probably answered it, too, but I can't find it anywhere in my email
>> archives.
>>
>> It looks like I finally upgraded my Mac to Tiger a half year ago,
>> and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
>> I am finally getting around to fixing it.  Now I am trying to
>> compile CM3 in accordance with Tony's instructions as of June 24,  
>> 2007:
>>
>> (short quote here)
>>> cd ~/cm3-cvs
>>> mkdir boot
>>> cd boot
>>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
>>> ./cminstall
>>
>> Now you will have some kind of cm3 installed, presumably in /usr/
>> local/cm3/bin/cm3.
>>
>> Make sure you have a fresh CVS checkout in directory cm3 (let's
>> assume this is in your home directory ~/cm3).  Also, make sure you
>> have an up-to-date version of the CM3 backend compiler cm3cg
>> installed by executing the following:
>>
>> STEP 0:
>>
>> export CM3=/usr/local/cm3/bin/cm3
>> cd ~/cm3/m3-sys/m3cc
>> $CM3
>> $CM3 -ship
>>
>> You can skip this last step if you know your backend compiler is up
>> to date.
>>
>> Now, let's build the new compiler from scratch (this is the sequence
>> I use regularly to test changes to the run-time system whenever I
>> make them):
>>
>> STEP 1:
>>
>> cd ~/cm3/m3-libs/m3core
>> $CM3
>> $CM3 -ship
>> (end short quote, there's much more)
>>
>> What happens is that when building m3core, my compiler is building
>> it against the interfaces in /usr/local/cm3, NOT the interfaces
>> within m3core itself:
>>
>> --- building in PPC_DARWIN ---
>>
>> ignoring ../src/m3overrides
>>
>> new source -> compiling RTCollector.m3
>> "../src/runtime/common/RTCollector.m3", line 2914: unknown  
>> qualification '.' (AMD64_LINUX)
>> "../src/runtime/common/RTCollector.m3", line 2915: unknown  
>> qualification '.' (SPARC32_LINUX)
>> "../src/runtime/common/RTCollector.m3", line 2916: unknown  
>> qualification '.' (SPARC64_OPENBSD)
>> "../src/runtime/common/RTCollector.m3", line 2917: unknown  
>> qualification '.' (PPC32_OPENBSD)
>> 4 errors encountered
>> stale imports -> compiling RTDebug.m3
>>
>> Fatal Error: bad version stamps: RTDebug.m3
>>
>> version stamp mismatch: Compiler.Platform
>>  <df3c2b13d1d385ee> => RTDebug.m3
>>  <da77490d024222ef> => Compiler.i3
>> version stamp mismatch: Compiler.ThisPlatform
>>  <8b5a6f513e082750> => RTDebug.m3
>>  <8e110d4fed998051> => Compiler.i3
>>
>> I feel like I should REALLY know the answer to this, but how do I
>> get the compiler to use only the local sources and not attempt
>> to compile things with reference to the already-installed
>> interfaces?
>>
>>    Mika


From hosking at cs.purdue.edu  Tue Oct 21 23:29:07 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 22:29:07 +0100
Subject: [M3devel] CM3 on Mac OS X Tiger
In-Reply-To: <200810212018.m9LKI81o019865@camembert.async.caltech.edu>
References: <200810212018.m9LKI81o019865@camembert.async.caltech.edu>
Message-ID: <BF077330-03E9-45CB-8F30-27066330331B@cs.purdue.edu>

Hmm.  Not sure.  Looks like it.

On 21 Oct 2008, at 21:18, Mika Nystrom wrote:

> Hi Tony,
>
> Thanks for helping, as usual!
>
> I ran into this now, is this also a bootstrapping problem?  (Moving
> on to building libm3, cleared out existing PPC_DARWIN, have rebuilt
> m3cc... only see a single version of Compiler.i3 anywhere...)
>
> Here's the log:
>
> [lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship
> --- building in PPC_DARWIN ---
>
> ignoring ../src/m3overrides
>
> new source -> compiling Atom.i3
> new source -> compiling AtomList.i3
> new source -> compiling OSError.i3
> new source -> compiling File.i3
> new source -> compiling RegularFile.i3
> new source -> compiling Pipe.i3
> new source -> compiling TextSeq.i3
> new source -> compiling Pathname.i3
> new source -> compiling FS.i3
> new source -> compiling Process.i3
> new source -> compiling Socket.i3
> new source -> compiling Terminal.i3
> new source -> compiling FS.m3
> new source -> compiling Terminal.m3
> new source -> compiling RegularFile.m3
> new source -> compiling Pipe.m3
> new source -> compiling Socket.m3
> new source -> compiling OSConfig.i3
> new source -> compiling OSErrorPosix.i3
> new source -> compiling Fmt.i3
> new source -> compiling OSErrorPosix.m3
> new source -> compiling FilePosix.i3
> new source -> compiling FilePosix.m3
> new source -> compiling FSPosix.m3
> new source -> compiling PipePosix.m3
> new source -> compiling PathnamePosix.m3
> new source -> compiling SocketPosix.m3
>
> Fatal Error: bad version stamps: SocketPosix.m3
>
> version stamp mismatch: Compiler.Platform
>  <df3c2b13d1d385ee> => SocketPosix.m3
>  <da77490d024222ef> => Compiler.i3
> version stamp mismatch: Compiler.ThisPlatform
>  <8b5a6f513e082750> => SocketPosix.m3
>  <8e110d4fed998051> => Compiler.i3
> [lapdog:~/cm3/m3-libs/libm3] mika%
>
> Tony Hosking writes:
>> This is a phase ordering problem that arises when you use an old
>> compiler to compile newer sources.  It really should be fixed
>> somehow.  In any case, the problem is those lines in RTCollector at
>> the bottom (I deleted them yesterday on the main trunk) that refer to
>> values supposedly built in to the compiler (which are not there for
>> the old binary you are using).  I think if you delete those lines  
>> then
>> you should be OK.  Once you have a new compiler bootstrapped (with
>> those configuration values available built in) then you should be  
>> able
>> to compile that code (excepting that I just deleted those lines
>> yesterday).
>>
>>
>> On 21 Oct 2008, at 12:05, Mika Nystrom wrote:
>>
>>> Hello everyone,
>>>
>>> Sorry if I have asked this before---I feel I must have, and Tony
>>> probably answered it, too, but I can't find it anywhere in my email
>>> archives.
>>>
>>> It looks like I finally upgraded my Mac to Tiger a half year ago,
>>> and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
>>> I am finally getting around to fixing it.  Now I am trying to
>>> compile CM3 in accordance with Tony's instructions as of June 24,
>>> 2007:
>>>
>>> (short quote here)
>>>> cd ~/cm3-cvs
>>>> mkdir boot
>>>> cd boot
>>>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
>>>> ./cminstall
>>>
>>> Now you will have some kind of cm3 installed, presumably in /usr/
>>> local/cm3/bin/cm3.
>>>
>>> Make sure you have a fresh CVS checkout in directory cm3 (let's
>>> assume this is in your home directory ~/cm3).  Also, make sure you
>>> have an up-to-date version of the CM3 backend compiler cm3cg
>>> installed by executing the following:
>>>
>>> STEP 0:
>>>
>>> export CM3=/usr/local/cm3/bin/cm3
>>> cd ~/cm3/m3-sys/m3cc
>>> $CM3
>>> $CM3 -ship
>>>
>>> You can skip this last step if you know your backend compiler is up
>>> to date.
>>>
>>> Now, let's build the new compiler from scratch (this is the sequence
>>> I use regularly to test changes to the run-time system whenever I
>>> make them):
>>>
>>> STEP 1:
>>>
>>> cd ~/cm3/m3-libs/m3core
>>> $CM3
>>> $CM3 -ship
>>> (end short quote, there's much more)
>>>
>>> What happens is that when building m3core, my compiler is building
>>> it against the interfaces in /usr/local/cm3, NOT the interfaces
>>> within m3core itself:
>>>
>>> --- building in PPC_DARWIN ---
>>>
>>> ignoring ../src/m3overrides
>>>
>>> new source -> compiling RTCollector.m3
>>> "../src/runtime/common/RTCollector.m3", line 2914: unknown
>>> qualification '.' (AMD64_LINUX)
>>> "../src/runtime/common/RTCollector.m3", line 2915: unknown
>>> qualification '.' (SPARC32_LINUX)
>>> "../src/runtime/common/RTCollector.m3", line 2916: unknown
>>> qualification '.' (SPARC64_OPENBSD)
>>> "../src/runtime/common/RTCollector.m3", line 2917: unknown
>>> qualification '.' (PPC32_OPENBSD)
>>> 4 errors encountered
>>> stale imports -> compiling RTDebug.m3
>>>
>>> Fatal Error: bad version stamps: RTDebug.m3
>>>
>>> version stamp mismatch: Compiler.Platform
>>> <df3c2b13d1d385ee> => RTDebug.m3
>>> <da77490d024222ef> => Compiler.i3
>>> version stamp mismatch: Compiler.ThisPlatform
>>> <8b5a6f513e082750> => RTDebug.m3
>>> <8e110d4fed998051> => Compiler.i3
>>>
>>> I feel like I should REALLY know the answer to this, but how do I
>>> get the compiler to use only the local sources and not attempt
>>> to compile things with reference to the already-installed
>>> interfaces?
>>>
>>>   Mika


From mika at async.caltech.edu  Thu Oct 23 10:24:53 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 23 Oct 2008 01:24:53 -0700
Subject: [M3devel] NEW in RTType.m3
Message-ID: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu>

Hello Modula-3 people,

Does anyone know whether there is anything that prevents using NEW
in RTType.m3?

I added a lot of memory recycling to the Scheme interpreter I am
working on, and now it seems it is spending a lot of time in Typecase
and IsSubtype.  I was wondering if it is possible to memoize IsSubtype
inside RTType.m3...  (specifically just replacing IsSubtype with an
array lookup).  

It is the nature of the interpreter that it spends a lot of time
checking types and narrowing things back and forth, as Scheme and
Modula-3 references share the same representation.

      Mika


From hosking at cs.purdue.edu  Thu Oct 23 12:10:01 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Thu, 23 Oct 2008 11:10:01 +0100
Subject: [M3devel] NEW in RTType.m3
In-Reply-To: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu>
References: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu>
Message-ID: <7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu>

Could be dangerous depending on module link orderings.  Might be  
better to cache your own lookups in your interpreter.

On 23 Oct 2008, at 09:24, Mika Nystrom wrote:

> Hello Modula-3 people,
>
> Does anyone know whether there is anything that prevents using NEW
> in RTType.m3?
>
> I added a lot of memory recycling to the Scheme interpreter I am
> working on, and now it seems it is spending a lot of time in Typecase
> and IsSubtype.  I was wondering if it is possible to memoize IsSubtype
> inside RTType.m3...  (specifically just replacing IsSubtype with an
> array lookup).
>
> It is the nature of the interpreter that it spends a lot of time
> checking types and narrowing things back and forth, as Scheme and
> Modula-3 references share the same representation.
>
>      Mika


From mika at async.caltech.edu  Thu Oct 23 19:29:50 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 23 Oct 2008 10:29:50 -0700
Subject: [M3devel] NEW in RTType.m3
In-Reply-To: Your message of "Thu, 23 Oct 2008 11:10:01 BST."
	<7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu> 
Message-ID: <200810231729.m9NHToMC080136@camembert.async.caltech.edu>


Well I'm not calling Typecase and IsSubtype directly---the compiler
is inserting the calls.

Here's an example of my code:

170           IF x # NIL AND ISTYPE(x,Symbol) THEN
171             RETURN env.lookup(x)
172           ELSIF x = NIL OR NOT ISTYPE(x,Pair) THEN 
173             RETURN x
174           ELSE

this code actually winds up in here (RTType.m3):

PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
  VAR t: RT0.TypeDefn;
  BEGIN
    IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
    t := Get (a);
    IF (t = NIL) THEN RETURN FALSE; END;
    IF (t.typecode = b) THEN RETURN TRUE END;
    WHILE (t.kind = ORD (TK.Obj)) DO
      IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
      t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
      IF (t = NIL) THEN RETURN FALSE; END;
      IF (t.typecode = b) THEN RETURN TRUE; END;
    END;
    IF (t.traced # 0)
      THEN RETURN (b = RT0.RefanyTypecode);
      ELSE RETURN (b = RT0.AddressTypecode);
    END;
  END IsSubtype;

Again this is an example of something where the CM3 code seems to
be hurting more than PM3, but it could be that for some reason I
have more visibility into the CM3 code, or that there's an optimization
difference (I haven't been able to investigate this fully yet).  In
any case, it's clear that if IsSubtype could be replaced with a
table lookup, this kind of code would be accelerated by potentially
a lot.

Note that while in the above example the code might be accelerated
by (in my opinion, less clear) use of TYPECODE (since I never subtype
Symbol or Pair---for now!), this is not so for some NARROWs.  The
NARROWs also wind up calling RTType.IsSubtype, and they arise because
I have types that depend on each other, and unless I want to introduce
extra complexity (new partial revelations) or stick everything in
the same interface, I am forced to NARROW something to avoid a
circular dependency of interfaces...  A method of A.T takes a B.T
and a method of B.T takes an A.T, so I make a supertype X.T s.t.
A.T <: X.T ; then I can declare B.T.m to take an X.T and NARROW it
to A.T within B.T.m... triggering a call to the above code.  (For
simplicity's sake, X.T could be REFANY or ROOT.)  An attempt to
declare B.T.m as taking A.T would lead to a circular dependency
between A and B.  The code is really rather simple and it's a shame
if you have to make it look much more complicated to avoid issues
like these which might equally well be solved by tweaking the runtime
implementation a bit.

     Mika

Tony Hosking writes:
>Could be dangerous depending on module link orderings.  Might be  
>better to cache your own lookups in your interpreter.
>
>On 23 Oct 2008, at 09:24, Mika Nystrom wrote:
>
>> Hello Modula-3 people,
>>
>> Does anyone know whether there is anything that prevents using NEW
>> in RTType.m3?
>>
>> I added a lot of memory recycling to the Scheme interpreter I am
>> working on, and now it seems it is spending a lot of time in Typecase
>> and IsSubtype.  I was wondering if it is possible to memoize IsSubtype
>> inside RTType.m3...  (specifically just replacing IsSubtype with an
>> array lookup).
>>
>> It is the nature of the interpreter that it spends a lot of time
>> checking types and narrowing things back and forth, as Scheme and
>> Modula-3 references share the same representation.
>>
>>      Mika


From mika at async.caltech.edu  Sat Oct 25 05:16:56 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 24 Oct 2008 20:16:56 -0700
Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3
Message-ID: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu>


Dear Modula-3 people,

I had a crash in my program from a range error that I believe
shouldn't have happened the way it did, although it's not in my
code, so I'm not sure if there's a reason for the way it's done (matching
a C declaration somewhere, maybe??).

Here it is, from ThreadPosix.m3:

PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN;
                  timeoutInterval: LONGREAL := -1.0D0): WaitResult =
  <*FATAL Alerted*>
  BEGIN
    self.alertable := FALSE;
    RETURN XIOWait(fd, read, timeoutInterval);
  END IOWait;

PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN;
                  timeoutInterval: LONGREAL := -1.0D0): WaitResult
                  RAISES {Alerted} =
  BEGIN
    self.alertable := TRUE;
    RETURN XIOWait(fd, read, timeoutInterval);
  END IOAlertWait;

PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL): WaitResult
    RAISES {Alerted} =
  VAR res: INTEGER;
      fdindex := fd DIV FDSetSize;
      fdset := FDSet{fd MOD FDSetSize};
... rest omitted ...

Note that IOWait calls XIOWait.  IOWait is declared as taking an
INTEGER, but XIOWait takes a CARDINAL.

So I really should use a CARDINAL in passing to IOWait, but since
IOWait is the interface function it's not clear that I should do
that (until my program crashes after passing -1 from some carelessly
wrapped C code).  I don't like the fact that I get a range error
*inside* the library when it appears unnecessary---it should have
happened in my code, as I make the call.

Suggested improvement: declare all the FDs in SchedulerPosix.i3
(the interface that exports these routines) to be CARDINAL instead
of INTEGER.

     Mika


From hosking at cs.purdue.edu  Mon Oct 27 15:28:52 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Mon, 27 Oct 2008 14:28:52 +0000
Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3
In-Reply-To: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu>
References: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu>
Message-ID: <5232F2E4-3B0E-49E5-B1C8-BB4D04C60C33@cs.purdue.edu>

Sounds fair to me.

On 25 Oct 2008, at 04:16, Mika Nystrom wrote:

>
> Dear Modula-3 people,
>
> I had a crash in my program from a range error that I believe
> shouldn't have happened the way it did, although it's not in my
> code, so I'm not sure if there's a reason for the way it's done  
> (matching
> a C declaration somewhere, maybe??).
>
> Here it is, from ThreadPosix.m3:
>
> PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN;
>                  timeoutInterval: LONGREAL := -1.0D0): WaitResult =
>  <*FATAL Alerted*>
>  BEGIN
>    self.alertable := FALSE;
>    RETURN XIOWait(fd, read, timeoutInterval);
>  END IOWait;
>
> PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN;
>                  timeoutInterval: LONGREAL := -1.0D0): WaitResult
>                  RAISES {Alerted} =
>  BEGIN
>    self.alertable := TRUE;
>    RETURN XIOWait(fd, read, timeoutInterval);
>  END IOAlertWait;
>
> PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL):  
> WaitResult
>    RAISES {Alerted} =
>  VAR res: INTEGER;
>      fdindex := fd DIV FDSetSize;
>      fdset := FDSet{fd MOD FDSetSize};
> ... rest omitted ...
>
> Note that IOWait calls XIOWait.  IOWait is declared as taking an
> INTEGER, but XIOWait takes a CARDINAL.
>
> So I really should use a CARDINAL in passing to IOWait, but since
> IOWait is the interface function it's not clear that I should do
> that (until my program crashes after passing -1 from some carelessly
> wrapped C code).  I don't like the fact that I get a range error
> *inside* the library when it appears unnecessary---it should have
> happened in my code, as I make the call.
>
> Suggested improvement: declare all the FDs in SchedulerPosix.i3
> (the interface that exports these routines) to be CARDINAL instead
> of INTEGER.
>
>     Mika


From jay.krell at cornell.edu  Thu Oct 30 22:21:09 2008
From: jay.krell at cornell.edu (Jay)
Date: Thu, 30 Oct 2008 21:21:09 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
Message-ID: <COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>


Please try this:

 http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2

std failed to build because stubgen crashed, probably due to gc.
cm3 does crash right away without @M3nogc.

Something like this:
    cd /src 
    wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2  
    cd /cm3  
    rm -rf *  
    tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2  
    cd /src/cm3/scripts/python  
    ./do-cm3-all.py realclean  
    ./upgrade.py  
    ./do-cm3-all.py realclean  
    ./do-cm3-std.py buildship  
    => it will fail, at zeus, but it should get far; you'll also need some X devel packages to get that far, I had a failure for lack of libXaw for example. I did not run anything, any of the GUI packages, but building itself with itself is a decent test.

I renamed the old AMD64_LINUX archives to "1.0.0".
 http://www.opencm3.com/uploaded-archives/

This has the bug fix I commited last night to cm3cg, and therefore a 64 bit hosted cm3cg.

jay at amd64a:/cm3/bin$ file *
AMD64_LINUX: ASCII text
cm3:         ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
cm3.cfg:     ASCII English text
cm3cg:       ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li
nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
m3bundle:    ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li
nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
mklib:       ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
Unix.common: ASCII English text

Built on Debian 4.0r4 (r5 is out).
jay at amd64a:/cm3/bin$ uname -a
Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64 GNU/Linux
jay at amd64a:/cm3/bin$ dmesg | head
Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)
Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org) (
gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Tue Aug 19 04:30:56 UTC 2008

Though really I couldn't do it without Visual C++ on Windows providing excellent find-in-files and editing, nothing else comes close, I edit on Windows and scp the files over. :)

 - Jay

________________________________

From: jay.krell at cornell.edu
To: dragisha at m3w.org; m3devel at elegosoft.com
Date: Tue, 9 Sep 2008 09:43:03 +0000
Subject: Re: [M3devel] AMD64_LINUX status


From hosking at cs.purdue.edu  Fri Oct 31 11:19:51 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Fri, 31 Oct 2008 10:19:51 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
Message-ID: <BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>

Umm, I think I found your bug with GC:

Check out "RTMachine.PointerAlignment".  You have it set to  
BITSIZE(INTEGER).  I suspect what you want is something like  
BYTESIZE(ADDRESS).  Also, "RTMachine.StackFrameAlignment" should  
probably be 2*BYTESIZE(ADDRESS).


On 30 Oct 2008, at 21:21, Jay wrote:

>
> Please try this:
>
> http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
>
> std failed to build because stubgen crashed, probably due to gc.
> cm3 does crash right away without @M3nogc.
>
> Something like this:
>    cd /src
>    wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
>    cd /cm3
>    rm -rf *
>    tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- 
> d5.7.0.tar.bz2
>    cd /src/cm3/scripts/python
>    ./do-cm3-all.py realclean
>    ./upgrade.py
>    ./do-cm3-all.py realclean
>    ./do-cm3-std.py buildship
>    => it will fail, at zeus, but it should get far; you'll also need  
> some X devel packages to get that far, I had a failure for lack of  
> libXaw for example. I did not run anything, any of the GUI packages,  
> but building itself with itself is a decent test.
>
> I renamed the old AMD64_LINUX archives to "1.0.0".
> http://www.opencm3.com/uploaded-archives/
>
> This has the bug fix I commited last night to cm3cg, and therefore a  
> 64 bit hosted cm3cg.
>
> jay at amd64a:/cm3/bin$ file *
> AMD64_LINUX: ASCII text
> cm3:         ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs),  
> for GNU/Linux 2.6.0, not stripped
> cm3.cfg:     ASCII English text
> cm3cg:       ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Li
> nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux  
> 2.6.0, not stripped
> m3bundle:    ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Li
> nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux  
> 2.6.0, not stripped
> mklib:       ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs),  
> for GNU/Linux 2.6.0, not stripped
> Unix.common: ASCII English text
>
> Built on Debian 4.0r4 (r5 is out).
> jay at amd64a:/cm3/bin$ uname -a
> Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008  
> x86_64 GNU/Linux
> jay at amd64a:/cm3/bin$ dmesg | head
> Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)
> Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org 
> ) (
> gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP  
> Tue Aug 19 04:30:56 UTC 2008
>
> Though really I couldn't do it without Visual C++ on Windows  
> providing excellent find-in-files and editing, nothing else comes  
> close, I edit on Windows and scp the files over. :)
>
> - Jay
>
> ________________________________
>
> From: jay.krell at cornell.edu
> To: dragisha at m3w.org; m3devel at elegosoft.com
> Date: Tue, 9 Sep 2008 09:43:03 +0000
> Subject: Re: [M3devel] AMD64_LINUX status
>
>
>
>


From jay.krell at cornell.edu  Fri Oct 31 14:52:43 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 31 Oct 2008 13:52:43 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl> 
	<BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
Message-ID: <COL101-W42D125019F4B7531A26BA6E6200@phx.gbl>


Tony, Excellent, thanks, that helps.
How do you know and confirm the right values? I don't like guessing.
 
And then cause then of :) :
 
  SymbolPickling font metrics...Done./cm3/bin/m3bundle -name JunoBundle -F/tmp/qk/cm3/bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTABstubgen: Processing RemoteView.T
****** runtime error:***    NEW() was unable to allocate more memory.***    file "../src/runtime/common/RTAllocator.m3", line 285***
"/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit 1536: /cm3/bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTAB
--procedure--  -line-  -file---exec               --  <builtin>_v_netobj          37  /cm3/pkg/netobj/src/netobj.tmplnetobjv1           44  /cm3/pkg/netobj/src/netobj.tmplnetobj             64  /cm3/pkg/netobj/src/netobj.tmplinclude_dir        71  /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile                    8  /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args
 
 
I should debug it, and double check that I upgraded what had to be upgraded.
 
 - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Fri, 31 Oct 2008 10:19:51 +0000> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD64_LINUX status> > Umm, I think I found your bug with GC:> > Check out "RTMachine.PointerAlignment". You have it set to > BITSIZE(INTEGER). I suspect what you want is something like > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > probably be 2*BYTESIZE(ADDRESS).> > > > On 30 Oct 2008, at 21:21, Jay wrote:> > >> > Please try this:> >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> >> > std failed to build because stubgen crashed, probably due to gc.> > cm3 does crash right away without @M3nogc.> >> > Something like this:> > cd /src> > wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > cd /cm3> > rm -rf *> > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > d5.7.0.tar.bz2> > cd /src/cm3/scripts/python> > ./do-cm3-all.py realclean> > ./upgrade.py> > ./do-cm3-all.py realclean> > ./do-cm3-std.py buildship> > => it will fail, at zeus, but it should get far; you'll also need > > some X devel packages to get that far, I had a failure for lack of > > libXaw for example. I did not run anything, any of the GUI packages, > > but building itself with itself is a decent test.> >> > I renamed the old AMD64_LINUX archives to "1.0.0".> > http://www.opencm3.com/uploaded-archives/> >> > This has the bug fix I commited last night to cm3cg, and therefore a > > 64 bit hosted cm3cg.> >> > jay at amd64a:/cm3/bin$ file *> > AMD64_LINUX: ASCII text> > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > cm3.cfg: ASCII English text> > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > Unix.common: ASCII English text> >> > Built on Debian 4.0r4 (r5 is out).> > jay at amd64a:/cm3/bin$ uname -a> > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > x86_64 GNU/Linux> > jay at amd64a:/cm3/bin$ dmesg | head> > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org > > ) (> > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > Tue Aug 19 04:30:56 UTC 2008> >> > Though really I couldn't do it without Visual C++ on Windows > > providing excellent find-in-files and editing, nothing else comes > > close, I edit on Windows and scp the files over. :)> >> > - Jay> >> > ________________________________> >> > From: jay.krell at cornell.edu> > To: dragisha at m3w.org; m3devel at elegosoft.com> > Date: Tue, 9 Sep 2008 09:43:03 +0000> > Subject: Re: [M3devel] AMD64_LINUX status> >> >> >> >> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081031/dfecf655/attachment-0001.html>

From jay.krell at cornell.edu  Fri Oct 31 15:25:13 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 31 Oct 2008 14:25:13 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <1225462205.14482.60.camel@faramir.m3w.org>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
	<BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
	<COL101-W42D125019F4B7531A26BA6E6200@phx.gbl> 
	<1225462205.14482.60.camel@faramir.m3w.org>
Message-ID: <COL101-W728C265A8AF283199F0034E6200@phx.gbl>


It seems like there's still a problem. I haven't debugged it yet.
(I'm sure glad Tony found the other problem before I debugged it.)
I updated http://www.opencm3.com/uploaded-archives with Tony's fix.
The older builds are now 0.0.0.1 and 0.0.0.2.
 
 - Jay> Subject: Re: [M3devel] AMD64_LINUX status> From: dragisha at m3w.org> To: jay.krell at cornell.edu> CC: hosking at cs.purdue.edu; m3devel at elegosoft.com> Date: Fri, 31 Oct 2008 15:10:05 +0100> > So, we now have fully functional AMD64_LINUX (_with_ GC)?> > TIA> > On Fri, 2008-10-31 at 13:52 +0000, Jay wrote:> > Tony, Excellent, thanks, that helps.> > How do you know and confirm the right values? I don't like guessing.> > > > And then cause then of :) :> > > > Symbol> > Pickling font metrics...> > Done.> > /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk> > /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > stubgen: Processing RemoteView.T> > > > ***> > *** runtime error:> > *** NEW() was unable to allocate more memory.> > *** file "../src/runtime/common/RTAllocator.m3", line 285> > ***> > "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit> > 1536: /cm3> > /bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > --procedure-- -line- -file---> > exec -- <builtin>> > _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmpl> > netobjv1 44 /cm3/pkg/netobj/src/netobj.tmpl> > netobj 64 /cm3/pkg/netobj/src/netobj.tmpl> > include_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile> > > > 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args> > > > > > I should debug it, and double check that I upgraded what had to be> > upgraded.> > > > - Jay> > > > > > > > > From: hosking at cs.purdue.edu> > > To: jay.krell at cornell.edu> > > Date: Fri, 31 Oct 2008 10:19:51 +0000> > > CC: m3devel at elegosoft.com> > > Subject: Re: [M3devel] AMD64_LINUX status> > > > > > Umm, I think I found your bug with GC:> > > > > > Check out "RTMachine.PointerAlignment". You have it set to > > > BITSIZE(INTEGER). I suspect what you want is something like > > > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > > > probably be 2*BYTESIZE(ADDRESS).> > > > > > > > > > > > On 30 Oct 2008, at 21:21, Jay wrote:> > > > > > >> > > > Please try this:> > > >> > > >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > >> > > > std failed to build because stubgen crashed, probably due to gc.> > > > cm3 does crash right away without @M3nogc.> > > >> > > > Something like this:> > > > cd /src> > > > wget> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > > cd /cm3> > > > rm -rf *> > > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > > > d5.7.0.tar.bz2> > > > cd /src/cm3/scripts/python> > > > ./do-cm3-all.py realclean> > > > ./upgrade.py> > > > ./do-cm3-all.py realclean> > > > ./do-cm3-std.py buildship> > > > => it will fail, at zeus, but it should get far; you'll also need > > > > some X devel packages to get that far, I had a failure for lack> > of > > > > libXaw for example. I did not run anything, any of the GUI> > packages, > > > > but building itself with itself is a decent test.> > > >> > > > I renamed the old AMD64_LINUX archives to "1.0.0".> > > > http://www.opencm3.com/uploaded-archives/> > > >> > > > This has the bug fix I commited last night to cm3cg, and therefore> > a > > > > 64 bit hosted cm3cg.> > > >> > > > jay at amd64a:/cm3/bin$ file *> > > > AMD64_LINUX: ASCII text> > > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > cm3.cfg: ASCII English text> > > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > Unix.common: ASCII English text> > > >> > > > Built on Debian 4.0r4 (r5 is out).> > > > jay at amd64a:/cm3/bin$ uname -a> > > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > > > x86_64 GNU/Linux> > > > jay at amd64a:/cm3/bin$ dmesg | head> > > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2)> > (dannf at debian.org > > > > ) (> > > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > > > Tue Aug 19 04:30:56 UTC 2008> > > >> > > > Though really I couldn't do it without Visual C++ on Windows > > > > providing excellent find-in-files and editing, nothing else comes > > > > close, I edit on Windows and scp the files over. :)> > > >> > > > - Jay> > > >> > > > ________________________________> > > >> > > > From: jay.krell at cornell.edu> > > > To: dragisha at m3w.org; m3devel at elegosoft.com> > > > Date: Tue, 9 Sep 2008 09:43:03 +0000> > > > Subject: Re: [M3devel] AMD64_LINUX status> > > >> > > >> > > >> > > >> > > > > > -- > Dragi?a Duri? <dragisha at m3w.org>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081031/8799c470/attachment-0001.html>

From dragisha at m3w.org  Fri Oct 31 15:10:05 2008
From: dragisha at m3w.org (=?UTF-8?Q?Dragi=C5=A1a_Duri=C4=87?=)
Date: Fri, 31 Oct 2008 15:10:05 +0100
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <COL101-W42D125019F4B7531A26BA6E6200@phx.gbl>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
	<BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
	<COL101-W42D125019F4B7531A26BA6E6200@phx.gbl>
Message-ID: <1225462205.14482.60.camel@faramir.m3w.org>

So, we now have fully functional AMD64_LINUX (_with_ GC)?

TIA

On Fri, 2008-10-31 at 13:52 +0000, Jay wrote:
> Tony, Excellent, thanks, that helps.
> How do you know and confirm the right values? I don't like guessing.
>  
> And then cause then of :) :
>  
>   Symbol
> Pickling font metrics...
> Done.
> /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk
> /cm3/bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTAB
> stubgen: Processing RemoteView.T
> 
> ***
> *** runtime error:
> ***    NEW() was unable to allocate more memory.
> ***    file "../src/runtime/common/RTAllocator.m3", line 285
> ***
> "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit
> 1536: /cm3
> /bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTAB
> --procedure--  -line-  -file---
> exec               --  <builtin>
> _v_netobj          37  /cm3/pkg/netobj/src/netobj.tmpl
> netobjv1           44  /cm3/pkg/netobj/src/netobj.tmpl
> netobj             64  /cm3/pkg/netobj/src/netobj.tmpl
> include_dir        71  /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile
> 
> 8  /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args
>  
>  
> I should debug it, and double check that I upgraded what had to be
> upgraded.
>  
>  - Jay
> 
> 
> 
> > From: hosking at cs.purdue.edu
> > To: jay.krell at cornell.edu
> > Date: Fri, 31 Oct 2008 10:19:51 +0000
> > CC: m3devel at elegosoft.com
> > Subject: Re: [M3devel] AMD64_LINUX status
> > 
> > Umm, I think I found your bug with GC:
> > 
> > Check out "RTMachine.PointerAlignment". You have it set to 
> > BITSIZE(INTEGER). I suspect what you want is something like 
> > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should 
> > probably be 2*BYTESIZE(ADDRESS).
> > 
> > 
> > 
> > On 30 Oct 2008, at 21:21, Jay wrote:
> > 
> > >
> > > Please try this:
> > >
> > >
> http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
> > >
> > > std failed to build because stubgen crashed, probably due to gc.
> > > cm3 does crash right away without @M3nogc.
> > >
> > > Something like this:
> > > cd /src
> > > wget
> http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
> > > cd /cm3
> > > rm -rf *
> > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- 
> > > d5.7.0.tar.bz2
> > > cd /src/cm3/scripts/python
> > > ./do-cm3-all.py realclean
> > > ./upgrade.py
> > > ./do-cm3-all.py realclean
> > > ./do-cm3-std.py buildship
> > > => it will fail, at zeus, but it should get far; you'll also need 
> > > some X devel packages to get that far, I had a failure for lack
> of 
> > > libXaw for example. I did not run anything, any of the GUI
> packages, 
> > > but building itself with itself is a decent test.
> > >
> > > I renamed the old AMD64_LINUX archives to "1.0.0".
> > > http://www.opencm3.com/uploaded-archives/
> > >
> > > This has the bug fix I commited last night to cm3cg, and therefore
> a 
> > > 64 bit hosted cm3cg.
> > >
> > > jay at amd64a:/cm3/bin$ file *
> > > AMD64_LINUX: ASCII text
> > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared
> libs), 
> > > for GNU/Linux 2.6.0, not stripped
> > > cm3.cfg: ASCII English text
> > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Li
> > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 
> > > 2.6.0, not stripped
> > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Li
> > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 
> > > 2.6.0, not stripped
> > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared
> libs), 
> > > for GNU/Linux 2.6.0, not stripped
> > > Unix.common: ASCII English text
> > >
> > > Built on Debian 4.0r4 (r5 is out).
> > > jay at amd64a:/cm3/bin$ uname -a
> > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 
> > > x86_64 GNU/Linux
> > > jay at amd64a:/cm3/bin$ dmesg | head
> > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)
> > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2)
> (dannf at debian.org 
> > > ) (
> > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP 
> > > Tue Aug 19 04:30:56 UTC 2008
> > >
> > > Though really I couldn't do it without Visual C++ on Windows 
> > > providing excellent find-in-files and editing, nothing else comes 
> > > close, I edit on Windows and scp the files over. :)
> > >
> > > - Jay
> > >
> > > ________________________________
> > >
> > > From: jay.krell at cornell.edu
> > > To: dragisha at m3w.org; m3devel at elegosoft.com
> > > Date: Tue, 9 Sep 2008 09:43:03 +0000
> > > Subject: Re: [M3devel] AMD64_LINUX status
> > >
> > >
> > >
> > >
> > 
> 
-- 
Dragi?a Duri? <dragisha at m3w.org>


From jay.krell at cornell.edu  Wed Oct  1 01:24:14 2008
From: jay.krell at cornell.edu (Jay)
Date: Tue, 30 Sep 2008 23:24:14 +0000
Subject: [M3devel] ARM Darwin
In-Reply-To: <7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org>
References: <F29CC4D9-0043-48B9-84F1-93E9F3336D40@darko.org>
	<5ED8E753-6B9E-4FED-8689-1D3D317A5A36@cs.purdue.edu> 
	<7F80509C-337F-46E7-93FB-D34AA7F8B4DF@darko.org>
Message-ID: <COL101-W3460EC073E17115925F24CE6430@phx.gbl>


Get me a machine and I'll work on it. :)
I'll get one before long but I'm bogged down with existing x86, AMD64, PPC, PPC64 (AIX), Mips (Irix) hardware not yet being used for all its meant..

I suspect Apple hasn't pushed their changes up, so be sure to poke around their gcc source.

> Apple are building their own ARM GCC and use that to configure the
> back end. Then the runtime issues which I imagine might be with the GC

gcc -v ?

> and threading. I'm not sure there will be any native treading and I'm
> sure VM will look very different.

I assume it'll look like most any Posix or *_DARWIN or 32bit thereof system.
I assume it has pthreads.

 - Jay


> From: darko at darko.org
> To: hosking at cs.purdue.edu
> Date: Tue, 30 Sep 2008 14:59:39 +0200
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] ARM Darwin
>
> Thanks, it should be a bit easier than the normal process since the
> compiler doesn't have to be fully bootstrapped, I just have to get a
> cross working. I know the first thing is to get the machine
> configuration correct, which I'll start when I get my hands on one of
> the machines in a couple of days. The other thing is to work out how
> Apple are building their own ARM GCC and use that to configure the
> back end. Then the runtime issues which I imagine might be with the GC
> and threading. I'm not sure there will be any native treading and I'm
> sure VM will look very different.
>
>
> On 30/09/2008, at 2:44 PM, Tony Hosking wrote:
>
>> I can share tips...
>>
>> On Sep 30, 2008, at 1:41 PM, Darko wrote:
>>
>>> Is anyone interested in working on an ARM port for Darwin? Or maybe
>>> just providing some tips as I give it a try?
>>>
>>> Cheers,
>>> Darko.
>>
>


From jay.krell at cornell.edu  Wed Oct  1 08:41:03 2008
From: jay.krell at cornell.edu (Jay)
Date: Wed, 1 Oct 2008 06:41:03 +0000
Subject: [M3devel] AMD-64 binaries?
In-Reply-To: <30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu>
References: <48BDF24B.900@wichita.edu>
	<20080903075804.zhep2ichmow00scg@mail.elegosoft.com>
	<COL101-W839FDBE447569C4D9BACC6E6430@phx.gbl> 
	<30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu>
Message-ID: <COL101-W281BD9E78E32E04348F400E6420@phx.gbl>


No -- you would know best about AMD64_DARWIN.
I'm sure ALPHA_OSF used to work, but it's been so long, I don't think it counts.
 
I'm being lazy.
 
file AMD64_DARWIN/cm3cg
 => fat binary? I doubt it. 
 => with ppc, i386, amd64? (doubt it) 
 => or just ppc, i386?  (doubt it) 
 => or just i386? This is I "suspect".  
 => or just AMD64. This would be somewhat interesting. 
 
I'm pretty sure cm3cg is always 32bit "these days".
I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in the same way.
This was a nice thing to find, that the problem is portable to multiple?all 64 bit hosts.
 
I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same problem.
 
Anyway, I should really get to debugging this soon.
 
It's a bit odd because gcc itself doesn't have this bug and I reviewed a lot of the code and it was ok. I'm just going to have to step through it in parallel on 32bit and 64bit hosts and find where they diverge. A LOT was identical, like the files output by cm3 into cm3cg were identical.
I was close a few months ago but sloughed off.
 
 - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Tue, 30 Sep 2008 10:16:41 +0100> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD-64 binaries?> > 64-bit hosted tools? Do you mean only for Linux? I don't quite > understand what you are saying.> > On Sep 30, 2008, at 9:36 AM, Jay wrote:> > >> > I'm getting back to this now.> > I didn't realize it till this weekend, but that archive is > > "relatively incompatible".> > In particular it has 32bit hosted tools, and won't run on Debian > > 4.0r4 / AMD64.> > Something about glibc 2.4, when all I see on my system is 2.3.> > I'll see what I can do.> > Probably just rebuild cm3cg.> > I think it was built on Fedora, but could have been Ubuntu or > > OpenSuse.> > Probably just that Debian stable lags the others.> >> > The main problem to debug is why 64bit hosted tools "never" work.> > (Right?)> >> >> > Stay tuned for a bunch more ports "soon", I've got a bunch more > > hardware,> > that runs Linux and others (Solaris, AIX, Irix).. :)> >> > I'll be able to debug the high dpi gui problems on a friend's laptop > > soon too.> > Send me a repro. I expect it is trivial -- like anything with a > > scrollbar.> > I can try formsedit, etc.> >> >> > - Jay> >> >> >> Date: Wed, 3 Sep 2008 07:58:04 +0200> >> From: wagner at elegosoft.com> >> To: m3devel at elegosoft.com> >> Subject: Re: [M3devel] AMD-64 binaries?> >>> >> Quoting "Rodney M. Bates" :> >>> >>> Are there binaries for AMD-64 around that can be used> >>> to bootstrap a 64-bit Linux compiler?> >>> >> Have a look at> >>> >> http://www.opencm3.net/uploaded-archives/index.html> >>> >> There are some AMD64 archives; I don't know about their status> >> offhand, though. I think Jay Krell produced them.> >> AFAIK there is no regular build on this platform yet.> >>> >> Olaf> >> --> >> Olaf Wagner -- elego Software Solutions GmbH> >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany> >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 > >> 45 86 95> >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: > >> Berlin> >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: > >> DE163214194> >>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081001/5f46def2/attachment-0002.html>

From jay.krell at cornell.edu  Wed Oct  1 09:02:29 2008
From: jay.krell at cornell.edu (Jay)
Date: Wed, 1 Oct 2008 07:02:29 +0000
Subject: [M3devel] m3cc build fails on older MacOS X
In-Reply-To: <5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org>
References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com>
	<COL101-W243B2B91162A39B280C4AFE6430@phx.gbl>
	<CEDFF837-1CFA-4C43-B287-D480AE19B889@cs.purdue.edu> 
	<5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org>
Message-ID: <COL101-W62EB6B61A3107DDC347580E6420@phx.gbl>


well, I agree and disagree.

"Almost everyone" only cares about C++, C#, Windows, and a little bit of Linux and Java.
"Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc.

Supporting 10.2 and 10.3 "ought not" be so difficult, but ok.

I wiped out the install and won't likely come back to it until
a bunch of other things are done.
e.g.:
 debug 64 bit hosted cm3cg 
 move PPC_LINUX to pthreads 
 high dpi 
 bring up or backup a bunch of targets I have hardware for,
  and some others I don't have yet.

Adding back support for NT4/Win9x probably not hard, though
 similar with gcc on Mac, the current Microsoft tools no longer
 target them.

It all gets easier with virtualization..
(Which is easiest on x86/amd64.)

 - Jay


> From: darko at darko.org
> To: hosking at cs.purdue.edu
> Date: Tue, 30 Sep 2008 11:50:43 +0200
> CC: m3devel at elegosoft.com; jay.krell at cornell.edu
> Subject: Re: [M3devel] m3cc build fails on older MacOS X
>
> I think supporting the latest version is enough work. I don't see the
> point of supporting older releases. Also, this seems to be relevant to
> development on that version of the system. Anyone who wants to build
> can upgrade.
>
>
> On 30/09/2008, at 11:15 AM, Tony Hosking wrote:
>
>> Does anyone really care about 10.3 now? As I recall, it had some
>> pretty broken assumptions.
>>
>> On Sep 30, 2008, at 9:25 AM, Jay wrote:
>>
>>>
>>> I have a machine running 10.3 now.
>>>
>>> gcc-4.3.2 (the current release) won't (toplevel) configure on
>>> MacOSX 10.3 apparently because its assembler doesn't support
>>> ".machine".
>>> Current "cctools" won't compile on 10.3 without patches or other
>>> updates, due to mucking with ppc64 stuff, though that is easy to fix.
>>>
>>> A simple wrapper around as for use on 10.3 that strips the .machine
>>> directive is probably reasonable, or a patch to gcc to just not
>>> emit it for Darwin, except maybe for non-ppc, or subject to a switch.
>>>
>>> Other than support for more architectures, I never found any of the
>>> updates beyond 10.2 very interesting.
>>> Though current Firefox and Safari also won't run on 10.3.
>>>
>>> IF I get this working, maybe I'll bring 10.2 back up also..
>>>
>>> - Jay
>>>
>>> ________________________________
>>>
>>> From: jayk123 at hotmail.com
>>> To: wagner at elegosoft.com; m3devel at elegosoft.com
>>> Subject: RE: [M3devel] m3cc build fails on older MacOS X
>>> Date: Tue, 6 May 2008 10:49:11 +0000
>>>
>>>
>>>
>>>
>>> I don't know what these Darwin versions are.
>>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5?
>>> I used to run 10.2 and could perhaps bring it back (though I'd hate
>>> to lose my PPC_LINUX install.. :( )
>>>
>>>> make[2]: Nothing to be done for `all'.
>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>> `patsubst'. Stop.
>>>
>>> Hopefully that's enough context though.
>>>
>>> The rest is a cascade.
>>> What happens if you remove all my m3makefile wierdness (which works
>>> everywhere else..) and just configure and make?
>>>
>>> Can I ssh into this?
>>>
>>> - Jay
>>>
>>>
>>>
>>> ________________________________
>>>
>>>
>>>> Date: Tue, 6 May 2008 07:57:54 +0200
>>>> From: wagner at elegosoft.com
>>>> To: m3devel at elegosoft.com
>>>> Subject: [M3devel] m3cc build fails on older MacOS X
>>>>
>>>> On % uname -a
>>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30
>>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power
>>>> Macintosh powerpc:
>>>>
>>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o
>>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o
>>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./fnmatch.o
>>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./getruntime.o
>>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o
>>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./obstack.o
>>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o
>>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o
>>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o
>>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o
>>>> ./xstrndup.o> required-list
>>>> make[2]: Nothing to be done for `all'.
>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>> `patsubst'. Stop.
>>>> make: *** [all-libcpp] Error 2
>>>> /bin/sh: line 1: cd: gcc: No such file or directory
>>>> make: *** No rule to make target `s-modes'. Stop.
>>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314: quake
>>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2
>>>>
>>>> --procedure-- -line- -file---
>>>> cp_if --
>>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>> 9
>>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args
>>>>
>>>> Fatal Error: package build failed
>>>> ==> m3-sys/m3cc done
>>>>
>>>> Any ideas?
>>>>
>>>> Olaf
>>>> --
>>>> Olaf Wagner -- elego Software Solutions GmbH
>>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany
>>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23
>>>> 45 86 95
>>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz:
>>>> Berlin
>>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr:
>>>> DE163214194
>>>>
>>>
>>
>


From darko at darko.org  Wed Oct  1 09:10:35 2008
From: darko at darko.org (Darko)
Date: Wed, 1 Oct 2008 09:10:35 +0200
Subject: [M3devel] m3cc build fails on older MacOS X
In-Reply-To: <COL101-W62EB6B61A3107DDC347580E6420@phx.gbl>
References: <20080506075754.o24j7xhx4wgokwwo@mail.elegosoft.com>
	<COL101-W243B2B91162A39B280C4AFE6430@phx.gbl>
	<CEDFF837-1CFA-4C43-B287-D480AE19B889@cs.purdue.edu>
	<5302F72A-11E4-4EC0-BD6C-53816834C1A6@darko.org>
	<COL101-W62EB6B61A3107DDC347580E6420@phx.gbl>
Message-ID: <973F196C-4B4A-4526-878C-93942E48E72A@darko.org>

Why bother with it if no one uses it and no-one is going to use it?  
Supporting M3 on Macs is good because people will use it into the  
future. People aren't moving back to 10.3. I wouldn't bother with it  
at all.

On 01/10/2008, at 9:02 AM, Jay wrote:

>
> well, I agree and disagree.
>
> "Almost everyone" only cares about C++, C#, Windows, and a little  
> bit of Linux and Java.
> "Almost nobody" cares about Modula-3, Mac, PowerPC, Unix, Linux, etc.
>
> Supporting 10.2 and 10.3 "ought not" be so difficult, but ok.
>
> I wiped out the install and won't likely come back to it until
> a bunch of other things are done.
> e.g.:
> debug 64 bit hosted cm3cg
> move PPC_LINUX to pthreads
> high dpi
> bring up or backup a bunch of targets I have hardware for,
>  and some others I don't have yet.
>
> Adding back support for NT4/Win9x probably not hard, though
> similar with gcc on Mac, the current Microsoft tools no longer
> target them.
>
> It all gets easier with virtualization..
> (Which is easiest on x86/amd64.)
>
> - Jay
>
>
>
>> From: darko at darko.org
>> To: hosking at cs.purdue.edu
>> Date: Tue, 30 Sep 2008 11:50:43 +0200
>> CC: m3devel at elegosoft.com; jay.krell at cornell.edu
>> Subject: Re: [M3devel] m3cc build fails on older MacOS X
>>
>> I think supporting the latest version is enough work. I don't see the
>> point of supporting older releases. Also, this seems to be relevant  
>> to
>> development on that version of the system. Anyone who wants to build
>> can upgrade.
>>
>>
>> On 30/09/2008, at 11:15 AM, Tony Hosking wrote:
>>
>>> Does anyone really care about 10.3 now? As I recall, it had some
>>> pretty broken assumptions.
>>>
>>> On Sep 30, 2008, at 9:25 AM, Jay wrote:
>>>
>>>>
>>>> I have a machine running 10.3 now.
>>>>
>>>> gcc-4.3.2 (the current release) won't (toplevel) configure on
>>>> MacOSX 10.3 apparently because its assembler doesn't support
>>>> ".machine".
>>>> Current "cctools" won't compile on 10.3 without patches or other
>>>> updates, due to mucking with ppc64 stuff, though that is easy to  
>>>> fix.
>>>>
>>>> A simple wrapper around as for use on 10.3 that strips the .machine
>>>> directive is probably reasonable, or a patch to gcc to just not
>>>> emit it for Darwin, except maybe for non-ppc, or subject to a  
>>>> switch.
>>>>
>>>> Other than support for more architectures, I never found any of the
>>>> updates beyond 10.2 very interesting.
>>>> Though current Firefox and Safari also won't run on 10.3.
>>>>
>>>> IF I get this working, maybe I'll bring 10.2 back up also..
>>>>
>>>> - Jay
>>>>
>>>> ________________________________
>>>>
>>>> From: jayk123 at hotmail.com
>>>> To: wagner at elegosoft.com; m3devel at elegosoft.com
>>>> Subject: RE: [M3devel] m3cc build fails on older MacOS X
>>>> Date: Tue, 6 May 2008 10:49:11 +0000
>>>>
>>>>
>>>>
>>>>
>>>> I don't know what these Darwin versions are.
>>>> Mac OSX 10.0? 10.1? 10.2? 10.3? 10.4? 10.5?
>>>> I used to run 10.2 and could perhaps bring it back (though I'd hate
>>>> to lose my PPC_LINUX install.. :( )
>>>>
>>>>> make[2]: Nothing to be done for `all'.
>>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>>> `patsubst'. Stop.
>>>>
>>>> Hopefully that's enough context though.
>>>>
>>>> The rest is a cascade.
>>>> What happens if you remove all my m3makefile wierdness (which works
>>>> everywhere else..) and just configure and make?
>>>>
>>>> Can I ssh into this?
>>>>
>>>> - Jay
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>> Date: Tue, 6 May 2008 07:57:54 +0200
>>>>> From: wagner at elegosoft.com
>>>>> To: m3devel at elegosoft.com
>>>>> Subject: [M3devel] m3cc build fails on older MacOS X
>>>>>
>>>>> On % uname -a
>>>>> Darwin apple.local 7.9.0 Darwin Kernel Version 7.9.0: Wed Mar 30
>>>>> 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC Power
>>>>> Macintosh powerpc:
>>>>>
>>>>> echo ./regex.o ./cplus-dem.o ./cp-demangle.o ./md5.o ./alloca.o
>>>>> ./argv.o ./choose-temp.o ./concat.o ./cp-demint.o ./dyn-string.o
>>>>> ./fdmatch.o ./fibheap.o ./filename_cmp.o ./floatformat.o ./ 
>>>>> fnmatch.o
>>>>> ./fopen_unlocked.o ./getopt.o ./getopt1.o ./getpwd.o ./ 
>>>>> getruntime.o
>>>>> ./hashtab.o ./hex.o ./lbasename.o ./lrealpath.o
>>>>> ./make-relative-prefix.o ./make-temp-file.o ./objalloc.o ./ 
>>>>> obstack.o
>>>>> ./partition.o ./pexecute.o ./physmem.o ./pex-common.o ./pex-one.o
>>>>> ./pex-unix.o ./safe-ctype.o ./sort.o ./spaces.o ./splay-tree.o
>>>>> ./strerror.o ./strsignal.o ./unlink-if-ordinary.o ./xatexit.o
>>>>> ./xexit.o ./xmalloc.o ./xmemdup.o ./xstrdup.o ./xstrerror.o
>>>>> ./xstrndup.o> required-list
>>>>> make[2]: Nothing to be done for `all'.
>>>>> Makefile:191: *** Insufficient number of arguments (2) to function
>>>>> `patsubst'. Stop.
>>>>> make: *** [all-libcpp] Error 2
>>>>> /bin/sh: line 1: cd: gcc: No such file or directory
>>>>> make: *** No rule to make target `s-modes'. Stop.
>>>>> "/Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile", line 314:  
>>>>> quake
>>>>> runtime error: unable to copy "./gcc/m3cgc1" to "./cm3cg": errno=2
>>>>>
>>>>> --procedure-- -line- -file---
>>>>> cp_if --
>>>>> postcp 314 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>>> include_dir 360 /Users/wagner/work/cm3/m3-sys/m3cc/src/m3makefile
>>>>> 9
>>>>> /Users/wagner/work/cm3/m3-sys/m3cc/PPC_DARWIN/m3make.args
>>>>>
>>>>> Fatal Error: package build failed
>>>>> ==> m3-sys/m3cc done
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Olaf
>>>>> --
>>>>> Olaf Wagner -- elego Software Solutions GmbH
>>>>> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany
>>>>> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23
>>>>> 45 86 95
>>>>> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz:
>>>>> Berlin
>>>>> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr:
>>>>> DE163214194
>>>>>
>>>>
>>>
>>


From darko at darko.org  Wed Oct  1 12:03:15 2008
From: darko at darko.org (Darko)
Date: Wed, 1 Oct 2008 12:03:15 +0200
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: <DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu>
	<DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
Message-ID: <B971C9C9-251C-4F79-A12F-622F47883781@darko.org>

I've extended one of the modules with a function that formats any  
allocated value for printing. If you're interested I can clean them up  
a little and post them.


On 28/09/2008, at 8:01 AM, Darko wrote:

> As far as I know, yes, they're not in the binary. I'd love to be  
> proven wrong though, or fix it so they did. I have a module that  
> reads the .M3WEB file and maps it to types and a module that will  
> read and write any field within a type safely using a numeric index.  
> Neither is perfect. You can integrate the two to get what you want  
> but I seem to remember having some problems mapping type ids (UIDs?)  
> to typecodes at runtime.
>
>
> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>
>> Right, I am aware of those interfaces.. just wondering what was
>> out there.  Do I really need to look at .M3WEB?  I thought
>> that m3gdb could figure out things without anything outside
>> of the binary...
>>
>> I'm looking for essentially what m3gdb offers, say prints
>> at minimum the name of the type (this I recall is trivial with
>> some of the RT* interfaces) but hopefully also with field names
>> and values, but doesn't expand references recursively.. something
>> like that?
>>
>>   Mika
>>
>> Darko writes:
>>> You can use RTTipe to read the fields and values within a type. If  
>>> you
>>> also want the type and field names you can interpret the .M3WEB  
>>> file.
>>> I have a couple of modules that do something like that but they are
>>> not what you would call finished. What level of detail are you  
>>> after?
>>>
>>>
>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> I am working on a writing an interpreter that I'd like to embed in
>>>> various Modula-3 programs.  It so happens that this interpreter
>>>> might from time to time be manipulating arbitrary M3 REFs, and just
>>>> from the point of view of providing information to a human user,
>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>> have any code that accomplishes this, at least partly?  I'm  
>>>> thinking
>>>> that since m3gdb can do it, the information must all be in the
>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>> pickler can pickle things... hmm.
>>>>
>>>> I would greatly appreciate any guidance that's out there...
>>>>
>>>>  Best regards,
>>>>     Mika Nystrom
>


From hosking at cs.purdue.edu  Wed Oct  1 11:59:23 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Wed, 1 Oct 2008 10:59:23 +0100
Subject: [M3devel] AMD-64 binaries?
In-Reply-To: <COL101-W281BD9E78E32E04348F400E6420@phx.gbl>
References: <48BDF24B.900@wichita.edu>
	<20080903075804.zhep2ichmow00scg@mail.elegosoft.com>
	<COL101-W839FDBE447569C4D9BACC6E6430@phx.gbl>
	<30A598AF-F712-4284-A776-6C14C1B69606@cs.purdue.edu>
	<COL101-W281BD9E78E32E04348F400E6420@phx.gbl>
Message-ID: <26766FFA-C3B6-45D2-8156-80FD14922882@cs.purdue.edu>

I can definitely vouch for ALPHA_OSF having worked as recently as two  
years ago, but without the pthreads native threading system.  That  
port should have been easy enough I suspect.

On Oct 1, 2008, at 7:41 AM, Jay wrote:

> No -- you would know best about AMD64_DARWIN.
> I'm sure ALPHA_OSF used to work, but it's been so long, I don't  
> think it counts.
>
> I'm being lazy.
>
> file AMD64_DARWIN/cm3cg
>  => fat binary? I doubt it.
>  => with ppc, i386, amd64? (doubt it)
>  => or just ppc, i386?  (doubt it)
>  => or just i386? This is I "suspect".
>  => or just AMD64. This would be somewhat interesting.

I believe that is how I configured it.

> I'm pretty sure cm3cg is always 32bit "these days".

Nope, cm3cg on AMD64_DARWIN is 64-bit.

> I've tried SPARC64_OPENBSD and AMD64_LINUX and they both failed in  
> the same way.
> This was a nice thing to find, that the problem is portable to  
> multiple?all 64 bit hosts.
>
> I'm ASSUMING but trying to confirm that AMD64_DARWIN has the same  
> problem.

Don't thinks so.

> Anyway, I should really get to debugging this soon.
>
> It's a bit odd because gcc itself doesn't have this bug and I  
> reviewed a lot of the code and it was ok. I'm just going to have to  
> step through it in parallel on 32bit and 64bit hosts and find where  
> they diverge. A LOT was identical, like the files output by cm3 into  
> cm3cg were identical.

Yes, the intermediate code should be identical.  Any such problems  
would be with cm3cg.

> I was close a few months ago but sloughed off.

Good luck.

>
>
>  - Jay
>
>
> > From: hosking at cs.purdue.edu
> > To: jay.krell at cornell.edu
> > Date: Tue, 30 Sep 2008 10:16:41 +0100
> > CC: m3devel at elegosoft.com
> > Subject: Re: [M3devel] AMD-64 binaries?
> >
> > 64-bit hosted tools? Do you mean only for Linux? I don't quite
> > understand what you are saying.
> >
> > On Sep 30, 2008, at 9:36 AM, Jay wrote:
> >
> > >
> > > I'm getting back to this now.
> > > I didn't realize it till this weekend, but that archive is
> > > "relatively incompatible".
> > > In particular it has 32bit hosted tools, and won't run on Debian
> > > 4.0r4 / AMD64.
> > > Something about glibc 2.4, when all I see on my system is 2.3.
> > > I'll see what I can do.
> > > Probably just rebuild cm3cg.
> > > I think it was built on Fedora, but could have been Ubuntu or
> > > OpenSuse.
> > > Probably just that Debian stable lags the others.
> > >
> > > The main problem to debug is why 64bit hosted tools "never" work.
> > > (Right?)
> > >
> > >
> > > Stay tuned for a bunch more ports "soon", I've got a bunch more
> > > hardware,
> > > that runs Linux and others (Solaris, AIX, Irix).. :)
> > >
> > > I'll be able to debug the high dpi gui problems on a friend's  
> laptop
> > > soon too.
> > > Send me a repro. I expect it is trivial -- like anything with a
> > > scrollbar.
> > > I can try formsedit, etc.
> > >
> > >
> > > - Jay
> > >
> > >
> > >> Date: Wed, 3 Sep 2008 07:58:04 +0200
> > >> From: wagner at elegosoft.com
> > >> To: m3devel at elegosoft.com
> > >> Subject: Re: [M3devel] AMD-64 binaries?
> > >>
> > >> Quoting "Rodney M. Bates" :
> > >>
> > >>> Are there binaries for AMD-64 around that can be used
> > >>> to bootstrap a 64-bit Linux compiler?
> > >>
> > >> Have a look at
> > >>
> > >> http://www.opencm3.net/uploaded-archives/index.html
> > >>
> > >> There are some AMD64 archives; I don't know about their status
> > >> offhand, though. I think Jay Krell produced them.
> > >> AFAIK there is no regular build on this platform yet.
> > >>
> > >> Olaf
> > >> --
> > >> Olaf Wagner -- elego Software Solutions GmbH
> > >> Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany
> > >> phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23
> > >> 45 86 95
> > >> http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz:
> > >> Berlin
> > >> Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr:
> > >> DE163214194
> > >>
> >
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081001/d38ae06a/attachment-0002.html>

From hosking at cs.purdue.edu  Wed Oct  1 12:07:00 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Wed, 1 Oct 2008 11:07:00 +0100
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: <B971C9C9-251C-4F79-A12F-622F47883781@darko.org>
References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu>
	<DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
	<B971C9C9-251C-4F79-A12F-622F47883781@darko.org>
Message-ID: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu>

m3gdb makes use of stabs debug information spat out by the backend.   
They are only in the binary if compiled -g.  There are other ways to  
get what you are after, as Darko has observed.

On Oct 1, 2008, at 11:03 AM, Darko wrote:

> I've extended one of the modules with a function that formats any  
> allocated value for printing. If you're interested I can clean them  
> up a little and post them.
>
>
> On 28/09/2008, at 8:01 AM, Darko wrote:
>
>> As far as I know, yes, they're not in the binary. I'd love to be  
>> proven wrong though, or fix it so they did. I have a module that  
>> reads the .M3WEB file and maps it to types and a module that will  
>> read and write any field within a type safely using a numeric  
>> index. Neither is perfect. You can integrate the two to get what  
>> you want but I seem to remember having some problems mapping type  
>> ids (UIDs?) to typecodes at runtime.
>>
>>
>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>
>>> Right, I am aware of those interfaces.. just wondering what was
>>> out there.  Do I really need to look at .M3WEB?  I thought
>>> that m3gdb could figure out things without anything outside
>>> of the binary...
>>>
>>> I'm looking for essentially what m3gdb offers, say prints
>>> at minimum the name of the type (this I recall is trivial with
>>> some of the RT* interfaces) but hopefully also with field names
>>> and values, but doesn't expand references recursively.. something
>>> like that?
>>>
>>>  Mika
>>>
>>> Darko writes:
>>>> You can use RTTipe to read the fields and values within a type.  
>>>> If you
>>>> also want the type and field names you can interpret the .M3WEB  
>>>> file.
>>>> I have a couple of modules that do something like that but they are
>>>> not what you would call finished. What level of detail are you  
>>>> after?
>>>>
>>>>
>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> I am working on a writing an interpreter that I'd like to embed in
>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>> might from time to time be manipulating arbitrary M3 REFs, and  
>>>>> just
>>>>> from the point of view of providing information to a human user,
>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>> thinking
>>>>> that since m3gdb can do it, the information must all be in the
>>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>>> pickler can pickle things... hmm.
>>>>>
>>>>> I would greatly appreciate any guidance that's out there...
>>>>>
>>>>> Best regards,
>>>>>    Mika Nystrom
>>


From darko at darko.org  Wed Oct  1 12:35:09 2008
From: darko at darko.org (Darko)
Date: Wed, 1 Oct 2008 12:35:09 +0200
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: <2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu>
References: <200809280549.m8S5nwbx069465@camembert.async.caltech.edu>
	<DA74F6CA-4488-42C5-BED2-CFE8162469F0@darko.org>
	<B971C9C9-251C-4F79-A12F-622F47883781@darko.org>
	<2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu>
Message-ID: <B26C3B35-ADAA-4289-8006-F32D5CCCA407@darko.org>

Here's some info on the stabs format: http://www.cs.utah.edu/dept/old/texinfo/gdb/stabs_toc.html


On 01/10/2008, at 12:07 PM, Tony Hosking wrote:

> m3gdb makes use of stabs debug information spat out by the backend.   
> They are only in the binary if compiled -g.  There are other ways to  
> get what you are after, as Darko has observed.
>
> On Oct 1, 2008, at 11:03 AM, Darko wrote:
>
>> I've extended one of the modules with a function that formats any  
>> allocated value for printing. If you're interested I can clean them  
>> up a little and post them.
>>
>>
>> On 28/09/2008, at 8:01 AM, Darko wrote:
>>
>>> As far as I know, yes, they're not in the binary. I'd love to be  
>>> proven wrong though, or fix it so they did. I have a module that  
>>> reads the .M3WEB file and maps it to types and a module that will  
>>> read and write any field within a type safely using a numeric  
>>> index. Neither is perfect. You can integrate the two to get what  
>>> you want but I seem to remember having some problems mapping type  
>>> ids (UIDs?) to typecodes at runtime.
>>>
>>>
>>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>>
>>>> Right, I am aware of those interfaces.. just wondering what was
>>>> out there.  Do I really need to look at .M3WEB?  I thought
>>>> that m3gdb could figure out things without anything outside
>>>> of the binary...
>>>>
>>>> I'm looking for essentially what m3gdb offers, say prints
>>>> at minimum the name of the type (this I recall is trivial with
>>>> some of the RT* interfaces) but hopefully also with field names
>>>> and values, but doesn't expand references recursively.. something
>>>> like that?
>>>>
>>>> Mika
>>>>
>>>> Darko writes:
>>>>> You can use RTTipe to read the fields and values within a type.  
>>>>> If you
>>>>> also want the type and field names you can interpret the .M3WEB  
>>>>> file.
>>>>> I have a couple of modules that do something like that but they  
>>>>> are
>>>>> not what you would call finished. What level of detail are you  
>>>>> after?
>>>>>
>>>>>
>>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>>
>>>>>> Hello Modula-3 people,
>>>>>>
>>>>>> I am working on a writing an interpreter that I'd like to embed  
>>>>>> in
>>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>>> might from time to time be manipulating arbitrary M3 REFs, and  
>>>>>> just
>>>>>> from the point of view of providing information to a human user,
>>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>>> thinking
>>>>>> that since m3gdb can do it, the information must all be in the
>>>>>> binary---somehow.  (Even enumeration names, right?)  And since  
>>>>>> the
>>>>>> pickler can pickle things... hmm.
>>>>>>
>>>>>> I would greatly appreciate any guidance that's out there...
>>>>>>
>>>>>> Best regards,
>>>>>>   Mika Nystrom
>>>
>


From mika at async.caltech.edu  Wed Oct  1 20:09:58 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Wed, 01 Oct 2008 11:09:58 -0700
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: Your message of "Wed, 01 Oct 2008 12:03:15 +0200."
	<B971C9C9-251C-4F79-A12F-622F47883781@darko.org> 
Message-ID: <200810011809.m91I9wxY087739@camembert.async.caltech.edu>

Oh, I'd love to give it a try!

I'm a little surprised no one has chimed in on the question of
whether you really need .M3WEB... I could swear I can get good
symbolic debugging with m3gdb on just a binary...

     Mika

Darko writes:
>I've extended one of the modules with a function that formats any  
>allocated value for printing. If you're interested I can clean them up  
>a little and post them.
>
>
>On 28/09/2008, at 8:01 AM, Darko wrote:
>
>> As far as I know, yes, they're not in the binary. I'd love to be  
>> proven wrong though, or fix it so they did. I have a module that  
>> reads the .M3WEB file and maps it to types and a module that will  
>> read and write any field within a type safely using a numeric index.  
>> Neither is perfect. You can integrate the two to get what you want  
>> but I seem to remember having some problems mapping type ids (UIDs?)  
>> to typecodes at runtime.
>>
>>
>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>
>>> Right, I am aware of those interfaces.. just wondering what was
>>> out there.  Do I really need to look at .M3WEB?  I thought
>>> that m3gdb could figure out things without anything outside
>>> of the binary...
>>>
>>> I'm looking for essentially what m3gdb offers, say prints
>>> at minimum the name of the type (this I recall is trivial with
>>> some of the RT* interfaces) but hopefully also with field names
>>> and values, but doesn't expand references recursively.. something
>>> like that?
>>>
>>>   Mika
>>>
>>> Darko writes:
>>>> You can use RTTipe to read the fields and values within a type. If  
>>>> you
>>>> also want the type and field names you can interpret the .M3WEB  
>>>> file.
>>>> I have a couple of modules that do something like that but they are
>>>> not what you would call finished. What level of detail are you  
>>>> after?
>>>>
>>>>
>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> I am working on a writing an interpreter that I'd like to embed in
>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>> might from time to time be manipulating arbitrary M3 REFs, and just
>>>>> from the point of view of providing information to a human user,
>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>> thinking
>>>>> that since m3gdb can do it, the information must all be in the
>>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>>> pickler can pickle things... hmm.
>>>>>
>>>>> I would greatly appreciate any guidance that's out there...
>>>>>
>>>>>  Best regards,
>>>>>     Mika Nystrom
>>


From mika at async.caltech.edu  Wed Oct  1 20:10:38 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Wed, 01 Oct 2008 11:10:38 -0700
Subject: [M3devel] Pretty-printing REFANYs?
In-Reply-To: Your message of "Wed, 01 Oct 2008 11:07:00 BST."
	<2A7B7ADE-62C4-429D-9A70-671E044195AD@cs.purdue.edu> 
Message-ID: <200810011810.m91IAcDW087832@camembert.async.caltech.edu>

Ok, ignore my previous email :-)

Tony Hosking writes:
>m3gdb makes use of stabs debug information spat out by the backend.   
>They are only in the binary if compiled -g.  There are other ways to  
>get what you are after, as Darko has observed.
>
>On Oct 1, 2008, at 11:03 AM, Darko wrote:
>
>> I've extended one of the modules with a function that formats any  
>> allocated value for printing. If you're interested I can clean them  
>> up a little and post them.
>>
>>
>> On 28/09/2008, at 8:01 AM, Darko wrote:
>>
>>> As far as I know, yes, they're not in the binary. I'd love to be  
>>> proven wrong though, or fix it so they did. I have a module that  
>>> reads the .M3WEB file and maps it to types and a module that will  
>>> read and write any field within a type safely using a numeric  
>>> index. Neither is perfect. You can integrate the two to get what  
>>> you want but I seem to remember having some problems mapping type  
>>> ids (UIDs?) to typecodes at runtime.
>>>
>>>
>>> On 28/09/2008, at 7:49 AM, Mika Nystrom wrote:
>>>
>>>> Right, I am aware of those interfaces.. just wondering what was
>>>> out there.  Do I really need to look at .M3WEB?  I thought
>>>> that m3gdb could figure out things without anything outside
>>>> of the binary...
>>>>
>>>> I'm looking for essentially what m3gdb offers, say prints
>>>> at minimum the name of the type (this I recall is trivial with
>>>> some of the RT* interfaces) but hopefully also with field names
>>>> and values, but doesn't expand references recursively.. something
>>>> like that?
>>>>
>>>>  Mika
>>>>
>>>> Darko writes:
>>>>> You can use RTTipe to read the fields and values within a type.  
>>>>> If you
>>>>> also want the type and field names you can interpret the .M3WEB  
>>>>> file.
>>>>> I have a couple of modules that do something like that but they are
>>>>> not what you would call finished. What level of detail are you  
>>>>> after?
>>>>>
>>>>>
>>>>> On 28/09/2008, at 6:45 AM, Mika Nystrom wrote:
>>>>>
>>>>>> Hello Modula-3 people,
>>>>>>
>>>>>> I am working on a writing an interpreter that I'd like to embed in
>>>>>> various Modula-3 programs.  It so happens that this interpreter
>>>>>> might from time to time be manipulating arbitrary M3 REFs, and  
>>>>>> just
>>>>>> from the point of view of providing information to a human user,
>>>>>> it might be nice to be able to pretty-print these.  Does anyone
>>>>>> have any code that accomplishes this, at least partly?  I'm  
>>>>>> thinking
>>>>>> that since m3gdb can do it, the information must all be in the
>>>>>> binary---somehow.  (Even enumeration names, right?)  And since the
>>>>>> pickler can pickle things... hmm.
>>>>>>
>>>>>> I would greatly appreciate any guidance that's out there...
>>>>>>
>>>>>> Best regards,
>>>>>>    Mika Nystrom
>>>


From jay.krell at cornell.edu  Sun Oct 12 11:51:03 2008
From: jay.krell at cornell.edu (Jay)
Date: Sun, 12 Oct 2008 09:51:03 +0000
Subject: [M3devel] a bunch of new/old platform names?
Message-ID: <COL101-W614506DC49BC7BC3640D65E6370@phx.gbl>


I plan on soon bringing "back" some old ports -- building current archives -- and bring up some new ports.

Specifically I have hardware: RS/6000 (PPC64/AIX), SGI (MIPS), SPARC64, plus the usual x86/AMD64.

Two of the platforms did exist.

In particular, "MIPS_IRIX" is "IRIX5".
  Reuse IRIX5, or introduce MIPS_IRIX?

PPC_AIX is IBMR2 or such.
  Same question.

Also, must versions really be in platform names?
I'm loathe to add a third dimension to the matrix.
I did just note that FreeBSD 7.0 64 bit is ABI-incompatible with FreeBSD 6.3 64 bit, lame.

SGI claims good ABI across all the 6.5 releases, which is all there will be now.
IBM claims good 32 bit ABI compat across AIX 4.x - 6.x and good 64 bit ABI compat across 5.x and 6.x, but incompatibility from 64 bit 4.x.
(Microsoft has always been good here, but "behavioral" compat is the actual tricky issue.)

And, what do folks think about putting "32" in new 32 bit platform names?

I'm considering the following:
  MIPS32_{IRIX,LINUX,OPENBSD,NETBSD} 
  MIPS64_IRIX (6.5) 
  SPARC{32,64}_{LINUX,*BSD}(probably no SPARC32_*BSD actually, and SPARC32_LINUX is already in, but not building regularly) 
  {SPARC64,I386,AMD64}_SOLARIS 
  PPC{32,64}_AIX 
    (PPC64_LINUX is blocked, Linux has problems booting on the hardware and I have no Mac G5 yet). 
 AMD64_*BSD 

Also, maybe some of the code should be restructured to separate processor from OS?
That might be primarily only pointer size.

Any interest in "x86" instead of "I386"?

If I make good progress against those 18 (!), I can see about PPC64_DARWIN, HPPA_*, IA64_*, ALPHA_*, ARM_*, which I lack hardware for. PPC_LINUX also should be converted to pthreads imho.
Mostly this is all just a matter of installing the OS and configuring gcc.
 
And, yeah, I have the two m3cgs stepping side by side to find the problem there, and will have use of a high dpi Windows laptop for that other problem..

And then of course, if the vast majority of platforms are named like that, there might be pressure to bring the rest in line. :) I386_{NT,LINUX,*BSD,CYGWIN,MINGWIN}

 - Jay

From mika at async.caltech.edu  Fri Oct 17 00:32:39 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 15:32:39 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
Message-ID: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu>

Hello Modula-3 people,

As I mentioned in an earlier email about printing structures (thanks
Darko), I'm in the midst of coding an interpreter embedded in
Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
JScheme for Java (well it was at first strongly based, but more and
more loosely, if you know what I mean...)

I expected that the performance of the interpreter would be much
better in Modula-3 than in Java, and I have been testing on two
different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
and the other is CM3 on a recent Debian system.  What I am finding
is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
close to ten times as fast on some tasks at this point), but on
Linux/CM3 it is much closer in speed to JScheme than I would like.

When I started, with code that was essentially equivalent to JScheme,
I found that it was a bit slower than JScheme on Linux/CM3 and
possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
spend most of its time in (surprise, surprise!) memory allocation
and garbage collection.  The speedup I have achieved between the
first implementation and now was due to the use of Modula-3 constructs
that are superior to Java's, such as the use of arrays of RECORDs
to make small stacks rather than linked lists.  (I get readable
code with much fewer memory allocations and GC work.)

Now, since this is an interpreter, I as the implementer have limited
control over how much memory is allocated and freed, and where it is
needed.  However, I can sometimes fall back on C-style memory management,
but I would like to do it in a safe way.  For instance, I have special-cased
evaluation of Scheme primitives, as follows.

Under the "normal" implementation, a list of things to evaluate is
built up, passed to an evaluation function, and then the GC is left
to sweep up the mess.  The problem is that there are various tricky
routes by which references can escape the evaluator, so you can't
just assume that what you put in is going to be dead right after
an eval and free it.  Instead, I set a flag in the evaluator, which
is TRUE if it is OK to free the list after the eval and FALSE if
it's unclear (in which case the problem is left up to the GC).

For the vast majority of Scheme primitives, one can indeed free the
list right after the eval.  Now of course I am not interested
in unsafe code, so what I do is this:

TYPE Pair = OBJECT first, rest : REFANY; END;

VAR
  mu := NEW(MUTEX);
  free : Pair := NIL;

PROCEDURE GetPair() : Pair =
  BEGIN
    LOCK mu DO
      IF free # NIL THEN
        TRY
          RETURN free
        FINALLY
          free := free.rest
        END
      END
    END;
    RETURN NEW(Pair)
  END GetPair;

PROCEDURE ReturnPair(cons : Pair) = 
  BEGIN
    cons.first := NIL;
    LOCK mu DO
      cons.rest := free;
      free := cons
    END
  END ReturnPair;

my eval code looks like

VAR okToFree : BOOLEAN; BEGIN

   args := GetPair(); ...
   result := EvalPrimitive(args, (*VAR OUT*) okToFree);

   IF okToFree THEN ReturnPair(args) END;
   RETURN result
END

and this does work well.  In fact it speeds up the Linux implementation
by almost 100% to recycle the lists like this *just* for the
evaluation of Scheme primitives.

But it's still ugly, isn't it?  There's a mutex, and a global
variable.  And yes, the time spent messing with the mutex is
noticeable, and I haven't even made the code multi-threaded yet
(and that is coming!)

So I'm thinking, what I really want is a structure that is attached
to my current Thread.T.  I want to be able to access just a single 
pointer (like the free list) but be sure it is unique to my current
thread.  No locking would be necessary if I could do this.

Does anyone have an elegant solution that does something like this?
Thread-specific "static" variables?  Just one REFANY would be enough
for a lot of uses...  seems to me this should be a frequently
occurring problem?

     Best regards,
       Mika
    

From hosking at cs.purdue.edu  Fri Oct 17 00:54:51 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Thu, 16 Oct 2008 23:54:51 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu>
References: <200810162232.m9GMWdtJ067248@camembert.async.caltech.edu>
Message-ID: <C17F2003-446E-466C-84DC-DA8E23A96726@cs.purdue.edu>

Have you tried running @M3noincremental?

On 16 Oct 2008, at 23:32, Mika Nystrom wrote:

> Hello Modula-3 people,
>
> As I mentioned in an earlier email about printing structures (thanks
> Darko), I'm in the midst of coding an interpreter embedded in
> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
> JScheme for Java (well it was at first strongly based, but more and
> more loosely, if you know what I mean...)
>
> I expected that the performance of the interpreter would be much
> better in Modula-3 than in Java, and I have been testing on two
> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
> and the other is CM3 on a recent Debian system.  What I am finding
> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
> close to ten times as fast on some tasks at this point), but on
> Linux/CM3 it is much closer in speed to JScheme than I would like.
>
> When I started, with code that was essentially equivalent to JScheme,
> I found that it was a bit slower than JScheme on Linux/CM3 and
> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
> spend most of its time in (surprise, surprise!) memory allocation
> and garbage collection.  The speedup I have achieved between the
> first implementation and now was due to the use of Modula-3 constructs
> that are superior to Java's, such as the use of arrays of RECORDs
> to make small stacks rather than linked lists.  (I get readable
> code with much fewer memory allocations and GC work.)
>
> Now, since this is an interpreter, I as the implementer have limited
> control over how much memory is allocated and freed, and where it is
> needed.  However, I can sometimes fall back on C-style memory  
> management,
> but I would like to do it in a safe way.  For instance, I have  
> special-cased
> evaluation of Scheme primitives, as follows.
>
> Under the "normal" implementation, a list of things to evaluate is
> built up, passed to an evaluation function, and then the GC is left
> to sweep up the mess.  The problem is that there are various tricky
> routes by which references can escape the evaluator, so you can't
> just assume that what you put in is going to be dead right after
> an eval and free it.  Instead, I set a flag in the evaluator, which
> is TRUE if it is OK to free the list after the eval and FALSE if
> it's unclear (in which case the problem is left up to the GC).
>
> For the vast majority of Scheme primitives, one can indeed free the
> list right after the eval.  Now of course I am not interested
> in unsafe code, so what I do is this:
>
> TYPE Pair = OBJECT first, rest : REFANY; END;
>
> VAR
>  mu := NEW(MUTEX);
>  free : Pair := NIL;
>
> PROCEDURE GetPair() : Pair =
>  BEGIN
>    LOCK mu DO
>      IF free # NIL THEN
>        TRY
>          RETURN free
>        FINALLY
>          free := free.rest
>        END
>      END
>    END;
>    RETURN NEW(Pair)
>  END GetPair;
>
> PROCEDURE ReturnPair(cons : Pair) =
>  BEGIN
>    cons.first := NIL;
>    LOCK mu DO
>      cons.rest := free;
>      free := cons
>    END
>  END ReturnPair;
>
> my eval code looks like
>
> VAR okToFree : BOOLEAN; BEGIN
>
>   args := GetPair(); ...
>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>
>   IF okToFree THEN ReturnPair(args) END;
>   RETURN result
> END
>
> and this does work well.  In fact it speeds up the Linux  
> implementation
> by almost 100% to recycle the lists like this *just* for the
> evaluation of Scheme primitives.
>
> But it's still ugly, isn't it?  There's a mutex, and a global
> variable.  And yes, the time spent messing with the mutex is
> noticeable, and I haven't even made the code multi-threaded yet
> (and that is coming!)
>
> So I'm thinking, what I really want is a structure that is attached
> to my current Thread.T.  I want to be able to access just a single
> pointer (like the free list) but be sure it is unique to my current
> thread.  No locking would be necessary if I could do this.
>
> Does anyone have an elegant solution that does something like this?
> Thread-specific "static" variables?  Just one REFANY would be enough
> for a lot of uses...  seems to me this should be a frequently
> occurring problem?
>
>     Best regards,
>       Mika
>
>
>
>
>
>


From mika at async.caltech.edu  Fri Oct 17 01:30:01 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 16:30:01 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Thu, 16 Oct 2008 23:54:51 BST."
	<C17F2003-446E-466C-84DC-DA8E23A96726@cs.purdue.edu> 
Message-ID: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>

Hi Tony,

I figured you would chime in!

Yes, @M3noincremental seems to make things consistently a tad bit
slower (but a very small difference), on both FreeBSD and Linux.
@M3nogc makes a bigger difference, of course.

Unfortunately I seem to have lost the code that did a lot of memory
allocations.  My tricks (as described in the email---and others!)
have removed most of the troublesome memory allocations, but now
I'm stuck with the mutex instead...

      Mika

Tony Hosking writes:
>Have you tried running @M3noincremental?
>
>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>
>> Hello Modula-3 people,
>>
>> As I mentioned in an earlier email about printing structures (thanks
>> Darko), I'm in the midst of coding an interpreter embedded in
>> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
>> JScheme for Java (well it was at first strongly based, but more and
>> more loosely, if you know what I mean...)
>>
>> I expected that the performance of the interpreter would be much
>> better in Modula-3 than in Java, and I have been testing on two
>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>> and the other is CM3 on a recent Debian system.  What I am finding
>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>> close to ten times as fast on some tasks at this point), but on
>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>
>> When I started, with code that was essentially equivalent to JScheme,
>> I found that it was a bit slower than JScheme on Linux/CM3 and
>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>> spend most of its time in (surprise, surprise!) memory allocation
>> and garbage collection.  The speedup I have achieved between the
>> first implementation and now was due to the use of Modula-3 constructs
>> that are superior to Java's, such as the use of arrays of RECORDs
>> to make small stacks rather than linked lists.  (I get readable
>> code with much fewer memory allocations and GC work.)
>>
>> Now, since this is an interpreter, I as the implementer have limited
>> control over how much memory is allocated and freed, and where it is
>> needed.  However, I can sometimes fall back on C-style memory  
>> management,
>> but I would like to do it in a safe way.  For instance, I have  
>> special-cased
>> evaluation of Scheme primitives, as follows.
>>
>> Under the "normal" implementation, a list of things to evaluate is
>> built up, passed to an evaluation function, and then the GC is left
>> to sweep up the mess.  The problem is that there are various tricky
>> routes by which references can escape the evaluator, so you can't
>> just assume that what you put in is going to be dead right after
>> an eval and free it.  Instead, I set a flag in the evaluator, which
>> is TRUE if it is OK to free the list after the eval and FALSE if
>> it's unclear (in which case the problem is left up to the GC).
>>
>> For the vast majority of Scheme primitives, one can indeed free the
>> list right after the eval.  Now of course I am not interested
>> in unsafe code, so what I do is this:
>>
>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>
>> VAR
>>  mu := NEW(MUTEX);
>>  free : Pair := NIL;
>>
>> PROCEDURE GetPair() : Pair =
>>  BEGIN
>>    LOCK mu DO
>>      IF free # NIL THEN
>>        TRY
>>          RETURN free
>>        FINALLY
>>          free := free.rest
>>        END
>>      END
>>    END;
>>    RETURN NEW(Pair)
>>  END GetPair;
>>
>> PROCEDURE ReturnPair(cons : Pair) =
>>  BEGIN
>>    cons.first := NIL;
>>    LOCK mu DO
>>      cons.rest := free;
>>      free := cons
>>    END
>>  END ReturnPair;
>>
>> my eval code looks like
>>
>> VAR okToFree : BOOLEAN; BEGIN
>>
>>   args := GetPair(); ...
>>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>
>>   IF okToFree THEN ReturnPair(args) END;
>>   RETURN result
>> END
>>
>> and this does work well.  In fact it speeds up the Linux  
>> implementation
>> by almost 100% to recycle the lists like this *just* for the
>> evaluation of Scheme primitives.
>>
>> But it's still ugly, isn't it?  There's a mutex, and a global
>> variable.  And yes, the time spent messing with the mutex is
>> noticeable, and I haven't even made the code multi-threaded yet
>> (and that is coming!)
>>
>> So I'm thinking, what I really want is a structure that is attached
>> to my current Thread.T.  I want to be able to access just a single
>> pointer (like the free list) but be sure it is unique to my current
>> thread.  No locking would be necessary if I could do this.
>>
>> Does anyone have an elegant solution that does something like this?
>> Thread-specific "static" variables?  Just one REFANY would be enough
>> for a lot of uses...  seems to me this should be a frequently
>> occurring problem?
>>
>>     Best regards,
>>       Mika
>>
>>
>>
>>
>>
>>


From jay.krell at cornell.edu  Fri Oct 17 06:40:28 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 17 Oct 2008 04:40:28 +0000
Subject: [M3devel] M3 programming problem : GC efficiency /
	per-thread	storage areas?
In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
References: Your message of 
	<200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
Message-ID: <COL101-W4964BD437A46A53516DAA3E6320@phx.gbl>


Making this per-thread is a fairly classic good improvement.

You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and allowing for a free to come in from any thread.

A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Including an array of mutexes.

If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a global counter that gets assigned into the thread on first use per-thread.

The cache could also be more than one element.

How do you manage okToFree?

Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be.
 
 - Jay

----------------------------------------
> To: hosking at cs.purdue.edu
> Date: Thu, 16 Oct 2008 16:30:01 -0700
> From: mika at async.caltech.edu
> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu
> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread	storage areas?
> 
> Hi Tony,
> 
> I figured you would chime in!
> 
> Yes, @M3noincremental seems to make things consistently a tad bit
> slower (but a very small difference), on both FreeBSD and Linux.
> @M3nogc makes a bigger difference, of course.
> 
> Unfortunately I seem to have lost the code that did a lot of memory
> allocations.  My tricks (as described in the email---and others!)
> have removed most of the troublesome memory allocations, but now
> I'm stuck with the mutex instead...
> 
>       Mika
> 
> Tony Hosking writes:
>>Have you tried running @M3noincremental?
>>
>>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>
>>> Hello Modula-3 people,
>>>
>>> As I mentioned in an earlier email about printing structures (thanks
>>> Darko), I'm in the midst of coding an interpreter embedded in
>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
>>> JScheme for Java (well it was at first strongly based, but more and
>>> more loosely, if you know what I mean...)
>>>
>>> I expected that the performance of the interpreter would be much
>>> better in Modula-3 than in Java, and I have been testing on two
>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>> and the other is CM3 on a recent Debian system.  What I am finding
>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>>> close to ten times as fast on some tasks at this point), but on
>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>
>>> When I started, with code that was essentially equivalent to JScheme,
>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>> spend most of its time in (surprise, surprise!) memory allocation
>>> and garbage collection.  The speedup I have achieved between the
>>> first implementation and now was due to the use of Modula-3 constructs
>>> that are superior to Java's, such as the use of arrays of RECORDs
>>> to make small stacks rather than linked lists.  (I get readable
>>> code with much fewer memory allocations and GC work.)
>>>
>>> Now, since this is an interpreter, I as the implementer have limited
>>> control over how much memory is allocated and freed, and where it is
>>> needed.  However, I can sometimes fall back on C-style memory  
>>> management,
>>> but I would like to do it in a safe way.  For instance, I have  
>>> special-cased
>>> evaluation of Scheme primitives, as follows.
>>>
>>> Under the "normal" implementation, a list of things to evaluate is
>>> built up, passed to an evaluation function, and then the GC is left
>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>> just assume that what you put in is going to be dead right after
>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>> it's unclear (in which case the problem is left up to the GC).
>>>
>>> For the vast majority of Scheme primitives, one can indeed free the
>>> list right after the eval.  Now of course I am not interested
>>> in unsafe code, so what I do is this:
>>>
>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>
>>> VAR
>>>  mu := NEW(MUTEX);
>>>  free : Pair := NIL;
>>>
>>> PROCEDURE GetPair() : Pair =
>>>  BEGIN
>>>    LOCK mu DO
>>>      IF free # NIL THEN
>>>        TRY
>>>          RETURN free
>>>        FINALLY
>>>          free := free.rest
>>>        END
>>>      END
>>>    END;
>>>    RETURN NEW(Pair)
>>>  END GetPair;
>>>
>>> PROCEDURE ReturnPair(cons : Pair) =
>>>  BEGIN
>>>    cons.first := NIL;
>>>    LOCK mu DO
>>>      cons.rest := free;
>>>      free := cons
>>>    END
>>>  END ReturnPair;
>>>
>>> my eval code looks like
>>>
>>> VAR okToFree : BOOLEAN; BEGIN
>>>
>>>   args := GetPair(); ...
>>>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>
>>>   IF okToFree THEN ReturnPair(args) END;
>>>   RETURN result
>>> END
>>>
>>> and this does work well.  In fact it speeds up the Linux  
>>> implementation
>>> by almost 100% to recycle the lists like this *just* for the
>>> evaluation of Scheme primitives.
>>>
>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>> variable.  And yes, the time spent messing with the mutex is
>>> noticeable, and I haven't even made the code multi-threaded yet
>>> (and that is coming!)
>>>
>>> So I'm thinking, what I really want is a structure that is attached
>>> to my current Thread.T.  I want to be able to access just a single
>>> pointer (like the free list) but be sure it is unique to my current
>>> thread.  No locking would be necessary if I could do this.
>>>
>>> Does anyone have an elegant solution that does something like this?
>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>> for a lot of uses...  seems to me this should be a frequently
>>> occurring problem?
>>>
>>>     Best regards,
>>>       Mika
>>>
>>>
>>>
>>>
>>>
>>>


From mika at async.caltech.edu  Fri Oct 17 08:32:15 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 23:32:15 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000."
	<COL101-W4964BD437A46A53516DAA3E6320@phx.gbl> 
Message-ID: <200810170632.m9H6WFHd078061@camembert.async.caltech.edu>


Well, I was thinking of something even simpler.  A Thread.T is an
OBJECT.  It's garbage collected just like any other object, is it
not?  

Why can't the thing that makes new threads simply include a single
globally visible field in every Thread.T, of type REFANY?  Call it "data".

Then you can always manipulate Thread.Self().data as you see fit
without any need for locks.  There can be no problem with this as
long as it is always manipulated from within that thread.
Of course this can be trivially encapsulated by not revealing "data"
and indeed always accessing it as Thread.Self().data.

You would not normally access this from any other thread.  It's indeed
only meant to be used in the idiom

  x := Allocate();
  TRY
    DoSomething(x)
  FINALLY
    Free(x)
  END

It's also not really a "Free" but just returning the object to a free
list (there can be no unsafe behavior here).

As a "nicer" interface, one could register routines with a public
interface, asking it to manufacture some kind of thread globals.
For maximum sanity, they would be visible inside the MODULE that
requested them, but I'm not sure how to accomplish this.  And of
course there's not much point in any of this unless it can be made
efficient or else a mutex plus a true global will work just as well.

What I'm talking about I guess could be done by hacking up Thread.Fork()
to return a subtype of Thread.T, but that won't work for the first
thread.  But with this method you could have arbitrary fields (and
methods) attached to a Thread.T.  How to collect everything you need
is a different story...

I'm not asking for a new language feature... really was just wondering
if anyone had tried anything like this before, and now am rambling a
bit.
 
     Mika

Jay writes:
>
>Making this per-thread is a fairly classic good improvement.
>
>You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and a
>llowing for a free to come in from any thread.
>
>A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Includi
>ng an array of mutexes.
>
>If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a glo
>bal counter that gets assigned into the thread on first use per-thread.
>
>The cache could also be more than one element.
>
>How do you manage okToFree?
>
>Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work
> with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be.
> 
> - Jay
>
>----------------------------------------
>> To: hosking at cs.purdue.edu
>> Date: Thu, 16 Oct 2008 16:30:01 -0700
>> From: mika at async.caltech.edu
>> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu
>> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread	storage areas?
>> 
>> Hi Tony,
>> 
>> I figured you would chime in!
>> 
>> Yes, @M3noincremental seems to make things consistently a tad bit
>> slower (but a very small difference), on both FreeBSD and Linux.
>> @M3nogc makes a bigger difference, of course.
>> 
>> Unfortunately I seem to have lost the code that did a lot of memory
>> allocations.  My tricks (as described in the email---and others!)
>> have removed most of the troublesome memory allocations, but now
>> I'm stuck with the mutex instead...
>> 
>>       Mika
>> 
>> Tony Hosking writes:
>>>Have you tried running @M3noincremental?
>>>
>>>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> As I mentioned in an earlier email about printing structures (thanks
>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
>>>> JScheme for Java (well it was at first strongly based, but more and
>>>> more loosely, if you know what I mean...)
>>>>
>>>> I expected that the performance of the interpreter would be much
>>>> better in Modula-3 than in Java, and I have been testing on two
>>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>>> close to ten times as fast on some tasks at this point), but on
>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>
>>>> When I started, with code that was essentially equivalent to JScheme,
>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>> and garbage collection.  The speedup I have achieved between the
>>>> first implementation and now was due to the use of Modula-3 constructs
>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>> to make small stacks rather than linked lists.  (I get readable
>>>> code with much fewer memory allocations and GC work.)
>>>>
>>>> Now, since this is an interpreter, I as the implementer have limited
>>>> control over how much memory is allocated and freed, and where it is
>>>> needed.  However, I can sometimes fall back on C-style memory  
>>>> management,
>>>> but I would like to do it in a safe way.  For instance, I have  
>>>> special-cased
>>>> evaluation of Scheme primitives, as follows.
>>>>
>>>> Under the "normal" implementation, a list of things to evaluate is
>>>> built up, passed to an evaluation function, and then the GC is left
>>>> to sweep up the mess.  The problem is that there are various tricky
>>>> routes by which references can escape the evaluator, so you can't
>>>> just assume that what you put in is going to be dead right after
>>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>> it's unclear (in which case the problem is left up to the GC).
>>>>
>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>> list right after the eval.  Now of course I am not interested
>>>> in unsafe code, so what I do is this:
>>>>
>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>
>>>> VAR
>>>>  mu := NEW(MUTEX);
>>>>  free : Pair := NIL;
>>>>
>>>> PROCEDURE GetPair() : Pair =
>>>>  BEGIN
>>>>    LOCK mu DO
>>>>      IF free # NIL THEN
>>>>        TRY
>>>>          RETURN free
>>>>        FINALLY
>>>>          free := free.rest
>>>>        END
>>>>      END
>>>>    END;
>>>>    RETURN NEW(Pair)
>>>>  END GetPair;
>>>>
>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>  BEGIN
>>>>    cons.first := NIL;
>>>>    LOCK mu DO
>>>>      cons.rest := free;
>>>>      free := cons
>>>>    END
>>>>  END ReturnPair;
>>>>
>>>> my eval code looks like
>>>>
>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>
>>>>   args := GetPair(); ...
>>>>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>
>>>>   IF okToFree THEN ReturnPair(args) END;
>>>>   RETURN result
>>>> END
>>>>
>>>> and this does work well.  In fact it speeds up the Linux  
>>>> implementation
>>>> by almost 100% to recycle the lists like this *just* for the
>>>> evaluation of Scheme primitives.
>>>>
>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>> variable.  And yes, the time spent messing with the mutex is
>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>> (and that is coming!)
>>>>
>>>> So I'm thinking, what I really want is a structure that is attached
>>>> to my current Thread.T.  I want to be able to access just a single
>>>> pointer (like the free list) but be sure it is unique to my current
>>>> thread.  No locking would be necessary if I could do this.
>>>>
>>>> Does anyone have an elegant solution that does something like this?
>>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>>> for a lot of uses...  seems to me this should be a frequently
>>>> occurring problem?
>>>>
>>>>     Best regards,
>>>>       Mika
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


From hosking at cs.purdue.edu  Fri Oct 17 08:35:03 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Fri, 17 Oct 2008 07:35:03 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
References: <200810162330.m9GNU1Zm068614@camembert.async.caltech.edu>
Message-ID: <0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu>

I suspect part of the overhead of allocation in the new code is the  
need for thread-local allocation buffers, which means we need to  
access thread-local state.  We really need an efficient way to do  
that, but pthreads thread-local accesses may be what is killing you.

On 17 Oct 2008, at 00:30, Mika Nystrom wrote:

> Hi Tony,
>
> I figured you would chime in!
>
> Yes, @M3noincremental seems to make things consistently a tad bit
> slower (but a very small difference), on both FreeBSD and Linux.
> @M3nogc makes a bigger difference, of course.
>
> Unfortunately I seem to have lost the code that did a lot of memory
> allocations.  My tricks (as described in the email---and others!)
> have removed most of the troublesome memory allocations, but now
> I'm stuck with the mutex instead...
>
>      Mika
>
> Tony Hosking writes:
>> Have you tried running @M3noincremental?
>>
>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>
>>> Hello Modula-3 people,
>>>
>>> As I mentioned in an earlier email about printing structures (thanks
>>> Darko), I'm in the midst of coding an interpreter embedded in
>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter  
>>> Norvig's
>>> JScheme for Java (well it was at first strongly based, but more and
>>> more loosely, if you know what I mean...)
>>>
>>> I expected that the performance of the interpreter would be much
>>> better in Modula-3 than in Java, and I have been testing on two
>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>> and the other is CM3 on a recent Debian system.  What I am finding
>>> is that it is indeed much faster than JScheme on FreeBSD/PM3  
>>> (getting
>>> close to ten times as fast on some tasks at this point), but on
>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>
>>> When I started, with code that was essentially equivalent to  
>>> JScheme,
>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>> spend most of its time in (surprise, surprise!) memory allocation
>>> and garbage collection.  The speedup I have achieved between the
>>> first implementation and now was due to the use of Modula-3  
>>> constructs
>>> that are superior to Java's, such as the use of arrays of RECORDs
>>> to make small stacks rather than linked lists.  (I get readable
>>> code with much fewer memory allocations and GC work.)
>>>
>>> Now, since this is an interpreter, I as the implementer have limited
>>> control over how much memory is allocated and freed, and where it is
>>> needed.  However, I can sometimes fall back on C-style memory
>>> management,
>>> but I would like to do it in a safe way.  For instance, I have
>>> special-cased
>>> evaluation of Scheme primitives, as follows.
>>>
>>> Under the "normal" implementation, a list of things to evaluate is
>>> built up, passed to an evaluation function, and then the GC is left
>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>> just assume that what you put in is going to be dead right after
>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>> it's unclear (in which case the problem is left up to the GC).
>>>
>>> For the vast majority of Scheme primitives, one can indeed free the
>>> list right after the eval.  Now of course I am not interested
>>> in unsafe code, so what I do is this:
>>>
>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>
>>> VAR
>>> mu := NEW(MUTEX);
>>> free : Pair := NIL;
>>>
>>> PROCEDURE GetPair() : Pair =
>>> BEGIN
>>>   LOCK mu DO
>>>     IF free # NIL THEN
>>>       TRY
>>>         RETURN free
>>>       FINALLY
>>>         free := free.rest
>>>       END
>>>     END
>>>   END;
>>>   RETURN NEW(Pair)
>>> END GetPair;
>>>
>>> PROCEDURE ReturnPair(cons : Pair) =
>>> BEGIN
>>>   cons.first := NIL;
>>>   LOCK mu DO
>>>     cons.rest := free;
>>>     free := cons
>>>   END
>>> END ReturnPair;
>>>
>>> my eval code looks like
>>>
>>> VAR okToFree : BOOLEAN; BEGIN
>>>
>>>  args := GetPair(); ...
>>>  result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>
>>>  IF okToFree THEN ReturnPair(args) END;
>>>  RETURN result
>>> END
>>>
>>> and this does work well.  In fact it speeds up the Linux
>>> implementation
>>> by almost 100% to recycle the lists like this *just* for the
>>> evaluation of Scheme primitives.
>>>
>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>> variable.  And yes, the time spent messing with the mutex is
>>> noticeable, and I haven't even made the code multi-threaded yet
>>> (and that is coming!)
>>>
>>> So I'm thinking, what I really want is a structure that is attached
>>> to my current Thread.T.  I want to be able to access just a single
>>> pointer (like the free list) but be sure it is unique to my current
>>> thread.  No locking would be necessary if I could do this.
>>>
>>> Does anyone have an elegant solution that does something like this?
>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>> for a lot of uses...  seems to me this should be a frequently
>>> occurring problem?
>>>
>>>    Best regards,
>>>      Mika
>>>
>>>
>>>
>>>
>>>
>>>


From mika at async.caltech.edu  Fri Oct 17 08:50:13 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 16 Oct 2008 23:50:13 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 04:40:28 -0000."
	<COL101-W4964BD437A46A53516DAA3E6320@phx.gbl> 
Message-ID: <200810170650.m9H6oDU0078549@camembert.async.caltech.edu>

Jay writes:
...
>How do you manage okToFree?
...

I forgot to answer this q.

Well, the primitive evaluation in the interpreter is just a big
CASE statement.  I really just look at where it references the list
I am making, and if it references the list at all in a branch, I
insert the code "okToFree := FALSE".  The first two parameters are
passed in separately.  

Here's the code... since you ask!

This is the code for the special case of a two-argument Scheme procedure call,
such as (+ x 1) .

PROCEDURE Apply2(t : T; interp : Scheme.T; a1, a2 : Object) : Object
  VAR
      d1, d2 := GetCons();
      free := TRUE;
  BEGIN
      d1.first := a1; d1.rest := d2;
      d2.first := a2; d2.rest := NIL;

      WITH res = Prims(t, interp, d1, a1, a2, free) DO
        IF free THEN
          ReturnCons(d1); ReturnCons(d2)
        END;
        RETURN res
      END
  END Apply2;

PROCEDURE Prims(t : T; interp : Scheme.T; args, x, y : Object;
                VAR free : BOOLEAN) : Object =

   (* The (hopefully temporary) list of arguments is args.  x and
      y are the first two elements of args *)

   BEGIN
      CASE VAL(t.idNumber,P) OF
          P.Eq => RETURN NumCompare(args, '=')  (* known not to let args escape *)
        |
          P.List => free := FALSE; RETURN args  (* args escapes, dont know whither *)
        |
          P.Car => RETURN PedanticFirst(x)  (* doesn't even use args *)

        (* and about another 100 cases follow here *)

      END
   END Prims;

       Mika


From mika at async.caltech.edu  Fri Oct 17 10:03:18 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 17 Oct 2008 01:03:18 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST."
	<0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> 
Message-ID: <200810170803.m9H83IIC080081@camembert.async.caltech.edu>

Ok this suggests that using thread local state to get around the
problem won't help either.

Can I ask a question... I am looking at ThreadPThread.m3...

Why do you have to lock the slotMu in Self()?

PROCEDURE Self (): T =
  (* If not the initial thread and not created by Fork, returns NIL *)
  (* LL = 0 *)
  VAR
    me := GetActivation();
    t: T;
  BEGIN
    IF me = NIL THEN RETURN NIL END;
    WITH r = Upthread.mutex_lock(slotMu) DO <*ASSERT r=0*> END;
      t := slots[me.slot];
    WITH r = Upthread.mutex_unlock(slotMu) DO <*ASSERT r=0*> END;
    IF (t.act # me) THEN Die(ThisLine(), "thread with bad slot!") END;
    RETURN t;
  END Self;

Is it just because of AssignSlots?  If so.. it's actually a very rare
event that there would ever be a conflict, no?  (Only when "slots" is
extended?)

Can data be stored in an "Activation"?  Not TRACED data, obviously, hmm...

     Mika


Tony Hosking writes:
>I suspect part of the overhead of allocation in the new code is the  
>need for thread-local allocation buffers, which means we need to  
>access thread-local state.  We really need an efficient way to do  
>that, but pthreads thread-local accesses may be what is killing you.
>
>On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>
>> Hi Tony,
>>
>> I figured you would chime in!
>>
>> Yes, @M3noincremental seems to make things consistently a tad bit
>> slower (but a very small difference), on both FreeBSD and Linux.
>> @M3nogc makes a bigger difference, of course.
>>
>> Unfortunately I seem to have lost the code that did a lot of memory
>> allocations.  My tricks (as described in the email---and others!)
>> have removed most of the troublesome memory allocations, but now
>> I'm stuck with the mutex instead...
>>
>>      Mika
>>
>> Tony Hosking writes:
>>> Have you tried running @M3noincremental?
>>>
>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> As I mentioned in an earlier email about printing structures (thanks
>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter  
>>>> Norvig's
>>>> JScheme for Java (well it was at first strongly based, but more and
>>>> more loosely, if you know what I mean...)
>>>>
>>>> I expected that the performance of the interpreter would be much
>>>> better in Modula-3 than in Java, and I have been testing on two
>>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3  
>>>> (getting
>>>> close to ten times as fast on some tasks at this point), but on
>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>
>>>> When I started, with code that was essentially equivalent to  
>>>> JScheme,
>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>> and garbage collection.  The speedup I have achieved between the
>>>> first implementation and now was due to the use of Modula-3  
>>>> constructs
>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>> to make small stacks rather than linked lists.  (I get readable
>>>> code with much fewer memory allocations and GC work.)
>>>>
>>>> Now, since this is an interpreter, I as the implementer have limited
>>>> control over how much memory is allocated and freed, and where it is
>>>> needed.  However, I can sometimes fall back on C-style memory
>>>> management,
>>>> but I would like to do it in a safe way.  For instance, I have
>>>> special-cased
>>>> evaluation of Scheme primitives, as follows.
>>>>
>>>> Under the "normal" implementation, a list of things to evaluate is
>>>> built up, passed to an evaluation function, and then the GC is left
>>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>>> just assume that what you put in is going to be dead right after
>>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>> it's unclear (in which case the problem is left up to the GC).
>>>>
>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>> list right after the eval.  Now of course I am not interested
>>>> in unsafe code, so what I do is this:
>>>>
>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>
>>>> VAR
>>>> mu := NEW(MUTEX);
>>>> free : Pair := NIL;
>>>>
>>>> PROCEDURE GetPair() : Pair =
>>>> BEGIN
>>>>   LOCK mu DO
>>>>     IF free # NIL THEN
>>>>       TRY
>>>>         RETURN free
>>>>       FINALLY
>>>>         free := free.rest
>>>>       END
>>>>     END
>>>>   END;
>>>>   RETURN NEW(Pair)
>>>> END GetPair;
>>>>
>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>> BEGIN
>>>>   cons.first := NIL;
>>>>   LOCK mu DO
>>>>     cons.rest := free;
>>>>     free := cons
>>>>   END
>>>> END ReturnPair;
>>>>
>>>> my eval code looks like
>>>>
>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>
>>>>  args := GetPair(); ...
>>>>  result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>
>>>>  IF okToFree THEN ReturnPair(args) END;
>>>>  RETURN result
>>>> END
>>>>
>>>> and this does work well.  In fact it speeds up the Linux
>>>> implementation
>>>> by almost 100% to recycle the lists like this *just* for the
>>>> evaluation of Scheme primitives.
>>>>
>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>> variable.  And yes, the time spent messing with the mutex is
>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>> (and that is coming!)
>>>>
>>>> So I'm thinking, what I really want is a structure that is attached
>>>> to my current Thread.T.  I want to be able to access just a single
>>>> pointer (like the free list) but be sure it is unique to my current
>>>> thread.  No locking would be necessary if I could do this.
>>>>
>>>> Does anyone have an elegant solution that does something like this?
>>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>>> for a lot of uses...  seems to me this should be a frequently
>>>> occurring problem?
>>>>
>>>>    Best regards,
>>>>      Mika
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


From mika at async.caltech.edu  Fri Oct 17 10:32:28 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 17 Oct 2008 01:32:28 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 07:35:03 BST."
	<0AB98AC8-EA86-4BD4-857F-CC0017E5FC32@cs.purdue.edu> 
Message-ID: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>

Ok I am sorry I am slow to pick up on this.

I take it the problem is actually the Upthread.getspecific routine,
which itself calls something get_curthread somewhere inside pthreads,
which in turn involves a context switch to the supervisor---the identity
of the current thread is just not accessible anywhere in user space.
Also explains why this program runs faster with my old PM3, which uses
longjmp threads.

The only way to avoid it (really) is to pass a pointer to the
Thread.T of the currently executing thread in the activation record
of *every* procedure, so that allocators can find it when necessary....
but that is very expensive in terms of stack memory.

Or I can just make a structure like that that I pass around where
I need it in my own program.  Thread-specific and user-managed.

I believe I have just answered all my own questions, but I hope
Tony will correct me if my answers are incorrect.

    Mika

Tony Hosking writes:
>I suspect part of the overhead of allocation in the new code is the  
>need for thread-local allocation buffers, which means we need to  
>access thread-local state.  We really need an efficient way to do  
>that, but pthreads thread-local accesses may be what is killing you.
>
>On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>
>> Hi Tony,
>>
>> I figured you would chime in!
>>
>> Yes, @M3noincremental seems to make things consistently a tad bit
>> slower (but a very small difference), on both FreeBSD and Linux.
>> @M3nogc makes a bigger difference, of course.
>>
>> Unfortunately I seem to have lost the code that did a lot of memory
>> allocations.  My tricks (as described in the email---and others!)
>> have removed most of the troublesome memory allocations, but now
>> I'm stuck with the mutex instead...
>>
>>      Mika
>>
>> Tony Hosking writes:
>>> Have you tried running @M3noincremental?
>>>
>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>
>>>> Hello Modula-3 people,
>>>>
>>>> As I mentioned in an earlier email about printing structures (thanks
>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter  
>>>> Norvig's
>>>> JScheme for Java (well it was at first strongly based, but more and
>>>> more loosely, if you know what I mean...)
>>>>
>>>> I expected that the performance of the interpreter would be much
>>>> better in Modula-3 than in Java, and I have been testing on two
>>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3  
>>>> (getting
>>>> close to ten times as fast on some tasks at this point), but on
>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>
>>>> When I started, with code that was essentially equivalent to  
>>>> JScheme,
>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>> and garbage collection.  The speedup I have achieved between the
>>>> first implementation and now was due to the use of Modula-3  
>>>> constructs
>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>> to make small stacks rather than linked lists.  (I get readable
>>>> code with much fewer memory allocations and GC work.)
>>>>
>>>> Now, since this is an interpreter, I as the implementer have limited
>>>> control over how much memory is allocated and freed, and where it is
>>>> needed.  However, I can sometimes fall back on C-style memory
>>>> management,
>>>> but I would like to do it in a safe way.  For instance, I have
>>>> special-cased
>>>> evaluation of Scheme primitives, as follows.
>>>>
>>>> Under the "normal" implementation, a list of things to evaluate is
>>>> built up, passed to an evaluation function, and then the GC is left
>>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>>> just assume that what you put in is going to be dead right after
>>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>> it's unclear (in which case the problem is left up to the GC).
>>>>
>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>> list right after the eval.  Now of course I am not interested
>>>> in unsafe code, so what I do is this:
>>>>
>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>
>>>> VAR
>>>> mu := NEW(MUTEX);
>>>> free : Pair := NIL;
>>>>
>>>> PROCEDURE GetPair() : Pair =
>>>> BEGIN
>>>>   LOCK mu DO
>>>>     IF free # NIL THEN
>>>>       TRY
>>>>         RETURN free
>>>>       FINALLY
>>>>         free := free.rest
>>>>       END
>>>>     END
>>>>   END;
>>>>   RETURN NEW(Pair)
>>>> END GetPair;
>>>>
>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>> BEGIN
>>>>   cons.first := NIL;
>>>>   LOCK mu DO
>>>>     cons.rest := free;
>>>>     free := cons
>>>>   END
>>>> END ReturnPair;
>>>>
>>>> my eval code looks like
>>>>
>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>
>>>>  args := GetPair(); ...
>>>>  result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>
>>>>  IF okToFree THEN ReturnPair(args) END;
>>>>  RETURN result
>>>> END
>>>>
>>>> and this does work well.  In fact it speeds up the Linux
>>>> implementation
>>>> by almost 100% to recycle the lists like this *just* for the
>>>> evaluation of Scheme primitives.
>>>>
>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>> variable.  And yes, the time spent messing with the mutex is
>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>> (and that is coming!)
>>>>
>>>> So I'm thinking, what I really want is a structure that is attached
>>>> to my current Thread.T.  I want to be able to access just a single
>>>> pointer (like the free list) but be sure it is unique to my current
>>>> thread.  No locking would be necessary if I could do this.
>>>>
>>>> Does anyone have an elegant solution that does something like this?
>>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>>> for a lot of uses...  seems to me this should be a frequently
>>>> occurring problem?
>>>>
>>>>    Best regards,
>>>>      Mika
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


From jay.krell at cornell.edu  Sat Oct 18 00:42:35 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 17 Oct 2008 22:42:35 +0000
Subject: [M3devel] M3 programming problem : GC efficiency /
	per-thread	storage areas?
In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
References: Your message of 
	<200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
Message-ID: <COL101-W48200DF8FB7269A7B2E229E6320@phx.gbl>


Right and wrong.

Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGetValue.
Wrong that this necessarily incurs a switch to the supervisor/kernel, and perhaps wrong to call that at a "context switch". It depends on the operating system.

I will explain.

On Windows/x86, the FS register points to a partly documented per-thread data structure.
C and C++ exception handling use FS:0.
Disassemble any code. You'll find it is used. Not by Modula-3 though.

Disassemble TlsGetValue.

 cdb /z %windir%\system32\kernel32.dll  

0:000> uf kernel32!TlsGetValue
kernel32!TlsGetValue:

 typical looking prolog.. 
7dd813e0 8bff            mov     edi,edi
7dd813e2 55              push    ebp
7dd813e3 8bec            mov     ebp,esp

 fs:18 contains a "normal" "linear" pointer to fs:0 
 Get that pointer. 
7dd813e5 64a118000000    mov     eax,dword ptr fs:[00000018h]

 get the index 
7dd813eb 8b4d08          mov     ecx,dword ptr [ebp+8]

 SetLastError(0) 
7dd813ee 83603400        and     dword ptr [eax+34h],0

  There are 64 preallocated thread local slots -- compare the index to 64. 
7dd813f2 83f940          cmp     ecx,40h   

  If it above or equal to 64, go use the non preallocated slots. 
7dd813f5 0f8353e20200    jae     kernel32!lstrcmpi+0x4b22 (7ddaf64e)

  preallocated slots are at fs:e10; get the data and done  
kernel32!TlsGetValue+0x1b:
7dd813fb 8b8488100e0000  mov     eax,dword ptr [eax+ecx*4+0E10h]

 epilog 

kernel32!TlsGetValue+0x22:
7dd81402 5d              pop     ebp
7dd81403 c20400          ret     4

 get here for indices>= 64
 compare index to 1088 == 1024 + 64, as there are another 1024 more slowly available slots  

kernel32!lstrcmpi+0x4b22:
7ddaf64e 81f940040000    cmp     ecx,440h

 if it is below 1024, go use those slots 

7ddaf654 7211            jb      kernel32!lstrcmpi+0x4b3b (7ddaf667)

 index is above or equal to 1024, SetLastError(invalid parameter) 

kernel32!lstrcmpi+0x4b2a:
7ddaf656 680d0000c0      push    0C000000Dh
7ddaf65b e80025fdff      call    kernel32!GetProcessHeap+0x12 (7dd81b60)

 and return 0 -- 0 is not unambiguously an error -- that's why last error was cleared at the start 

kernel32!lstrcmpi+0x4b34:
7ddaf660 33c0            xor     eax,eax
7ddaf662 e99b1dfdff      jmp     kernel32!TlsGetValue+0x22 (7dd81402)

 This is where the slots between 64 and 1088 are used. 
 Get pointer from FS:F94 and compare to null.
  If it is null, that is ok, it means nobody has yet calls TlsSetValue for this value,
  so it just retains its initial 0 value. 
kernel32!lstrcmpi+0x4b3b:
7ddaf667 8b80940f0000    mov     eax,dword ptr [eax+0F94h]
7ddaf66d 85c0            test    eax,eax
7ddaf66f 74ef            je      kernel32!lstrcmpi+0x4b34 (7ddaf660)

 Index is between 64 and 1088, and there is a non null pointer at FS:F94.
 Subtract 64 from index and index into pointer there. 
 Note it does the subtraction after the multiplication, so subtracts 64*4=0x100.

kernel32!lstrcmpi+0x4b45:
7ddaf671 8b848800ffffff  mov     eax,dword ptr [eax+ecx*4-100h]
7ddaf678 e9851dfdff      jmp     kernel32!TlsGetValue+0x22 (7dd81402)


So, it is a few instructions but there is no context switch into the kernel/supervisor.

Also, calls into the kernel aren't necessarily a "context switch".
Some context is saved, and a bit is twiddled in the processor to indicate a privilege level change, but no page tables are altered and I believe no TLBs (translation lookaside buffer) are invalidated, and no thread scheduling decisions are made -- though upon exit from the kernel, APCs (asynchronous procedure call) can be run -- on the calling thread. 

A more expensive context switch is when another thread or another process runs.
Switching threads requires saving more context, and switching processes requires changing the register that points to the page tables.
One detail there -- calling into the x86 NT kernel does not preserve floating point state -- that's the additional state that a thread switch has to save, at least. NT/x86 kernel drivers aren't allowed to use floating point, with some exception, like if they are video drivers (only certain functions?) or they explicitly save/restore the floating point registers using public functions.
I don't know about the other architectures. I think IA64 only preserves some floating point state, not all.


Now, the question then is how is Upthread.getspecific implemented on other archictures and operating systems.
We should look into that for various operating systems.


Oh, also, let's see what __declspec(thread) does.

>type t.c


__declspec(thread) int a;

void F1(int);

void F2() { F1(a); }

cl -c t.c

link -dump -disasm t.obj


Dump of file t.obj

File Type: COFF OBJECT

_F2:
  00000000: 55                 push        ebp
  00000001: 8B EC              mov         ebp,esp
  00000003: A1 00 00 00 00     mov         eax,dword ptr [__tls_index]
  00000008: 64 8B 0D 00 00 00  mov         ecx,dword ptr fs:[__tls_array]
            00
  0000000F: 8B 14 81           mov         edx,dword ptr [ecx+eax*4]
  00000012: 8B 82 00 00 00 00  mov         eax,dword ptr _a[edx]
  00000018: 50                 push        eax
  00000019: E8 00 00 00 00     call        _F1
  0000001E: 83 C4 04           add         esp,4
  00000021: 5D                 pop         ebp
  00000022: C3                 ret

See the compiler generated code reference FS directly.

The optimized version is:

Dump of file t.obj

File Type: COFF OBJECT

_F2:
  00000000: A1 00 00 00 00     mov         eax,dword ptr [__tls_index]
  00000005: 64 8B 0D 00 00 00  mov         ecx,dword ptr fs:[__tls_array]
            00
  0000000C: 8B 14 81           mov         edx,dword ptr [ecx+eax*4]
  0000000F: 8B 82 00 00 00 00  mov         eax,dword ptr _a[edx]
  00000015: 50                 push        eax
  00000016: E8 00 00 00 00     call        _F1
  0000001B: 59                 pop         ecx
  0000001C: C3                 ret

 - Jay


> To: hosking at cs.purdue.edu
> Date: Fri, 17 Oct 2008 01:32:28 -0700
> From: mika at async.caltech.edu
> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu
> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread storage areas?
>
> Ok I am sorry I am slow to pick up on this.
>
> I take it the problem is actually the Upthread.getspecific routine,
> which itself calls something get_curthread somewhere inside pthreads,
> which in turn involves a context switch to the supervisor---the identity
> of the current thread is just not accessible anywhere in user space.
> Also explains why this program runs faster with my old PM3, which uses
> longjmp threads.
>
> The only way to avoid it (really) is to pass a pointer to the
> Thread.T of the currently executing thread in the activation record
> of *every* procedure, so that allocators can find it when necessary....
> but that is very expensive in terms of stack memory.
>
> Or I can just make a structure like that that I pass around where
> I need it in my own program. Thread-specific and user-managed.
>
> I believe I have just answered all my own questions, but I hope
> Tony will correct me if my answers are incorrect.
>
> Mika
>
> Tony Hosking writes:
>>I suspect part of the overhead of allocation in the new code is the
>>need for thread-local allocation buffers, which means we need to
>>access thread-local state. We really need an efficient way to do
>>that, but pthreads thread-local accesses may be what is killing you.
>>
>>On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>>
>>> Hi Tony,
>>>
>>> I figured you would chime in!
>>>
>>> Yes, @M3noincremental seems to make things consistently a tad bit
>>> slower (but a very small difference), on both FreeBSD and Linux.
>>> @M3nogc makes a bigger difference, of course.
>>>
>>> Unfortunately I seem to have lost the code that did a lot of memory
>>> allocations. My tricks (as described in the email---and others!)
>>> have removed most of the troublesome memory allocations, but now
>>> I'm stuck with the mutex instead...
>>>
>>> Mika
>>>
>>> Tony Hosking writes:
>>>> Have you tried running @M3noincremental?
>>>>
>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> As I mentioned in an earlier email about printing structures (thanks
>>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>>> Modula-3. It's a Scheme interpreter, loosely based on Peter
>>>>> Norvig's
>>>>> JScheme for Java (well it was at first strongly based, but more and
>>>>> more loosely, if you know what I mean...)
>>>>>
>>>>> I expected that the performance of the interpreter would be much
>>>>> better in Modula-3 than in Java, and I have been testing on two
>>>>> different systems. One is my ancient FreeBSD-4.11 with an old PM3,
>>>>> and the other is CM3 on a recent Debian system. What I am finding
>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3
>>>>> (getting
>>>>> close to ten times as fast on some tasks at this point), but on
>>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>>
>>>>> When I started, with code that was essentially equivalent to
>>>>> JScheme,
>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to
>>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>>> and garbage collection. The speedup I have achieved between the
>>>>> first implementation and now was due to the use of Modula-3
>>>>> constructs
>>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>>> to make small stacks rather than linked lists. (I get readable
>>>>> code with much fewer memory allocations and GC work.)
>>>>>
>>>>> Now, since this is an interpreter, I as the implementer have limited
>>>>> control over how much memory is allocated and freed, and where it is
>>>>> needed. However, I can sometimes fall back on C-style memory
>>>>> management,
>>>>> but I would like to do it in a safe way. For instance, I have
>>>>> special-cased
>>>>> evaluation of Scheme primitives, as follows.
>>>>>
>>>>> Under the "normal" implementation, a list of things to evaluate is
>>>>> built up, passed to an evaluation function, and then the GC is left
>>>>> to sweep up the mess. The problem is that there are various tricky
>>>> routes by which references can escape the evaluator, so you can't
>>>>> just assume that what you put in is going to be dead right after
>>>>> an eval and free it. Instead, I set a flag in the evaluator, which
>>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>>> it's unclear (in which case the problem is left up to the GC).
>>>>>
>>>>> For the vast majority of Scheme primitives, one can indeed free the
>>>>> list right after the eval. Now of course I am not interested
>>>>> in unsafe code, so what I do is this:
>>>>>
>>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>>
>>>>> VAR
>>>>> mu := NEW(MUTEX);
>>>>> free : Pair := NIL;
>>>>>
>>>>> PROCEDURE GetPair() : Pair =
>>>>> BEGIN
>>>>> LOCK mu DO
>>>>> IF free # NIL THEN
>>>>> TRY
>>>>> RETURN free
>>>>> FINALLY
>>>>> free := free.rest
>>>>> END
>>>>> END
>>>>> END;
>>>>> RETURN NEW(Pair)
>>>>> END GetPair;
>>>>>
>>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>> BEGIN
>>>>> cons.first := NIL;
>>>>> LOCK mu DO
>>>>> cons.rest := free;
>>>>> free := cons
>>>>> END
>>>>> END ReturnPair;
>>>>>
>>>>> my eval code looks like
>>>>>
>>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>>
>>>>> args := GetPair(); ...
>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>>
>>>>> IF okToFree THEN ReturnPair(args) END;
>>>>> RETURN result
>>>>> END
>>>>>
>>>>> and this does work well. In fact it speeds up the Linux
>>>>> implementation
>>>>> by almost 100% to recycle the lists like this *just* for the
>>>>> evaluation of Scheme primitives.
>>>>>
>>>>> But it's still ugly, isn't it? There's a mutex, and a global
>>>>> variable. And yes, the time spent messing with the mutex is
>>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>>> (and that is coming!)
>>>>>
>>>>> So I'm thinking, what I really want is a structure that is attached
>>>>> to my current Thread.T. I want to be able to access just a single
>>>>> pointer (like the free list) but be sure it is unique to my current
>>>>> thread. No locking would be necessary if I could do this.
>>>>>
>>>>> Does anyone have an elegant solution that does something like this?
>>>>> Thread-specific "static" variables? Just one REFANY would be enough
>>>>> for a lot of uses... seems to me this should be a frequently
>>>>> occurring problem?
>>>>>
>>>>> Best regards,
>>>>> Mika
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>


From mika at async.caltech.edu  Sat Oct 18 01:00:28 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 17 Oct 2008 16:00:28 -0700
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: Your message of "Fri, 17 Oct 2008 22:42:35 -0000."
	<COL101-W48200DF8FB7269A7B2E229E6320@phx.gbl> 
Message-ID: <200810172300.m9HN0SfN008554@camembert.async.caltech.edu>


No, I didn't mean that it *necessarily* involves a context switch.
Obviously it doesn't, because the user-level threading doesn't
ever need to do a "kernel" context switch (but of course does its
own switching, however I don't see that it would need that to get 
or set a variable).

I just meant that looking at the (C) implementation of pthreads I
have (on FreeBSD), on that system, it does seem to, as the code in
question is marked as "kernel code".

In any case I think I have been able to solve my particular problem
by identifying a data structure that is inherently only accessed
from a single thread (in my program) and attaching my memory recycling
trickery to that particular structure.  I get very little memory
allocation/GC and no need for locks at all, which is precisely the
effect I was going for.

I am still a little bit concerned about the performance of CM3-generated
code but the main culprit appears to be TYPECASE/ISTYPE now, far
from garbage collectors and thread libraries.  I'll send an update
if I can find something egregiously inefficient.

    Mika

Jay writes:
>
>Right and wrong.
>
>Right Tony was referring to Upthread.getspecific. Or on Windows WinBase.TlsGet
>Value.
>Wrong that this necessarily incurs a switch to the supervisor/kernel, and perh
>aps wrong to call that at a "context switch". It depends on the operating syst
>em.
>
>I will explain.
>
>On Windows/x86, the FS register points to a partly documented per-thread data 
>structure.
>C and C++ exception handling use FS:0.
>Disassemble any code. You'll find it is used. Not by Modula-3 though.
>
>Disassemble TlsGetValue.
>
> cdb /z %windir%\system32\kernel32.dll  
>
>0:000> uf kernel32!TlsGetValue
>kernel32!TlsGetValue:
...


From mika at async.caltech.edu  Sat Oct 18 10:41:30 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Sat, 18 Oct 2008 01:41:30 -0700
Subject: [M3devel] Fortran
Message-ID: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu>


Ok now in the realm of crazy questions---and I apologize to those
whose inboxes I clog with some of my emails...

If there is anyone out there in Modula-3-ether who has ever written
or heard of ...

  an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs

... would he please make himself known to me?  (I have a Scheme
interpreter to trade...)

    Mika


From lemming at henning-thielemann.de  Sat Oct 18 17:34:50 2008
From: lemming at henning-thielemann.de (Henning Thielemann)
Date: Sat, 18 Oct 2008 17:34:50 +0200 (MEST)
Subject: [M3devel] Fortran
In-Reply-To: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu>
References: <200810180841.m9I8fUUT020989@camembert.async.caltech.edu>
Message-ID: <Pine.SOC.4.64.0810181646120.28054@haydn.informatik.uni-halle.de>


On Sat, 18 Oct 2008, Mika Nystrom wrote:

> Ok now in the realm of crazy questions---and I apologize to those
> whose inboxes I clog with some of my emails...
>
> If there is anyone out there in Modula-3-ether who has ever written
> or heard of ...
>
>  an automatic generator of Modula-3 INTERFACEs for FORTRAN-77 programs
>
> ... would he please make himself known to me?  (I have a Scheme
> interpreter to trade...)

I have written a program for generating Modula-3 interfaces for LAPACK 
(linear algebra routines) using m3coco. But I'm afraid that my Fortran 
parser works only for LAPACK and no other library. I have just copied the 
CVS files to
    http://modula3.elegosoft.com/cgi-bin/cvsweb.cgi/m3/pm3/language/parsing/m3coco/test/?cvsroot=PM3
   Before you check this out, I might move it to a different location, 
maybe cm3/m3-tools, if this is more appropriate. (Maybe you also need the 
revised m3coco version, which I only have on a branch, and never tried to 
merge it back to HEAD.)


While searching my own code in the net, I found some nice interviews with 
Luca Cardelli:
   http://www.wikio.com/technology/development/modula-3


From mika at async.caltech.edu  Tue Oct 21 13:05:01 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Tue, 21 Oct 2008 04:05:01 -0700
Subject: [M3devel] CM3 on Mac OS X Tiger
Message-ID: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu>

Hello everyone,

Sorry if I have asked this before---I feel I must have, and Tony
probably answered it, too, but I can't find it anywhere in my email
archives.

It looks like I finally upgraded my Mac to Tiger a half year ago,
and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
I am finally getting around to fixing it.  Now I am trying to
compile CM3 in accordance with Tony's instructions as of June 24, 2007:

(short quote here)
> cd ~/cm3-cvs
> mkdir boot
> cd boot
> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
> ./cminstall

Now you will have some kind of cm3 installed, presumably in /usr/
local/cm3/bin/cm3.

Make sure you have a fresh CVS checkout in directory cm3 (let's
assume this is in your home directory ~/cm3).  Also, make sure you
have an up-to-date version of the CM3 backend compiler cm3cg
installed by executing the following:

STEP 0:

export CM3=/usr/local/cm3/bin/cm3
cd ~/cm3/m3-sys/m3cc
$CM3
$CM3 -ship

You can skip this last step if you know your backend compiler is up
to date.

Now, let's build the new compiler from scratch (this is the sequence
I use regularly to test changes to the run-time system whenever I
make them):

STEP 1:

cd ~/cm3/m3-libs/m3core
$CM3
$CM3 -ship
(end short quote, there's much more)

What happens is that when building m3core, my compiler is building
it against the interfaces in /usr/local/cm3, NOT the interfaces
within m3core itself:

--- building in PPC_DARWIN ---

ignoring ../src/m3overrides

new source -> compiling RTCollector.m3
"../src/runtime/common/RTCollector.m3", line 2914: unknown qualification '.' (AMD64_LINUX)
"../src/runtime/common/RTCollector.m3", line 2915: unknown qualification '.' (SPARC32_LINUX)
"../src/runtime/common/RTCollector.m3", line 2916: unknown qualification '.' (SPARC64_OPENBSD)
"../src/runtime/common/RTCollector.m3", line 2917: unknown qualification '.' (PPC32_OPENBSD)
4 errors encountered
stale imports -> compiling RTDebug.m3

Fatal Error: bad version stamps: RTDebug.m3

version stamp mismatch: Compiler.Platform
  <df3c2b13d1d385ee> => RTDebug.m3
  <da77490d024222ef> => Compiler.i3  
version stamp mismatch: Compiler.ThisPlatform
  <8b5a6f513e082750> => RTDebug.m3
  <8e110d4fed998051> => Compiler.i3  

I feel like I should REALLY know the answer to this, but how do I 
get the compiler to use only the local sources and not attempt
to compile things with reference to the already-installed 
interfaces?

    Mika


From hosking at cs.purdue.edu  Tue Oct 21 13:21:36 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 12:21:36 +0100
Subject: [M3devel] CM3 on Mac OS X Tiger
In-Reply-To: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu>
References: <200810211105.m9LB51kQ007258@camembert.async.caltech.edu>
Message-ID: <27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu>

This is a phase ordering problem that arises when you use an old  
compiler to compile newer sources.  It really should be fixed  
somehow.  In any case, the problem is those lines in RTCollector at  
the bottom (I deleted them yesterday on the main trunk) that refer to  
values supposedly built in to the compiler (which are not there for  
the old binary you are using).  I think if you delete those lines then  
you should be OK.  Once you have a new compiler bootstrapped (with  
those configuration values available built in) then you should be able  
to compile that code (excepting that I just deleted those lines  
yesterday).


On 21 Oct 2008, at 12:05, Mika Nystrom wrote:

> Hello everyone,
>
> Sorry if I have asked this before---I feel I must have, and Tony
> probably answered it, too, but I can't find it anywhere in my email
> archives.
>
> It looks like I finally upgraded my Mac to Tiger a half year ago,
> and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
> I am finally getting around to fixing it.  Now I am trying to
> compile CM3 in accordance with Tony's instructions as of June 24,  
> 2007:
>
> (short quote here)
>> cd ~/cm3-cvs
>> mkdir boot
>> cd boot
>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
>> ./cminstall
>
> Now you will have some kind of cm3 installed, presumably in /usr/
> local/cm3/bin/cm3.
>
> Make sure you have a fresh CVS checkout in directory cm3 (let's
> assume this is in your home directory ~/cm3).  Also, make sure you
> have an up-to-date version of the CM3 backend compiler cm3cg
> installed by executing the following:
>
> STEP 0:
>
> export CM3=/usr/local/cm3/bin/cm3
> cd ~/cm3/m3-sys/m3cc
> $CM3
> $CM3 -ship
>
> You can skip this last step if you know your backend compiler is up
> to date.
>
> Now, let's build the new compiler from scratch (this is the sequence
> I use regularly to test changes to the run-time system whenever I
> make them):
>
> STEP 1:
>
> cd ~/cm3/m3-libs/m3core
> $CM3
> $CM3 -ship
> (end short quote, there's much more)
>
> What happens is that when building m3core, my compiler is building
> it against the interfaces in /usr/local/cm3, NOT the interfaces
> within m3core itself:
>
> --- building in PPC_DARWIN ---
>
> ignoring ../src/m3overrides
>
> new source -> compiling RTCollector.m3
> "../src/runtime/common/RTCollector.m3", line 2914: unknown  
> qualification '.' (AMD64_LINUX)
> "../src/runtime/common/RTCollector.m3", line 2915: unknown  
> qualification '.' (SPARC32_LINUX)
> "../src/runtime/common/RTCollector.m3", line 2916: unknown  
> qualification '.' (SPARC64_OPENBSD)
> "../src/runtime/common/RTCollector.m3", line 2917: unknown  
> qualification '.' (PPC32_OPENBSD)
> 4 errors encountered
> stale imports -> compiling RTDebug.m3
>
> Fatal Error: bad version stamps: RTDebug.m3
>
> version stamp mismatch: Compiler.Platform
>  <df3c2b13d1d385ee> => RTDebug.m3
>  <da77490d024222ef> => Compiler.i3
> version stamp mismatch: Compiler.ThisPlatform
>  <8b5a6f513e082750> => RTDebug.m3
>  <8e110d4fed998051> => Compiler.i3
>
> I feel like I should REALLY know the answer to this, but how do I
> get the compiler to use only the local sources and not attempt
> to compile things with reference to the already-installed
> interfaces?
>
>    Mika


From hosking at cs.purdue.edu  Tue Oct 21 16:54:58 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 15:54:58 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
Message-ID: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu>

I have one more question that I forgot to ask before.  Did you  
evaluate performance with -O3 optimization in the backend?

Generally, I have the following in my m3_backend specs so that turning  
on optimization results in -O3 (and lots of lovely inlining):

proc m3_backend (source, object, optimize, debug) is
   local args =
   [
     "-m32",
     "-quiet",
     source,
     "-o",
     object,
     % fPIC really is needed here, despite man gcc saying it is the  
default.
     % This is because man gcc is about Apple's gcc but m3cg is
     % built from FSF source.
     "-fPIC",
     "-fno-reorder-blocks"
   ]
   if optimize  args += "-O3"  end
   if debug     args += "-gstabs"  end
   if M3_PROFILING args += "-p" end
   return try_exec (m3back, args)
end


On 17 Oct 2008, at 09:32, Mika Nystrom wrote:

> Ok I am sorry I am slow to pick up on this.
>
> I take it the problem is actually the Upthread.getspecific routine,
> which itself calls something get_curthread somewhere inside pthreads,
> which in turn involves a context switch to the supervisor---the  
> identity
> of the current thread is just not accessible anywhere in user space.
> Also explains why this program runs faster with my old PM3, which uses
> longjmp threads.
>
> The only way to avoid it (really) is to pass a pointer to the
> Thread.T of the currently executing thread in the activation record
> of *every* procedure, so that allocators can find it when  
> necessary....
> but that is very expensive in terms of stack memory.
>
> Or I can just make a structure like that that I pass around where
> I need it in my own program.  Thread-specific and user-managed.
>
> I believe I have just answered all my own questions, but I hope
> Tony will correct me if my answers are incorrect.
>
>    Mika
>
> Tony Hosking writes:
>> I suspect part of the overhead of allocation in the new code is the
>> need for thread-local allocation buffers, which means we need to
>> access thread-local state.  We really need an efficient way to do
>> that, but pthreads thread-local accesses may be what is killing you.
>>
>> On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>>
>>> Hi Tony,
>>>
>>> I figured you would chime in!
>>>
>>> Yes, @M3noincremental seems to make things consistently a tad bit
>>> slower (but a very small difference), on both FreeBSD and Linux.
>>> @M3nogc makes a bigger difference, of course.
>>>
>>> Unfortunately I seem to have lost the code that did a lot of memory
>>> allocations.  My tricks (as described in the email---and others!)
>>> have removed most of the troublesome memory allocations, but now
>>> I'm stuck with the mutex instead...
>>>
>>>     Mika
>>>
>>> Tony Hosking writes:
>>>> Have you tried running @M3noincremental?
>>>>
>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>>
>>>>> Hello Modula-3 people,
>>>>>
>>>>> As I mentioned in an earlier email about printing structures  
>>>>> (thanks
>>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter
>>>>> Norvig's
>>>>> JScheme for Java (well it was at first strongly based, but more  
>>>>> and
>>>>> more loosely, if you know what I mean...)
>>>>>
>>>>> I expected that the performance of the interpreter would be much
>>>>> better in Modula-3 than in Java, and I have been testing on two
>>>>> different systems.  One is my ancient FreeBSD-4.11 with an old  
>>>>> PM3,
>>>>> and the other is CM3 on a recent Debian system.  What I am finding
>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3
>>>>> (getting
>>>>> close to ten times as fast on some tasks at this point), but on
>>>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>>>
>>>>> When I started, with code that was essentially equivalent to
>>>>> JScheme,
>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>>> and garbage collection.  The speedup I have achieved between the
>>>>> first implementation and now was due to the use of Modula-3
>>>>> constructs
>>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>>> to make small stacks rather than linked lists.  (I get readable
>>>>> code with much fewer memory allocations and GC work.)
>>>>>
>>>>> Now, since this is an interpreter, I as the implementer have  
>>>>> limited
>>>>> control over how much memory is allocated and freed, and where  
>>>>> it is
>>>>> needed.  However, I can sometimes fall back on C-style memory
>>>>> management,
>>>>> but I would like to do it in a safe way.  For instance, I have
>>>>> special-cased
>>>>> evaluation of Scheme primitives, as follows.
>>>>>
>>>>> Under the "normal" implementation, a list of things to evaluate is
>>>>> built up, passed to an evaluation function, and then the GC is  
>>>>> left
>>>>> to sweep up the mess.  The problem is that there are various  
>>>>> tricky
>>>> routes by which references can escape the evaluator, so you can't
>>>>> just assume that what you put in is going to be dead right after
>>>>> an eval and free it.  Instead, I set a flag in the evaluator,  
>>>>> which
>>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>>> it's unclear (in which case the problem is left up to the GC).
>>>>>
>>>>> For the vast majority of Scheme primitives, one can indeed free  
>>>>> the
>>>>> list right after the eval.  Now of course I am not interested
>>>>> in unsafe code, so what I do is this:
>>>>>
>>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>>
>>>>> VAR
>>>>> mu := NEW(MUTEX);
>>>>> free : Pair := NIL;
>>>>>
>>>>> PROCEDURE GetPair() : Pair =
>>>>> BEGIN
>>>>>  LOCK mu DO
>>>>>    IF free # NIL THEN
>>>>>      TRY
>>>>>        RETURN free
>>>>>      FINALLY
>>>>>        free := free.rest
>>>>>      END
>>>>>    END
>>>>>  END;
>>>>>  RETURN NEW(Pair)
>>>>> END GetPair;
>>>>>
>>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>> BEGIN
>>>>>  cons.first := NIL;
>>>>>  LOCK mu DO
>>>>>    cons.rest := free;
>>>>>    free := cons
>>>>>  END
>>>>> END ReturnPair;
>>>>>
>>>>> my eval code looks like
>>>>>
>>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>>
>>>>> args := GetPair(); ...
>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>>
>>>>> IF okToFree THEN ReturnPair(args) END;
>>>>> RETURN result
>>>>> END
>>>>>
>>>>> and this does work well.  In fact it speeds up the Linux
>>>>> implementation
>>>>> by almost 100% to recycle the lists like this *just* for the
>>>>> evaluation of Scheme primitives.
>>>>>
>>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>>> variable.  And yes, the time spent messing with the mutex is
>>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>>> (and that is coming!)
>>>>>
>>>>> So I'm thinking, what I really want is a structure that is  
>>>>> attached
>>>>> to my current Thread.T.  I want to be able to access just a single
>>>>> pointer (like the free list) but be sure it is unique to my  
>>>>> current
>>>>> thread.  No locking would be necessary if I could do this.
>>>>>
>>>>> Does anyone have an elegant solution that does something like  
>>>>> this?
>>>>> Thread-specific "static" variables?  Just one REFANY would be  
>>>>> enough
>>>>> for a lot of uses...  seems to me this should be a frequently
>>>>> occurring problem?
>>>>>
>>>>>   Best regards,
>>>>>     Mika
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>


From hosking at cs.purdue.edu  Tue Oct 21 17:17:24 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 16:17:24 +0100
Subject: [M3devel] M3 programming problem : GC efficiency / per-thread
	storage areas?
In-Reply-To: <34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu>
References: <200810170832.m9H8WSYH088831@camembert.async.caltech.edu>
	<34B39608-5C68-4C4C-B3DC-03F74844D434@cs.purdue.edu>
Message-ID: <1396C14A-B23D-4D19-804B-B1627B44106F@cs.purdue.edu>

Also, turn off assertions.

On 21 Oct 2008, at 15:54, Tony Hosking wrote:

> I have one more question that I forgot to ask before.  Did you  
> evaluate performance with -O3 optimization in the backend?
>
> Generally, I have the following in my m3_backend specs so that  
> turning on optimization results in -O3 (and lots of lovely inlining):
>
> proc m3_backend (source, object, optimize, debug) is
>  local args =
>  [
>    "-m32",
>    "-quiet",
>    source,
>    "-o",
>    object,
>    % fPIC really is needed here, despite man gcc saying it is the  
> default.
>    % This is because man gcc is about Apple's gcc but m3cg is
>    % built from FSF source.
>    "-fPIC",
>    "-fno-reorder-blocks"
>  ]
>  if optimize  args += "-O3"  end
>  if debug     args += "-gstabs"  end
>  if M3_PROFILING args += "-p" end
>  return try_exec (m3back, args)
> end
>
>
> On 17 Oct 2008, at 09:32, Mika Nystrom wrote:
>
>> Ok I am sorry I am slow to pick up on this.
>>
>> I take it the problem is actually the Upthread.getspecific routine,
>> which itself calls something get_curthread somewhere inside pthreads,
>> which in turn involves a context switch to the supervisor---the  
>> identity
>> of the current thread is just not accessible anywhere in user space.
>> Also explains why this program runs faster with my old PM3, which  
>> uses
>> longjmp threads.
>>
>> The only way to avoid it (really) is to pass a pointer to the
>> Thread.T of the currently executing thread in the activation record
>> of *every* procedure, so that allocators can find it when  
>> necessary....
>> but that is very expensive in terms of stack memory.
>>
>> Or I can just make a structure like that that I pass around where
>> I need it in my own program.  Thread-specific and user-managed.
>>
>> I believe I have just answered all my own questions, but I hope
>> Tony will correct me if my answers are incorrect.
>>
>>   Mika
>>
>> Tony Hosking writes:
>>> I suspect part of the overhead of allocation in the new code is the
>>> need for thread-local allocation buffers, which means we need to
>>> access thread-local state.  We really need an efficient way to do
>>> that, but pthreads thread-local accesses may be what is killing you.
>>>
>>> On 17 Oct 2008, at 00:30, Mika Nystrom wrote:
>>>
>>>> Hi Tony,
>>>>
>>>> I figured you would chime in!
>>>>
>>>> Yes, @M3noincremental seems to make things consistently a tad bit
>>>> slower (but a very small difference), on both FreeBSD and Linux.
>>>> @M3nogc makes a bigger difference, of course.
>>>>
>>>> Unfortunately I seem to have lost the code that did a lot of memory
>>>> allocations.  My tricks (as described in the email---and others!)
>>>> have removed most of the troublesome memory allocations, but now
>>>> I'm stuck with the mutex instead...
>>>>
>>>>    Mika
>>>>
>>>> Tony Hosking writes:
>>>>> Have you tried running @M3noincremental?
>>>>>
>>>>> On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>>>>
>>>>>> Hello Modula-3 people,
>>>>>>
>>>>>> As I mentioned in an earlier email about printing structures  
>>>>>> (thanks
>>>>>> Darko), I'm in the midst of coding an interpreter embedded in
>>>>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter
>>>>>> Norvig's
>>>>>> JScheme for Java (well it was at first strongly based, but more  
>>>>>> and
>>>>>> more loosely, if you know what I mean...)
>>>>>>
>>>>>> I expected that the performance of the interpreter would be much
>>>>>> better in Modula-3 than in Java, and I have been testing on two
>>>>>> different systems.  One is my ancient FreeBSD-4.11 with an old  
>>>>>> PM3,
>>>>>> and the other is CM3 on a recent Debian system.  What I am  
>>>>>> finding
>>>>>> is that it is indeed much faster than JScheme on FreeBSD/PM3
>>>>>> (getting
>>>>>> close to ten times as fast on some tasks at this point), but on
>>>>>> Linux/CM3 it is much closer in speed to JScheme than I would  
>>>>>> like.
>>>>>>
>>>>>> When I started, with code that was essentially equivalent to
>>>>>> JScheme,
>>>>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>>>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>>>>> spend most of its time in (surprise, surprise!) memory allocation
>>>>>> and garbage collection.  The speedup I have achieved between the
>>>>>> first implementation and now was due to the use of Modula-3
>>>>>> constructs
>>>>>> that are superior to Java's, such as the use of arrays of RECORDs
>>>>>> to make small stacks rather than linked lists.  (I get readable
>>>>>> code with much fewer memory allocations and GC work.)
>>>>>>
>>>>>> Now, since this is an interpreter, I as the implementer have  
>>>>>> limited
>>>>>> control over how much memory is allocated and freed, and where  
>>>>>> it is
>>>>>> needed.  However, I can sometimes fall back on C-style memory
>>>>>> management,
>>>>>> but I would like to do it in a safe way.  For instance, I have
>>>>>> special-cased
>>>>>> evaluation of Scheme primitives, as follows.
>>>>>>
>>>>>> Under the "normal" implementation, a list of things to evaluate  
>>>>>> is
>>>>>> built up, passed to an evaluation function, and then the GC is  
>>>>>> left
>>>>>> to sweep up the mess.  The problem is that there are various  
>>>>>> tricky
>>>>> routes by which references can escape the evaluator, so you can't
>>>>>> just assume that what you put in is going to be dead right after
>>>>>> an eval and free it.  Instead, I set a flag in the evaluator,  
>>>>>> which
>>>>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>>>>> it's unclear (in which case the problem is left up to the GC).
>>>>>>
>>>>>> For the vast majority of Scheme primitives, one can indeed free  
>>>>>> the
>>>>>> list right after the eval.  Now of course I am not interested
>>>>>> in unsafe code, so what I do is this:
>>>>>>
>>>>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>>>>
>>>>>> VAR
>>>>>> mu := NEW(MUTEX);
>>>>>> free : Pair := NIL;
>>>>>>
>>>>>> PROCEDURE GetPair() : Pair =
>>>>>> BEGIN
>>>>>> LOCK mu DO
>>>>>>   IF free # NIL THEN
>>>>>>     TRY
>>>>>>       RETURN free
>>>>>>     FINALLY
>>>>>>       free := free.rest
>>>>>>     END
>>>>>>   END
>>>>>> END;
>>>>>> RETURN NEW(Pair)
>>>>>> END GetPair;
>>>>>>
>>>>>> PROCEDURE ReturnPair(cons : Pair) =
>>>>>> BEGIN
>>>>>> cons.first := NIL;
>>>>>> LOCK mu DO
>>>>>>   cons.rest := free;
>>>>>>   free := cons
>>>>>> END
>>>>>> END ReturnPair;
>>>>>>
>>>>>> my eval code looks like
>>>>>>
>>>>>> VAR okToFree : BOOLEAN; BEGIN
>>>>>>
>>>>>> args := GetPair(); ...
>>>>>> result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>>>>
>>>>>> IF okToFree THEN ReturnPair(args) END;
>>>>>> RETURN result
>>>>>> END
>>>>>>
>>>>>> and this does work well.  In fact it speeds up the Linux
>>>>>> implementation
>>>>>> by almost 100% to recycle the lists like this *just* for the
>>>>>> evaluation of Scheme primitives.
>>>>>>
>>>>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>>>>> variable.  And yes, the time spent messing with the mutex is
>>>>>> noticeable, and I haven't even made the code multi-threaded yet
>>>>>> (and that is coming!)
>>>>>>
>>>>>> So I'm thinking, what I really want is a structure that is  
>>>>>> attached
>>>>>> to my current Thread.T.  I want to be able to access just a  
>>>>>> single
>>>>>> pointer (like the free list) but be sure it is unique to my  
>>>>>> current
>>>>>> thread.  No locking would be necessary if I could do this.
>>>>>>
>>>>>> Does anyone have an elegant solution that does something like  
>>>>>> this?
>>>>>> Thread-specific "static" variables?  Just one REFANY would be  
>>>>>> enough
>>>>>> for a lot of uses...  seems to me this should be a frequently
>>>>>> occurring problem?
>>>>>>
>>>>>>  Best regards,
>>>>>>    Mika
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>


From mika at async.caltech.edu  Tue Oct 21 22:18:07 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Tue, 21 Oct 2008 13:18:07 -0700
Subject: [M3devel] CM3 on Mac OS X Tiger
In-Reply-To: Your message of "Tue, 21 Oct 2008 12:21:36 BST."
	<27E24B62-7D71-43D0-988D-74EAB9E88C81@cs.purdue.edu> 
Message-ID: <200810212018.m9LKI81o019865@camembert.async.caltech.edu>

Hi Tony,

Thanks for helping, as usual!

I ran into this now, is this also a bootstrapping problem?  (Moving
on to building libm3, cleared out existing PPC_DARWIN, have rebuilt
m3cc... only see a single version of Compiler.i3 anywhere...)

Here's the log:

[lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship
--- building in PPC_DARWIN ---

ignoring ../src/m3overrides

new source -> compiling Atom.i3
new source -> compiling AtomList.i3
new source -> compiling OSError.i3
new source -> compiling File.i3
new source -> compiling RegularFile.i3
new source -> compiling Pipe.i3
new source -> compiling TextSeq.i3
new source -> compiling Pathname.i3
new source -> compiling FS.i3
new source -> compiling Process.i3
new source -> compiling Socket.i3
new source -> compiling Terminal.i3
new source -> compiling FS.m3
new source -> compiling Terminal.m3
new source -> compiling RegularFile.m3
new source -> compiling Pipe.m3
new source -> compiling Socket.m3
new source -> compiling OSConfig.i3
new source -> compiling OSErrorPosix.i3
new source -> compiling Fmt.i3
new source -> compiling OSErrorPosix.m3
new source -> compiling FilePosix.i3
new source -> compiling FilePosix.m3
new source -> compiling FSPosix.m3
new source -> compiling PipePosix.m3
new source -> compiling PathnamePosix.m3
new source -> compiling SocketPosix.m3

Fatal Error: bad version stamps: SocketPosix.m3

version stamp mismatch: Compiler.Platform
  <df3c2b13d1d385ee> => SocketPosix.m3
  <da77490d024222ef> => Compiler.i3  
version stamp mismatch: Compiler.ThisPlatform
  <8b5a6f513e082750> => SocketPosix.m3
  <8e110d4fed998051> => Compiler.i3  
[lapdog:~/cm3/m3-libs/libm3] mika% 

Tony Hosking writes:
>This is a phase ordering problem that arises when you use an old  
>compiler to compile newer sources.  It really should be fixed  
>somehow.  In any case, the problem is those lines in RTCollector at  
>the bottom (I deleted them yesterday on the main trunk) that refer to  
>values supposedly built in to the compiler (which are not there for  
>the old binary you are using).  I think if you delete those lines then  
>you should be OK.  Once you have a new compiler bootstrapped (with  
>those configuration values available built in) then you should be able  
>to compile that code (excepting that I just deleted those lines  
>yesterday).
>
>
>On 21 Oct 2008, at 12:05, Mika Nystrom wrote:
>
>> Hello everyone,
>>
>> Sorry if I have asked this before---I feel I must have, and Tony
>> probably answered it, too, but I can't find it anywhere in my email
>> archives.
>>
>> It looks like I finally upgraded my Mac to Tiger a half year ago,
>> and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
>> I am finally getting around to fixing it.  Now I am trying to
>> compile CM3 in accordance with Tony's instructions as of June 24,  
>> 2007:
>>
>> (short quote here)
>>> cd ~/cm3-cvs
>>> mkdir boot
>>> cd boot
>>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
>>> ./cminstall
>>
>> Now you will have some kind of cm3 installed, presumably in /usr/
>> local/cm3/bin/cm3.
>>
>> Make sure you have a fresh CVS checkout in directory cm3 (let's
>> assume this is in your home directory ~/cm3).  Also, make sure you
>> have an up-to-date version of the CM3 backend compiler cm3cg
>> installed by executing the following:
>>
>> STEP 0:
>>
>> export CM3=/usr/local/cm3/bin/cm3
>> cd ~/cm3/m3-sys/m3cc
>> $CM3
>> $CM3 -ship
>>
>> You can skip this last step if you know your backend compiler is up
>> to date.
>>
>> Now, let's build the new compiler from scratch (this is the sequence
>> I use regularly to test changes to the run-time system whenever I
>> make them):
>>
>> STEP 1:
>>
>> cd ~/cm3/m3-libs/m3core
>> $CM3
>> $CM3 -ship
>> (end short quote, there's much more)
>>
>> What happens is that when building m3core, my compiler is building
>> it against the interfaces in /usr/local/cm3, NOT the interfaces
>> within m3core itself:
>>
>> --- building in PPC_DARWIN ---
>>
>> ignoring ../src/m3overrides
>>
>> new source -> compiling RTCollector.m3
>> "../src/runtime/common/RTCollector.m3", line 2914: unknown  
>> qualification '.' (AMD64_LINUX)
>> "../src/runtime/common/RTCollector.m3", line 2915: unknown  
>> qualification '.' (SPARC32_LINUX)
>> "../src/runtime/common/RTCollector.m3", line 2916: unknown  
>> qualification '.' (SPARC64_OPENBSD)
>> "../src/runtime/common/RTCollector.m3", line 2917: unknown  
>> qualification '.' (PPC32_OPENBSD)
>> 4 errors encountered
>> stale imports -> compiling RTDebug.m3
>>
>> Fatal Error: bad version stamps: RTDebug.m3
>>
>> version stamp mismatch: Compiler.Platform
>>  <df3c2b13d1d385ee> => RTDebug.m3
>>  <da77490d024222ef> => Compiler.i3
>> version stamp mismatch: Compiler.ThisPlatform
>>  <8b5a6f513e082750> => RTDebug.m3
>>  <8e110d4fed998051> => Compiler.i3
>>
>> I feel like I should REALLY know the answer to this, but how do I
>> get the compiler to use only the local sources and not attempt
>> to compile things with reference to the already-installed
>> interfaces?
>>
>>    Mika


From hosking at cs.purdue.edu  Tue Oct 21 23:29:07 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Tue, 21 Oct 2008 22:29:07 +0100
Subject: [M3devel] CM3 on Mac OS X Tiger
In-Reply-To: <200810212018.m9LKI81o019865@camembert.async.caltech.edu>
References: <200810212018.m9LKI81o019865@camembert.async.caltech.edu>
Message-ID: <BF077330-03E9-45CB-8F30-27066330331B@cs.purdue.edu>

Hmm.  Not sure.  Looks like it.

On 21 Oct 2008, at 21:18, Mika Nystrom wrote:

> Hi Tony,
>
> Thanks for helping, as usual!
>
> I ran into this now, is this also a bootstrapping problem?  (Moving
> on to building libm3, cleared out existing PPC_DARWIN, have rebuilt
> m3cc... only see a single version of Compiler.i3 anywhere...)
>
> Here's the log:
>
> [lapdog:~/cm3/m3-libs/libm3] mika% $CM3 && $CM3 -ship
> --- building in PPC_DARWIN ---
>
> ignoring ../src/m3overrides
>
> new source -> compiling Atom.i3
> new source -> compiling AtomList.i3
> new source -> compiling OSError.i3
> new source -> compiling File.i3
> new source -> compiling RegularFile.i3
> new source -> compiling Pipe.i3
> new source -> compiling TextSeq.i3
> new source -> compiling Pathname.i3
> new source -> compiling FS.i3
> new source -> compiling Process.i3
> new source -> compiling Socket.i3
> new source -> compiling Terminal.i3
> new source -> compiling FS.m3
> new source -> compiling Terminal.m3
> new source -> compiling RegularFile.m3
> new source -> compiling Pipe.m3
> new source -> compiling Socket.m3
> new source -> compiling OSConfig.i3
> new source -> compiling OSErrorPosix.i3
> new source -> compiling Fmt.i3
> new source -> compiling OSErrorPosix.m3
> new source -> compiling FilePosix.i3
> new source -> compiling FilePosix.m3
> new source -> compiling FSPosix.m3
> new source -> compiling PipePosix.m3
> new source -> compiling PathnamePosix.m3
> new source -> compiling SocketPosix.m3
>
> Fatal Error: bad version stamps: SocketPosix.m3
>
> version stamp mismatch: Compiler.Platform
>  <df3c2b13d1d385ee> => SocketPosix.m3
>  <da77490d024222ef> => Compiler.i3
> version stamp mismatch: Compiler.ThisPlatform
>  <8b5a6f513e082750> => SocketPosix.m3
>  <8e110d4fed998051> => Compiler.i3
> [lapdog:~/cm3/m3-libs/libm3] mika%
>
> Tony Hosking writes:
>> This is a phase ordering problem that arises when you use an old
>> compiler to compile newer sources.  It really should be fixed
>> somehow.  In any case, the problem is those lines in RTCollector at
>> the bottom (I deleted them yesterday on the main trunk) that refer to
>> values supposedly built in to the compiler (which are not there for
>> the old binary you are using).  I think if you delete those lines  
>> then
>> you should be OK.  Once you have a new compiler bootstrapped (with
>> those configuration values available built in) then you should be  
>> able
>> to compile that code (excepting that I just deleted those lines
>> yesterday).
>>
>>
>> On 21 Oct 2008, at 12:05, Mika Nystrom wrote:
>>
>>> Hello everyone,
>>>
>>> Sorry if I have asked this before---I feel I must have, and Tony
>>> probably answered it, too, but I can't find it anywhere in my email
>>> archives.
>>>
>>> It looks like I finally upgraded my Mac to Tiger a half year ago,
>>> and everything broke.  (Modula-3, emacs, make, etc etc etc etc.)
>>> I am finally getting around to fixing it.  Now I am trying to
>>> compile CM3 in accordance with Tony's instructions as of June 24,
>>> 2007:
>>>
>>> (short quote here)
>>>> cd ~/cm3-cvs
>>>> mkdir boot
>>>> cd boot
>>>> tar xzvf ../cm3-min-POSIX-FreeBSD4-d5.3.1-2005-10-05.tgz
>>>> ./cminstall
>>>
>>> Now you will have some kind of cm3 installed, presumably in /usr/
>>> local/cm3/bin/cm3.
>>>
>>> Make sure you have a fresh CVS checkout in directory cm3 (let's
>>> assume this is in your home directory ~/cm3).  Also, make sure you
>>> have an up-to-date version of the CM3 backend compiler cm3cg
>>> installed by executing the following:
>>>
>>> STEP 0:
>>>
>>> export CM3=/usr/local/cm3/bin/cm3
>>> cd ~/cm3/m3-sys/m3cc
>>> $CM3
>>> $CM3 -ship
>>>
>>> You can skip this last step if you know your backend compiler is up
>>> to date.
>>>
>>> Now, let's build the new compiler from scratch (this is the sequence
>>> I use regularly to test changes to the run-time system whenever I
>>> make them):
>>>
>>> STEP 1:
>>>
>>> cd ~/cm3/m3-libs/m3core
>>> $CM3
>>> $CM3 -ship
>>> (end short quote, there's much more)
>>>
>>> What happens is that when building m3core, my compiler is building
>>> it against the interfaces in /usr/local/cm3, NOT the interfaces
>>> within m3core itself:
>>>
>>> --- building in PPC_DARWIN ---
>>>
>>> ignoring ../src/m3overrides
>>>
>>> new source -> compiling RTCollector.m3
>>> "../src/runtime/common/RTCollector.m3", line 2914: unknown
>>> qualification '.' (AMD64_LINUX)
>>> "../src/runtime/common/RTCollector.m3", line 2915: unknown
>>> qualification '.' (SPARC32_LINUX)
>>> "../src/runtime/common/RTCollector.m3", line 2916: unknown
>>> qualification '.' (SPARC64_OPENBSD)
>>> "../src/runtime/common/RTCollector.m3", line 2917: unknown
>>> qualification '.' (PPC32_OPENBSD)
>>> 4 errors encountered
>>> stale imports -> compiling RTDebug.m3
>>>
>>> Fatal Error: bad version stamps: RTDebug.m3
>>>
>>> version stamp mismatch: Compiler.Platform
>>> <df3c2b13d1d385ee> => RTDebug.m3
>>> <da77490d024222ef> => Compiler.i3
>>> version stamp mismatch: Compiler.ThisPlatform
>>> <8b5a6f513e082750> => RTDebug.m3
>>> <8e110d4fed998051> => Compiler.i3
>>>
>>> I feel like I should REALLY know the answer to this, but how do I
>>> get the compiler to use only the local sources and not attempt
>>> to compile things with reference to the already-installed
>>> interfaces?
>>>
>>>   Mika


From mika at async.caltech.edu  Thu Oct 23 10:24:53 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 23 Oct 2008 01:24:53 -0700
Subject: [M3devel] NEW in RTType.m3
Message-ID: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu>

Hello Modula-3 people,

Does anyone know whether there is anything that prevents using NEW
in RTType.m3?

I added a lot of memory recycling to the Scheme interpreter I am
working on, and now it seems it is spending a lot of time in Typecase
and IsSubtype.  I was wondering if it is possible to memoize IsSubtype
inside RTType.m3...  (specifically just replacing IsSubtype with an
array lookup).  

It is the nature of the interpreter that it spends a lot of time
checking types and narrowing things back and forth, as Scheme and
Modula-3 references share the same representation.

      Mika


From hosking at cs.purdue.edu  Thu Oct 23 12:10:01 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Thu, 23 Oct 2008 11:10:01 +0100
Subject: [M3devel] NEW in RTType.m3
In-Reply-To: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu>
References: <200810230825.m9N8OrAl067794@camembert.async.caltech.edu>
Message-ID: <7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu>

Could be dangerous depending on module link orderings.  Might be  
better to cache your own lookups in your interpreter.

On 23 Oct 2008, at 09:24, Mika Nystrom wrote:

> Hello Modula-3 people,
>
> Does anyone know whether there is anything that prevents using NEW
> in RTType.m3?
>
> I added a lot of memory recycling to the Scheme interpreter I am
> working on, and now it seems it is spending a lot of time in Typecase
> and IsSubtype.  I was wondering if it is possible to memoize IsSubtype
> inside RTType.m3...  (specifically just replacing IsSubtype with an
> array lookup).
>
> It is the nature of the interpreter that it spends a lot of time
> checking types and narrowing things back and forth, as Scheme and
> Modula-3 references share the same representation.
>
>      Mika


From mika at async.caltech.edu  Thu Oct 23 19:29:50 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Thu, 23 Oct 2008 10:29:50 -0700
Subject: [M3devel] NEW in RTType.m3
In-Reply-To: Your message of "Thu, 23 Oct 2008 11:10:01 BST."
	<7E3C53E3-9863-4377-802C-D71560ACD6F0@cs.purdue.edu> 
Message-ID: <200810231729.m9NHToMC080136@camembert.async.caltech.edu>


Well I'm not calling Typecase and IsSubtype directly---the compiler
is inserting the calls.

Here's an example of my code:

170           IF x # NIL AND ISTYPE(x,Symbol) THEN
171             RETURN env.lookup(x)
172           ELSIF x = NIL OR NOT ISTYPE(x,Pair) THEN 
173             RETURN x
174           ELSE

this code actually winds up in here (RTType.m3):

PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN =
  VAR t: RT0.TypeDefn;
  BEGIN
    IF (a = RT0.NilTypecode) THEN RETURN TRUE END;
    t := Get (a);
    IF (t = NIL) THEN RETURN FALSE; END;
    IF (t.typecode = b) THEN RETURN TRUE END;
    WHILE (t.kind = ORD (TK.Obj)) DO
      IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END;
      t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent;
      IF (t = NIL) THEN RETURN FALSE; END;
      IF (t.typecode = b) THEN RETURN TRUE; END;
    END;
    IF (t.traced # 0)
      THEN RETURN (b = RT0.RefanyTypecode);
      ELSE RETURN (b = RT0.AddressTypecode);
    END;
  END IsSubtype;

Again this is an example of something where the CM3 code seems to
be hurting more than PM3, but it could be that for some reason I
have more visibility into the CM3 code, or that there's an optimization
difference (I haven't been able to investigate this fully yet).  In
any case, it's clear that if IsSubtype could be replaced with a
table lookup, this kind of code would be accelerated by potentially
a lot.

Note that while in the above example the code might be accelerated
by (in my opinion, less clear) use of TYPECODE (since I never subtype
Symbol or Pair---for now!), this is not so for some NARROWs.  The
NARROWs also wind up calling RTType.IsSubtype, and they arise because
I have types that depend on each other, and unless I want to introduce
extra complexity (new partial revelations) or stick everything in
the same interface, I am forced to NARROW something to avoid a
circular dependency of interfaces...  A method of A.T takes a B.T
and a method of B.T takes an A.T, so I make a supertype X.T s.t.
A.T <: X.T ; then I can declare B.T.m to take an X.T and NARROW it
to A.T within B.T.m... triggering a call to the above code.  (For
simplicity's sake, X.T could be REFANY or ROOT.)  An attempt to
declare B.T.m as taking A.T would lead to a circular dependency
between A and B.  The code is really rather simple and it's a shame
if you have to make it look much more complicated to avoid issues
like these which might equally well be solved by tweaking the runtime
implementation a bit.

     Mika

Tony Hosking writes:
>Could be dangerous depending on module link orderings.  Might be  
>better to cache your own lookups in your interpreter.
>
>On 23 Oct 2008, at 09:24, Mika Nystrom wrote:
>
>> Hello Modula-3 people,
>>
>> Does anyone know whether there is anything that prevents using NEW
>> in RTType.m3?
>>
>> I added a lot of memory recycling to the Scheme interpreter I am
>> working on, and now it seems it is spending a lot of time in Typecase
>> and IsSubtype.  I was wondering if it is possible to memoize IsSubtype
>> inside RTType.m3...  (specifically just replacing IsSubtype with an
>> array lookup).
>>
>> It is the nature of the interpreter that it spends a lot of time
>> checking types and narrowing things back and forth, as Scheme and
>> Modula-3 references share the same representation.
>>
>>      Mika


From mika at async.caltech.edu  Sat Oct 25 05:16:56 2008
From: mika at async.caltech.edu (Mika Nystrom)
Date: Fri, 24 Oct 2008 20:16:56 -0700
Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3
Message-ID: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu>


Dear Modula-3 people,

I had a crash in my program from a range error that I believe
shouldn't have happened the way it did, although it's not in my
code, so I'm not sure if there's a reason for the way it's done (matching
a C declaration somewhere, maybe??).

Here it is, from ThreadPosix.m3:

PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN;
                  timeoutInterval: LONGREAL := -1.0D0): WaitResult =
  <*FATAL Alerted*>
  BEGIN
    self.alertable := FALSE;
    RETURN XIOWait(fd, read, timeoutInterval);
  END IOWait;

PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN;
                  timeoutInterval: LONGREAL := -1.0D0): WaitResult
                  RAISES {Alerted} =
  BEGIN
    self.alertable := TRUE;
    RETURN XIOWait(fd, read, timeoutInterval);
  END IOAlertWait;

PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL): WaitResult
    RAISES {Alerted} =
  VAR res: INTEGER;
      fdindex := fd DIV FDSetSize;
      fdset := FDSet{fd MOD FDSetSize};
... rest omitted ...

Note that IOWait calls XIOWait.  IOWait is declared as taking an
INTEGER, but XIOWait takes a CARDINAL.

So I really should use a CARDINAL in passing to IOWait, but since
IOWait is the interface function it's not clear that I should do
that (until my program crashes after passing -1 from some carelessly
wrapped C code).  I don't like the fact that I get a range error
*inside* the library when it appears unnecessary---it should have
happened in my code, as I make the call.

Suggested improvement: declare all the FDs in SchedulerPosix.i3
(the interface that exports these routines) to be CARDINAL instead
of INTEGER.

     Mika


From hosking at cs.purdue.edu  Mon Oct 27 15:28:52 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Mon, 27 Oct 2008 14:28:52 +0000
Subject: [M3devel] Unnecessary(?) range confusion in ThreadPosix.m3
In-Reply-To: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu>
References: <200810250317.m9P3GuVA025509@camembert.async.caltech.edu>
Message-ID: <5232F2E4-3B0E-49E5-B1C8-BB4D04C60C33@cs.purdue.edu>

Sounds fair to me.

On 25 Oct 2008, at 04:16, Mika Nystrom wrote:

>
> Dear Modula-3 people,
>
> I had a crash in my program from a range error that I believe
> shouldn't have happened the way it did, although it's not in my
> code, so I'm not sure if there's a reason for the way it's done  
> (matching
> a C declaration somewhere, maybe??).
>
> Here it is, from ThreadPosix.m3:
>
> PROCEDURE IOWait(fd: INTEGER; read: BOOLEAN;
>                  timeoutInterval: LONGREAL := -1.0D0): WaitResult =
>  <*FATAL Alerted*>
>  BEGIN
>    self.alertable := FALSE;
>    RETURN XIOWait(fd, read, timeoutInterval);
>  END IOWait;
>
> PROCEDURE IOAlertWait(fd: INTEGER; read: BOOLEAN;
>                  timeoutInterval: LONGREAL := -1.0D0): WaitResult
>                  RAISES {Alerted} =
>  BEGIN
>    self.alertable := TRUE;
>    RETURN XIOWait(fd, read, timeoutInterval);
>  END IOAlertWait;
>
> PROCEDURE XIOWait (fd: CARDINAL; read: BOOLEAN; interval: LONGREAL):  
> WaitResult
>    RAISES {Alerted} =
>  VAR res: INTEGER;
>      fdindex := fd DIV FDSetSize;
>      fdset := FDSet{fd MOD FDSetSize};
> ... rest omitted ...
>
> Note that IOWait calls XIOWait.  IOWait is declared as taking an
> INTEGER, but XIOWait takes a CARDINAL.
>
> So I really should use a CARDINAL in passing to IOWait, but since
> IOWait is the interface function it's not clear that I should do
> that (until my program crashes after passing -1 from some carelessly
> wrapped C code).  I don't like the fact that I get a range error
> *inside* the library when it appears unnecessary---it should have
> happened in my code, as I make the call.
>
> Suggested improvement: declare all the FDs in SchedulerPosix.i3
> (the interface that exports these routines) to be CARDINAL instead
> of INTEGER.
>
>     Mika


From jay.krell at cornell.edu  Thu Oct 30 22:21:09 2008
From: jay.krell at cornell.edu (Jay)
Date: Thu, 30 Oct 2008 21:21:09 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
Message-ID: <COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>


Please try this:

 http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2

std failed to build because stubgen crashed, probably due to gc.
cm3 does crash right away without @M3nogc.

Something like this:
    cd /src 
    wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2  
    cd /cm3  
    rm -rf *  
    tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2  
    cd /src/cm3/scripts/python  
    ./do-cm3-all.py realclean  
    ./upgrade.py  
    ./do-cm3-all.py realclean  
    ./do-cm3-std.py buildship  
    => it will fail, at zeus, but it should get far; you'll also need some X devel packages to get that far, I had a failure for lack of libXaw for example. I did not run anything, any of the GUI packages, but building itself with itself is a decent test.

I renamed the old AMD64_LINUX archives to "1.0.0".
 http://www.opencm3.com/uploaded-archives/

This has the bug fix I commited last night to cm3cg, and therefore a 64 bit hosted cm3cg.

jay at amd64a:/cm3/bin$ file *
AMD64_LINUX: ASCII text
cm3:         ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
cm3.cfg:     ASCII English text
cm3cg:       ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li
nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
m3bundle:    ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Li
nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
mklib:       ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
Unix.common: ASCII English text

Built on Debian 4.0r4 (r5 is out).
jay at amd64a:/cm3/bin$ uname -a
Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64 GNU/Linux
jay at amd64a:/cm3/bin$ dmesg | head
Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)
Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org) (
gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Tue Aug 19 04:30:56 UTC 2008

Though really I couldn't do it without Visual C++ on Windows providing excellent find-in-files and editing, nothing else comes close, I edit on Windows and scp the files over. :)

 - Jay

________________________________

From: jay.krell at cornell.edu
To: dragisha at m3w.org; m3devel at elegosoft.com
Date: Tue, 9 Sep 2008 09:43:03 +0000
Subject: Re: [M3devel] AMD64_LINUX status


From hosking at cs.purdue.edu  Fri Oct 31 11:19:51 2008
From: hosking at cs.purdue.edu (Tony Hosking)
Date: Fri, 31 Oct 2008 10:19:51 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
Message-ID: <BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>

Umm, I think I found your bug with GC:

Check out "RTMachine.PointerAlignment".  You have it set to  
BITSIZE(INTEGER).  I suspect what you want is something like  
BYTESIZE(ADDRESS).  Also, "RTMachine.StackFrameAlignment" should  
probably be 2*BYTESIZE(ADDRESS).


On 30 Oct 2008, at 21:21, Jay wrote:

>
> Please try this:
>
> http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
>
> std failed to build because stubgen crashed, probably due to gc.
> cm3 does crash right away without @M3nogc.
>
> Something like this:
>    cd /src
>    wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
>    cd /cm3
>    rm -rf *
>    tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- 
> d5.7.0.tar.bz2
>    cd /src/cm3/scripts/python
>    ./do-cm3-all.py realclean
>    ./upgrade.py
>    ./do-cm3-all.py realclean
>    ./do-cm3-std.py buildship
>    => it will fail, at zeus, but it should get far; you'll also need  
> some X devel packages to get that far, I had a failure for lack of  
> libXaw for example. I did not run anything, any of the GUI packages,  
> but building itself with itself is a decent test.
>
> I renamed the old AMD64_LINUX archives to "1.0.0".
> http://www.opencm3.com/uploaded-archives/
>
> This has the bug fix I commited last night to cm3cg, and therefore a  
> 64 bit hosted cm3cg.
>
> jay at amd64a:/cm3/bin$ file *
> AMD64_LINUX: ASCII text
> cm3:         ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs),  
> for GNU/Linux 2.6.0, not stripped
> cm3.cfg:     ASCII English text
> cm3cg:       ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Li
> nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux  
> 2.6.0, not stripped
> m3bundle:    ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Li
> nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux  
> 2.6.0, not stripped
> mklib:       ELF 64-bit LSB executable, AMD x86-64, version 1  
> (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs),  
> for GNU/Linux 2.6.0, not stripped
> Unix.common: ASCII English text
>
> Built on Debian 4.0r4 (r5 is out).
> jay at amd64a:/cm3/bin$ uname -a
> Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008  
> x86_64 GNU/Linux
> jay at amd64a:/cm3/bin$ dmesg | head
> Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)
> Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org 
> ) (
> gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP  
> Tue Aug 19 04:30:56 UTC 2008
>
> Though really I couldn't do it without Visual C++ on Windows  
> providing excellent find-in-files and editing, nothing else comes  
> close, I edit on Windows and scp the files over. :)
>
> - Jay
>
> ________________________________
>
> From: jay.krell at cornell.edu
> To: dragisha at m3w.org; m3devel at elegosoft.com
> Date: Tue, 9 Sep 2008 09:43:03 +0000
> Subject: Re: [M3devel] AMD64_LINUX status
>
>
>
>


From jay.krell at cornell.edu  Fri Oct 31 14:52:43 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 31 Oct 2008 13:52:43 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl> 
	<BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
Message-ID: <COL101-W42D125019F4B7531A26BA6E6200@phx.gbl>


Tony, Excellent, thanks, that helps.
How do you know and confirm the right values? I don't like guessing.
 
And then cause then of :) :
 
  SymbolPickling font metrics...Done./cm3/bin/m3bundle -name JunoBundle -F/tmp/qk/cm3/bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTABstubgen: Processing RemoteView.T
****** runtime error:***    NEW() was unable to allocate more memory.***    file "../src/runtime/common/RTAllocator.m3", line 285***
"/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit 1536: /cm3/bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTAB
--procedure--  -line-  -file---exec               --  <builtin>_v_netobj          37  /cm3/pkg/netobj/src/netobj.tmplnetobjv1           44  /cm3/pkg/netobj/src/netobj.tmplnetobj             64  /cm3/pkg/netobj/src/netobj.tmplinclude_dir        71  /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile                    8  /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args
 
 
I should debug it, and double check that I upgraded what had to be upgraded.
 
 - Jay> From: hosking at cs.purdue.edu> To: jay.krell at cornell.edu> Date: Fri, 31 Oct 2008 10:19:51 +0000> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] AMD64_LINUX status> > Umm, I think I found your bug with GC:> > Check out "RTMachine.PointerAlignment". You have it set to > BITSIZE(INTEGER). I suspect what you want is something like > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > probably be 2*BYTESIZE(ADDRESS).> > > > On 30 Oct 2008, at 21:21, Jay wrote:> > >> > Please try this:> >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> >> > std failed to build because stubgen crashed, probably due to gc.> > cm3 does crash right away without @M3nogc.> >> > Something like this:> > cd /src> > wget http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > cd /cm3> > rm -rf *> > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > d5.7.0.tar.bz2> > cd /src/cm3/scripts/python> > ./do-cm3-all.py realclean> > ./upgrade.py> > ./do-cm3-all.py realclean> > ./do-cm3-std.py buildship> > => it will fail, at zeus, but it should get far; you'll also need > > some X devel packages to get that far, I had a failure for lack of > > libXaw for example. I did not run anything, any of the GUI packages, > > but building itself with itself is a decent test.> >> > I renamed the old AMD64_LINUX archives to "1.0.0".> > http://www.opencm3.com/uploaded-archives/> >> > This has the bug fix I commited last night to cm3cg, and therefore a > > 64 bit hosted cm3cg.> >> > jay at amd64a:/cm3/bin$ file *> > AMD64_LINUX: ASCII text> > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > cm3.cfg: ASCII English text> > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Li> > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > 2.6.0, not stripped> > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), > > for GNU/Linux 2.6.0, not stripped> > Unix.common: ASCII English text> >> > Built on Debian 4.0r4 (r5 is out).> > jay at amd64a:/cm3/bin$ uname -a> > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > x86_64 GNU/Linux> > jay at amd64a:/cm3/bin$ dmesg | head> > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2) (dannf at debian.org > > ) (> > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > Tue Aug 19 04:30:56 UTC 2008> >> > Though really I couldn't do it without Visual C++ on Windows > > providing excellent find-in-files and editing, nothing else comes > > close, I edit on Windows and scp the files over. :)> >> > - Jay> >> > ________________________________> >> > From: jay.krell at cornell.edu> > To: dragisha at m3w.org; m3devel at elegosoft.com> > Date: Tue, 9 Sep 2008 09:43:03 +0000> > Subject: Re: [M3devel] AMD64_LINUX status> >> >> >> >> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081031/dfecf655/attachment-0002.html>

From jay.krell at cornell.edu  Fri Oct 31 15:25:13 2008
From: jay.krell at cornell.edu (Jay)
Date: Fri, 31 Oct 2008 14:25:13 +0000
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <1225462205.14482.60.camel@faramir.m3w.org>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
	<BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
	<COL101-W42D125019F4B7531A26BA6E6200@phx.gbl> 
	<1225462205.14482.60.camel@faramir.m3w.org>
Message-ID: <COL101-W728C265A8AF283199F0034E6200@phx.gbl>


It seems like there's still a problem. I haven't debugged it yet.
(I'm sure glad Tony found the other problem before I debugged it.)
I updated http://www.opencm3.com/uploaded-archives with Tony's fix.
The older builds are now 0.0.0.1 and 0.0.0.2.
 
 - Jay> Subject: Re: [M3devel] AMD64_LINUX status> From: dragisha at m3w.org> To: jay.krell at cornell.edu> CC: hosking at cs.purdue.edu; m3devel at elegosoft.com> Date: Fri, 31 Oct 2008 15:10:05 +0100> > So, we now have fully functional AMD64_LINUX (_with_ GC)?> > TIA> > On Fri, 2008-10-31 at 13:52 +0000, Jay wrote:> > Tony, Excellent, thanks, that helps.> > How do you know and confirm the right values? I don't like guessing.> > > > And then cause then of :) :> > > > Symbol> > Pickling font metrics...> > Done.> > /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk> > /cm3/bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > stubgen: Processing RemoteView.T> > > > ***> > *** runtime error:> > *** NEW() was unable to allocate more memory.> > *** file "../src/runtime/common/RTAllocator.m3", line 285> > ***> > "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit> > 1536: /cm3> > /bin/stubgen -v1 -sno RemoteView.T -T.M3IMPTAB> > --procedure-- -line- -file---> > exec -- <builtin>> > _v_netobj 37 /cm3/pkg/netobj/src/netobj.tmpl> > netobjv1 44 /cm3/pkg/netobj/src/netobj.tmpl> > netobj 64 /cm3/pkg/netobj/src/netobj.tmpl> > include_dir 71 /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile> > > > 8 /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args> > > > > > I should debug it, and double check that I upgraded what had to be> > upgraded.> > > > - Jay> > > > > > > > > From: hosking at cs.purdue.edu> > > To: jay.krell at cornell.edu> > > Date: Fri, 31 Oct 2008 10:19:51 +0000> > > CC: m3devel at elegosoft.com> > > Subject: Re: [M3devel] AMD64_LINUX status> > > > > > Umm, I think I found your bug with GC:> > > > > > Check out "RTMachine.PointerAlignment". You have it set to > > > BITSIZE(INTEGER). I suspect what you want is something like > > > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should > > > probably be 2*BYTESIZE(ADDRESS).> > > > > > > > > > > > On 30 Oct 2008, at 21:21, Jay wrote:> > > > > > >> > > > Please try this:> > > >> > > >> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > >> > > > std failed to build because stubgen crashed, probably due to gc.> > > > cm3 does crash right away without @M3nogc.> > > >> > > > Something like this:> > > > cd /src> > > > wget> > http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2> > > > cd /cm3> > > > rm -rf *> > > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- > > > > d5.7.0.tar.bz2> > > > cd /src/cm3/scripts/python> > > > ./do-cm3-all.py realclean> > > > ./upgrade.py> > > > ./do-cm3-all.py realclean> > > > ./do-cm3-std.py buildship> > > > => it will fail, at zeus, but it should get far; you'll also need > > > > some X devel packages to get that far, I had a failure for lack> > of > > > > libXaw for example. I did not run anything, any of the GUI> > packages, > > > > but building itself with itself is a decent test.> > > >> > > > I renamed the old AMD64_LINUX archives to "1.0.0".> > > > http://www.opencm3.com/uploaded-archives/> > > >> > > > This has the bug fix I commited last night to cm3cg, and therefore> > a > > > > 64 bit hosted cm3cg.> > > >> > > > jay at amd64a:/cm3/bin$ file *> > > > AMD64_LINUX: ASCII text> > > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > cm3.cfg: ASCII English text> > > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Li> > > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux > > > > 2.6.0, not stripped> > > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 > > > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared> > libs), > > > > for GNU/Linux 2.6.0, not stripped> > > > Unix.common: ASCII English text> > > >> > > > Built on Debian 4.0r4 (r5 is out).> > > > jay at amd64a:/cm3/bin$ uname -a> > > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 > > > > x86_64 GNU/Linux> > > > jay at amd64a:/cm3/bin$ dmesg | head> > > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)> > > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2)> > (dannf at debian.org > > > > ) (> > > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP > > > > Tue Aug 19 04:30:56 UTC 2008> > > >> > > > Though really I couldn't do it without Visual C++ on Windows > > > > providing excellent find-in-files and editing, nothing else comes > > > > close, I edit on Windows and scp the files over. :)> > > >> > > > - Jay> > > >> > > > ________________________________> > > >> > > > From: jay.krell at cornell.edu> > > > To: dragisha at m3w.org; m3devel at elegosoft.com> > > > Date: Tue, 9 Sep 2008 09:43:03 +0000> > > > Subject: Re: [M3devel] AMD64_LINUX status> > > >> > > >> > > >> > > >> > > > > > -- > Dragi?a Duri? <dragisha at m3w.org>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20081031/8799c470/attachment-0002.html>

From dragisha at m3w.org  Fri Oct 31 15:10:05 2008
From: dragisha at m3w.org (=?UTF-8?Q?Dragi=C5=A1a_Duri=C4=87?=)
Date: Fri, 31 Oct 2008 15:10:05 +0100
Subject: [M3devel] AMD64_LINUX status
In-Reply-To: <COL101-W42D125019F4B7531A26BA6E6200@phx.gbl>
References: <1220941880.9421.11.camel@faramir.m3w.org>
	<COL101-W284C9BF1498474699AEEDEE6540@phx.gbl>
	<COL101-W8B44DA93BF9D7B88DED08E6210@phx.gbl>
	<BCC1A05D-2863-45E5-8596-D64CF8D01A86@cs.purdue.edu>
	<COL101-W42D125019F4B7531A26BA6E6200@phx.gbl>
Message-ID: <1225462205.14482.60.camel@faramir.m3w.org>

So, we now have fully functional AMD64_LINUX (_with_ GC)?

TIA

On Fri, 2008-10-31 at 13:52 +0000, Jay wrote:
> Tony, Excellent, thanks, that helps.
> How do you know and confirm the right values? I don't like guessing.
>  
> And then cause then of :) :
>  
>   Symbol
> Pickling font metrics...
> Done.
> /cm3/bin/m3bundle -name JunoBundle -F/tmp/qk
> /cm3/bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTAB
> stubgen: Processing RemoteView.T
> 
> ***
> *** runtime error:
> ***    NEW() was unable to allocate more memory.
> ***    file "../src/runtime/common/RTAllocator.m3", line 285
> ***
> "/cm3/pkg/netobj/src/netobj.tmpl", line 37: quake runtime error: exit
> 1536: /cm3
> /bin/stubgen -v1 -sno RemoteView.T   -T.M3IMPTAB
> --procedure--  -line-  -file---
> exec               --  <builtin>
> _v_netobj          37  /cm3/pkg/netobj/src/netobj.tmpl
> netobjv1           44  /cm3/pkg/netobj/src/netobj.tmpl
> netobj             64  /cm3/pkg/netobj/src/netobj.tmpl
> include_dir        71  /dev2/cm3/m3-ui/juno-2/juno-app/src/m3makefile
> 
> 8  /dev2/cm3/m3-ui/juno-2/juno-app/AMD64_LINUX/m3make.args
>  
>  
> I should debug it, and double check that I upgraded what had to be
> upgraded.
>  
>  - Jay
> 
> 
> 
> > From: hosking at cs.purdue.edu
> > To: jay.krell at cornell.edu
> > Date: Fri, 31 Oct 2008 10:19:51 +0000
> > CC: m3devel at elegosoft.com
> > Subject: Re: [M3devel] AMD64_LINUX status
> > 
> > Umm, I think I found your bug with GC:
> > 
> > Check out "RTMachine.PointerAlignment". You have it set to 
> > BITSIZE(INTEGER). I suspect what you want is something like 
> > BYTESIZE(ADDRESS). Also, "RTMachine.StackFrameAlignment" should 
> > probably be 2*BYTESIZE(ADDRESS).
> > 
> > 
> > 
> > On 30 Oct 2008, at 21:21, Jay wrote:
> > 
> > >
> > > Please try this:
> > >
> > >
> http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
> > >
> > > std failed to build because stubgen crashed, probably due to gc.
> > > cm3 does crash right away without @M3nogc.
> > >
> > > Something like this:
> > > cd /src
> > > wget
> http://www.opencm3.com/uploaded-archives/cm3-min-POSIX-AMD64_LINUX-d5.7.0.tar.bz2
> > > cd /cm3
> > > rm -rf *
> > > tar --strip-components=1 -xf /src/cm3-min-POSIX-AMD64_LINUX- 
> > > d5.7.0.tar.bz2
> > > cd /src/cm3/scripts/python
> > > ./do-cm3-all.py realclean
> > > ./upgrade.py
> > > ./do-cm3-all.py realclean
> > > ./do-cm3-std.py buildship
> > > => it will fail, at zeus, but it should get far; you'll also need 
> > > some X devel packages to get that far, I had a failure for lack
> of 
> > > libXaw for example. I did not run anything, any of the GUI
> packages, 
> > > but building itself with itself is a decent test.
> > >
> > > I renamed the old AMD64_LINUX archives to "1.0.0".
> > > http://www.opencm3.com/uploaded-archives/
> > >
> > > This has the bug fix I commited last night to cm3cg, and therefore
> a 
> > > 64 bit hosted cm3cg.
> > >
> > > jay at amd64a:/cm3/bin$ file *
> > > AMD64_LINUX: ASCII text
> > > cm3: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared
> libs), 
> > > for GNU/Linux 2.6.0, not stripped
> > > cm3.cfg: ASCII English text
> > > cm3cg: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Li
> > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 
> > > 2.6.0, not stripped
> > > m3bundle: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Li
> > > nux 2.6.0, dynamically linked (uses shared libs), for GNU/Linux 
> > > 2.6.0, not stripped
> > > mklib: ELF 64-bit LSB executable, AMD x86-64, version 1 
> > > (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared
> libs), 
> > > for GNU/Linux 2.6.0, not stripped
> > > Unix.common: ASCII English text
> > >
> > > Built on Debian 4.0r4 (r5 is out).
> > > jay at amd64a:/cm3/bin$ uname -a
> > > Linux amd64a 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 
> > > x86_64 GNU/Linux
> > > jay at amd64a:/cm3/bin$ dmesg | head
> > > Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=805)
> > > Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-22etch2)
> (dannf at debian.org 
> > > ) (
> > > gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP 
> > > Tue Aug 19 04:30:56 UTC 2008
> > >
> > > Though really I couldn't do it without Visual C++ on Windows 
> > > providing excellent find-in-files and editing, nothing else comes 
> > > close, I edit on Windows and scp the files over. :)
> > >
> > > - Jay
> > >
> > > ________________________________
> > >
> > > From: jay.krell at cornell.edu
> > > To: dragisha at m3w.org; m3devel at elegosoft.com
> > > Date: Tue, 9 Sep 2008 09:43:03 +0000
> > > Subject: Re: [M3devel] AMD64_LINUX status
> > >
> > >
> > >
> > >
> > 
> 
-- 
Dragi?a Duri? <dragisha at m3w.org>