<html>
<head>
<style>
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Verdana
}
</style>
</head>
<body class='hmmessage'>
My point about optimized thread locals was about C and C++.<BR>
(And not gcc, at that.)<BR>
Modula-3 on NT is the same as "all" platforms, except, I guess, Solaris/SPARC32.<BR>
No stack walker.<BR>
Highly inefficient pushframe/popframe using general thread locals -- pthread_getspecific/setspecific / TlsGetValue/SetValue -- pthreads and Win32 are very analogous here.<BR>
There is only NT/x86 for Modula-3 so far.<BR>
I'm not sure gcc for NT/amd64 is mature enough, and haven't seen any signs of NT/IA64 (nor Alpha/PPC/MIPS...).<BR>
gcc on NT/x86 has its own other two EH mechanisms -- setjmp/longjmp and I presume a stack walking implementation.<BR>
Most other compilers are like Visual C++ -- e.g. OpenWatcom and DigitalMars.<BR>
CodeWarrior had two settings, I think one matched Visual C++.<BR>
<BR>
There should really be just one implementation of this across all languages.<BR>
NT's setjmp/longjmp do interoperate with exceptions at least, so you can use a portable slow form and still interoperate..well, there are two versions, one that does, one that doesn't. I need to switch Modula-3 to the interoperable form.<BR>
<BR>
- Jay<BR> <BR>> CC: m3devel@elegosoft.com<BR>> From: hosking@cs.purdue.edu<BR>> To: jay.krell@cornell.edu<BR>> Subject: Re: [M3devel] per thread data?<BR>> Date: Tue, 31 Mar 2009 09:42:06 +1100<BR>> <BR>> Yes, this is a tricky issue. At some point I seem to recall it being <BR>> OK to have non-Modula-3 threads start running Modula-3 code, but I <BR>> don't know for sure. I've never really liked the idea of having non- <BR>> M3 threads.<BR>> <BR>> Are you using the existing handler maps and exception stack unwinding <BR>> support for non-x86 NT?<BR>> <BR>> On 31 Mar 2009, at 09:15, Jay wrote:<BR>> <BR>> ><BR>> > hm, thinking about this more...<BR>> > What about threads not created by Modula-3 Fork() (or the first <BR>> > thread)?<BR>> ><BR>> > It looks like exception handling had a chance of working on them<BR>> > before. Now they'll crash upon entering functions<BR>> > with try or raise or I presume lock.<BR>> ><BR>> ><BR>> > 1) ok?<BR>> ><BR>> ><BR>> > 2) do the heap alloc on demand?<BR>> > But is that enough? Can it be initialized without further context?<BR>> > Let's see..the circular list can be maintained without further <BR>> > context.<BR>> > handle := pthread_self, ok. stack can probably be figured out, though<BR>> > that is probably just for gc and could be left alone for now, <BR>> > continuing<BR>> > to not work (or fixed)...getcontext at least on some platforms can<BR>> > fill this in, or VirtualQuery/msomething (mmap family?)?<BR>> ><BR>> ><BR>> > 3) put back the second thread local?<BR>> ><BR>> ><BR>> > #2 has a chance of working better than before -- letting GC<BR>> > work on threads not created by Modula-3 runtime, something<BR>> > that has long bothered me...but I haven't done a complete analysis.<BR>> > Or at least maybe keep it working as it was<BR>> > For now there is somewhat of a regression, ie, when calling<BR>> > Modula-3 code on threads not created from Modula-3.<BR>> > Possibly the gc in this case was already dangerous?<BR>> > Failing to find references on other stacks?<BR>> > Or failing all allocations (should be easy to check but I have to <BR>> > run..)<BR>> ><BR>> ><BR>> > - Jay<BR>> ><BR>> ><BR>> ><BR>> ><BR>> ><BR>> ><BR>> ><BR>> ><BR>> ><BR>> ><BR>> ><BR>> > ----------------------------------------<BR>> >> From: jay.krell@cornell.edu<BR>> >> To: hosking@cs.purdue.edu<BR>> >> CC: m3devel@elegosoft.com<BR>> >> Subject: RE: [M3devel] per thread data?<BR>> >> Date: Mon, 30 Mar 2009 13:23:10 +0000<BR>> >><BR>> >><BR>> >> This was surprisingly difficult.<BR>> >><BR>> >><BR>> >> InitHandlers is called much earlier than InitActivations.<BR>> >> InitActivations does a heap allocation.<BR>> >> InitHandlers did not.<BR>> >> The types involved are not yet initialized at this point, or <BR>> >> somesuch.<BR>> >> You cannot NEW(Activation) in the first call to PushFrame.<BR>> >> So, maybe, use a global for the first one,<BR>> >> but then what happens is it gets reinitialized later by<BR>> >> the module initializer -- which is perhaps another indictment<BR>> >> of initializers..or maybe a special case in the depths of the <BR>> >> system --<BR>> >> this module and anything it uses are subject to be called by<BR>> >> compiler-generated calls -- they can be called before their <BR>> >> initializers<BR>> >> run.. seems to me the initialization could have happened "statically"<BR>> >> like in C.<BR>> >><BR>> >><BR>> >> Anyway, I should have this done shortly.<BR>> >> Trick is to use a local value and assign it to a heap block<BR>> >> allocated directly with calloc instead of RTAllocator.<BR>> >><BR>> >><BR>> >> The result is maybe faster, maybe slower.<BR>> >> Before, "try" cost pthread_getspecific and setspecific.<BR>> >> Now it will just cost getspecific.<BR>> >> But with another pointer deref and call to GetActivation<BR>> >> with its on-demand initialization.<BR>> >><BR>> >><BR>> >> Before, popframe only called setspecific.<BR>> >> Now it will only call getspecific, plus the indirect<BR>> >> and on-demand initialization.<BR>> >> The on-demand seems bogus in pop, given that push already had to <BR>> >> occur.<BR>> >> So maybe that could be optimized.<BR>> >><BR>> >><BR>> >> This stuff is highly optimized in C and C++ on NT..<BR>> >> NT/x86 has a special thread local just for exception handling,<BR>> >> faster than all other thread locals.<BR>> >> All non-x86 NT platforms have stack walkers -- no cost for "try",<BR>> >> and then "throw" maps instruction pointer to data about how to<BR>> >> to unwind the stack, using a little mini-assembly code.<BR>> >><BR>> >><BR>> >> - Jay<BR>> >><BR>> >><BR>> >> ________________________________<BR>> >>> From: jay.krell@cornell.edu<BR>> >>> To: hosking@cs.purdue.edu<BR>> >>> Date: Thu, 19 Mar 2009 01:03:57 +0000<BR>> >>> CC: m3devel@elegosoft.com<BR>> >>> Subject: Re: [M3devel] per thread data?<BR>> >>><BR>> >>><BR>> >>><BR>> >>><BR>> >>><BR>> >>><BR>> >>><BR>> >>><BR>> >>> Thanks, I should get around to that "soon" then.<BR>> >>><BR>> >>><BR>> >>><BR>> >>> - Jay<BR>> >>><BR>> >>><BR>> >>><BR>> >>> ________________________________<BR>> >>><BR>> >>> From: hosking@cs.purdue.edu<BR>> >>> To: jay.krell@cornell.edu<BR>> >>> Date: Thu, 19 Mar 2009 10:14:59 +1100<BR>> >>> CC: m3devel@elegosoft.com<BR>> >>> Subject: Re: [M3devel] per thread data?<BR>> >>><BR>> >>> I have no problem putting the exception handler stack thread local <BR>> >>> into the activation thread local.<BR>> >>><BR>> >>><BR>> >>><BR>> >>><BR>> >>><BR>> >>><BR>> >>> On 18 Mar 2009, at 20:11, Jay wrote:<BR>> >>><BR>> >>><BR>> >>><BR>> >>> I'm not looking at it right now, but doesn't seem rather piggy to <BR>> >>> have two thread locals and data on the side?<BR>> >>><BR>> >>><BR>> >>> I'm guessing the data on the side is needed because we need to be <BR>> >>> able to enumerate our threads, to suspend them all?<BR>> >>><BR>> >>><BR>> >>> I understand that having multiple thread locals optimizes their <BR>> >>> use, but it seems greedy.<BR>> >>> vs. a small heap allocation that combines them.<BR>> >>><BR>> >>> Or in fact.. presumably there could just be one thread local that <BR>> >>> is the thread pointer, and the handler link could be put at the <BR>> >>> start, for architectures where zero offset is smaller/faster than <BR>> >>> non-zero offset.<BR>> >>><BR>> >>><BR>> >>> Another idea, of course, is to look into "__thread", <BR>> >>> "__declspec(thread)".<BR>> >>><BR>> >>> On Windows and probably all platforms they exist on, they are <BR>> >>> nicely more efficient than pthread_get/setspecific, except on <BR>> >>> Windows they don't really work acceptably prior to Vista -- they <BR>> >>> only work in .exes and their static dependencies, not any .dll you <BR>> >>> load after the process starts with LoadLibrary (dlopen).<BR>> >>><BR>> >>><BR>> >>> Does "__thread" work well on most non-Windows platforms?<BR>> >>> i.e. even if shared object is loaded with dlopen?<BR>> >>><BR>> >>><BR>> >>> I could have sworn I saw code out there that was "adaptive".<BR>> >>> It easily/efficiently checked if it was loaded with LoadLibrary or <BR>> >>> not.<BR>> >>> If so, it'd TlsGet/SetValue (pthread_get/setspecific).<BR>> >>> If not, it'd use __declspec(thread) (__thread).<BR>> >>> The check was based on if __tlsindex was not zero or somesuch. I <BR>> >>> couldn't track it down though.<BR>> >>><BR>> >>><BR>> >>> In either case, yes, I know, one of the thread locals at least is <BR>> >>> gone on platforms that have stack walkers, e.g. Solaris, and <BR>> >>> potentially NT, and maybe others.<BR>> >>><BR>> >>><BR>> >>> - Jay<BR>> >>><BR>> >>><BR>> <BR></body>
</html>