[M3devel] FW: per thread data?

Jay jay.krell at cornell.edu
Tue Mar 31 02:25:44 CEST 2009


[truncated again]





 


From: jay.krell at cornell.edu
To: hosking at cs.purdue.edu
CC: m3devel at elegosoft.com
Subject: RE: [M3devel] per thread data?
Date: Tue, 31 Mar 2009 00:12:25 +0000



> > but I don't know for sure. I've never really liked the idea of 
> > having non-M3 threads.

I understand there is no free lunch, but the scenario is that I write a "plugin" in Modula-3.
Or even a static dependency -- the point being, to mix languages and have the "primary" language not be Modula-3. For folks to be able to call "native" pthread_create or Win32 CreateThread, and still be able to use Modula-3.
 
On Win32, .dlls have a callback that gets called for every thread created.
Generally they can initializer their per-thread data there.
There is also a callback for thread exit.
It is a slightly thorny issue though, for a few reasons.
For example, if .dlls get dynamically loaded/unloaded, threads can be created before they load -- no callback, or threads can be exited after they unload -- again, no callback.
You only get callbacks when you are already loaded at the time of thread create/exit.
 
You can also initialize on demand, assuming there is enough memory still.
 
If the primary executable is not written in Modula-3, but has a static dependency on a Modula-3 .dll, then it works ok.
 
 - Jay

 
> From: hosking at cs.purdue.edu
> To: hosking at cs.purdue.edu
> Date: Tue, 31 Mar 2009 09:45:22 +1100
> CC: m3devel at elegosoft.com; jay.krell at cornell.edu
> Subject: Re: [M3devel] per thread data?
> 
> PS In general, I am loathe to make changes that complicate the code 
> based on performance assumptions that are only hypothetical. Better 
> to profile and see where the time is going before prematurely 
> "optimizing".
> 
> On 31 Mar 2009, at 09:42, Tony Hosking wrote:
> 
> > Yes, this is a tricky issue. At some point I seem to recall it 
> > being OK to have non-Modula-3 threads start running Modula-3 code, 
> > but I don't know for sure. I've never really liked the idea of 
> > having non-M3 threads.
> >
> > Are you using the existing handler maps and exception stack 
> > unwinding support for non-x86 NT?
> >
> > On 31 Mar 2009, at 09:15, Jay wrote:
> >
> >>
> >> hm, thinking about this more...
> >> What about threads not created by Modula-3 Fork() (or the first 
> >> thread)?
> >>
> >> It looks like exception handling had a chance of working on them
> >> before. Now they'll crash upon entering functions
> >> with try or raise or I presume lock.
> >>
> >>
> >> 1) ok?
> >>
> >>
> >> 2) do the heap alloc on demand?
> >> But is that enough? Can it be initialized without further context?
> >> Let's see..the circular list can be maintained without further 
> >> context.
> >> handle := pthread_self, ok. stack can probably be figured out, though
> >> that is probably just for gc and could be left alone for now, 
> >> continuing
> >> to not work (or fixed)...getcontext at least on some platforms can
> >> fill this in, or VirtualQuery/msomething (mmap family?)?
> >>
> >>
> >> 3) put back the second thread local?
> >>
> >>
> >> #2 has a chance of working better than before -- letting GC
> >> work on threads not created by Modula-3 runtime, something
> >> that has long bothered me...but I haven't done a complete analysis.
> >> Or at least maybe keep it working as it was
> >> For now there is somewhat of a regression, ie, when calling
> >> Modula-3 code on threads not created from Modula-3.
> >> Possibly the gc in this case was already dangerous?
> >> Failing to find references on other stacks?
> >> Or failing all allocations (should be easy to check but I have to 
> >> run..)
> >>
> >>
> >> - Jay
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ----------------------------------------
> >>> From: jay.krell at cornell.edu
> >>> To: hosking at cs.purdue.edu
> >>> CC: m3devel at elegosoft.com
> >>> Subject: RE: [M3devel] per thread data?
> >>> Date: Mon, 30 Mar 2009 13:23:10 +0000
> >>>
> >>>
> >>> This was surprisingly difficult.
> >>>
> >>>
> >>> InitHandlers is called much earlier than InitActivations.
> >>> InitActivations does a heap allocation.
> >>> InitHandlers did not.
> >>> The types involved are not yet initialized at this point, or 
> >>> somesuch.
> >>> You cannot NEW(Activation) in the first call to PushFrame.
> >>> So, maybe, use a global for the first one,
> >>> but then what happens is it gets reinitialized later by
> >>> the module initializer -- which is perhaps another indictment
> >>> of initializers..or maybe a special case in the depths of the 
> >>> system --
> >>> this module and anything it uses are subject to be called by
> >>> compiler-generated calls -- they can be called before their 
> >>> initializers
> >>> run.. seems to me the initialization could have happened 
> >>> "statically"
> >>> like in C.
> >>>
> >>>
> >>> Anyway, I should have this done shortly.
> >>> Trick is to use a local value and assign it to a heap block
> >>> allocated directly with calloc instead of RTAllocator.
> >>>
> >>>
> >>> The result is maybe faster, maybe slower.
> >>> Before, "try" cost pthread_getspecific and setspecific.
> >>> Now it will just cost getspecific.
> >>> But with another pointer deref and call to GetActivation
> >>> with its on-demand initialization.
> >>>
> >>>
> >>> Before, popframe only called setspecific.
> >>> Now it will only call getspecific, plus the indirect
> >>> and on-demand initialization.
> >>> The on-demand seems bogus in pop, given that push already had to 
> >>> occur.
> >>> So maybe that could be optimized.
> >>>
> >>>
> >>> This stuff is highly optimized in C and C++ on NT..
> >>> NT/x86 has a special thread local just for exception handling,
> >>> faster than all other thread locals.
> >>> All non-x86 NT platforms have stack walkers -- no cost for "try",
> >>> and then "throw" maps instruction pointer to data about how to
> >>> to unwind the stack, using a little mini-assembly code.
> >>>
> >>>
> >>> - Jay
> >>>
> >>>
> >>> ________________________________
> >>>> From: jay.krell at cornell.edu
> >>>> To: hosking at cs.purdue.edu
> >>>> Date: Thu, 19 Mar 2009 01:03:57 +0000
> >>>> CC: m3devel at elegosoft.com
> >>>> Subject: Re: [M3devel] per thread data?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Thanks, I should get around to that "soon" then.
> >>>>
> >>>>
> >>>>
> >>>> - Jay
> >>>>
> >>>>
> >>>>
> >>>> ________________________________
> >>>>
> >>>> From: hosking at cs.purdue.edu
> >>>> To: jay.krell at cornell.edu
> >>>> Date: Thu, 19 Mar 2009 10:14:59 +1100
> >>>> CC: m3devel at elegosoft.com
> >>>> Subject: Re: [M3devel] per thread data?
> >>>>
> >>>> I have no problem putting the exception handler stack thread 
> >>>> local into the activation thread local.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 18 Mar 2009, at 20:11, Jay wrote:
> >>>>
> >>>>
> >>>>
> >>>> I'm not looking at it right now, but doesn't seem rather piggy to 
> >>>> have two thread locals and data on the side?
> >>>>
> >>>>
> >>>> I'm guessing the data on the side is needed because we need to be 
> >>>> able to enumerate our threads, to suspend them all?
> >>>>
> >>>>
> >>>> I understand that having multiple thread locals optimizes their 
> >>>> use, but it seems greedy.
> >>>> vs. a small heap allocation that combines them.
> >>>>
> >>>> Or in fact.. presumably there could just be one thread local that 
> >>>> is the thread pointer, and the handler link could be put at the 
> >>>> start, for architectures where zero offset is smaller/faster than 
> >>>> non-zero offset.
> >>>>
> >>>>
> >>>> Another idea, of course, is to look into "__thread", 
> >>>> "__declspec(thread)".
> >>>>
> >>>> On Windows and probably all platforms they exist on, they are 
> >>>> nicely more efficient than pthread_get/setspecific, except on 
> >>>> Windows they don't really work acceptably prior to Vista -- they 
> >>>> only work in .exes and their static dependencies, not any .dll 
> >>>> you load after the process starts with LoadLibrary (dlopen).
> >>>>
> >>>>
> >>>> Does "__thread" work well on most non-Windows platforms?
> >>>> i.e. even if shared object is loaded with dlopen?
> >>>>
> >>>>
> >>>> I could have sworn I saw code out there that was "adaptive".
> >>>> It easily/efficiently checked if it was loaded with LoadLibrary 
> >>>> or not.
> >>>> If so, it'd TlsGet/SetValue (pthread_get/setspecific).
> >>>> If not, it'd use __declspec(thread) (__thread).
> >>>> The check was based on if __tlsindex was not zero or somesuch. I 
> >>>> couldn't track it down though.
> >>>>
> >>>>
> >>>> In either case, yes, I know, one of the thread locals at least is 
> >>>> gone on platforms that have stack walkers, e.g. Solaris, and 
> >>>> potentially NT, and maybe others.
> >>>>
> >>>>
> >>>> - Jay
> >>>>
> >>>>
> ]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20090331/55d93415/attachment-0002.html>


More information about the M3devel mailing list