[M3devel] per thread data?
Jay
jay.krell at cornell.edu
Tue Mar 31 00:15:29 CEST 2009
hm, thinking about this more...
What about threads not created by Modula-3 Fork() (or the first thread)?
It looks like exception handling had a chance of working on them
before. Now they'll crash upon entering functions
with try or raise or I presume lock.
1) ok?
2) do the heap alloc on demand?
But is that enough? Can it be initialized without further context?
Let's see..the circular list can be maintained without further context.
handle := pthread_self, ok. stack can probably be figured out, though
that is probably just for gc and could be left alone for now, continuing
to not work (or fixed)...getcontext at least on some platforms can
fill this in, or VirtualQuery/msomething (mmap family?)?
3) put back the second thread local?
#2 has a chance of working better than before -- letting GC
work on threads not created by Modula-3 runtime, something
that has long bothered me...but I haven't done a complete analysis.
Or at least maybe keep it working as it was
For now there is somewhat of a regression, ie, when calling
Modula-3 code on threads not created from Modula-3.
Possibly the gc in this case was already dangerous?
Failing to find references on other stacks?
Or failing all allocations (should be easy to check but I have to run..)
- Jay
----------------------------------------
> From: jay.krell at cornell.edu
> To: hosking at cs.purdue.edu
> CC: m3devel at elegosoft.com
> Subject: RE: [M3devel] per thread data?
> Date: Mon, 30 Mar 2009 13:23:10 +0000
>
>
> This was surprisingly difficult.
>
>
> InitHandlers is called much earlier than InitActivations.
> InitActivations does a heap allocation.
> InitHandlers did not.
> The types involved are not yet initialized at this point, or somesuch.
> You cannot NEW(Activation) in the first call to PushFrame.
> So, maybe, use a global for the first one,
> but then what happens is it gets reinitialized later by
> the module initializer -- which is perhaps another indictment
> of initializers..or maybe a special case in the depths of the system --
> this module and anything it uses are subject to be called by
> compiler-generated calls -- they can be called before their initializers
> run.. seems to me the initialization could have happened "statically"
> like in C.
>
>
> Anyway, I should have this done shortly.
> Trick is to use a local value and assign it to a heap block
> allocated directly with calloc instead of RTAllocator.
>
>
> The result is maybe faster, maybe slower.
> Before, "try" cost pthread_getspecific and setspecific.
> Now it will just cost getspecific.
> But with another pointer deref and call to GetActivation
> with its on-demand initialization.
>
>
> Before, popframe only called setspecific.
> Now it will only call getspecific, plus the indirect
> and on-demand initialization.
> The on-demand seems bogus in pop, given that push already had to occur.
> So maybe that could be optimized.
>
>
> This stuff is highly optimized in C and C++ on NT..
> NT/x86 has a special thread local just for exception handling,
> faster than all other thread locals.
> All non-x86 NT platforms have stack walkers -- no cost for "try",
> and then "throw" maps instruction pointer to data about how to
> to unwind the stack, using a little mini-assembly code.
>
>
> - Jay
>
>
> ________________________________
>> From: jay.krell at cornell.edu
>> To: hosking at cs.purdue.edu
>> Date: Thu, 19 Mar 2009 01:03:57 +0000
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] per thread data?
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks, I should get around to that "soon" then.
>>
>>
>>
>> - Jay
>>
>>
>>
>> ________________________________
>>
>> From: hosking at cs.purdue.edu
>> To: jay.krell at cornell.edu
>> Date: Thu, 19 Mar 2009 10:14:59 +1100
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] per thread data?
>>
>> I have no problem putting the exception handler stack thread local into the activation thread local.
>>
>>
>>
>>
>>
>>
>> On 18 Mar 2009, at 20:11, Jay wrote:
>>
>>
>>
>> I'm not looking at it right now, but doesn't seem rather piggy to have two thread locals and data on the side?
>>
>>
>> I'm guessing the data on the side is needed because we need to be able to enumerate our threads, to suspend them all?
>>
>>
>> I understand that having multiple thread locals optimizes their use, but it seems greedy.
>> vs. a small heap allocation that combines them.
>>
>> Or in fact.. presumably there could just be one thread local that is the thread pointer, and the handler link could be put at the start, for architectures where zero offset is smaller/faster than non-zero offset.
>>
>>
>> Another idea, of course, is to look into "__thread", "__declspec(thread)".
>>
>> On Windows and probably all platforms they exist on, they are nicely more efficient than pthread_get/setspecific, except on Windows they don't really work acceptably prior to Vista -- they only work in .exes and their static dependencies, not any .dll you load after the process starts with LoadLibrary (dlopen).
>>
>>
>> Does "__thread" work well on most non-Windows platforms?
>> i.e. even if shared object is loaded with dlopen?
>>
>>
>> I could have sworn I saw code out there that was "adaptive".
>> It easily/efficiently checked if it was loaded with LoadLibrary or not.
>> If so, it'd TlsGet/SetValue (pthread_get/setspecific).
>> If not, it'd use __declspec(thread) (__thread).
>> The check was based on if __tlsindex was not zero or somesuch. I couldn't track it down though.
>>
>>
>> In either case, yes, I know, one of the thread locals at least is gone on platforms that have stack walkers, e.g. Solaris, and potentially NT, and maybe others.
>>
>>
>> - Jay
>>
>>
More information about the M3devel
mailing list