[M3devel] per thread data?

Jay jay.krell at cornell.edu
Tue Mar 31 10:17:08 CEST 2009


[I don't see it below, but you mentioned not liking the use of calloc.]

 

 

It doesn't have to use calloc.
It can be statically allocated as an array [1..BYTESIZE(ActivationRecord)] of CHAR.
I didn't do that because I was confused and thought there'd be overhead,
but that's just for open arrays probably, not fixed size arrays.

 


This is just for the first thread/activation.
Subsequent ones are unchanged, using NEW.

 


If there was a way to suppress the initializer, then a regular static
allocation could be used.

 


I went around removing the initializer for Activation and its contituent
types (RTHeapRep.ThreadState) but then thought of what I have.

 

 

Another fix might be to reorder InitRuntime to get the relevant modules/types

initialized earlier. I didn't look into that.

As well, perhaps, altering the data structure used during startup

to be initially statically allocated, if they aren't already, and large enough

to not need heap allocation for "long enough".

That is, if you use NEW here, there are two problems.

 - the infinite recursion on pushframe

   That is fixed by pushing the RAISE into a separate function.

   Currently the transform like that in RTAllocator is only for "perf",

   but depending on the final form of ThreadPThread, it could be for correctness.

   (I need to fix Win32 too.)

 - the fact that the typedefn isn't complete yet

   That's what I didn't look into adequately and could maybe

   come up with a reasonable fix for.

   You can see I changed the code so the result of an incomplete typedefn

   is clearer in the debugger.

 

 

Though again, a static allocation of an ARRAY 1..size of CHAR should

suffice just fine, perhaps being better than it was and better than it is.

 

 

I still use calloc for the conditions/mutexes though.

The point there was to remove the open array overhead.

And to, like, to keep things simpler for myself.

I think I was using the address of the array instead of its first element,

unaware of the difference. With calloc that layer isn't there.

But this can use NEW if you want, though again, it'll waste

space for the open array size.

 

 

These are all untraced allocations anyway, so I didn't think there's

any great advantage to RTAllocator.

I do realize now that it is what knows how to run type initializers.

That is a nice feature.

 


 - Jay
 


From: jay.krell at cornell.edu
To: hosking at cs.purdue.edu
Date: Tue, 31 Mar 2009 00:27:01 +0000
CC: m3devel at elegosoft.com
Subject: Re: [M3devel] per thread data?



I understand, but pushframe/popframe seem so icky..
I like to try to keep things pretty ok as I go, but I realize your opinion is the more often stated one -- don't optimize prematurely.
 
 - Jay

 
> From: hosking at cs.purdue.edu
> To: hosking at cs.purdue.edu
> Date: Tue, 31 Mar 2009 09:45:22 +1100
> CC: m3devel at elegosoft.com; jay.krell at cornell.edu
> Subject: Re: [M3devel] per thread data?
> 
> PS In general, I am loathe to make changes that complicate the code 
> based on performance assumptions that are only hypothetical. Better 
> to profile and see where the time is going before prematurely 
> "optimizing".
> 
> On 31 Mar 2009, at 09:42, Tony Hosking wrote:
> 
> > Yes, this is a tricky issue. At some point I seem to recall it 
> > being OK to have non-Modula-3 threads start running Modula-3 code, 
> > but I don't know for sure. I've never really liked the idea of 
> > having non-M3 threads.
> >
> > Are you using the existing handler maps and exception stack 
> > unwinding support for non-x86 NT?
> >
> > On 31 Mar 2009, at 09:15, Jay wrote:
> >
> >>
> >> hm, thinking about this more...
> >> What about threads not created by Modula-3 Fork() (or the first 
> >> thread)?
> >>
> >> It looks like exception handling had a chance of working on them
> >> before. Now they'll crash upon entering functions
> >> with try or raise or I presume lock.
> >>
> >>
> >> 1) ok?
> >>
> >>
> >> 2) do the heap alloc on demand?
> >> But is that enough? Can it be initialized without further context?
> >> Let's see..the circular list can be maintained without further 
> >> context.
> >> handle := pthread_self, ok. stack can probably be figured out, though
> >> that is probably just for gc and could be left alone for now, 
> >> continuing
> >> to not work (or fixed)...getcontext at least on some platforms can
> >> fill this in, or VirtualQuery/msomething (mmap family?)?
> >>
> >>
> >> 3) put back the second thread local?
> >>
> >>
> >> #2 has a chance of working better than before -- letting GC
> >> work on threads not created by Modula-3 runtime, something
> >> that has long bothered me...but I haven't done a complete analysis.
> >> Or at least maybe keep it working as it was
> >> For now there is somewhat of a regression, ie, when calling
> >> Modula-3 code on threads not created from Modula-3.
> >> Possibly the gc in this case was already dangerous?
> >> Failing to find references on other stacks?
> >> Or failing all allocations (should be easy to check but I have to 
> >> run..)
> >>
> >>
> >> - Jay
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ----------------------------------------
> >>> From: jay.krell at cornell.edu
> >>> To: hosking at cs.purdue.edu
> >>> CC: m3devel at elegosoft.com
> >>> Subject: RE: [M3devel] per thread data?
> >>> Date: Mon, 30 Mar 2009 13:23:10 +0000
> >>>
> >>>
> >>> This was surprisingly difficult.
> >>>
> >>>
> >>> InitHandlers is called much earlier than InitActivations.
> >>> InitActivations does a heap allocation.
> >>> InitHandlers did not.
> >>> The types involved are not yet initialized at this point, or 
> >>> somesuch.
> >>> You cannot NEW(Activation) in the first call to PushFrame.
> >>> So, maybe, use a global for the first one,
> >>> but then what happens is it gets reinitialized later by
> >>> the module initializer -- which is perhaps another indictment
> >>> of initializers..or maybe a special case in the depths of the 
> >>> system --
> >>> this module and anything it uses are subject to be called by
> >>> compiler-generated calls -- they can be called before their 
> >>> initializers
> >>> run.. seems to me the initialization could have happened 
> >>> "statically"
> >>> like in C.
> >>>
> >>>
> >>> Anyway, I should have this done shortly.
> >>> Trick is to use a local value and assign it to a heap block
> >>> allocated directly with calloc instead of RTAllocator.
> >>>
> >>>
> >>> The result is maybe faster, maybe slower.
> >>> Before, "try" cost pthread_getspecific and setspecific.
> >>> Now it will just cost getspecific.
> >>> But with another pointer deref and call to GetActivation
> >>> with its on-demand initialization.
> >>>
> >>>
> >>> Before, popframe only called setspecific.
> >>> Now it will only call getspecific, plus the indirect
> >>> and on-demand initialization.
> >>> The on-demand seems bogus in pop, given that push already had to 
> >>> occur.
> >>> So maybe that could be optimized.
> >>>
> >>>
> >>> This stuff is highly optimized in C and C++ on NT..
> >>> NT/x86 has a special thread local just for exception handling,
> >>> faster than all other thread locals.
> >>> All non-x86 NT platforms have stack walkers -- no cost for "try",
> >>> and then "throw" maps instruction pointer to data about how to
> >>> to unwind the stack, using a little mini-assembly code.
> >>>
> >>>
> >>> - Jay
> >>>
> >>>
> >>> ________________________________
> >>>> From: jay.krell at cornell.edu
> >>>> To: hosking at cs.purdue.edu
> >>>> Date: Thu, 19 Mar 2009 01:03:57 +0000
> >>>> CC: m3devel at elegosoft.com
> >>>> Subject: Re: [M3devel] per thread data?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Thanks, I should get around to that "soon" then.
> >>>>
> >>>>
> >>>>
> >>>> - Jay
> >>>>
> >>>>
> >>>>
> >>>> ________________________________
> >>>>
> >>>> From: hosking at cs.purdue.edu
> >>>> To: jay.krell at cornell.edu
> >>>> Date: Thu, 19 Mar 2009 10:14:59 +1100
> >>>> CC: m3devel at elegosoft.com
> >>>> Subject: Re: [M3devel] per thread data?
> >>>>
> >>>> I have no problem putting the exception handler stack thread 
> >>>> local into the activation thread local.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 18 Mar 2009, at 20:11, Jay wrote:
> >>>>
> >>>>
> >>>>
> >>>> I'm not looking at it right now, but doesn't seem rather piggy to 
> >>>> have two thread locals and data on the side?
> >>>>
> >>>>
> >>>> I'm guessing the data on the side is needed because we need to be 
> >>>> able to enumerate our threads, to suspend them all?
> >>>>
> >>>>
> >>>> I understand that having multiple thread locals optimizes their 
> >>>> use, but it seems greedy.
> >>>> vs. a small heap allocation that combines them.
> >>>>
> >>>> Or in fact.. presumably there could just be one thread local that 
> >>>> is the thread pointer, and the handler link could be put at the 
> >>>> start, for architectures where zero offset is smaller/faster than 
> >>>> non-zero offset.
> >>>>
> >>>>
> >>>> Another idea, of course, is to look into "__thread", 
> >>>> "__declspec(thread)".
> >>>>
> >>>> On Windows and probably all platforms they exist on, they are 
> >>>> nicely more efficient than pthread_get/setspecific, except on 
> >>>> Windows they don't really work acceptably prior to Vista -- they 
> >>>> only work in .exes and their static dependencies, not any .dll 
> >>>> you load after the process starts with LoadLibrary (dlopen).
> >>>>
> >>>>
> >>>> Does "__thread" work well on most non-Windows platforms?
> >>>> i.e. even if shared object is loaded with dlopen?
> >>>>
> >>>>
> >>>> I could have sworn I saw code out there that was "adaptive".
> >>>> It easily/efficiently checked if it was loaded with LoadLibrary 
> >>>> or not.
> >>>> If so, it'd TlsGet/SetValue (pthread_get/setspecific).
> >>>> If not, it'd use __declspec(thread) (__thread).
> >>>> The check was based on if __tlsindex was not zero or somesuch. I 
> >>>> couldn't track it down though.
> >>>>
> >>>>
> >>>> In either case, yes, I know, one of the thread locals at least is 
> >>>> gone on platforms that have stack walkers, e.g. Solaris, and 
> >>>> potentially NT, and maybe others.
> >>>>
> >>>>
> >>>> - Jay
> >>>>
> >>>>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20090331/891146a6/attachment-0002.html>


More information about the M3devel mailing list