[M3devel] Per thread data

Tony Hosking hosking at cs.purdue.edu
Tue Sep 14 14:09:46 CEST 2010


I really don't understand any of this motivation for a language that has heap allocation.  Where is the overhead to just holding a reference to heap-allocated state from each thread?

On 14 Sep 2010, at 01:36, Jay K wrote:

> Best results imho would be use lightly #ifdefed C and ignore OpenBSD.
>   Perhaps we should provide something for Modula-3, that is then portable to user threads.
>  
>  
> The Win32 equivalent is "thread local storage", or, if you are picky and care "fiber local storage".
>   TlsAlloc  
>   TlsGetValue  
>   TlsSetValue  
>   TlsFree  
>  
>  
> Search the web, you'll find the documentation right away.
>  
>  
> There are also Fls functions. They aren't in XP and fibers are exceedingly rarely used.
> Fiber local storage is thread local if fibers aren't in use.
> That is, using FLS is like a "superset" of TLS. You can't go wrong by using it, except you need to handle the LoadLibrary/GetProcAddress to see if the functions are available or not.
>  
>  
> For TLS (and FLS?), there is an array of 64 and an array of 1024, pointers. Each thread local is a pointer. You can heap allocate therein. Note that the heap allocation can fail. And Tls can be exhausted.
> The maximum number of thread locals is 1088.
>     On older operating systems it is 64. 
> TlsGet/SetValue just index into the appropriate array.
>  
>  
> If you are in an .exe, or a .dll that the .exe statically depends upon, or depend on Vista or newer, than __declspec(thread) is more convenient, faster, and I think has no limits. You just put that marker on your data and voila, the compiler, linker, and loader collaborate. (as well as the context switcher, which must be involved in both mechanisms).
>  
>  
> That is __declspec(thread) does not work in .dlls that are loaded via LoadLibrary, prior to Vista.
> This is a major downer, though decreasing in time now, as __declspec(thread) is so preferable.
>  
>  
> Non-Win32 systems often have "__thread" that is equivalent ot __declspec(thread), probably without an equivalent dlopen limitation, but of varying efficiency -- really, read the spec, it is confusing, there are so many "models" for the linker/loader to worry about.
>  
>  
> Current gcc emulates __thread in terms of "whatever" (tls or pthread) if there isn't anything better.
>  
>  
> I did some experiments though and __thread didn't seem all that much faster, at least once you go to the worst of the "models".
> "models" include like in the executable or a shared object, position-independent or not.
>  
>  
> m3core uses pthread_getspecific/setspecific -- like for every occurence of "TRY" -- but those experiments were around having it use __thread if it is available.
> Ultimately we just want to dramatically reduce the use of that specific data anyway -- by having a stack walker or using whatever the underlying native exception handling mechanism is..
>  
>  
> There is surprisingly more to this subject. Imagine you have a few thread locals and you pack them together into a heap allocated struct/record.
> It is tempting to think DllMain(THREAD_ATTACH) is the place to do the allocation. It is, but threads can be created before your .dll is loaded, and you won't get notifications for them. So you are still left having to do the allocation on-demand. You are still left with "failure points" on every access. 
>  
>  
> Thread locals are also "fragile", nearly as much so as globals.
> Like, written when you don't realize and their value "lost".
>  
>  
> What I mean, is, imagine this code:
>  
>  
> f = fopen(...)
> if (!f)
>   return errno;
>  
> errno is a thread local on all modern systems.
> Then imagine similar:
>  
>  
> f = fopen(...)
> if (!f)
>   return errno;
> g = fopen(...)
> if (!g)
> {
>   fclose(f); /* preferably in a destructor or finally! */
>   return errno;
> }
>  
>  
> See the bug? fclose might have clobbered errno.
>  
>  
> As well, accessing thread locals is always going to much slower than accessing a function local or parameter.
> You really should try to avoid them.
>  
>  
>  - Jay
> 
> 
>  
> > From: darko at darko.org
> > Date: Mon, 13 Sep 2010 22:08:19 -0700
> > To: dragisha at m3w.org
> > CC: m3devel at elegosoft.com
> > Subject: Re: [M3devel] Per thread data
> > 
> > I'm guessing that the pthread calls wouldn't be portable to Win32? Which is ok if there are similar calls for Win threads.
> > 
> > 
> > On 13/09/2010, at 9:57 PM, Dragiša Durić wrote:
> > 
> > > Only "better" would be using same storage as for MyId... Access time
> > > would be same as for that value, and if you use MyId access time is
> > > access to TLS (thread local storage) plus your structure manipulation.
> > > IMO, some IntRefTbl is better idea than list.
> > > 
> > > If you decide to use same storage, then it's pthread.getspecific and
> > > pthread.setspecific you are looking for.
> > > 
> > > On Mon, 2010-09-13 at 21:14 -0700, Darko wrote:
> > >> I need to have certain data structures allocated on a per thread basis. Right now I'm thinking of using the thread id from ThreadF.MyId() to index a list. Is there a better, more portable way of allocating on a per-thread basis?
> > >> 
> > >> Cheers,
> > >> Darko.
> > >> 
> > > 
> > > -- 
> > > Dragiša Durić <dragisha at m3w.org>
> > > 
> > 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100914/4c7c7e6b/attachment-0002.html>


More information about the M3devel mailing list