[M3devel] Per thread data
Jay K
jay.krell at cornell.edu
Wed Sep 15 04:23:04 CEST 2010
It's not a hash lookup. It is a direct array index.
Disassemble kernel32!TlsGetValue.
It's much cheaper than a hash lookup, much more expensive than reading a global or local.
on Windows 7 x86:
\bin\x86\cdb cmd
0:000> u kernel32!TlsGetValue
kernel32!TlsGetValue:
750111cd ff2510080175 jmp dword ptr [kernel32!_imp__TlsGetValue (75010810)]
0:000> u poi(75010810)
KERNELBASE!TlsGetValue:
76532c95 8bff mov edi,edi
76532c97 55 push ebp
76532c98 8bec mov ebp,esp
76532c9a 64a118000000 mov eax,dword ptr fs:[00000018h] ; get per thread data base
76532ca0 8b4d08 mov ecx,dword ptr [ebp+8]
76532ca3 83603400 and dword ptr [eax+34h],0
76532ca7 83f940 cmp ecx,40h ; compare index to 64
76532caa 7309 jae KERNELBASE!TlsGetValue+0x20 (76532cb5) ; if above, goto 76532cb5
76532cac 8b8488100e0000 mov eax,dword ptr [eax+ecx*4+0E10h] ; get the actual value
76532cb3 eb14 jmp KERNELBASE!TlsGetValue+0x34 (76532cc9) ; goto end
76532cb5 81f940040000 cmp ecx,440h ; compare to 1088
76532cbb 7210 jb KERNELBASE!TlsGetValue+0x38 (76532ccd) if below, goto 76532ccd
76532cbd 680d0000c0 push 0C000000Dh ; invalid parameter
76532cc2 e86b390200 call KERNELBASE!BaseSetLastNTError (76556632)
76532cc7 33c0 xor eax,eax ; return 0 for failure or for TlsSetValue not called
76532cc9 5d pop ebp
76532cca c20400 ret 4
76532ccd 8b80940f0000 mov eax,dword ptr [eax+0F94h] ; get data base for values > 64
76532cd3 85c0 test eax,eax ; compare to null
76532cd5 74f0 je KERNELBASE!TlsGetValue+0x32 (76532cc7) ; if null, goto 76532cc7, which returns 0, this is if you have calls TlsAlloc but not TlsSetValue
76532cd7 8b848800ffffff mov eax,dword ptr [eax+ecx*4-100h] ; get the value for index > 64 (subtracting 64*4)
76532cde ebe9 jmp KERNELBASE!TlsGetValue+0x34 (76532cc9) ; goto end
But your proposal might be reasonable anyway.
Except, wouldn't Thread.T have to be revealed in an .i3 file?
- Jay
> From: darko at darko.org
> Date: Tue, 14 Sep 2010 17:38:20 -0700
> To: jay.krell at cornell.edu
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] Per thread data
>
> The issue I see is performance. That requires at least a hash lookup and will have performance nothing like a global variable.
>
> I'd like to change the Thread interface so that Fork takes a parameter of a typecode which must be a subtype of Thread.T and allocates that if specified. Assuming Thread.Self() is not slow that should perform much better. Anyone see any problems with that?
>
>
> On 14/09/2010, at 7:05 AM, Jay K wrote:
>
> >
> > Eh? Just one thread local for the entire process? I think not.
> >
> > More like:
> >
> > PROCEDURE AllocateThreadLocal(): INTEGER;
> > PROCEDURE GetThreadLocal(INTEGER):REFANY;
> >
> > PROCEDURE SetThreadLocal(INTEGER;REFANY);
> >
> >
> > or ThreadLocalAllocate, ThreadLocalGet, ThreadLocalSet.
> > The first set of names sounds better, the second "scales" better.
> > This seems like a constant dilemna.
> >
> > btw, important point I just remembered: unless you do extra work,
> > thread locals are hidden from the garbage collector.
> >
> > This is why the thread implementations seemingly store extra data.
> > The traced data is in globals, so the garbage collector can see them.
> >
> > - Jay
> >
> > ________________________________
> >> From: darko at darko.org
> >> Date: Tue, 14 Sep 2010 06:13:26 -0700
> >> To: jay.krell at cornell.edu
> >> CC: m3devel at elegosoft.com
> >> Subject: Re: [M3devel] Per thread data
> >>
> >> I think a minimalist approach where you get to store and retrieve one
> >> traced reference per thread would do the trick. If people want more
> >> they can design their own abstraction on top of that. Maybe just add
> >> the following to the Thread interface:
> >>
> >> PROCEDURE GetPrivate(): REFANY;
> >> PROCEDURE SetPrivate(ref: REFANY);
> >>
> >>
> >> On 14/09/2010, at 5:59 AM, Jay K wrote:
> >>
> >> Tony -- then why does pthread_get/setspecific and Win32 TLS exist?
> >> What language doesn't support heap allocation were they designed to support?
> >> It is because code often fails to pass all the parameters through all
> >> functions.
> >>
> >> Again the best current answer is:
> >> #ifdefed C that uses pthread_get/setspecific / Win32
> >> TlsAlloc/GetValue/SetValue, ignoring user threads/OpenBSD.
> >>
> >> As well, you'd get very very far with merely:
> >> #ifdef _WIN32
> >> __declspec(thread)
> >> #else
> >> __thread
> >> #endif
> >>
> >> Those work adequately for many many purposes, are more efficient, much
> >> more convenient, and very portable.
> >> I believe there is even an "official" C++ proposal along these lines.
> >>
> >> We could easily abstract this -- the first -- into Modula-3 and then
> >> support it on user threads as well.
> >> Can anyone propose something?
> >> It has to go in m3core, as that is the only code that (is supposed to)
> >> know which thread implementation is in use.
> >>
> >> - Jay
> >>
> >>
> >>> From: darko at darko.org
> >>> Date: Tue, 14 Sep 2010 05:34:59 -0700
> >>> To: hosking at cs.purdue.edu
> >>> CC: m3devel at elegosoft.com
> >>> Subject: Re: [M3devel] Per thread data
> >>>
> >>> That's the idea but each object can only call another object
> >> allocated for the same thread, so it needs to find the currently
> >> running thread's copy of the desired object.
> >>>
> >>> On 14/09/2010, at 5:08 AM, Tony Hosking wrote:
> >>>
> >>>> If they are truly private to each thread, then allocating them in
> >> the heap while still not locking them would be adequate. Why not?
> >>>>
> >>>> On 14 Sep 2010, at 01:08, Darko wrote:
> >>>>
> >>>>> I have lots of objects that are implemented on the basis that no
> >> calls on them can be re-entered, which also avoids the need for locking
> >> them in a threaded environment, which is impractical. The result is
> >> that I need one copy of each object in each thread. There is
> >> approximately one allocated object per object type so space is not a
> >> big issue. I'm looking at a small number of threads, probably maximum
> >> two per processor core. With modern processors I'm assuming that a
> >> linear search through a small array is actually quicker that a hash
> >> table.
> >>>>>
> >>>>> On 13/09/2010, at 9:55 PM, Mika Nystrom wrote:
> >>>>>
> >>>>>> Darko writes:
> >>>>>>> I need to have certain data structures allocated on a per thread
> >> basis. =
> >>>>>>> Right now I'm thinking of using the thread id from ThreadF.MyId() to =
> >>>>>>> index a list. Is there a better, more portable way of allocating
> >> on a =
> >>>>>>> per-thread basis?
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Darko.
> >>>>>>
> >>>>>> In my experience what you suggest works just fine (remember to lock the
> >>>>>> doors, though!) But you can get disappointing performance on some
> >> thread
> >>>>>> implementations (ones that involve switching into supervisor mode more
> >>>>>> than necessary when accessing pthread structures).
> >>>>>>
> >>>>>> Generally speaking I avoid needing per-thread structures as much
> >> as possible
> >>>>>> and instead put what you need in the Closure and then pass
> >> pointers around.
> >>>>>> Of course you can mix the methods for a compromise between speed and
> >>>>>> cluttered code...
> >>>>>>
> >>>>>> I think what you want is also not a list but a Table.
> >>>>>>
> >>>>>> Mika
> >>>>>
> >>>>
> >>>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100915/67b047e8/attachment-0002.html>
More information about the M3devel
mailing list