<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style>
</head>
<body class='hmmessage'>
It's not a hash lookup. It is a direct array index.<BR>
Disassemble kernel32!TlsGetValue.<BR>
It's much cheaper than a hash lookup, much more expensive than reading a global or local.<BR>
<BR>
<BR>
on Windows 7 x86:<BR>
<BR>
\bin\x86\cdb cmd<BR>
0:000> u kernel32!TlsGetValue<BR>kernel32!TlsGetValue:<BR>750111cd ff2510080175 jmp dword ptr [kernel32!_imp__TlsGetValue (75010810)]<BR><BR>
0:000> u poi(75010810)<BR>KERNELBASE!TlsGetValue:<BR>76532c95 8bff mov edi,edi<BR>76532c97 55 push ebp<BR>76532c98 8bec mov ebp,esp<BR>76532c9a 64a118000000 mov eax,dword ptr fs:[00000018h] ; get per thread data base<BR>76532ca0 8b4d08 mov ecx,dword ptr [ebp+8]<BR>76532ca3 83603400 and dword ptr [eax+34h],0<BR>76532ca7 83f940 cmp ecx,40h ; compare index to 64<BR>76532caa 7309 jae KERNELBASE!TlsGetValue+0x20 (76532cb5) ; if above, goto 76532cb5<BR>76532cac 8b8488100e0000 mov eax,dword ptr [eax+ecx*4+0E10h] ; get the actual value<BR>76532cb3 eb14 jmp KERNELBASE!TlsGetValue+0x34 (76532cc9) ; goto end<BR>76532cb5 81f940040000 cmp ecx,440h ; compare to 1088<BR>76532cbb 7210 jb KERNELBASE!TlsGetValue+0x38 (76532ccd) if below, goto 76532ccd<BR>76532cbd 680d0000c0 push 0C000000Dh ; invalid parameter<BR>76532cc2 e86b390200 call KERNELBASE!BaseSetLastNTError (76556632)<BR>76532cc7 33c0 xor eax,eax ; return 0 for failure or for TlsSetValue not called <BR>76532cc9 5d pop ebp<BR>76532cca c20400 ret 4<BR>76532ccd 8b80940f0000 mov eax,dword ptr [eax+0F94h] ; get data base for values > 64<BR>
76532cd3 85c0 test eax,eax ; compare to null<BR>76532cd5 74f0 je KERNELBASE!TlsGetValue+0x32 (76532cc7) ; if null, goto 76532cc7, which returns 0, this is if you have calls TlsAlloc but not TlsSetValue<BR>76532cd7 8b848800ffffff mov eax,dword ptr [eax+ecx*4-100h] ; get the value for index > 64 (subtracting 64*4)<BR>76532cde ebe9 jmp KERNELBASE!TlsGetValue+0x34 (76532cc9) ; goto end<BR><BR>
<BR>
<BR>
But your proposal might be reasonable anyway.<BR>
Except, wouldn't Thread.T have to be revealed in an .i3 file?<BR>
<BR>
<BR>
- Jay<BR><BR><BR> <BR>
> From: darko@darko.org<BR>> Date: Tue, 14 Sep 2010 17:38:20 -0700<BR>> To: jay.krell@cornell.edu<BR>> CC: m3devel@elegosoft.com<BR>> Subject: Re: [M3devel] Per thread data<BR>> <BR>> The issue I see is performance. That requires at least a hash lookup and will have performance nothing like a global variable.<BR>> <BR>> I'd like to change the Thread interface so that Fork takes a parameter of a typecode which must be a subtype of Thread.T and allocates that if specified. Assuming Thread.Self() is not slow that should perform much better. Anyone see any problems with that?<BR>> <BR>> <BR>> On 14/09/2010, at 7:05 AM, Jay K wrote:<BR>> <BR>> > <BR>> > Eh? Just one thread local for the entire process? I think not.<BR>> > <BR>> > More like:<BR>> > <BR>> > PROCEDURE AllocateThreadLocal(): INTEGER;<BR>> > PROCEDURE GetThreadLocal(INTEGER):REFANY;<BR>> > <BR>> > PROCEDURE SetThreadLocal(INTEGER;REFANY);<BR>> > <BR>> > <BR>> > or ThreadLocalAllocate, ThreadLocalGet, ThreadLocalSet.<BR>> > The first set of names sounds better, the second "scales" better.<BR>> > This seems like a constant dilemna.<BR>> > <BR>> > btw, important point I just remembered: unless you do extra work,<BR>> > thread locals are hidden from the garbage collector.<BR>> > <BR>> > This is why the thread implementations seemingly store extra data.<BR>> > The traced data is in globals, so the garbage collector can see them.<BR>> > <BR>> > - Jay<BR>> > <BR>> > ________________________________<BR>> >> From: darko@darko.org<BR>> >> Date: Tue, 14 Sep 2010 06:13:26 -0700<BR>> >> To: jay.krell@cornell.edu<BR>> >> CC: m3devel@elegosoft.com<BR>> >> Subject: Re: [M3devel] Per thread data<BR>> >> <BR>> >> I think a minimalist approach where you get to store and retrieve one<BR>> >> traced reference per thread would do the trick. If people want more<BR>> >> they can design their own abstraction on top of that. Maybe just add<BR>> >> the following to the Thread interface:<BR>> >> <BR>> >> PROCEDURE GetPrivate(): REFANY;<BR>> >> PROCEDURE SetPrivate(ref: REFANY);<BR>> >> <BR>> >> <BR>> >> On 14/09/2010, at 5:59 AM, Jay K wrote:<BR>> >> <BR>> >> Tony -- then why does pthread_get/setspecific and Win32 TLS exist?<BR>> >> What language doesn't support heap allocation were they designed to support?<BR>> >> It is because code often fails to pass all the parameters through all<BR>> >> functions.<BR>> >> <BR>> >> Again the best current answer is:<BR>> >> #ifdefed C that uses pthread_get/setspecific / Win32<BR>> >> TlsAlloc/GetValue/SetValue, ignoring user threads/OpenBSD.<BR>> >> <BR>> >> As well, you'd get very very far with merely:<BR>> >> #ifdef _WIN32<BR>> >> __declspec(thread)<BR>> >> #else<BR>> >> __thread<BR>> >> #endif<BR>> >> <BR>> >> Those work adequately for many many purposes, are more efficient, much<BR>> >> more convenient, and very portable.<BR>> >> I believe there is even an "official" C++ proposal along these lines.<BR>> >> <BR>> >> We could easily abstract this -- the first -- into Modula-3 and then<BR>> >> support it on user threads as well.<BR>> >> Can anyone propose something?<BR>> >> It has to go in m3core, as that is the only code that (is supposed to)<BR>> >> know which thread implementation is in use.<BR>> >> <BR>> >> - Jay<BR>> >> <BR>> >> <BR>> >>> From: darko@darko.org<BR>> >>> Date: Tue, 14 Sep 2010 05:34:59 -0700<BR>> >>> To: hosking@cs.purdue.edu<BR>> >>> CC: m3devel@elegosoft.com<BR>> >>> Subject: Re: [M3devel] Per thread data<BR>> >>> <BR>> >>> That's the idea but each object can only call another object<BR>> >> allocated for the same thread, so it needs to find the currently<BR>> >> running thread's copy of the desired object.<BR>> >>> <BR>> >>> On 14/09/2010, at 5:08 AM, Tony Hosking wrote:<BR>> >>> <BR>> >>>> If they are truly private to each thread, then allocating them in<BR>> >> the heap while still not locking them would be adequate. Why not?<BR>> >>>> <BR>> >>>> On 14 Sep 2010, at 01:08, Darko wrote:<BR>> >>>> <BR>> >>>>> I have lots of objects that are implemented on the basis that no<BR>> >> calls on them can be re-entered, which also avoids the need for locking<BR>> >> them in a threaded environment, which is impractical. The result is<BR>> >> that I need one copy of each object in each thread. There is<BR>> >> approximately one allocated object per object type so space is not a<BR>> >> big issue. I'm looking at a small number of threads, probably maximum<BR>> >> two per processor core. With modern processors I'm assuming that a<BR>> >> linear search through a small array is actually quicker that a hash<BR>> >> table.<BR>> >>>>> <BR>> >>>>> On 13/09/2010, at 9:55 PM, Mika Nystrom wrote:<BR>> >>>>> <BR>> >>>>>> Darko writes:<BR>> >>>>>>> I need to have certain data structures allocated on a per thread<BR>> >> basis. =<BR>> >>>>>>> Right now I'm thinking of using the thread id from ThreadF.MyId() to =<BR>> >>>>>>> index a list. Is there a better, more portable way of allocating<BR>> >> on a =<BR>> >>>>>>> per-thread basis?<BR>> >>>>>>> <BR>> >>>>>>> Cheers,<BR>> >>>>>>> Darko.<BR>> >>>>>> <BR>> >>>>>> In my experience what you suggest works just fine (remember to lock the<BR>> >>>>>> doors, though!) But you can get disappointing performance on some<BR>> >> thread<BR>> >>>>>> implementations (ones that involve switching into supervisor mode more<BR>> >>>>>> than necessary when accessing pthread structures).<BR>> >>>>>> <BR>> >>>>>> Generally speaking I avoid needing per-thread structures as much<BR>> >> as possible<BR>> >>>>>> and instead put what you need in the Closure and then pass<BR>> >> pointers around.<BR>> >>>>>> Of course you can mix the methods for a compromise between speed and<BR>> >>>>>> cluttered code...<BR>> >>>>>> <BR>> >>>>>> I think what you want is also not a list but a Table.<BR>> >>>>>> <BR>> >>>>>> Mika<BR>> >>>>> <BR>> >>>> <BR>> >>> <BR>> >> <BR>> > <BR>> <BR> </body>
</html>