<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>No, only the subtype has to be revealed. I think both approaches have their usefulness, the Get/Set is useful for libraries.</div><br><div><div>On 14/09/2010, at 7:23 PM, Jay K wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><span class="Apple-style-span" style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div class="hmmessage" style="font-size: 10pt; font-family: Tahoma; ">It's not a hash lookup. It is a direct array index.<br>Disassemble kernel32!TlsGetValue.<br>It's much cheaper than a hash lookup, much more expensive than reading a global or local.<br> <br> <br>on Windows 7 x86:<br> <br>\bin\x86\cdb cmd<br>0:000> u kernel32!TlsGetValue<br>kernel32!TlsGetValue:<br>750111cd ff2510080175    jmp     dword ptr [kernel32!_imp__TlsGetValue (75010810)]<br><br>0:000> u poi(75010810)<br>KERNELBASE!TlsGetValue:<br>76532c95 8bff            mov     edi,edi<br>76532c97 55              push    ebp<br>76532c98 8bec            mov     ebp,esp<br>76532c9a 64a118000000    mov     eax,dword ptr fs:[00000018h] ; get per thread data base<br>76532ca0 8b4d08          mov     ecx,dword ptr [ebp+8]<br>76532ca3 83603400        and     dword ptr [eax+34h],0<br>76532ca7 83f940          cmp     ecx,40h ; compare index to 64<br>76532caa 7309            jae     KERNELBASE!TlsGetValue+0x20 (76532cb5) ; if above, goto 76532cb5<br>76532cac 8b8488100e0000  mov     eax,dword ptr [eax+ecx*4+0E10h] ; get the actual value<br>76532cb3 eb14            jmp     KERNELBASE!TlsGetValue+0x34 (76532cc9) ; goto end<br>76532cb5 81f940040000    cmp     ecx,440h ; compare to 1088<br>76532cbb 7210            jb      KERNELBASE!TlsGetValue+0x38 (76532ccd) if below, goto 76532ccd<br>76532cbd 680d0000c0      push    0C000000Dh  ; invalid parameter<br>76532cc2 e86b390200      call    KERNELBASE!BaseSetLastNTError (76556632)<br>76532cc7 33c0            xor     eax,eax ; return 0 for failure or for TlsSetValue not called<span class="Apple-converted-space"> </span><br>76532cc9 5d              pop     ebp<br>76532cca c20400          ret     4<br>76532ccd 8b80940f0000    mov     eax,dword ptr [eax+0F94h] ; get data base for values > 64<br>76532cd3 85c0            test    eax,eax ; compare to null<br>76532cd5 74f0            je      KERNELBASE!TlsGetValue+0x32 (76532cc7) ; if null, goto 76532cc7, which returns 0, this is if you have calls TlsAlloc but not TlsSetValue<br>76532cd7 8b848800ffffff  mov     eax,dword ptr [eax+ecx*4-100h] ; get the value for index > 64 (subtracting 64*4)<br>76532cde ebe9            jmp     KERNELBASE!TlsGetValue+0x34 (76532cc9) ; goto end<br><br> <br> <br>But your proposal might be reasonable anyway.<br>Except, wouldn't Thread.T have to be revealed in an .i3 file?<br> <br> <br> - Jay<br><br><br> <br>> From:<span class="Apple-converted-space"> </span><a href="mailto:darko@darko.org">darko@darko.org</a><br>> Date: Tue, 14 Sep 2010 17:38:20 -0700<br>> To:<span class="Apple-converted-space"> </span><a href="mailto:jay.krell@cornell.edu">jay.krell@cornell.edu</a><br>> CC:<span class="Apple-converted-space"> </span><a href="mailto:m3devel@elegosoft.com">m3devel@elegosoft.com</a><br>> Subject: Re: [M3devel] Per thread data<br>><span class="Apple-converted-space"> </span><br>> The issue I see is performance. That requires at least a hash lookup and will have performance nothing like a global variable.<br>><span class="Apple-converted-space"> </span><br>> I'd like to change the Thread interface so that Fork takes a parameter of a typecode which must be a subtype of Thread.T and allocates that if specified. Assuming Thread.Self() is not slow that should perform much better. Anyone see any problems with that?<br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> On 14/09/2010, at 7:05 AM, Jay K wrote:<br>><span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>> > Eh? Just one thread local for the entire process? I think not.<br>> ><span class="Apple-converted-space"> </span><br>> > More like:<br>> ><span class="Apple-converted-space"> </span><br>> > PROCEDURE AllocateThreadLocal(): INTEGER;<br>> > PROCEDURE GetThreadLocal(INTEGER):REFANY;<br>> ><span class="Apple-converted-space"> </span><br>> > PROCEDURE SetThreadLocal(INTEGER;REFANY);<br>> ><span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>> > or ThreadLocalAllocate, ThreadLocalGet, ThreadLocalSet.<br>> > The first set of names sounds better, the second "scales" better.<br>> > This seems like a constant dilemna.<br>> ><span class="Apple-converted-space"> </span><br>> > btw, important point I just remembered: unless you do extra work,<br>> > thread locals are hidden from the garbage collector.<br>> ><span class="Apple-converted-space"> </span><br>> > This is why the thread implementations seemingly store extra data.<br>> > The traced data is in globals, so the garbage collector can see them.<br>> ><span class="Apple-converted-space"> </span><br>> > - Jay<br>> ><span class="Apple-converted-space"> </span><br>> > ________________________________<br>> >> From:<span class="Apple-converted-space"> </span><a href="mailto:darko@darko.org">darko@darko.org</a><br>> >> Date: Tue, 14 Sep 2010 06:13:26 -0700<br>> >> To:<span class="Apple-converted-space"> </span><a href="mailto:jay.krell@cornell.edu">jay.krell@cornell.edu</a><br>> >> CC:<span class="Apple-converted-space"> </span><a href="mailto:m3devel@elegosoft.com">m3devel@elegosoft.com</a><br>> >> Subject: Re: [M3devel] Per thread data<br>> >><span class="Apple-converted-space"> </span><br>> >> I think a minimalist approach where you get to store and retrieve one<br>> >> traced reference per thread would do the trick. If people want more<br>> >> they can design their own abstraction on top of that. Maybe just add<br>> >> the following to the Thread interface:<br>> >><span class="Apple-converted-space"> </span><br>> >> PROCEDURE GetPrivate(): REFANY;<br>> >> PROCEDURE SetPrivate(ref: REFANY);<br>> >><span class="Apple-converted-space"> </span><br>> >><span class="Apple-converted-space"> </span><br>> >> On 14/09/2010, at 5:59 AM, Jay K wrote:<br>> >><span class="Apple-converted-space"> </span><br>> >> Tony -- then why does pthread_get/setspecific and Win32 TLS exist?<br>> >> What language doesn't support heap allocation were they designed to support?<br>> >> It is because code often fails to pass all the parameters through all<br>> >> functions.<br>> >><span class="Apple-converted-space"> </span><br>> >> Again the best current answer is:<br>> >> #ifdefed C that uses pthread_get/setspecific / Win32<br>> >> TlsAlloc/GetValue/SetValue, ignoring user threads/OpenBSD.<br>> >><span class="Apple-converted-space"> </span><br>> >> As well, you'd get very very far with merely:<br>> >> #ifdef _WIN32<br>> >> __declspec(thread)<br>> >> #else<br>> >> __thread<br>> >> #endif<br>> >><span class="Apple-converted-space"> </span><br>> >> Those work adequately for many many purposes, are more efficient, much<br>> >> more convenient, and very portable.<br>> >> I believe there is even an "official" C++ proposal along these lines.<br>> >><span class="Apple-converted-space"> </span><br>> >> We could easily abstract this -- the first -- into Modula-3 and then<br>> >> support it on user threads as well.<br>> >> Can anyone propose something?<br>> >> It has to go in m3core, as that is the only code that (is supposed to)<br>> >> know which thread implementation is in use.<br>> >><span class="Apple-converted-space"> </span><br>> >> - Jay<br>> >><span class="Apple-converted-space"> </span><br>> >><span class="Apple-converted-space"> </span><br>> >>> From:<span class="Apple-converted-space"> </span><a href="mailto:darko@darko.org">darko@darko.org</a><br>> >>> Date: Tue, 14 Sep 2010 05:34:59 -0700<br>> >>> To:<span class="Apple-converted-space"> </span><a href="mailto:hosking@cs.purdue.edu">hosking@cs.purdue.edu</a><br>> >>> CC:<span class="Apple-converted-space"> </span><a href="mailto:m3devel@elegosoft.com">m3devel@elegosoft.com</a><br>> >>> Subject: Re: [M3devel] Per thread data<br>> >>><span class="Apple-converted-space"> </span><br>> >>> That's the idea but each object can only call another object<br>> >> allocated for the same thread, so it needs to find the currently<br>> >> running thread's copy of the desired object.<br>> >>><span class="Apple-converted-space"> </span><br>> >>> On 14/09/2010, at 5:08 AM, Tony Hosking wrote:<br>> >>><span class="Apple-converted-space"> </span><br>> >>>> If they are truly private to each thread, then allocating them in<br>> >> the heap while still not locking them would be adequate. Why not?<br>> >>>><span class="Apple-converted-space"> </span><br>> >>>> On 14 Sep 2010, at 01:08, Darko wrote:<br>> >>>><span class="Apple-converted-space"> </span><br>> >>>>> I have lots of objects that are implemented on the basis that no<br>> >> calls on them can be re-entered, which also avoids the need for locking<br>> >> them in a threaded environment, which is impractical. The result is<br>> >> that I need one copy of each object in each thread. There is<br>> >> approximately one allocated object per object type so space is not a<br>> >> big issue. I'm looking at a small number of threads, probably maximum<br>> >> two per processor core. With modern processors I'm assuming that a<br>> >> linear search through a small array is actually quicker that a hash<br>> >> table.<br>> >>>>><span class="Apple-converted-space"> </span><br>> >>>>> On 13/09/2010, at 9:55 PM, Mika Nystrom wrote:<br>> >>>>><span class="Apple-converted-space"> </span><br>> >>>>>> Darko writes:<br>> >>>>>>> I need to have certain data structures allocated on a per thread<br>> >> basis. =<br>> >>>>>>> Right now I'm thinking of using the thread id from ThreadF.MyId() to =<br>> >>>>>>> index a list. Is there a better, more portable way of allocating<br>> >> on a =<br>> >>>>>>> per-thread basis?<br>> >>>>>>><span class="Apple-converted-space"> </span><br>> >>>>>>> Cheers,<br>> >>>>>>> Darko.<br>> >>>>>><span class="Apple-converted-space"> </span><br>> >>>>>> In my experience what you suggest works just fine (remember to lock the<br>> >>>>>> doors, though!) But you can get disappointing performance on some<br>> >> thread<br>> >>>>>> implementations (ones that involve switching into supervisor mode more<br>> >>>>>> than necessary when accessing pthread structures).<br>> >>>>>><span class="Apple-converted-space"> </span><br>> >>>>>> Generally speaking I avoid needing per-thread structures as much<br>> >> as possible<br>> >>>>>> and instead put what you need in the Closure and then pass<br>> >> pointers around.<br>> >>>>>> Of course you can mix the methods for a compromise between speed and<br>> >>>>>> cluttered code...<br>> >>>>>><span class="Apple-converted-space"> </span><br>> >>>>>> I think what you want is also not a list but a Table.<br>> >>>>>><span class="Apple-converted-space"> </span><br>> >>>>>> Mika<br>> >>>>><span class="Apple-converted-space"> </span><br>> >>>><span class="Apple-converted-space"> </span><br>> >>><span class="Apple-converted-space"> </span><br>> >><span class="Apple-converted-space"> </span><br>> ><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br></div></span></blockquote></div><br></body></html>