[M3devel] Per thread data
Darko
darko at darko.org
Wed Sep 15 05:00:34 CEST 2010
No, only the subtype has to be revealed. I think both approaches have their usefulness, the Get/Set is useful for libraries.
On 14/09/2010, at 7:23 PM, Jay K wrote:
> It's not a hash lookup. It is a direct array index.
> Disassemble kernel32!TlsGetValue.
> It's much cheaper than a hash lookup, much more expensive than reading a global or local.
>
>
> on Windows 7 x86:
>
> \bin\x86\cdb cmd
> 0:000> u kernel32!TlsGetValue
> kernel32!TlsGetValue:
> 750111cd ff2510080175 jmp dword ptr [kernel32!_imp__TlsGetValue (75010810)]
>
> 0:000> u poi(75010810)
> KERNELBASE!TlsGetValue:
> 76532c95 8bff mov edi,edi
> 76532c97 55 push ebp
> 76532c98 8bec mov ebp,esp
> 76532c9a 64a118000000 mov eax,dword ptr fs:[00000018h] ; get per thread data base
> 76532ca0 8b4d08 mov ecx,dword ptr [ebp+8]
> 76532ca3 83603400 and dword ptr [eax+34h],0
> 76532ca7 83f940 cmp ecx,40h ; compare index to 64
> 76532caa 7309 jae KERNELBASE!TlsGetValue+0x20 (76532cb5) ; if above, goto 76532cb5
> 76532cac 8b8488100e0000 mov eax,dword ptr [eax+ecx*4+0E10h] ; get the actual value
> 76532cb3 eb14 jmp KERNELBASE!TlsGetValue+0x34 (76532cc9) ; goto end
> 76532cb5 81f940040000 cmp ecx,440h ; compare to 1088
> 76532cbb 7210 jb KERNELBASE!TlsGetValue+0x38 (76532ccd) if below, goto 76532ccd
> 76532cbd 680d0000c0 push 0C000000Dh ; invalid parameter
> 76532cc2 e86b390200 call KERNELBASE!BaseSetLastNTError (76556632)
> 76532cc7 33c0 xor eax,eax ; return 0 for failure or for TlsSetValue not called
> 76532cc9 5d pop ebp
> 76532cca c20400 ret 4
> 76532ccd 8b80940f0000 mov eax,dword ptr [eax+0F94h] ; get data base for values > 64
> 76532cd3 85c0 test eax,eax ; compare to null
> 76532cd5 74f0 je KERNELBASE!TlsGetValue+0x32 (76532cc7) ; if null, goto 76532cc7, which returns 0, this is if you have calls TlsAlloc but not TlsSetValue
> 76532cd7 8b848800ffffff mov eax,dword ptr [eax+ecx*4-100h] ; get the value for index > 64 (subtracting 64*4)
> 76532cde ebe9 jmp KERNELBASE!TlsGetValue+0x34 (76532cc9) ; goto end
>
>
>
> But your proposal might be reasonable anyway.
> Except, wouldn't Thread.T have to be revealed in an .i3 file?
>
>
> - Jay
>
>
>
> > From: darko at darko.org
> > Date: Tue, 14 Sep 2010 17:38:20 -0700
> > To: jay.krell at cornell.edu
> > CC: m3devel at elegosoft.com
> > Subject: Re: [M3devel] Per thread data
> >
> > The issue I see is performance. That requires at least a hash lookup and will have performance nothing like a global variable.
> >
> > I'd like to change the Thread interface so that Fork takes a parameter of a typecode which must be a subtype of Thread.T and allocates that if specified. Assuming Thread.Self() is not slow that should perform much better. Anyone see any problems with that?
> >
> >
> > On 14/09/2010, at 7:05 AM, Jay K wrote:
> >
> > >
> > > Eh? Just one thread local for the entire process? I think not.
> > >
> > > More like:
> > >
> > > PROCEDURE AllocateThreadLocal(): INTEGER;
> > > PROCEDURE GetThreadLocal(INTEGER):REFANY;
> > >
> > > PROCEDURE SetThreadLocal(INTEGER;REFANY);
> > >
> > >
> > > or ThreadLocalAllocate, ThreadLocalGet, ThreadLocalSet.
> > > The first set of names sounds better, the second "scales" better.
> > > This seems like a constant dilemna.
> > >
> > > btw, important point I just remembered: unless you do extra work,
> > > thread locals are hidden from the garbage collector.
> > >
> > > This is why the thread implementations seemingly store extra data.
> > > The traced data is in globals, so the garbage collector can see them.
> > >
> > > - Jay
> > >
> > > ________________________________
> > >> From: darko at darko.org
> > >> Date: Tue, 14 Sep 2010 06:13:26 -0700
> > >> To: jay.krell at cornell.edu
> > >> CC: m3devel at elegosoft.com
> > >> Subject: Re: [M3devel] Per thread data
> > >>
> > >> I think a minimalist approach where you get to store and retrieve one
> > >> traced reference per thread would do the trick. If people want more
> > >> they can design their own abstraction on top of that. Maybe just add
> > >> the following to the Thread interface:
> > >>
> > >> PROCEDURE GetPrivate(): REFANY;
> > >> PROCEDURE SetPrivate(ref: REFANY);
> > >>
> > >>
> > >> On 14/09/2010, at 5:59 AM, Jay K wrote:
> > >>
> > >> Tony -- then why does pthread_get/setspecific and Win32 TLS exist?
> > >> What language doesn't support heap allocation were they designed to support?
> > >> It is because code often fails to pass all the parameters through all
> > >> functions.
> > >>
> > >> Again the best current answer is:
> > >> #ifdefed C that uses pthread_get/setspecific / Win32
> > >> TlsAlloc/GetValue/SetValue, ignoring user threads/OpenBSD.
> > >>
> > >> As well, you'd get very very far with merely:
> > >> #ifdef _WIN32
> > >> __declspec(thread)
> > >> #else
> > >> __thread
> > >> #endif
> > >>
> > >> Those work adequately for many many purposes, are more efficient, much
> > >> more convenient, and very portable.
> > >> I believe there is even an "official" C++ proposal along these lines.
> > >>
> > >> We could easily abstract this -- the first -- into Modula-3 and then
> > >> support it on user threads as well.
> > >> Can anyone propose something?
> > >> It has to go in m3core, as that is the only code that (is supposed to)
> > >> know which thread implementation is in use.
> > >>
> > >> - Jay
> > >>
> > >>
> > >>> From: darko at darko.org
> > >>> Date: Tue, 14 Sep 2010 05:34:59 -0700
> > >>> To: hosking at cs.purdue.edu
> > >>> CC: m3devel at elegosoft.com
> > >>> Subject: Re: [M3devel] Per thread data
> > >>>
> > >>> That's the idea but each object can only call another object
> > >> allocated for the same thread, so it needs to find the currently
> > >> running thread's copy of the desired object.
> > >>>
> > >>> On 14/09/2010, at 5:08 AM, Tony Hosking wrote:
> > >>>
> > >>>> If they are truly private to each thread, then allocating them in
> > >> the heap while still not locking them would be adequate. Why not?
> > >>>>
> > >>>> On 14 Sep 2010, at 01:08, Darko wrote:
> > >>>>
> > >>>>> I have lots of objects that are implemented on the basis that no
> > >> calls on them can be re-entered, which also avoids the need for locking
> > >> them in a threaded environment, which is impractical. The result is
> > >> that I need one copy of each object in each thread. There is
> > >> approximately one allocated object per object type so space is not a
> > >> big issue. I'm looking at a small number of threads, probably maximum
> > >> two per processor core. With modern processors I'm assuming that a
> > >> linear search through a small array is actually quicker that a hash
> > >> table.
> > >>>>>
> > >>>>> On 13/09/2010, at 9:55 PM, Mika Nystrom wrote:
> > >>>>>
> > >>>>>> Darko writes:
> > >>>>>>> I need to have certain data structures allocated on a per thread
> > >> basis. =
> > >>>>>>> Right now I'm thinking of using the thread id from ThreadF.MyId() to =
> > >>>>>>> index a list. Is there a better, more portable way of allocating
> > >> on a =
> > >>>>>>> per-thread basis?
> > >>>>>>>
> > >>>>>>> Cheers,
> > >>>>>>> Darko.
> > >>>>>>
> > >>>>>> In my experience what you suggest works just fine (remember to lock the
> > >>>>>> doors, though!) But you can get disappointing performance on some
> > >> thread
> > >>>>>> implementations (ones that involve switching into supervisor mode more
> > >>>>>> than necessary when accessing pthread structures).
> > >>>>>>
> > >>>>>> Generally speaking I avoid needing per-thread structures as much
> > >> as possible
> > >>>>>> and instead put what you need in the Closure and then pass
> > >> pointers around.
> > >>>>>> Of course you can mix the methods for a compromise between speed and
> > >>>>>> cluttered code...
> > >>>>>>
> > >>>>>> I think what you want is also not a list but a Table.
> > >>>>>>
> > >>>>>> Mika
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100914/552b6253/attachment-0002.html>
More information about the M3devel
mailing list