[M3devel] userthreads vs. pthreads performance?
Jay K
jay.krell at cornell.edu
Mon Mar 29 05:15:41 CEST 2010
> Getting thread locals should not require a kernel call
Indeed, on Linux/x86 it does not, looks pretty ok:
00000380 <__pthread_getspecific>:
380: 55 push %ebp
381: 89 e5 mov %esp,%ebp
383: 8b 55 08 mov 0x8(%ebp),%edx
386: 81 fa ff 03 00 00 cmp $0x3ff,%edx
38c: 76 04 jbe 392 <__pthread_getspecific+0x12>
38e: 5d pop %ebp
38f: 31 c0 xor %eax,%eax
391: c3 ret
392: 89 d0 mov %edx,%eax
394: c1 e8 05 shr $0x5,%eax
397: 8d 0c 85 1c 01 00 00 lea 0x11c(,%eax,4),%ecx
39e: 65 8b 01 mov %gs:(%ecx),%eax
3a1: 85 c0 test %eax,%eax
3a3: 74 e9 je 38e <__pthread_getspecific+0xe>
3a5: 8b 04 d5 00 00 00 00 mov 0x0(,%edx,8),%eax
3ac: 85 c0 test %eax,%eax
3ae: 74 de je 38e <__pthread_getspecific+0xe>
3b0: 65 8b 01 mov %gs:(%ecx),%eax
3b3: 83 e2 1f and $0x1f,%edx
3b6: 8b 04 90 mov (%eax,%edx,4),%eax
3b9: 5d pop %ebp
3ba: c3 ret
> Entering an uncontended pthread mutex should not be expensive
Linux/x86:
00001020 <__pthread_self>:
1020: 55 push %ebp
1021: 89 e5 mov %esp,%ebp
1023: 65 a1 50 00 00 00 mov %gs:0x50,%eax
1029: 5d pop %ebp
102a: c3 ret
102b: 90 nop
102c: 8d 74 26 00 lea 0x0(%esi),%esi
pretty lame, five instructions were only two are needed.
000004f0 <__pthread_mutex_lock>:
.. too much to read through..but I think no kernel call..
- Jay
From: jay.krell at cornell.edu
To: dragisha at m3w.org; mika at async.async.caltech.edu
CC: m3devel at elegosoft.com
Subject: RE: [M3devel] userthreads vs. pthreads performance?
Date: Sun, 28 Mar 2010 20:46:01 +0000
O(1) scheduling is not a new idea. Just look at NT and probably Solaris and probably all the other non-free systems (AIX, Irix, HP-UX, Tru64, VMS, etc.)
Getting thread locals should not require a kernel call. It doesn't on NT. We can optimize this somewhat on most systems with __thread. I had that in briefly.
Entering an uncontended pthread mutex should not be expensive -- at least no kernel call, but granted a call and atomic op. Two calls because of the C layer.
But user threads pay for a call too of course.
Maybe I should profile some of this..
- Jay
> From: dragisha at m3w.org
> To: mika at async.async.caltech.edu
> Date: Sun, 28 Mar 2010 21:14:57 +0200
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] userthreads vs. pthreads performance?
>
> I remember reading (long time ago) about how these (FUTEXes) are
> efficient in LINUX... Can I have your test code to try?
>
> On Sun, 2010-03-28 at 12:11 -0700, Mika Nystrom wrote:
> > Well I have run programs on PPC_DARWIN and FreeBSD<X> and seen these sorts of things...
> >
> > =?UTF-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes:
> > >Which platform?
> > >
> > >On Sun, 2010-03-28 at 11:57 -0700, Mika Nystrom wrote:
> > >> Yep, sounds right.
> > >>
> > >> I was profiling some other thread-using code that slowed down
> > >> enormously
> > >> because of pthreads and it turned out the program was spending ~95%
> > >> of its time in accessing the thread locals via one of the pthread_
> > >> functions.
> > >> (The overhead of entering the kernel.)
> > >--
> > >Dragiša Durić <dragisha at m3w.org>
> --
> Dragiša Durić <dragisha at m3w.org>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100329/c9b9029a/attachment-0002.html>
More information about the M3devel
mailing list