[M3devel] [M3commit] how to switch userthreads on/off

Wed Apr 29 23:16:24 CEST 2009

Really? I mean, partly, definitely. And I was looking at the "inflating" to see how they do conditionv variables.

But, as I understand:
 The first time you enter a lock will be slow in that it will be heap allocate and pthread_mutex_init. (untraced; with implied use of a lock for the untraced heap).
(plus pthread_mutex_lock(&init)).

 The subsequent times you enter a lock will be "slow" in that they will always call pthread_mutex_lock, but that might not be particularly slow, right? It is a function call, granted, but the implementation might/should be very quick in the case of no contention. Like, you know, presumably the pthread implementer, except on the case of FreeBSD 4.x :) is trying to do a pretty good job for everyone.

"But I don't have numbers."

 - Jay

________________________________
> From: hosking at cs.purdue.edu
> To: mika at async.caltech.edu
> Date: Thu, 30 Apr 2009 04:33:01 +1000
> CC: m3devel at elegosoft.com; jay.krell at cornell.edu
> Subject: Re: [M3devel] [M3commit] how to switch userthreads on/off
>
> Mika,
>
> With the current implementation of M3 MUTEX 1-1 as pthread mutex you are bound to have significant overhead for any locking code even in single-threaded apps. We need to move towards a thin-lock implementation for mutex (as used in modern Java implementations) to avoid overhead for uncontended locks. It's not too hard to implement. The idea is to represent a mutex as a tagged word. The word contains either NIL, the thread holding the lock, or a pointer to a full-blown (inflated) pthread mutex. We can use GC and other opportunities to deflate locks as needs. Checking the tag requires a CAS. There are other techniques that further eliminate the CAS for the uncontended case. But, generally, you should consider LOCK to be a fairly high-overhead operation for now.
>
> Antony Hosking | Associate Professor | Computer Science | Purdue University
> 305 N. University Street | West Lafayette | IN 47907 | USA
> Office +1 765 494 6001 | Mobile +1 765 427 5484
>
>
>
> On 29 Apr 2009, at 16:22, Mika Nystrom wrote:
>
> Jay writes:
> ...
>
> Maybe just leave it as an option in m3core's m3makefile and people can twiddle it if they want and rebuild the entire system like it is today?
> That is a bit onerous, but maybe it's all userthreads deserve?
> ?
>
>
> Anyone who actually wanted to switch back and forth (Mika) would just have two installs and two source trees?
>
>
> - Jay
>
> I just want to clarify. I'm not really that interested in switching
> back and forth. I'm just a little disturbed by the sometimes huge
> performance loss due to the introduction of kernel threads. I knew
> that this would happen in certain highly multithreaded applications,
> but I'm surprised it happens in a more or less single-threaded
> application.
>
> I think I've just been spoiled by 10 years of using SRCM3 and PM3
> for FreeBSD w/o kernel threads in the sense that I've learned that
> using LOCK has essentially no cost. On a shared-memory multiprocessor,
> I really don't expect that to remain the case... physics won't allow
> it. So now I just have to go through my code and find all the places
> where I lock too much and remove them.
>
> But the memory allocator and garbage collector do it too, no?
>
> I also think that this idea of being able to use either is great.
> Mainly single-threaded programs should definitely not use kernel
> threads!
>
> As for reaching the "thread locals", there is one slightly crazy
> idea that one could borrow from Sussman and Steele: add another
> implicit argument to every Modula-3 routine. In that argument,
> pass a pointer to the thread locals. For EXTERNAL calls (in or
> out), make it NIL (somehow, maybe involving pragmas), and in that
> case (only), use the pthreads routines to access the thread locals.
> Ok so it sounds kind of nuts, but with this approach you could avoid
> locking or even calling into the pthreads libs almost entirely for
> a single-threaded program. You could even have a thread-local
> memory allocator that would only lock when it needs to request
> memory from the "global allocator"... in fact there are lots of
> things you can do with this sort of thing. Dynamically scoped
> variables in Scheme (a la MacLisp?) is what they originally proposed
> it for but then they suggested all kinds of tricks related to
> continuations with it.
>
> Mika
>