[M3devel] userthreads vs. pthreads performance?

Mon Mar 29 17:01:04 CEST 2010

Tony Hosking wrote:
> Guys,
> 
> You are looking in the wrong place.  The whole good thing about having a 
> language-defined thread model is that we get to implement the thread 
> primitives in the language run-time, and we can take advantage of 
> language-specific semantics.   There is no requirement that a Modula-3 
> mutex even map to a pthread_mutex_t.  Java monitors are implemented in 
> modern JVMs so that lock/unlock in the common case doesn't require any 
> calls or any atomic operations!  We can do much the same for Modula-3. 
>  My group at Purdue has done a fair amount of work on this in the 
> context of Java, and when I can find the time I was hoping to do the 
> same for Modula-3.  The only requirement for M3 was that we have proper 
> support for atomic ops in the language upon which to build the 
> synchronisation primitives.  We are close to having this now.  Let me 
> sketch a design:
> 
> The common case is that a mutex is only ever manipulated by one thread 
Do you mean _almost_ only ever? -----^ Otherwise, why would it have been
coded with a mutex at all?

> (i.e., never shared), in which case it suffices to "bias" the mutex to 
> that thread.  Locking/unlocking is simply a matter of checking to see 
> that the bias belongs to the current thread.  No need for an atomic 
> operation.  If the thread already has the bias then locking/unlocking is 
> simply a matter of setting a bit in the mutex.  If another thread comes 
> along and needs to lock the mutex then it must first revoke the bias of 
> the owning thread.  This can be expensive (assuming it occurs 
> infrequently) and in our case probably means stopping the thread having 
> the bias, revoking the bias, then restarting it.
> 
> Another case is when a mutex is locked/unlocked by multiple threads but 
> there is never contention (i.e., no thread tries to acquire while 
-----------^ and _almost_ never?

> another thread holds).  In this case we never need a wait queue for the 
> mutex so we can simply store the lock owner in the mutex and test using 
> atomic ops.  Spinning is often useful here to avoid needing to inflate 
> if contention ever does arise: if the thread holding the lock gets out 
> of the mutex quickly then the spinner can move in quickly.  After some 
> number of spins we generally need to inflate the lock to allocate a wait 
> queue (to avoiding hogging the processor).
> 
> Finally, the case where many threads are contending on the inflated lock 
> (with wait queue).  The only question now is when to deflate.  Our 
> current heuristic is to deflate when the last thread releases the lock 
> and notices that there are no other threads waiting.  This seems to work 
> well in practice, but of course there are pathological cases.

So does the implementation dynamically detect which case holds and
choose between the schemes?  What are the criteria?

> 
> Note that in no case have I mentioned the need for a pthread_mutex 
> (though pthread locks/conditions are used to manage threads that must 
> block on a Java quit queue).
> 
> We ought to be able to do much the same in Modula-3.
> 
> Antony Hosking | Associate Professor | Computer Science | Purdue University
> 305 N. University Street | West Lafayette | IN 47907 | USA
> Office +1 765 494 6001 | Mobile +1 765 427 5484
> 
>