[M3devel] userthreads vs. pthreads performance?

Mon Mar 29 19:26:28 CEST 2010

On 29 Mar 2010, at 11:01, Rodney M. Bates wrote:

> 
> 
> Tony Hosking wrote:
>> Guys,
>> You are looking in the wrong place.  The whole good thing about having a language-defined thread model is that we get to implement the thread primitives in the language run-time, and we can take advantage of language-specific semantics.   There is no requirement that a Modula-3 mutex even map to a pthread_mutex_t.  Java monitors are implemented in modern JVMs so that lock/unlock in the common case doesn't require any calls or any atomic operations!  We can do much the same for Modula-3.  My group at Purdue has done a fair amount of work on this in the context of Java, and when I can find the time I was hoping to do the same for Modula-3.  The only requirement for M3 was that we have proper support for atomic ops in the language upon which to build the synchronisation primitives.  We are close to having this now.  Let me sketch a design:
>> The common case is that a mutex is only ever manipulated by one thread 
> Do you mean _almost_ only ever? -----^ Otherwise, why would it have been
> coded with a mutex at all?

You might have a thread-safe library that is used by a single-thread application.

>> (i.e., never shared), in which case it suffices to "bias" the mutex to that thread.  Locking/unlocking is simply a matter of checking to see that the bias belongs to the current thread.  No need for an atomic operation.  If the thread already has the bias then locking/unlocking is simply a matter of setting a bit in the mutex.  If another thread comes along and needs to lock the mutex then it must first revoke the bias of the owning thread.  This can be expensive (assuming it occurs infrequently) and in our case probably means stopping the thread having the bias, revoking the bias, then restarting it.
>> Another case is when a mutex is locked/unlocked by multiple threads but there is never contention (i.e., no thread tries to acquire while 
> -----------^ and _almost_ never?
> 
>> another thread holds).  In this case we never need a wait queue for the mutex so we can simply store the lock owner in the mutex and test using atomic ops.  Spinning is often useful here to avoid needing to inflate if contention ever does arise: if the thread holding the lock gets out of the mutex quickly then the spinner can move in quickly.  After some number of spins we generally need to inflate the lock to allocate a wait queue (to avoiding hogging the processor).
>> Finally, the case where many threads are contending on the inflated lock (with wait queue).  The only question now is when to deflate.  Our current heuristic is to deflate when the last thread releases the lock and notices that there are no other threads waiting.  This seems to work well in practice, but of course there are pathological cases.
> 
> So does the implementation dynamically detect which case holds and
> choose between the schemes?  What are the criteria?

Short answer, yes.  Criteria much as I described.  The only trick thing is choosing when to deflate.  For more on this topic take a look also at: http://blogs.azulsystems.com/cliff/.  Azul independently developed similar strategies.

> 
>> Note that in no case have I mentioned the need for a pthread_mutex (though pthread locks/conditions are used to manage threads that must block on a Java quit queue).
>> We ought to be able to do much the same in Modula-3.
>> Antony Hosking | Associate Professor | Computer Science | Purdue University
>> 305 N. University Street | West Lafayette | IN 47907 | USA
>> Office +1 765 494 6001 | Mobile +1 765 427 5484
>>