[M3devel] user threads

Tony Hosking hosking at cs.purdue.edu
Wed Apr 29 20:58:06 CEST 2009


On 30 Apr 2009, at 04:46, Mika Nystrom wrote:

> Tony, no of course it's not surprising.  -unsafe removes the
> lock-on-every update (in the global environment---pushed environments
> are always "unsafe").  In fact since there are hardly any updates
> in the global environment, it's locking on reading that's hurting.
>
> [Tony, I'm curious how you'd go about implementing the sort of
> "non-blocking lock" you mention.]
>
> But what Jay has pointed out is that what I labeled as "kernel
> threads" in the attached table aren't kernel threads at all but
> libc_r threads, which provide the same facilities (as far as I know)
> as M3's "user threads".  In this case M3's user threads are simply
> superior to what the system provides.  Much superior.

Ah, understood.

> I think if possible "M3 user threads" should definitely be the
> default on FreeBSD4 and earlier, rather than "system user threads".

Fair enough.  Old thread libraries were pretty substandard.

> I think people who want to implement coroutines and occam-like
> things with threads would also be happy if user threads were very
> easy to enable.  Although I agree they should certainly not normally
> be the default and it is probably unnecessary to be able to enable
> them at runtime.  A compile/link-time option would be nice though....
> Rather than having to recompile the whole M3 distribution, that is.

For a systems programming language like Modula-3 I think the general  
expectation should be that m3-thread = system thread (i.e., something  
scheduled on a physical processor by the operating system).  Anyone  
wanting Occam-like things and co-routines should think hard about what  
they need and devise a scheme for multi-plexing them on each of the  
processors.  Perhaps it could be made available as a library.

> I know of two important classes of applications where user threads
> are likely to be superior to kernel threads:
>
> 1. occam-like things, as described above.  Not uncommon in hardware
>   circles!

But on modern hardware we'd need them to be scheduled across multiple  
processors.

> 2. applications that use RMI (e.g., Network Objects) to communicate
>   between separate runtimes that are mapped exactly one runtime
>   to each physical core.

Hmm, perhaps...

>
>
> For case 2, readers may be interested in the following report:
>
> http://caltechcstr.library.caltech.edu/218/
>
>     Mika
>
> Tony Hosking writes:
>> Mika, it is not surprising that your lock-on-every variable update
>> will cost a lot in any non-user-level threading scheme.  You should
>> consider using different mechanisms for this degree of locking in
>> Scheme (based on some of the non-blocking lock implementations for
>> Java perhaps).  I don't expect any implementation of locking for a
>> multi-core/-processor will ever perform as well as user-level  
>> threads.
>>
>> On 29 Apr 2009, at 16:53, Mika Nystrom wrote:
>>
>>> Ok, it works!
>>>
>>> Numbers:
>>>
>>> Timings in milliseconds, three samples, filesystem "warmed up" by
>>> doing one dummy run before launching the real ones.
>>>
>>> -unsafe means that I use non-locking Scheme environments, otherwise
>>> they lock for every variable update.
>>>                                                            ave
>>> CM3 last week, kernel threads, -unsafe   1460  1482  1437   1460
>>> CM3 last week, kernel threads,           2392  2402  2376   2390
>>> CM3 this week, kernel threads, -unsafe   1455  1458  1490   1468 (*)
>>> CM3 this week, user threads,   -unsafe    914   934   914    921
>>> CM3 this week, user threads,              967   965   986    973
>>> PM3                            -unsafe    678   657   682    672
>>> PM3                                       709   714   700    708
>>>
>>> (*) not entirely sure this got linked correctly.
>>>
>>>   Mika
>>>
>>>
>>> Jay writes:
>>>>
>>>> User threads seem to work on on FreeBSD/x86 7.0.
>>>> Mika can you report back the perf cm3 vs. pm3?
>>>> Still, kernel threads have been around a long time and imho should
>>>> be strongly favored..
>>>>
>>>>
>>>> Kernel threads should be a /little/ faster than they were --
>>>> PushEFrame removed from successful heap allocations. And should be
>>>> further improvable via __thread where it is supported -- probably
>>>> not FreeBSD 4.
>>>> x, sometimes older is not better. :)
>>>>
>>>>
>>>> I've temporarily switched FreeBSD/x86 to userthreads by default but
>>>> I think that's just an experiment and should be undone shortly,
>>>> maybe work out some other story for easily switching between them,
>>>> or just k
>>>> eep the existing story of "you get to rebuild everything".
>>>>
>>>>
>>>> Tony, can you look into GetGCRatio? I removed the call to it. The
>>>> "fatal" pragma invokes PushEFrame apparently.
>>>>
>>>>
>>>> We should now "fix" Win32 and pthreads to not have GetActivation
>>>> initialize on-demand, just leave Init to initialize always. This
>>>> should shave a few more cycles from PushEFrame.
>>>>
>>>>
>>>> - Jay




More information about the M3devel mailing list