[M3devel] pthread in cm3...

Wed May 16 17:21:12 CEST 2007

TestThreads is still locking up with 3000 threads (it works with 2000  
on my I386_DARWIN box).  I am now tracking this down.

On Apr 25, 2007, at 12:35 PM, Dragiša Durić wrote:

> After implementing that workaround for result code 11 in that  
> SIGSUSPEND
> loop, every time during 1st or at most second pass (of 4000) it  
> stucks.
> Not same place every time, though... I think there are RCs in
> LockHeap/SuspendAll logic.
>
> dd
>
> On Wed, 2007-04-25 at 09:51 -0400, Tony Hosking wrote:
>> Yes, good thinking.  Tuning the threads systems is a good plan all
>> round.
>>
>> On Apr 25, 2007, at 2:55 AM, Dragiša Durić wrote:
>>
>>> Just a random thought...
>>>
>>> I don't think my TestThreads is something special, but it's few  
>>> thread
>>> use patterns combined... And I've just had bright :) idea  
>>> yesterday -
>>> it's also decent benchmark for whole threading system... I think it
>>> would be nice to test it with 10000 rounds, 4000 threads each (once
>>> cm3
>>> cvs-head is fixed) with PTHREAD and without PTHREAD. I will do
>>> tests for
>>> mine.
>>>
>>> I think these extra data structures and global locks in PTHREAD are
>>> big
>>> efficiency killers. Benchmark will show.
>>>
>>> dd
>>>
>>> On Mon, 2007-04-23 at 08:40 -0400, Tony Hosking wrote:
>>>> I take all of your points seriously.  One option would be to offer
>>>> your threads implementation as another build option for CM3.  I'm
>>>> going to track down the bug I introduced recently and then we can
>>>> consider how to move forward.
>>>>
>>>> On Apr 23, 2007, at 4:21 AM, Dragiša Durić wrote:
>>>>
>>>>> On Sun, 2007-04-22 at 15:59 -0400, Tony Hosking wrote:
>>>>>> On Apr 22, 2007, at 10:47 AM, Dragiša Durić wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Have been skimming through source of PTHREAD code... And I think
>>>>>>> job can
>>>>>>> be done without so much relying on how-they-did-it-before, esp
>>>>>>> with
>>>>>>> regard to list of waiters and similar internal and global
>>>>>>> structures.
>>>>>>> Also, I see number of global locks and I am sure they are
>>>>>>> congestion
>>>>>>> generators every now and while, esp in heavy threading  
>>>>>>> situations.
>>>>>>>
>>>>>>> Of course, there is number of approaches to this multi-thread
>>>>>>> situations. Mine being one of very nonconservative use of
>>>>>>> threads, I
>>>>>>> think it is important to remain open to possibly very big
>>>>>>> number of
>>>>>>> threads running in single process - meaning scalability is  
>>>>>>> one of
>>>>>>> primary objections... As global locks don't do well with
>>>>>>> scalability, I
>>>>>>> don't like "cm" and similar global congestion points.
>>>>>>
>>>>>> Yes, there are tensions between a thin/absent veneer between
>>>>>> language
>>>>>> threads and system threads.  Most important are issues of
>>>>>> preserving
>>>>>> a reasonable memory model for programmers (see Hans Boehm's paper
>>>>>> http://portal.acm.org/citation.cfm?doid=1065010.1065042).
>>>>>
>>>>> I know that paper, and being Modula-3 camp, I am - by definition -
>>>>> agreed to no-library-approach-for-threads :).
>>>>>
>>>>>>  There are
>>>>>> also questions of portability and debuggability.
>>>>>
>>>>>   Of course. That's why I am using only POSIX defined features,  
>>>>> and
>>>>> when
>>>>> in doubt - ones used by Boehm in his famous GC :).
>>>>>
>>>>>> I agree that global
>>>>>> locks are to be avoided where they cause known contention, but
>>>>>> there
>>>>>> are tradeoffs there too.
>>>>>
>>>>>   Global lock is bad, whatever reasons.
>>>>>
>>>>>> For large numbers of threads (as you appear
>>>>>> to need) I think we would need to adopt some other implementation
>>>>>> approach, possibly by multiplexing multiple lightweight user- 
>>>>>> level
>>>>>> threads on some smaller number of heavyweight system-level  
>>>>>> threads,
>>>>>> but then you run into scheduling and load-balancing problems.
>>>>>
>>>>>   I've argued this before... With O(1) process scheduling  
>>>>> available
>>>>> for
>>>>> four years from Linux, and in that time surely from everyone
>>>>> else... It
>>>>> would be bigger problem to maintain scheduling for special "BIG"
>>>>> cases
>>>>> inside our support libraries than to rely on operating system.  
>>>>> It's
>>>>> good
>>>>> that mainstream OS people recognized threads as Need, and it is
>>>>> time now
>>>>> for us to accept they did it well - AT LAST.
>>>>>
>>>>>   And, very important... I can't see what is heavyweight on system
>>>>> which
>>>>> does 10,000 context switches per second for 1500 threads with  
>>>>> 2% CPU
>>>>> load. And all this in 2004 on some 1.something GHz CPU. Threads  
>>>>> WERE
>>>>> heavyweight four years ago, and they are not, long since. Even
>>>>> Windows
>>>>> has lightweight kernel-space threads :).
>>>>>
>>>>>>
>>>>>>> We talked about this at least once before, and I think I  
>>>>>>> remember
>>>>>>> you
>>>>>>> insisted on more compatibility than can be read from SPwM3.  
>>>>>>> Do you
>>>>>>> think
>>>>>>> best idea would be to integrate mine NPTL code into CM3 for
>>>>>>> people to
>>>>>>> trash and test, and let everyone select what is best for their
>>>>>>> situation?
>>>>>>
>>>>>> What I wanted was a situation where programs would be able to run
>>>>>> with the same tools (e.g., showheap, showthread) under both user-
>>>>>> level and system threading.  This goal has been achieved with the
>>>>>> current pthread-based implementation.
>>>>>
>>>>>   It is good reasoning, and it's one of reasons I did not suggest
>>>>> replacement... I think mine version is less bloated and I know  
>>>>> it's
>>>>> very
>>>>> good for long and stable process uptime we all expect from  
>>>>> Modula-3
>>>>> programs. But also, implied compatibilities outside of SPwM3 and
>>>>> direct
>>>>> demands from other parts of runtime were not on my list. These  
>>>>> I've
>>>>> respected, and it looks like these are good production time
>>>>> criteria. As
>>>>> opposed to excellent development time criteria you based yours on.
>>>>>
>>>>>> Moreover, I wanted something
>>>>>> where variations in thread support from one system to another  
>>>>>> could
>>>>>> be exploited most easily (such as for systems where thread  
>>>>>> suspend/
>>>>>> resume is provided as a primitive).  Again, the current
>>>>>> implementation achieves this, and runs with minimal target- 
>>>>>> specific
>>>>>> code on Darwin, Solaris, and Linux.  Ports to other targets
>>>>>> should be
>>>>>> relatively straightforward.
>>>>>
>>>>>   I've not ported mine outside of LINUXLIBC6, but as it's  
>>>>> extremmely
>>>>> POSIX, I see no problem.
>>>>>
>>>>>>
>>>>>>> Problems I had with my pthread implementation were all related
>>>>>>> to VM
>>>>>>> hell of earlier GC implementation... After you did that piece
>>>>>>> of art
>>>>>>> with new approach to GC, I expect infinite uptimes from my
>>>>>>> servers and
>>>>>>> bots :). Big thanks for that!
>>>>>>
>>>>>> Any native threading implementation is going to have problems  
>>>>>> with
>>>>>> VM-
>>>>>> based memory management.  I'm surprised to were able to get
>>>>>> anything
>>>>>> going with the VM-based GC.
>>>>>
>>>>>   Anything is pretty much - I have heavy multithreaded servers
>>>>> running
>>>>> literally for years,,, one of them is up since January of 2004, it
>>>>> services few hundreds of connected users (and up to 1500  
>>>>> threads) at
>>>>> almost every moment and breaks only when system reboots :). All  
>>>>> that
>>>>> with heavy integration of various C libraries.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> dd
>>>>>>> -- 
>>>>>>> Dragiša Durić <dragisha at m3w.org>
>>>>>>
>>>>> -- 
>>>>> Dragiša Durić <dragisha at m3w.org>
>>>>
>>> -- 
>>> Dragiša Durić <dragisha at m3w.org>
>>
> -- 
> Dragiša Durić <dragisha at m3w.org>