[M3devel] Fwd: Re: Fwd: Fork bug

Wed Jul 9 01:12:20 CEST 2014

On 07/08/2014 03:27 PM, Antony Hosking wrote:
> I would hesitate to do this mainly because the C and POSIX standards don’t define behavior for re-entrant locks.
>
>  From http://en.cppreference.com/w/c/thread/mtx_lock:
>
>     Defined in header |<threads.h>|
>     		
>     int mtx_lock( mtx_t <http://en.cppreference.com/w/c/thread>* mutex );
>     		(since C11)
>     		
>
>     Blocks the current thread until the mutex pointed to by |mutex| is locked.
>     The behavior is undefined if the current thread has already locked the mutex and the mutex is not recursive.
                                                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
So if we just label it recursive, this would not apply.

 From http://http://en.cppreference.com/w/cpp/thread/recursive_mutex:

std::recursive_mutex

C++

Thread support library

std::recursive_mutex

Defined in header <mutex>

class recursive_mutex;
		(since C++11)

The recursive_mutex class is a synchronization primitive that can be used to protect shared data from being simultaneously accessed by multiple threads.

recursive_mutex offers exclusive, recursive ownership semantics:

     A calling thread owns a recursive_mutex for a period of time that starts when it successfully calls either lock or
     try_lock. During this period, the thread may make additional calls to lock or try_lock. The period of ownership
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     ends when the thread makes a matching number of calls to unlock.
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

     When a thread owns a recursive_mutex, all other threads will block (for calls to lock) or receive a false return
     value (for try_lock) if they attempt to claim ownership of the recursive_mutex.

     The maximum number of times that a recursive_mutex may be locked is unspecified, but after that number is
     reached, calls to lock will throw std::system_error and calls to try_lock will return false.

The behavior of a program is undefined if a recursive_mutex is destroyed while still owned by some thread. The
recursive_mutex class satisfies all requiremenets of Mutex and StandardLayoutType.

Member types
Member type 	Definition
native_handle_type 	implementation-defined
Member functions
(constructor)
	constructs the mutex
(public member function)
(destructor)
	destroys the mutex
(public member function)
operator=
[deleted]
	not copy-assignable
(public member function)
Locking
lock
	locks the mutex, blocks if the mutex is not available
(public member function)
try_lock
	tries to lock the mutex, returns if the mutex is not available
(public member function)
unlock
	unlocks the mutex
(public member function)
Native handle
native_handle
	returns the underlying implementation-defined thread handle
(public member function)

>
>
> On Jul 8, 2014, at 3:20 PM, Rodney M. Bates <rodney_bates at lcwb.coop <mailto:rodney_bates at lcwb.coop>> wrote:
>
>>
>> Resent after 24 hours:
>>
>>
>> While we are working on MUTEX, I would like to propose making them
>> what I believe is meant by a recursive mutex, that is, one thread
>> can lock multiple times, the mutex being released only when the number
>> of unlocks catches up with the number of locks.
>>
>> I don't remember the details off the top of my head, but there is a
>> place in Trestle where you have to acquire a MUTEX but it is very
>> difficult or impossible to know whether different code on the same
>> thread already has done so.  The different code isn't under your
>> control either.  Some runtime scheme to figure it out dynamically
>> would be tantamount to, but messier than, just having a recursive MUTEX.
>>
>> I recall there are other places as well where similar problems arise.
>> It would greatly simplify things when needed.
>>
>> The only disadvantage I can think of is there might be a case where
>> runtime detection of a second lock attempt by the same thread would
>> help find a bug.  Maybe the RTS could have a way of setting the
>> behavior of a specific MUTEX.
>>
>> On 07/03/2014 02:28 PM, Tony Hosking wrote:
>>> I wonder if we should not move to a surrogate parent model to make this cleaner in general?
>>> Since fork is (or should be) only used in service of creating a new process (i.e., fork + exec) then this technique would save us a lot of grief.
>>> Thoughts?
>>>
>>> In the surrogate parent model, a program forks a child process at initialization time. The sole purpose of the child is to serve as a sort of "surrogate parent" for the original process should it ever need to fork another child. After initialization, the original parent can proceed to create its additional threads. When it wants to /exec/ an image, it communicates this to its child (which has remained single-threaded). The child then performs the /fork/ and /exec/ on behalf of the original process.
>>>
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> *From: *Peter McKinna <peter.mckinna at gmail.com <mailto:peter.mckinna at gmail.com> <mailto:peter.mckinna at gmail.com>>
>>>> *Subject: **Fork bug*
>>>> *Date: *July 2, 2014 at 10:30:24 PM EDT
>>>> *To: *Antony Hosking <hosking at cs.purdue.edu <mailto:hosking at cs.purdue.edu> <mailto:hosking at cs.purdue.edu>>
>>>>
>>>> Hi Tony,
>>>>
>>>>  That fork bug on posix doesn't appear to be fixed, so just to recap the problem. In the threadtest program if you have a bunch of threads creating mutexes and having them collected then get a thread that does a few forks what can happen is that the child executes  atforkchild  as I think the first thing it does which calls initwithstackbase which does an allocation and possible collection. Unfortunately the weaktable from the parent may be non empty and this is the only thread executing. It calls the cleanup of those mutexes of nonexistant threads some of which may be locked. If they are locked then pthread_mutex_destroy returns ebusy. Then the child exits with the abort in pthread_mutex_delete.
>>>>  Whether the abort is needed I dont know. In this case the error can be safely ignored. One could try to see if the owner of the mutex is still alive and not abort in that case. Otherwise if one is sure the child is going to do an exec almost immediately then disabling the collector in atforkchild could work.
>>>>  In the broader picture anything thats got a weak ref still active could cause problems if one thread does a fork. The weak callback could do anything.
>>>>  Anyway I dont know what the fix is.
>>>>
>>>> Peter
>>>
>>
>> --
>> Rodney Bates
>> rodney.m.bates at acm.org <mailto:rodney.m.bates at acm.org>
>>
>>
>

-- 
Rodney Bates
rodney.m.bates at acm.org