[M3devel] purported condition variable problems on Win32?

Rodney M. Bates rodney_bates at lcwb.coop
Wed Jul 20 17:36:13 CEST 2016



On 07/18/2016 08:53 PM, Jay wrote:
> I wrote this version. Right, based on Schmidt. I also found similar code in Java. There are comments as to both of these in the code.
>
> Otherwise previous version I think had a very large lock.
>
> Changing the < to !=, is that really ok? I get that 32bit overflow is possible.
>
> We can change to a 64bit counter if < is required. That really can't rollover. I.e. Even a 64bit cycle counter can't rollover, and this would advance much slower.
>
> I guess you are saying it would be unfair, which is ok.

I think ths is OK as coded.  The # will, overwhelmingly often, make wraparound work as
expected, whereas the > would leave threads trapped at the upper end for ~ FIRST(INTEGER)
generations.  I am not sure about the cases that have problems with #, but for anything
to happen at all, there would have to be a wrap, further followed by a thread that got starved
for another ~ NUMBER(INTEGER) generations of other threads getting through.  Even at 32 bits,
this is extremely unlikely, and probably something else would fail before this.

If it really were a problem, I think counter can be reset to zero any time  there are no
waiters, which would usually be very often.

>
> There is pattern called "directed notification" or such that might be preferable? It has a certain significant inefficiency though, like creating an event per notify. I think it is what boost's condition variable are using.
>
> I'll check on the unused event.
>
>   - Jay
>
>> On Jul 18, 2016, at 7:25 AM, "Rodney M. Bates" <rodney_bates at lcwb.coop> wrote:
>>
>>
>>
>>
>>
>> I spent some time looking over the signal part of ThreadWin32.m3.  A few
>> comments:
>>
>> This looks like a very direct implementation of Schmidts' generation count
>> algorithm, with a number of important little details additionally taken care
>> of.
>>
>> On small difference is at line 308 (B, below) where Schmidt does
>> c.counter>count instead of #.  Usually, this would make no difference,
>> since counter only increases, but on a long-running program, it could
>> overflow.  If this happened,as it often does, as silent wrap-around to
>> FIRST(INTEGER), some threads could be trapped for an extremely long
>> time, waiting for counter come back around greater than their count.  As
>> coded in ThreadWin32, they would be allowed to proceed normally.
>>
>> The only glitch would be that, if there were simultaneously waiting threads
>> separated by NUMBER(INTEGER) truly different generations (but the same value
>> of c.counter), the recent ones could unfairly proceed along with the
>> extremely old ones.  That seems extremely less likely than the
>> mere existence of an overflow.
>>
>> Both type Condition and type Activation have a field waitEvent: HANDLE.
>> Condition.waitEvent is extensively used, but Activation.waitEvent is
>> only created at :537 and deleted at :554, but never used in any real
>> way.  It could be deleted.
>>
>> As for this code:
>>
>>     EnterCriticalSection(conditionLock);
>>
>>       (* Capture the value of the counter before we start waiting.
>>        * We will not stop waiting until the counter changes.
>>        * That is, we will not stop waiting until a signal
>>        * comes in after we start waiting.
>>        *)
>>
>>       count := c.counter;
>>       INC(c.waiters);
>>
>>     LeaveCriticalSection(conditionLock);
>>     m.release(); (* can this be moved to before taking conditionLock? *)
>>
>> No, it is important to sample and save c.counter in the order threads wait.
>> Retaining m until this is done ensures this.  Otherwise, some other thread
>> that waited later could have altered c.counter before this one gets conditionLock
>> and saves its copy of c.counter.
>>
>> I have made several comment changes to ThreadWin32.m3 that would have made
>> it easier to vet.  I will commit these, but only comments.  It is either
>> very difficult or impossible for me to test or even recompile this module,
>> and, especially as fragile as this kind of code is, I don't want to commit
>> any substantive changes untested and uncompiled.
>>
>> More below:
>>
>>> On 07/07/2016 04:24 AM, Jay K wrote:
>>> So...I do NOT understand all of this.
>>>
>>>
>>> However, comparing the three implementations
>>> and attempting to understand ours,
>>> I am struck by the following
>>>
>>>   - Our broadcast seems ok, but
>>>   - our signal seems to wake everyone,
>>>     and then...they race a bit,
>>>     one will decide it is last, and
>>>     reset the event, but every prior waiter
>>>     is still woken.
>>> Why not merge the following chunks:
>>
>> I don't think this will work.  It is important that, once a thread has decided,
>> at A, that it can fairly proceed, it is guaranteed to be the next thread to proceed
>> (i.e, to return from [Alert]Wait), before it decrements c.tickets at C, and, thus,
>> if it is also the last of its generation to proceed, resets c.waitevent.  This
>> is ensured by its having already reacquired m first, preventing any other thread
>> from returning from its Wait yet.
>>
>> And it can't attempt to acquire m while already holding conditionLock, because this would
>> invite deadlock, since, elsewhere, a thread can try to acquire conditionLock while
>> already holding m, at line 265.
>>
>>>
>>>
>>>      WHILE (NOT alerted) AND (NOT waitDone) DO
>>> .
>>> .
>>> .
>>>   1
>>>        EnterCriticalSection(conditionLock);
>>>          waitDone := (c.tickets # 0 AND c.counter # count);
>> A:---------^
>> B:--------------------------------------------------^
>>>        LeaveCriticalSection(conditionLock);
>>>      END; (* WHILE *)
>>>      IF waitDone THEN
>>>        alerted := FALSE;
>>>      END;
>>>      m.acquire();
>>> 2
>>>      EnterCriticalSection(conditionLock);
>>>        DEC(c.waiters);
>>>        IF waitDone THEN   esp. here.
>>>          DEC(c.tickets);
>>
>> C:---------------^
>>>          lastWaiter := (c.tickets = 0);
>>>        END;
>>>      LeaveCriticalSection(conditionLock);
>>>
>>>
>>>   That is, if we decrement tickets earlier
>>>   within the first critical section,
>>>   while we will still wake everyone,
>>>   only one will decide waitDone and the rest will keep looping.
>>>   A downside of this, perhaps, is that waking all for Broadcast
>>>   might be a little slower.
>>>   but more so, in both the "ptw32" (pthreads for win32) and Boost threads
>>>   implementations, they seem to deal with this differently than us,
>>>   and they each do about the same thing -- they use a counted semaphore.
>>>   In the boost case, it appears they duplicate the semaphore for every
>>>   notification generation, which seems expensive.
>>>
>>>   Perhaps if they can guarantee some lifetimes, they don't need to duplicate it.
>>>   Or they can do their own reference counting?
>>>   jdk7 seems to looke like jdk6.
>>>   The code is gone in jdk8 and I can't easily find the delete in history.
>>>    The lock merging jdk does probably helps here too.
>>>    In fact they merge #1 and #2 above as a result.
>>>    But they still initially wake all threads for signal, not just broadcast.
>>>    There is also the problem that all the event waits
>>>    are followed by EnterCriticalSection (or jdk facsimile).
>>>    - Jay
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
>> ---
>>> From: jay.krell at cornell.edu
>>> To: m3devel at elegosoft.com
>>> Subject: RE: [M3devel] purported condition variable problems on Win32?
>>> Date: Tue, 5 Jul 2016 08:26:36 +0000
>>>
>>> Here is another implementation to consider:
>>>
>>> https://github.com/boostorg/thread/blob/develop/include/boost/thread/win32/condition_variable.hpp
>>>
>>>   - Jay
>>>
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
>> ---
>>> From: jay.krell at cornell.edu
>>> To: m3devel at elegosoft.com
>>> Date: Tue, 5 Jul 2016 08:19:23 +0000
>>> Subject: [M3devel] purported condition variable problems on Win32?
>>>
>>>   https://sourceforge.net/p/pthreads4w/code/ci/master/tree/README.CV
>>>
>>>   vs.
>>>
>>> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
>>>   vs.
>>>   https://modula3.elegosoft.com/cgi-bin/cvsweb.cgi/cm3/m3-libs/m3core/src/thread/WIN32/ThreadWin32.m3?rev=1.210.2.1;content-type=text%2Fplain
>>>   I wrote this:
>>>   PROCEDURE XWait(m: Mutex; c: Condition; act: Activation;
>>>                  alertable: BOOLEAN) RAISES {Alerted} =
>>>    (* LL = m on entry and exit, but not for the duration
>>>     * see C:\src\jdk-6u14-ea-src-b05-jrl-23_apr_2009\hotspot\agent\src\os\win32\Monitor.cpp
>>>     * NOTE that they merge the user lock and the condition lock.
>>>     * http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
>>>     *   "3.3. The Generation Count Solution"
>>>     *)
>>>
>>>   Do we have the problems described in README.CV?
>>>
>>>   I haven't looked through the ACE code to see
>>>   to what extent they resemble solution 3.3, or if they
>>>   changed as a result of this discussion -- which I admit I don't understand
>>> and haven't read closely.
>>>
>>>
>>> Spurious wakeups are ok, though should be minimized.
>>>
>>> I'd still rather not drop pre-Vista support but I realize it becomes more interesting as time advances.
>>>
>>>   Thank you,
>>>    - Jay
>>>
>>> _______________________________________________ M3devel mailing list M3devel at elegosoft.com https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>>>
>>>
>>> _______________________________________________
>>> M3devel mailing list
>>> M3devel at elegosoft.com
>>> https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>>
>> --
>> Rodney Bates
>> rodney.m.bates at acm.org
>>
>>
>

-- 
Rodney Bates
rodney.m.bates at acm.org



More information about the M3devel mailing list