[M3devel] purported condition variable problems on Win32?
Rodney M. Bates
rodney_bates at lcwb.coop
Wed Jul 20 17:36:13 CEST 2016
On 07/18/2016 08:53 PM, Jay wrote:
> I wrote this version. Right, based on Schmidt. I also found similar code in Java. There are comments as to both of these in the code.
>
> Otherwise previous version I think had a very large lock.
>
> Changing the < to !=, is that really ok? I get that 32bit overflow is possible.
>
> We can change to a 64bit counter if < is required. That really can't rollover. I.e. Even a 64bit cycle counter can't rollover, and this would advance much slower.
>
> I guess you are saying it would be unfair, which is ok.
I think ths is OK as coded. The # will, overwhelmingly often, make wraparound work as
expected, whereas the > would leave threads trapped at the upper end for ~ FIRST(INTEGER)
generations. I am not sure about the cases that have problems with #, but for anything
to happen at all, there would have to be a wrap, further followed by a thread that got starved
for another ~ NUMBER(INTEGER) generations of other threads getting through. Even at 32 bits,
this is extremely unlikely, and probably something else would fail before this.
If it really were a problem, I think counter can be reset to zero any time there are no
waiters, which would usually be very often.
>
> There is pattern called "directed notification" or such that might be preferable? It has a certain significant inefficiency though, like creating an event per notify. I think it is what boost's condition variable are using.
>
> I'll check on the unused event.
>
> - Jay
>
>> On Jul 18, 2016, at 7:25 AM, "Rodney M. Bates" <rodney_bates at lcwb.coop> wrote:
>>
>>
>>
>>
>>
>> I spent some time looking over the signal part of ThreadWin32.m3. A few
>> comments:
>>
>> This looks like a very direct implementation of Schmidts' generation count
>> algorithm, with a number of important little details additionally taken care
>> of.
>>
>> On small difference is at line 308 (B, below) where Schmidt does
>> c.counter>count instead of #. Usually, this would make no difference,
>> since counter only increases, but on a long-running program, it could
>> overflow. If this happened,as it often does, as silent wrap-around to
>> FIRST(INTEGER), some threads could be trapped for an extremely long
>> time, waiting for counter come back around greater than their count. As
>> coded in ThreadWin32, they would be allowed to proceed normally.
>>
>> The only glitch would be that, if there were simultaneously waiting threads
>> separated by NUMBER(INTEGER) truly different generations (but the same value
>> of c.counter), the recent ones could unfairly proceed along with the
>> extremely old ones. That seems extremely less likely than the
>> mere existence of an overflow.
>>
>> Both type Condition and type Activation have a field waitEvent: HANDLE.
>> Condition.waitEvent is extensively used, but Activation.waitEvent is
>> only created at :537 and deleted at :554, but never used in any real
>> way. It could be deleted.
>>
>> As for this code:
>>
>> EnterCriticalSection(conditionLock);
>>
>> (* Capture the value of the counter before we start waiting.
>> * We will not stop waiting until the counter changes.
>> * That is, we will not stop waiting until a signal
>> * comes in after we start waiting.
>> *)
>>
>> count := c.counter;
>> INC(c.waiters);
>>
>> LeaveCriticalSection(conditionLock);
>> m.release(); (* can this be moved to before taking conditionLock? *)
>>
>> No, it is important to sample and save c.counter in the order threads wait.
>> Retaining m until this is done ensures this. Otherwise, some other thread
>> that waited later could have altered c.counter before this one gets conditionLock
>> and saves its copy of c.counter.
>>
>> I have made several comment changes to ThreadWin32.m3 that would have made
>> it easier to vet. I will commit these, but only comments. It is either
>> very difficult or impossible for me to test or even recompile this module,
>> and, especially as fragile as this kind of code is, I don't want to commit
>> any substantive changes untested and uncompiled.
>>
>> More below:
>>
>>> On 07/07/2016 04:24 AM, Jay K wrote:
>>> So...I do NOT understand all of this.
>>>
>>>
>>> However, comparing the three implementations
>>> and attempting to understand ours,
>>> I am struck by the following
>>>
>>> - Our broadcast seems ok, but
>>> - our signal seems to wake everyone,
>>> and then...they race a bit,
>>> one will decide it is last, and
>>> reset the event, but every prior waiter
>>> is still woken.
>>> Why not merge the following chunks:
>>
>> I don't think this will work. It is important that, once a thread has decided,
>> at A, that it can fairly proceed, it is guaranteed to be the next thread to proceed
>> (i.e, to return from [Alert]Wait), before it decrements c.tickets at C, and, thus,
>> if it is also the last of its generation to proceed, resets c.waitevent. This
>> is ensured by its having already reacquired m first, preventing any other thread
>> from returning from its Wait yet.
>>
>> And it can't attempt to acquire m while already holding conditionLock, because this would
>> invite deadlock, since, elsewhere, a thread can try to acquire conditionLock while
>> already holding m, at line 265.
>>
>>>
>>>
>>> WHILE (NOT alerted) AND (NOT waitDone) DO
>>> .
>>> .
>>> .
>>> 1
>>> EnterCriticalSection(conditionLock);
>>> waitDone := (c.tickets # 0 AND c.counter # count);
>> A:---------^
>> B:--------------------------------------------------^
>>> LeaveCriticalSection(conditionLock);
>>> END; (* WHILE *)
>>> IF waitDone THEN
>>> alerted := FALSE;
>>> END;
>>> m.acquire();
>>> 2
>>> EnterCriticalSection(conditionLock);
>>> DEC(c.waiters);
>>> IF waitDone THEN esp. here.
>>> DEC(c.tickets);
>>
>> C:---------------^
>>> lastWaiter := (c.tickets = 0);
>>> END;
>>> LeaveCriticalSection(conditionLock);
>>>
>>>
>>> That is, if we decrement tickets earlier
>>> within the first critical section,
>>> while we will still wake everyone,
>>> only one will decide waitDone and the rest will keep looping.
>>> A downside of this, perhaps, is that waking all for Broadcast
>>> might be a little slower.
>>> but more so, in both the "ptw32" (pthreads for win32) and Boost threads
>>> implementations, they seem to deal with this differently than us,
>>> and they each do about the same thing -- they use a counted semaphore.
>>> In the boost case, it appears they duplicate the semaphore for every
>>> notification generation, which seems expensive.
>>>
>>> Perhaps if they can guarantee some lifetimes, they don't need to duplicate it.
>>> Or they can do their own reference counting?
>>> jdk7 seems to looke like jdk6.
>>> The code is gone in jdk8 and I can't easily find the delete in history.
>>> The lock merging jdk does probably helps here too.
>>> In fact they merge #1 and #2 above as a result.
>>> But they still initially wake all threads for signal, not just broadcast.
>>> There is also the problem that all the event waits
>>> are followed by EnterCriticalSection (or jdk facsimile).
>>> - Jay
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
>> ---
>>> From: jay.krell at cornell.edu
>>> To: m3devel at elegosoft.com
>>> Subject: RE: [M3devel] purported condition variable problems on Win32?
>>> Date: Tue, 5 Jul 2016 08:26:36 +0000
>>>
>>> Here is another implementation to consider:
>>>
>>> https://github.com/boostorg/thread/blob/develop/include/boost/thread/win32/condition_variable.hpp
>>>
>>> - Jay
>>>
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
>> ---
>>> From: jay.krell at cornell.edu
>>> To: m3devel at elegosoft.com
>>> Date: Tue, 5 Jul 2016 08:19:23 +0000
>>> Subject: [M3devel] purported condition variable problems on Win32?
>>>
>>> https://sourceforge.net/p/pthreads4w/code/ci/master/tree/README.CV
>>>
>>> vs.
>>>
>>> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
>>> vs.
>>> https://modula3.elegosoft.com/cgi-bin/cvsweb.cgi/cm3/m3-libs/m3core/src/thread/WIN32/ThreadWin32.m3?rev=1.210.2.1;content-type=text%2Fplain
>>> I wrote this:
>>> PROCEDURE XWait(m: Mutex; c: Condition; act: Activation;
>>> alertable: BOOLEAN) RAISES {Alerted} =
>>> (* LL = m on entry and exit, but not for the duration
>>> * see C:\src\jdk-6u14-ea-src-b05-jrl-23_apr_2009\hotspot\agent\src\os\win32\Monitor.cpp
>>> * NOTE that they merge the user lock and the condition lock.
>>> * http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
>>> * "3.3. The Generation Count Solution"
>>> *)
>>>
>>> Do we have the problems described in README.CV?
>>>
>>> I haven't looked through the ACE code to see
>>> to what extent they resemble solution 3.3, or if they
>>> changed as a result of this discussion -- which I admit I don't understand
>>> and haven't read closely.
>>>
>>>
>>> Spurious wakeups are ok, though should be minimized.
>>>
>>> I'd still rather not drop pre-Vista support but I realize it becomes more interesting as time advances.
>>>
>>> Thank you,
>>> - Jay
>>>
>>> _______________________________________________ M3devel mailing list M3devel at elegosoft.com https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>>>
>>>
>>> _______________________________________________
>>> M3devel mailing list
>>> M3devel at elegosoft.com
>>> https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>>
>> --
>> Rodney Bates
>> rodney.m.bates at acm.org
>>
>>
>
--
Rodney Bates
rodney.m.bates at acm.org
More information about the M3devel
mailing list