[M3devel] purported condition variable problems on Win32?

Rodney M. Bates rodney_bates at lcwb.coop
Sun Jul 24 18:29:34 CEST 2016


No, they don't.  But I don't think that presents a problem.

On 07/23/2016 06:37 PM, Jay K wrote:
> I don't think Signal/Broadcast require caller be holding any lock, is a problem.
>
>   - Jay
>
>
>
>  > Date: Sat, 23 Jul 2016 16:01:43 -0500
>  > From: rodney_bates at lcwb.coop
>  > To: jay.krell at cornell.edu; rodney.m.bates at acm.org
>  > CC: m3devel at elegosoft.com
>  > Subject: Re: [M3devel] purported condition variable problems on Win32?
>  >
>  >
>  >
>  > On 07/22/2016 03:20 AM, Jay K wrote:
>  > > Here is very weak circumstantial evidence that the locks can be merged:
>  > >
>  > >
>  > >
>  > > https://msdn.microsoft.com/en-us/library/windows/desktop/ms682052(v=vs.85).aspx
>  > >
>  > > "It is often convenient to use more than one condition variable with the same lock.
>  > > For example, an implementation of a reader/writer lock might use a single critical
>  > > section but separate condition variables for readers and writers.
>  > > "
>  > >
>  > >
>  > > i.e. the mapping isn't one lock to one condition, but the convenient
>  > > construct is multiple conditions to one lock, not multiple locks to one condition.
>  > >
>  > >
>  > > http://linux.die.net/man/3/pthread_cond_wait
>  > >
>  > > "
>  > > The effect of using more than one mutex for concurrent pthread_cond_timedwait() or
>  > > pthread_cond_wait() operations on the same condition variable is undefined; that is,
>  > > a condition variable becomes bound to a unique mutex when a thread waits on the condition
>  > > variable, and this (dynamic) binding shall end when the wait returns.
>  > > "
>  > >
>  >
>  > Yes, these are the usual rules about the relationship between mutexs and condition variables.
>  > There is sometimes a definite need for >1 condition variable associated with a single
>  > mutex. The other way around might be definable, but I think would create a big mess.
>  >
>  > Our Thread interface documents only that Wait(m,c) must have m locked at the time of
>  > call (and will return with it locked too, but not necessarily for the duration.
>  >
>  > Chapter 4 of SPwM3, "An Introduction to Programming with Threads" informally defines
>  > a _monitor_ as "some data, a mutex, and zero or more condition variables" and says
>  > further "A particular condition variable is always used in conjunction with the same
>  > mutex and its data". (SPwM3, 4.3.2, p93)
>  >
>  > It looks to me like the formal specification of Wait, Signal, and Broadcast in
>  > SRwM3, 5.3.3, p124 would allow a condition to simultaneously contain waiters
>  > that passed in different mutexes. Each thread remembers locally what mutex it
>  > wants to reacquire when it wakes up.
>  >
>  > I'm not sure we really want to commit to that. Client code's doing it would no doubt
>  > create even trickier code than this kind of stuff already is. Off hand, I have a
>  > hard time thinking of a reasonable use-case.
>  >
>  > If we decide to go with the informal statement, the implementations really should
>  > enforce it by storing the associated mutex in the condition variable, and verifying
>  > that additional waiters supply the same one.
>  >
>  > Probably, in most implementations, you could get away with moving the association
>  > of a condition variable from one mutex to another when no threads were waiting on
>  > the condition. Buy\t the only possible benefit I can think of would be memory savings,
>  > not needing as many condition variables, since the different mutex associations would
>  > be, well, mutually exclusive.
>  >
>  >
>  >
>  >
>  > > i.e. while the lock used with a condition variable can change through time,
>  > > there can be only one at a time.
>  > >
>  > > And then, presumably win32 condition variables and posix condition variables
>  > > and Modula-3 Thread.Condition are all kind of similar.
>  > >
>  > > Hm, ok, a problem with merging the locks, is it leaves no lock for Signal/Broadcast.
>  > > Perhaps they should use atomic/Interlocked.
>  > >
>  > > - Jay
>  > >
>  > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
>  > ---
>  > > From: jay.krell at cornell.edu
>  > > To: rodney.m.bates at acm.org
>  > > CC: m3devel at elegosoft.com
>  > > Subject: RE: [M3devel] purported condition variable problems on Win32?
>  > > Date: Thu, 21 Jul 2016 06:43:16 +0000
>  > >
>  > > That question was meant for the other bug.
>  > >
>  > > Can this one be fixed by rechecking if tickets is positive before decrementing it?
>  > > This seems like a big problem in Schmidt's code.
>  > >
>  > > I'm nervous about merging the locks, but maybe.
>  > > In particular, I'm the multiple release/acquires in service of the condition variable, also releasing/acquiring the user variable.
>  > > Though I realize that is actually necessarily part of what gets done.
>  > > Schmidt clearly did not merge the locks, and the Java version does.
>  > > The Java version also has its own simple and not-terrible critical section.
>  > > The Java version also disappeared in newer releases and I didn't track down what happened to it.
>  > >
>  > > - Jay
>  > >
>  > >
>  > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
>  > ---
>  > > From: jay.krell at cornell.edu
>  > > To: rodney.m.bates at acm.org
>  > > CC: m3devel at elegosoft.com
>  > > Subject: RE: [M3devel] purported condition variable problems on Win32?
>  > > Date: Thu, 21 Jul 2016 06:35:41 +0000
>  > >
>  > > Am I understanding this?
>  > > There are some waiters.
>  > > There is a broadcast ("wake all"). Some waiters are woken.
>  > > There are more waiters.
>  > > There is a signal ("wake one").
>  > > There is a chance that some of the original waiters will remain waiting while the later waiters are not.
>  > > That is, the signaled threads kind of steal the wake intended by the broadcast.
>  > >
>  > >
>  > > Is this unfairness or incorrectness?
>  > > It feels like unfairness.
>  > > It feels like condition variables are specified fairly weakly.
>  > >
>  > >
>  > > - Jay
>  > >
>  > >
>  > > > Date: Wed, 20 Jul 2016 11:43:27 -0500
>  > > > From: rodney_bates at lcwb.coop
>  > > > To: jay.krell at cornell.edu; rodney.m.bates at acm.org
>  > > > CC: m3devel at elegosoft.com
>  > > > Subject: Re: [M3devel] purported condition variable problems on Win32?
>  > > >
>  > > > In looking at ThreadWin32.m3 more, I think I see a significant bug.
>  > > >
>  > > > This gets confusing, since there are Modula-3 waits, signals,
>  > > > etc. and similar concepts in Win32, but which are not the
>  > > > same. I am going the try to put "M3-" and "Win-" prefixes on
>  > > > things to make this clearer. I cartainly need to do this
>  > > > for my own benefit, if for nobody else.
>  > > >
>  > > > Suppose two threads are both M3-waiting in Wait(m,c,...), further
>  > > > Win-waiting on c.waitEvent in WaitForMultipleObjects at :301.
>  > > > c.tickets = 0, and c.waiters = 2.
>  > > >
>  > > > Now, (M3-)Signal (c) happens. This does (Win-)SetEvent(c.waitEvent),
>  > > > which will allow both the waiters to Win-wakeup sometime soon.
>  > > > Before releasing c.lock, Signal increments c.tickets to 1 (supposedly
>  > > > to eventually allow only one waiter to M3-wakeup) and increments
>  > > > c.counter, setting things up so that the waitDone expression at :308
>  > > > will be TRUE for the waiting threads, the next time they evaluate it.
>  > > >
>  > > > Each waiter in turn Win-wakes, gets c.lock, sets waitDone TRUE,
>  > > > releases c.lock, and tries to acquire m. The problem arises
>  > > > because the second waiter can get to these steps before the first
>  > > > can acquire m, acquire c.lock, and DEC(c.tickets) at :322. The
>  > > > second to reach :308 will find c.tickets # 0, as did the first,
>  > > > and proceed.
>  > > >
>  > > > Each will eventually, do a M3-wake. If this were Posix condition
>  > > > variables, this would be wrong, because Posix says only one
>  > > > waiter proceeds. For M3, it's OK, if unnecessary, because M3
>  > > > says one or more waiters proceed. However, each will, in the
>  > > > process, get to :322, and DEC(c.tickets). Signal only provided
>  > > > one ticket, but the waiters have taken two. c.tickets goes
>  > > > negative.
>  > > >
>  > > > After this, when some thread can do another (M3-)Wait, and some
>  > > > other does another (M3-)Signal, c.tickets increments only back to
>  > > > zero, which means the new waiter will not M3-wake, even though it
>  > > > should.
>  > > >
>  > > >
>  > > > I am going to try to construct a test case that will force
>  > > > this scenario.
>  > > >
>  > > > See more below:
>  > > >
>  > > >
>  > > > On 07/18/2016 08:53 PM, Jay wrote:
>  > > > > I wrote this version. Right, based on Schmidt. I also found similar code in Java. There are comments as to both of these in the code.
>  > > > >
>  > > > > Otherwise previous version I think had a very large lock.
>  > > > >
>  > > > > Changing the < to !=, is that really ok? I get that 32bit overflow is possible.
>  > > > >
>  > > > > We can change to a 64bit counter if < is required. That really can't rollover. I.e. Even a 64bit cycle counter can't rollover, and this would advance much slower.
>  > > > >
>  > > > > I guess you are saying it would be unfair, which is ok.
>  > > > >
>  > > > > There is pattern called "directed notification" or such that might be preferable? It has a certain significant inefficiency though, like creating an event per notify. I think it is what boost's condition variable are using.
>  > > > >
>  > > > > I'll check on the unused event.
>  > > > >
>  > > > > - Jay
>  > > > >
>  > > > >> On Jul 18, 2016, at 7:25 AM, "Rodney M. Bates" <rodney_bates at lcwb.coop> wrote:
>  > > > >>
>  > > > >>
>  > > > >>
>  > > > >>
>  > > > >>
>  > > > >> I spent some time looking over the signal part of ThreadWin32.m3. A few
>  > > > >> comments:
>  > > > >>
>  > > > >> This looks like a very direct implementation of Schmidts' generation count
>  > > > >> algorithm, with a number of important little details additionally taken care
>  > > > >> of.
>  > > > >>
>  > > > >> On small difference is at line 308 (B, below) where Schmidt does
>  > > > >> c.counter>count instead of #. Usually, this would make no difference,
>  > > > >> since counter only increases, but on a long-running program, it could
>  > > > >> overflow. If this happened,as it often does, as silent wrap-around to
>  > > > >> FIRST(INTEGER), some threads could be trapped for an extremely long
>  > > > >> time, waiting for counter come back around greater than their count. As
>  > > > >> coded in ThreadWin32, they would be allowed to proceed normally.
>  > > > >>
>  > > > >> The only glitch would be that, if there were simultaneously waiting threads
>  > > > >> separated by NUMBER(INTEGER) truly different generations (but the same value
>  > > > >> of c.counter), the recent ones could unfairly proceed along with the
>  > > > >> extremely old ones. That seems extremely less likely than the
>  > > > >> mere existence of an overflow.
>  > > > >>
>  > > > >> Both type Condition and type Activation have a field waitEvent: HANDLE.
>  > > > >> Condition.waitEvent is extensively used, but Activation.waitEvent is
>  > > > >> only created at :537 and deleted at :554, but never used in any real
>  > > > >> way. It could be deleted.
>  > > > >>
>  > > > >> As for this code:
>  > > > >>
>  > > > >> EnterCriticalSection(conditionLock);
>  > > > >>
>  > > > >> (* Capture the value of the counter before we start waiting.
>  > > > >> * We will not stop waiting until the counter changes.
>  > > > >> * That is, we will not stop waiting until a signal
>  > > > >> * comes in after we start waiting.
>  > > > >> *)
>  > > > >>
>  > > > >> count := c.counter;
>  > > > >> INC(c.waiters);
>  > > > >>
>  > > > >> LeaveCriticalSection(conditionLock);
>  > > > >> m.release(); (* can this be moved to before taking conditionLock? *)
>  > > > >>
>  > > > >> No, it is important to sample and save c.counter in the order threads wait.
>  > > > >> Retaining m until this is done ensures this. Otherwise, some other thread
>  > > > >> that waited later could have altered c.counter before this one gets conditionLock
>  > > > >> and saves its copy of c.counter.
>  > > > >>
>  > > > >> I have made several comment changes to ThreadWin32.m3 that would have made
>  > > > >> it easier to vet. I will commit these, but only comments. It is either
>  > > > >> very difficult or impossible for me to test or even recompile this module,
>  > > > >> and, especially as fragile as this kind of code is, I don't want to commit
>  > > > >> any substantive changes untested and uncompiled.
>  > > > >>
>  > > > >> More below:
>  > > > >>
>  > > > >>> On 07/07/2016 04:24 AM, Jay K wrote:
>  > > > >>> So...I do NOT understand all of this.
>  > > > >>>
>  > > > >>>
>  > > > >>> However, comparing the three implementations
>  > > > >>> and attempting to understand ours,
>  > > > >>> I am struck by the following
>  > > > >>>
>  > > > >>> - Our broadcast seems ok, but
>  > > > >>> - our signal seems to wake everyone,
>  > > > >>> and then...they race a bit,
>  > > > >>> one will decide it is last, and
>  > > > >>> reset the event, but every prior waiter
>  > > > >>> is still woken.
>  > > > >>> Why not merge the following chunks:
>  > > > >>
>  > > > >> I don't think this will work. It is important that, once a thread has decided,
>  > > > >> at A, that it can fairly proceed, it is guaranteed to be the next thread to proceed
>  > > > >> (i.e, to return from [Alert]Wait), before it decrements c.tickets at C, and, thus,
>  > > > >> if it is also the last of its generation to proceed, resets c.waitevent. This
>  > > > >> is ensured by its having already reacquired m first, preventing any other thread
>  > > > >> from returning from its Wait yet.
>  > > >
>  > > > I was wrong about this. There is already no fairness being enforced here. Even with
>  > > > only one ticked, all waiters Win-wake at :301, the order they get c.lock at :307 is
>  > > > arbitrary anyway. There is no benefit in trying to control the order they reacquire m.
>  > > > c.counter has nothing to do with fairness. What it does is prevent Win-awakened waiters
>  > > > that waited after the most recent Signal/Broadcast from M3-waking, a correctness issue.
>  > > >
>  > > > Merging, as you proposed, is probably close to the fix for this bug.
>  > > >
>  > > >
>  > > > >>
>  > > > >> And it can't attempt to acquire m while already holding conditionLock, because this would
>  > > > >> invite deadlock, since, elsewhere, a thread can try to acquire conditionLock while
>  > > > >> already holding m, at line 265.
>  > > > >>
>  > > > >>>
>  > > > >>>
>  > > > >>> WHILE (NOT alerted) AND (NOT waitDone) DO
>  > > > >>> .
>  > > > >>> .
>  > > > >>> .
>  > > > >>> 1
>  > > > >>> EnterCriticalSection(conditionLock);
>  > > > >>> waitDone := (c.tickets # 0 AND c.counter # count);
>  > > > >> A:---------^
>  > > > >> B:--------------------------------------------------^
>  > > > >>> LeaveCriticalSection(conditionLock);
>  > > > >>> END; (* WHILE *)
>  > > > >>> IF waitDone THEN
>  > > > >>> alerted := FALSE;
>  > > > >>> END;
>  > > > >>> m.acquire();
>  > > > >>> 2
>  > > > >>> EnterCriticalSection(conditionLock);
>  > > > >>> DEC(c.waiters);
>  > > > >>> IF waitDone THEN esp. here.
>  > > > >>> DEC(c.tickets);
>  > > > >>
>  > > > >> C:---------------^
>  > > > >>> lastWaiter := (c.tickets = 0);
>  > > > >>> END;
>  > > > >>> LeaveCriticalSection(conditionLock);
>  > > > >>>
>  > > > >>>
>  > > > >>> That is, if we decrement tickets earlier
>  > > > >>> within the first critical section,
>  > > > >>> while we will still wake everyone,
>  > > > >>> only one will decide waitDone and the rest will keep looping.
>  > > > >>> A downside of this, perhaps, is that waking all for Broadcast
>  > > > >>> might be a little slower.
>  > > > >>> but more so, in both the "ptw32" (pthreads for win32) and Boost threads
>  > > > >>> implementations, they seem to deal with this differently than us,
>  > > > >>> and they each do about the same thing -- they use a counted semaphore.
>  > > > >>> In the boost case, it appears they duplicate the semaphore for every
>  > > > >>> notification generation, which seems expensive.
>  > > > >>>
>  > > > >>> Perhaps if they can guarantee some lifetimes, they don't need to duplicate it.
>  > > > >>> Or they can do their own reference counting?
>  > > > >>> jdk7 seems to looke like jdk6.
>  > > > >>> The code is gone in jdk8 and I can't easily find the delete in history.
>  > > > >>> The lock merging jdk does probably helps here too.
>  > > > >>> In fact they merge #1 and #2 above as a result.
>  > > > >>> But they still initially wake all threads for signal, not just broadcast.
>  > > > >>> There is also the problem that all the event waits
>  > > > >>> are followed by EnterCriticalSection (or jdk facsimile).
>  > > > >>> - Jay
>  > > > >>>
>  > > > >>>
>  > > > >>>
>  > > > >>>
>  > > > >>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
>  > -----!
>  > > > --!
>  > > > >> ---
>  > > > >>> From: jay.krell at cornell.edu
>  > > > >>> To: m3devel at elegosoft.com
>  > > > >>> Subject: RE: [M3devel] purported condition variable problems on Win32?
>  > > > >>> Date: Tue, 5 Jul 2016 08:26:36 +0000
>  > > > >>>
>  > > > >>> Here is another implementation to consider:
>  > > > >>>
>  > > > >>> https://github.com/boostorg/thread/blob/develop/include/boost/thread/win32/condition_variable.hpp
>  > > > >>>
>  > > > >>> - Jay
>  > > > >>>
>  > > > >>>
>  > > > >>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
>  > -----!
>  > > > --!
>  > > > >> ---
>  > > > >>> From: jay.krell at cornell.edu
>  > > > >>> To: m3devel at elegosoft.com
>  > > > >>> Date: Tue, 5 Jul 2016 08:19:23 +0000
>  > > > >>> Subject: [M3devel] purported condition variable problems on Win32?
>  > > > >>>
>  > > > >>> https://sourceforge.net/p/pthreads4w/code/ci/master/tree/README.CV
>  > > > >>>
>  > > > >>> vs.
>  > > > >>>
>  > > > >>> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
>  > > > >>> vs.
>  > > > >>> https://modula3.elegosoft.com/cgi-bin/cvsweb.cgi/cm3/m3-libs/m3core/src/thread/WIN32/ThreadWin32.m3?rev=1.210.2.1;content-type=text%2Fplain
>  > > > >>> I wrote this:
>  > > > >>> PROCEDURE XWait(m: Mutex; c: Condition; act: Activation;
>  > > > >>> alertable: BOOLEAN) RAISES {Alerted} =
>  > > > >>> (* LL = m on entry and exit, but not for the duration
>  > > > >>> * see C:\src\jdk-6u14-ea-src-b05-jrl-23_apr_2009\hotspot\agent\src\os\win32\Monitor.cpp
>  > > > >>> * NOTE that they merge the user lock and the condition lock.
>  > > > >>> * http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
>  > > > >>> * "3.3. The Generation Count Solution"
>  > > > >>> *)
>  > > > >>>
>  > > > >>> Do we have the problems described in README.CV?
>  > > > >>>
>  > > > >>> I haven't looked through the ACE code to see
>  > > > >>> to what extent they resemble solution 3.3, or if they
>  > > > >>> changed as a result of this discussion -- which I admit I don't understand
>  > > > >>> and haven't read closely.
>  > > > >>>
>  > > > >>>
>  > > > >>> Spurious wakeups are ok, though should be minimized.
>  > > > >>>
>  > > > >>> I'd still rather not drop pre-Vista support but I realize it becomes more interesting as time advances.
>  > > > >>>
>  > > > >>> Thank you,
>  > > > >>> - Jay
>  > > > >>>
>  > > > >>> _______________________________________________ M3devel mailing list M3devel at elegosoft.com https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>  > > > >>>
>  > > > >>>
>  > > > >>> _______________________________________________
>  > > > >>> M3devel mailing list
>  > > > >>> M3devel at elegosoft.com
>  > > > >>> https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>  > > > >>
>  > > > >> --
>  > > > >> Rodney Bates
>  > > > >> rodney.m.bates at acm.org
>  > > > >>
>  > > > >>
>  > > > >
>  > > >
>  > > > --
>  > > > Rodney Bates
>  > > > rodney.m.bates at acm.org
>  >
>  > --
>  > Rodney Bates
>  > rodney.m.bates at acm.org

-- 
Rodney Bates
rodney.m.bates at acm.org



More information about the M3devel mailing list