[M3devel] purported condition variable problems on Win32?
Rodney M. Bates
rodney_bates at lcwb.coop
Sat Jul 23 23:02:15 CEST 2016
On 07/22/2016 03:20 AM, Jay K wrote:
> Here is very weak circumstantial evidence that the locks can be merged:
>
>
>
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms682052(v=vs.85).aspx
>
> "It is often convenient to use more than one condition variable with the same lock.
> For example, an implementation of a reader/writer lock might use a single critical
> section but separate condition variables for readers and writers.
> "
>
>
> i.e. the mapping isn't one lock to one condition, but the convenient
> construct is multiple conditions to one lock, not multiple locks to one condition.
>
>
> http://linux.die.net/man/3/pthread_cond_wait
>
> "
> The effect of using more than one mutex for concurrent pthread_cond_timedwait() or
> pthread_cond_wait() operations on the same condition variable is undefined; that is,
> a condition variable becomes bound to a unique mutex when a thread waits on the condition
> variable, and this (dynamic) binding shall end when the wait returns.
> "
>
Yes, these are the usual rules about the relationship between mutexs and condition variables.
There is sometimes a definite need for >1 condition variable associated with a single
mutex. The other way around might be definable, but I think would create a big mess.
Our Thread interface documents only that Wait(m,c) must have m locked at the time of
call (and will return with it locked too, but not necessarily for the duration.
Chapter 4 of SPwM3, "An Introduction to Programming with Threads" informally defines
a _monitor_ as "some data, a mutex, and zero or more condition variables" and says
further "A particular condition variable is always used in conjunction with the same
mutex and its data". (SPwM3, 4.3.2, p93)
It looks to me like the formal specification of Wait, Signal, and Broadcast in
SRwM3, 5.3.3, p124 would allow a condition to simultaneously contain waiters
that passed in different mutexes. Each thread remembers locally what mutex it
wants to reacquire when it wakes up.
I'm not sure we really want to commit to that. Client code's doing it would no doubt
create even trickier code than this kind of stuff already is. Off hand, I have a
hard time thinking of a reasonable use-case.
If we decide to go with the informal statement, the implementations really should
enforce it by storing the associated mutex in the condition variable, and verifying
that additional waiters supply the same one.
Probably, in most implementations, you could get away with moving the association
of a condition variable from one mutex to another when no threads were waiting on
the condition. Buy\t the only possible benefit I can think of would be memory savings,
not needing as many condition variables, since the different mutex associations would
be, well, mutually exclusive.
> i.e. while the lock used with a condition variable can change through time,
> there can be only one at a time.
>
> And then, presumably win32 condition variables and posix condition variables
> and Modula-3 Thread.Condition are all kind of similar.
>
> Hm, ok, a problem with merging the locks, is it leaves no lock for Signal/Broadcast.
> Perhaps they should use atomic/Interlocked.
>
> - Jay
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> From: jay.krell at cornell.edu
> To: rodney.m.bates at acm.org
> CC: m3devel at elegosoft.com
> Subject: RE: [M3devel] purported condition variable problems on Win32?
> Date: Thu, 21 Jul 2016 06:43:16 +0000
>
> That question was meant for the other bug.
>
> Can this one be fixed by rechecking if tickets is positive before decrementing it?
> This seems like a big problem in Schmidt's code.
>
> I'm nervous about merging the locks, but maybe.
> In particular, I'm the multiple release/acquires in service of the condition variable, also releasing/acquiring the user variable.
> Though I realize that is actually necessarily part of what gets done.
> Schmidt clearly did not merge the locks, and the Java version does.
> The Java version also has its own simple and not-terrible critical section.
> The Java version also disappeared in newer releases and I didn't track down what happened to it.
>
> - Jay
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> From: jay.krell at cornell.edu
> To: rodney.m.bates at acm.org
> CC: m3devel at elegosoft.com
> Subject: RE: [M3devel] purported condition variable problems on Win32?
> Date: Thu, 21 Jul 2016 06:35:41 +0000
>
> Am I understanding this?
> There are some waiters.
> There is a broadcast ("wake all"). Some waiters are woken.
> There are more waiters.
> There is a signal ("wake one").
> There is a chance that some of the original waiters will remain waiting while the later waiters are not.
> That is, the signaled threads kind of steal the wake intended by the broadcast.
>
>
> Is this unfairness or incorrectness?
> It feels like unfairness.
> It feels like condition variables are specified fairly weakly.
>
>
> - Jay
>
>
> > Date: Wed, 20 Jul 2016 11:43:27 -0500
> > From: rodney_bates at lcwb.coop
> > To: jay.krell at cornell.edu; rodney.m.bates at acm.org
> > CC: m3devel at elegosoft.com
> > Subject: Re: [M3devel] purported condition variable problems on Win32?
> >
> > In looking at ThreadWin32.m3 more, I think I see a significant bug.
> >
> > This gets confusing, since there are Modula-3 waits, signals,
> > etc. and similar concepts in Win32, but which are not the
> > same. I am going the try to put "M3-" and "Win-" prefixes on
> > things to make this clearer. I cartainly need to do this
> > for my own benefit, if for nobody else.
> >
> > Suppose two threads are both M3-waiting in Wait(m,c,...), further
> > Win-waiting on c.waitEvent in WaitForMultipleObjects at :301.
> > c.tickets = 0, and c.waiters = 2.
> >
> > Now, (M3-)Signal (c) happens. This does (Win-)SetEvent(c.waitEvent),
> > which will allow both the waiters to Win-wakeup sometime soon.
> > Before releasing c.lock, Signal increments c.tickets to 1 (supposedly
> > to eventually allow only one waiter to M3-wakeup) and increments
> > c.counter, setting things up so that the waitDone expression at :308
> > will be TRUE for the waiting threads, the next time they evaluate it.
> >
> > Each waiter in turn Win-wakes, gets c.lock, sets waitDone TRUE,
> > releases c.lock, and tries to acquire m. The problem arises
> > because the second waiter can get to these steps before the first
> > can acquire m, acquire c.lock, and DEC(c.tickets) at :322. The
> > second to reach :308 will find c.tickets # 0, as did the first,
> > and proceed.
> >
> > Each will eventually, do a M3-wake. If this were Posix condition
> > variables, this would be wrong, because Posix says only one
> > waiter proceeds. For M3, it's OK, if unnecessary, because M3
> > says one or more waiters proceed. However, each will, in the
> > process, get to :322, and DEC(c.tickets). Signal only provided
> > one ticket, but the waiters have taken two. c.tickets goes
> > negative.
> >
> > After this, when some thread can do another (M3-)Wait, and some
> > other does another (M3-)Signal, c.tickets increments only back to
> > zero, which means the new waiter will not M3-wake, even though it
> > should.
> >
> >
> > I am going to try to construct a test case that will force
> > this scenario.
> >
> > See more below:
> >
> >
> > On 07/18/2016 08:53 PM, Jay wrote:
> > > I wrote this version. Right, based on Schmidt. I also found similar code in Java. There are comments as to both of these in the code.
> > >
> > > Otherwise previous version I think had a very large lock.
> > >
> > > Changing the < to !=, is that really ok? I get that 32bit overflow is possible.
> > >
> > > We can change to a 64bit counter if < is required. That really can't rollover. I.e. Even a 64bit cycle counter can't rollover, and this would advance much slower.
> > >
> > > I guess you are saying it would be unfair, which is ok.
> > >
> > > There is pattern called "directed notification" or such that might be preferable? It has a certain significant inefficiency though, like creating an event per notify. I think it is what boost's condition variable are using.
> > >
> > > I'll check on the unused event.
> > >
> > > - Jay
> > >
> > >> On Jul 18, 2016, at 7:25 AM, "Rodney M. Bates" <rodney_bates at lcwb.coop> wrote:
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> I spent some time looking over the signal part of ThreadWin32.m3. A few
> > >> comments:
> > >>
> > >> This looks like a very direct implementation of Schmidts' generation count
> > >> algorithm, with a number of important little details additionally taken care
> > >> of.
> > >>
> > >> On small difference is at line 308 (B, below) where Schmidt does
> > >> c.counter>count instead of #. Usually, this would make no difference,
> > >> since counter only increases, but on a long-running program, it could
> > >> overflow. If this happened,as it often does, as silent wrap-around to
> > >> FIRST(INTEGER), some threads could be trapped for an extremely long
> > >> time, waiting for counter come back around greater than their count. As
> > >> coded in ThreadWin32, they would be allowed to proceed normally.
> > >>
> > >> The only glitch would be that, if there were simultaneously waiting threads
> > >> separated by NUMBER(INTEGER) truly different generations (but the same value
> > >> of c.counter), the recent ones could unfairly proceed along with the
> > >> extremely old ones. That seems extremely less likely than the
> > >> mere existence of an overflow.
> > >>
> > >> Both type Condition and type Activation have a field waitEvent: HANDLE.
> > >> Condition.waitEvent is extensively used, but Activation.waitEvent is
> > >> only created at :537 and deleted at :554, but never used in any real
> > >> way. It could be deleted.
> > >>
> > >> As for this code:
> > >>
> > >> EnterCriticalSection(conditionLock);
> > >>
> > >> (* Capture the value of the counter before we start waiting.
> > >> * We will not stop waiting until the counter changes.
> > >> * That is, we will not stop waiting until a signal
> > >> * comes in after we start waiting.
> > >> *)
> > >>
> > >> count := c.counter;
> > >> INC(c.waiters);
> > >>
> > >> LeaveCriticalSection(conditionLock);
> > >> m.release(); (* can this be moved to before taking conditionLock? *)
> > >>
> > >> No, it is important to sample and save c.counter in the order threads wait.
> > >> Retaining m until this is done ensures this. Otherwise, some other thread
> > >> that waited later could have altered c.counter before this one gets conditionLock
> > >> and saves its copy of c.counter.
> > >>
> > >> I have made several comment changes to ThreadWin32.m3 that would have made
> > >> it easier to vet. I will commit these, but only comments. It is either
> > >> very difficult or impossible for me to test or even recompile this module,
> > >> and, especially as fragile as this kind of code is, I don't want to commit
> > >> any substantive changes untested and uncompiled.
> > >>
> > >> More below:
> > >>
> > >>> On 07/07/2016 04:24 AM, Jay K wrote:
> > >>> So...I do NOT understand all of this.
> > >>>
> > >>>
> > >>> However, comparing the three implementations
> > >>> and attempting to understand ours,
> > >>> I am struck by the following
> > >>>
> > >>> - Our broadcast seems ok, but
> > >>> - our signal seems to wake everyone,
> > >>> and then...they race a bit,
> > >>> one will decide it is last, and
> > >>> reset the event, but every prior waiter
> > >>> is still woken.
> > >>> Why not merge the following chunks:
> > >>
> > >> I don't think this will work. It is important that, once a thread has decided,
> > >> at A, that it can fairly proceed, it is guaranteed to be the next thread to proceed
> > >> (i.e, to return from [Alert]Wait), before it decrements c.tickets at C, and, thus,
> > >> if it is also the last of its generation to proceed, resets c.waitevent. This
> > >> is ensured by its having already reacquired m first, preventing any other thread
> > >> from returning from its Wait yet.
> >
> > I was wrong about this. There is already no fairness being enforced here. Even with
> > only one ticked, all waiters Win-wake at :301, the order they get c.lock at :307 is
> > arbitrary anyway. There is no benefit in trying to control the order they reacquire m.
> > c.counter has nothing to do with fairness. What it does is prevent Win-awakened waiters
> > that waited after the most recent Signal/Broadcast from M3-waking, a correctness issue.
> >
> > Merging, as you proposed, is probably close to the fix for this bug.
> >
> >
> > >>
> > >> And it can't attempt to acquire m while already holding conditionLock, because this would
> > >> invite deadlock, since, elsewhere, a thread can try to acquire conditionLock while
> > >> already holding m, at line 265.
> > >>
> > >>>
> > >>>
> > >>> WHILE (NOT alerted) AND (NOT waitDone) DO
> > >>> .
> > >>> .
> > >>> .
> > >>> 1
> > >>> EnterCriticalSection(conditionLock);
> > >>> waitDone := (c.tickets # 0 AND c.counter # count);
> > >> A:---------^
> > >> B:--------------------------------------------------^
> > >>> LeaveCriticalSection(conditionLock);
> > >>> END; (* WHILE *)
> > >>> IF waitDone THEN
> > >>> alerted := FALSE;
> > >>> END;
> > >>> m.acquire();
> > >>> 2
> > >>> EnterCriticalSection(conditionLock);
> > >>> DEC(c.waiters);
> > >>> IF waitDone THEN esp. here.
> > >>> DEC(c.tickets);
> > >>
> > >> C:---------------^
> > >>> lastWaiter := (c.tickets = 0);
> > >>> END;
> > >>> LeaveCriticalSection(conditionLock);
> > >>>
> > >>>
> > >>> That is, if we decrement tickets earlier
> > >>> within the first critical section,
> > >>> while we will still wake everyone,
> > >>> only one will decide waitDone and the rest will keep looping.
> > >>> A downside of this, perhaps, is that waking all for Broadcast
> > >>> might be a little slower.
> > >>> but more so, in both the "ptw32" (pthreads for win32) and Boost threads
> > >>> implementations, they seem to deal with this differently than us,
> > >>> and they each do about the same thing -- they use a counted semaphore.
> > >>> In the boost case, it appears they duplicate the semaphore for every
> > >>> notification generation, which seems expensive.
> > >>>
> > >>> Perhaps if they can guarantee some lifetimes, they don't need to duplicate it.
> > >>> Or they can do their own reference counting?
> > >>> jdk7 seems to looke like jdk6.
> > >>> The code is gone in jdk8 and I can't easily find the delete in history.
> > >>> The lock merging jdk does probably helps here too.
> > >>> In fact they merge #1 and #2 above as a result.
> > >>> But they still initially wake all threads for signal, not just broadcast.
> > >>> There is also the problem that all the event waits
> > >>> are followed by EnterCriticalSection (or jdk facsimile).
> > >>> - Jay
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
> > --!
> > >> ---
> > >>> From: jay.krell at cornell.edu
> > >>> To: m3devel at elegosoft.com
> > >>> Subject: RE: [M3devel] purported condition variable problems on Win32?
> > >>> Date: Tue, 5 Jul 2016 08:26:36 +0000
> > >>>
> > >>> Here is another implementation to consider:
> > >>>
> > >>> https://github.com/boostorg/thread/blob/develop/include/boost/thread/win32/condition_variable.hpp
> > >>>
> > >>> - Jay
> > >>>
> > >>>
> > >>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------!
> > --!
> > >> ---
> > >>> From: jay.krell at cornell.edu
> > >>> To: m3devel at elegosoft.com
> > >>> Date: Tue, 5 Jul 2016 08:19:23 +0000
> > >>> Subject: [M3devel] purported condition variable problems on Win32?
> > >>>
> > >>> https://sourceforge.net/p/pthreads4w/code/ci/master/tree/README.CV
> > >>>
> > >>> vs.
> > >>>
> > >>> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
> > >>> vs.
> > >>> https://modula3.elegosoft.com/cgi-bin/cvsweb.cgi/cm3/m3-libs/m3core/src/thread/WIN32/ThreadWin32.m3?rev=1.210.2.1;content-type=text%2Fplain
> > >>> I wrote this:
> > >>> PROCEDURE XWait(m: Mutex; c: Condition; act: Activation;
> > >>> alertable: BOOLEAN) RAISES {Alerted} =
> > >>> (* LL = m on entry and exit, but not for the duration
> > >>> * see C:\src\jdk-6u14-ea-src-b05-jrl-23_apr_2009\hotspot\agent\src\os\win32\Monitor.cpp
> > >>> * NOTE that they merge the user lock and the condition lock.
> > >>> * http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
> > >>> * "3.3. The Generation Count Solution"
> > >>> *)
> > >>>
> > >>> Do we have the problems described in README.CV?
> > >>>
> > >>> I haven't looked through the ACE code to see
> > >>> to what extent they resemble solution 3.3, or if they
> > >>> changed as a result of this discussion -- which I admit I don't understand
> > >>> and haven't read closely.
> > >>>
> > >>>
> > >>> Spurious wakeups are ok, though should be minimized.
> > >>>
> > >>> I'd still rather not drop pre-Vista support but I realize it becomes more interesting as time advances.
> > >>>
> > >>> Thank you,
> > >>> - Jay
> > >>>
> > >>> _______________________________________________ M3devel mailing list M3devel at elegosoft.com https://m3lists.elegosoft.com/mailman/listinfo/m3devel
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> M3devel mailing list
> > >>> M3devel at elegosoft.com
> > >>> https://m3lists.elegosoft.com/mailman/listinfo/m3devel
> > >>
> > >> --
> > >> Rodney Bates
> > >> rodney.m.bates at acm.org
> > >>
> > >>
> > >
> >
> > --
> > Rodney Bates
> > rodney.m.bates at acm.org
--
Rodney Bates
rodney.m.bates at acm.org
More information about the M3devel
mailing list