[M3devel] purported condition variable problems on Win32?

Rodney M. Bates rodney_bates at lcwb.coop
Sun Jul 17 23:06:19 CEST 2016


I spent some time looking over the signal part of ThreadWin32.m3.  A few
comments:

This looks like a very direct implementation of Schmidts' generation count
algorithm, with a number of important little details additionally taken care
of.

On small difference is at line 308 (B, below) where Schmidt does
c.counter>count instead of #.  Usually, this would make no difference,
since counter only increases, but on a long-running program, it could
overflow.  If this happened,as it often does, as silent wrap-around to
FIRST(INTEGER), some threads could be trapped for an extremely long
time, waiting for counter come back around greater than their count.  As
coded in ThreadWin32, they would be allowed to proceed normally.

The only glitch would be that, if there were simultaneously waiting threads
separated by NUMBER(INTEGER) truly different generations (but the same value
of c.counter), the recent ones could unfairly proceed along with the
extremely old ones.  That seems extremely less likely than the
mere existence of an overflow.

Both type Condition and type Activation have a field waitEvent: HANDLE.
Condition.waitEvent is extensively used, but Activation.waitEvent is
only created at :537 and deleted at :554, but never used in any real
way.  It could be deleted.

As for this code:

     EnterCriticalSection(conditionLock);

       (* Capture the value of the counter before we start waiting.
        * We will not stop waiting until the counter changes.
        * That is, we will not stop waiting until a signal
        * comes in after we start waiting.
        *)

       count := c.counter;
       INC(c.waiters);

     LeaveCriticalSection(conditionLock);
     m.release(); (* can this be moved to before taking conditionLock? *)

No, it is important to sample and save c.counter in the order threads wait.
Retaining m until this is done ensures this.  Otherwise, some other thread
that waited later could have altered c.counter before this one gets conditionLock
and saves its copy of c.counter.

I have made several comment changes to ThreadWin32.m3 that would have made
it easier to vet.  I will commit these, but only comments.  It is either
very difficult or impossible for me to test or even recompile this module,
and, especially as fragile as this kind of code is, I don't want to commit
any substantive changes untested and uncompiled.

More below:

On 07/07/2016 04:24 AM, Jay K wrote:
> So...I do NOT understand all of this.
>
>
> However, comparing the three implementations
> and attempting to understand ours,
> I am struck by the following
>
>   - Our broadcast seems ok, but
>   - our signal seems to wake everyone,
>     and then...they race a bit,
>     one will decide it is last, and
>     reset the event, but every prior waiter
>     is still woken.
> Why not merge the following chunks:

I don't think this will work.  It is important that, once a thread has decided,
at A, that it can fairly proceed, it is guaranteed to be the next thread to proceed
(i.e, to return from [Alert]Wait), before it decrements c.tickets at C, and, thus,
if it is also the last of its generation to proceed, resets c.waitevent.  This
is ensured by its having already reacquired m first, preventing any other thread
from returning from its Wait yet.

And it can't attempt to acquire m while already holding conditionLock, because this would
invite deadlock, since, elsewhere, a thread can try to acquire conditionLock while
already holding m, at line 265.

>
>
>      WHILE (NOT alerted) AND (NOT waitDone) DO
> .
> .
> .
>   1
>        EnterCriticalSection(conditionLock);
>          waitDone := (c.tickets # 0 AND c.counter # count);
A:---------^
B:--------------------------------------------------^
>        LeaveCriticalSection(conditionLock);
>      END; (* WHILE *)
>      IF waitDone THEN
>        alerted := FALSE;
>      END;
>      m.acquire();
> 2
>      EnterCriticalSection(conditionLock);
>        DEC(c.waiters);
>        IF waitDone THEN   esp. here.
>          DEC(c.tickets);

C:---------------^
>          lastWaiter := (c.tickets = 0);
>        END;
>      LeaveCriticalSection(conditionLock);
>
>
>   That is, if we decrement tickets earlier
>   within the first critical section,
>   while we will still wake everyone,
>   only one will decide waitDone and the rest will keep looping.
>   A downside of this, perhaps, is that waking all for Broadcast
>   might be a little slower.
>   but more so, in both the "ptw32" (pthreads for win32) and Boost threads
>   implementations, they seem to deal with this differently than us,
>   and they each do about the same thing -- they use a counted semaphore.
>   In the boost case, it appears they duplicate the semaphore for every
>   notification generation, which seems expensive.
>
>   Perhaps if they can guarantee some lifetimes, they don't need to duplicate it.
>   Or they can do their own reference counting?
>   jdk7 seems to looke like jdk6.
>   The code is gone in jdk8 and I can't easily find the delete in history.
>    The lock merging jdk does probably helps here too.
>    In fact they merge #1 and #2 above as a result.
>    But they still initially wake all threads for signal, not just broadcast.
>    There is also the problem that all the event waits
>    are followed by EnterCriticalSection (or jdk facsimile).
>    - Jay
>
>
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> From: jay.krell at cornell.edu
> To: m3devel at elegosoft.com
> Subject: RE: [M3devel] purported condition variable problems on Win32?
> Date: Tue, 5 Jul 2016 08:26:36 +0000
>
> Here is another implementation to consider:
>
> https://github.com/boostorg/thread/blob/develop/include/boost/thread/win32/condition_variable.hpp
>
>   - Jay
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> From: jay.krell at cornell.edu
> To: m3devel at elegosoft.com
> Date: Tue, 5 Jul 2016 08:19:23 +0000
> Subject: [M3devel] purported condition variable problems on Win32?
>
>   https://sourceforge.net/p/pthreads4w/code/ci/master/tree/README.CV
>
>   vs.
>
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
>   vs.
>   https://modula3.elegosoft.com/cgi-bin/cvsweb.cgi/cm3/m3-libs/m3core/src/thread/WIN32/ThreadWin32.m3?rev=1.210.2.1;content-type=text%2Fplain
>   I wrote this:
>   PROCEDURE XWait(m: Mutex; c: Condition; act: Activation;
>                  alertable: BOOLEAN) RAISES {Alerted} =
>    (* LL = m on entry and exit, but not for the duration
>     * see C:\src\jdk-6u14-ea-src-b05-jrl-23_apr_2009\hotspot\agent\src\os\win32\Monitor.cpp
>     * NOTE that they merge the user lock and the condition lock.
>     * http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
>     *   "3.3. The Generation Count Solution"
>     *)
>
>   Do we have the problems described in README.CV?
>
>   I haven't looked through the ACE code to see
>   to what extent they resemble solution 3.3, or if they
>   changed as a result of this discussion -- which I admit I don't understand
> and haven't read closely.
>
>
> Spurious wakeups are ok, though should be minimized.
>
> I'd still rather not drop pre-Vista support but I realize it becomes more interesting as time advances.
>
>   Thank you,
>    - Jay
>
> _______________________________________________ M3devel mailing list M3devel at elegosoft.com https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>
>
> _______________________________________________
> M3devel mailing list
> M3devel at elegosoft.com
> https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>

-- 
Rodney Bates
rodney.m.bates at acm.org



More information about the M3devel mailing list