[M3devel] size vs. speed set_member / set_singleton serialized on NT386

Sat Jun 12 17:11:42 CEST 2010

We used to have (still in release branch):

m3-libs/m3core/src/Csupport/Common/parse.c:

void __cdecl set_singleton
    ANSI((      long a, ulong* s))
      KR((a, s) long a; ulong* s;)
{
  long a_word = a / SET_GRAIN;
  long a_bit  = a % SET_GRAIN;
  s[a_word] |= (1UL << a_bit);
}

long __cdecl set_member
    ANSI((          long elt, ulong* set))
      KR((elt, set) long elt; ulong* set;)
{
  register long word = elt / SET_GRAIN;
  register long bit  = elt % SET_GRAIN;
  return (set[word] & (1UL << bit)) != 0;
}

Both backends generate calls to these functions.

In the gcc backend we now inline the equivalent.
In the NT backend, I use the bt and bts instructions.
 The resulting code is very size optimized.
  bt and bts do the shift, the index, the and/or. All in one.

I don't think I knew it at the time, but bt/bts are interlocked/seralized/atomic -- slow.

Maybe the wrong choice here?
Should instead inline the equivalent mask/index/and/or like the gcc backend does?

 - Jay