[M3devel] size vs. speed set_member / set_singleton serialized on NT386
Jay K
jay.krell at cornell.edu
Sat Jun 12 17:11:42 CEST 2010
We used to have (still in release branch):
m3-libs/m3core/src/Csupport/Common/parse.c:
void __cdecl set_singleton
ANSI(( long a, ulong* s))
KR((a, s) long a; ulong* s;)
{
long a_word = a / SET_GRAIN;
long a_bit = a % SET_GRAIN;
s[a_word] |= (1UL << a_bit);
}
long __cdecl set_member
ANSI(( long elt, ulong* set))
KR((elt, set) long elt; ulong* set;)
{
register long word = elt / SET_GRAIN;
register long bit = elt % SET_GRAIN;
return (set[word] & (1UL << bit)) != 0;
}
Both backends generate calls to these functions.
In the gcc backend we now inline the equivalent.
In the NT backend, I use the bt and bts instructions.
The resulting code is very size optimized.
bt and bts do the shift, the index, the and/or. All in one.
I don't think I knew it at the time, but bt/bts are interlocked/seralized/atomic -- slow.
Maybe the wrong choice here?
Should instead inline the equivalent mask/index/and/or like the gcc backend does?
- Jay
More information about the M3devel
mailing list