[M3devel] further reducing cloned headers wrt pthread?

Wed Feb 4 23:11:52 CET 2009

>> I am very leery of this proposal -- the code will be inherently opaque
>> and unmaintainable. I don't see any advantage to it.

The entire proposal or the optimizations?

The original unoptimized proposal seems like a small change mostly.
I checked and the indirection/heap allocation is already there
 for cond and mutex, but not for pthread_t itself.
Factoring out the size I think is a small change.

On the other hand, we can also optimize it, pretty much
locking in the platform-specificity. It's a tough decision to me.
I don't mind the deoptimizations of const-to-var, or adding
some function calls, but heap allocs imho are among the things
to definitely avoid if not needed. These are untraced as well,
so the argument that Modula-3 heap alloc is efficient doesn't apply.

One caveat that bothers me though is, like with sem_t,
I don't want to have types that are declared "incorrectly".
I'd like types that you can only have references too.
Probably in that case "give up", declare them as ADDRESS,
losing the type safety -- pthread_cond_foo could take a mutex_t
and no compilation error.

The idea of making them all ADDRESS and adding C functions to alloc/cleanup
is also good imho. That allows for one of the optimized forms --
not where the space is at the end of the Thread.T, but where the
ADDRESS field is the data itself.

I got hung up on pthread_attr_t here because it was efficiently
stack allocated and this proposal would have really deoptimized that.
The C code I showed avoids that though.
 Albeit only in the face of creating a thread -- an extra heap allocation
 per thread create probably not a big deal.

Clearly I'm ambivalent.

Later,

 - Jay

----------------------------------------
> From: jay.krell at cornell.edu
> To: hosking at cs.purdue.edu
> Date: Wed, 4 Feb 2009 09:42:12 +0000
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] further reducing cloned headers wrt pthread?
>
>
> It gains something, but maybe it isn't enough
> to be worthwhile. The issue is in the subjectivity.
>
>
> It would remove e.g. the following system-dependent lines:
>
>
> Linux:
> pthread_t = ADDRESS;
> pthread_cond_t = RECORD data: ARRAY[1..6] OF LONGINT; END;
> pthread_key_t = uint32_t;
>
>
> Linux/32:
> pthread_attr_t = ARRAY[1..9] OF INTEGER;
> pthread_mutex_t = ARRAY[1..6] OF INTEGER;
>
>
> Linux/64:
> pthread_attr_t = ARRAY[1..7] OF INTEGER;
> pthread_mutex_t = ARRAY[1..5] OF INTEGER;
>
>
> FreeBSD:
> pthread_t = ADDRESS;
> pthread_attr_t = ADDRESS;
> pthread_mutex_t = ADDRESS;
> pthread_cond_t = ADDRESS;
> pthread_key_t = int;
>
>
> HP-UX:
> (* trick from darwin-generic/Upthread.i3 *)
> X32 = ORD(BITSIZE(INTEGER) = 32);
> X64 = ORD(BITSIZE(INTEGER) = 64);
> pthread_t = int32_t; (* opaque *)
> pthread_attr_t = int32_t; (* opaque *)
> pthread_mutex_t = RECORD opaque: ARRAY [1..11 * X64 + 22 * X32] OF INTEGER; END; (* 88 opaque bytes with size_t alignment *)
> pthread_cond_t = RECORD opaque: ARRAY [1..7 * X64 + 14 * X32] OF INTEGER; END; (* 56 opaque bytes with size_t alignment *)
> pthread_key_t = int32_t; (* opaque *)
>
>
> Cygwin:
> pthread_t = ADDRESS; (* opaque *)
> pthread_attr_t = ADDRESS; (* opaque *)
> pthread_mutex_t = ADDRESS; (* opaque *)
> pthread_cond_t = ADDRESS; (* opaque *)
> pthread_key_t = ADDRESS; (* opaque *)
>
>
> Solaris:
> pthread_t = int32_t; (* opaque *)
> pthread_attr_t = int32_t; (* opaque *)
> pthread_mutex_t = RECORD opaque: ARRAY [1..4] OF LONGINT; END; (* 32 bytes with 64 bit alignment *)
> pthread_cond_t = RECORD opaque: ARRAY [1..2] OF LONGINT; END; (* 16 bytes with 64 bit alignment *)
> pthread_key_t = int32_t; (* opaque *)
>
>
> Darwin: (only ppc32 currently)
> pthread_t = INTEGER; (* opaque *)
> pthread_attr_t = RECORD opaque: ARRAY [1..10] OF INTEGER; END;
> pthread_mutex_t = RECORD opaque: ARRAY [1..11] OF INTEGER; END;
> pthread_cond_t = RECORD opaque: ARRAY [1..7] OF INTEGER; END;
> pthread_key_t = INTEGER; (* opaque *)
>
>
> (plus AIX, Irix, VMS, Tru64.)
>
>
> Another approach would be make them all ADDRESS and introduce a portable
> C layer of "varything thickness", using the same logic.
> It would look just like the native pthreads, but there'd be extra allocate/cleanup
> calls -- to do the heap alloc/cleanup when the underlying types are larger than addresses.
> The two layers would be clear and simple, the cost would be the same,
> but there would be the conceptual cost of two simple layers instead of one
> just one slightly complicated layer.
>
>
> Another approach is maybe make them all addresses on new platforms and introduce
> the C layer only on new platforms. Again, about the only change in the Modula-3
> code is extra alloc/cleanup calls.
>
>
> And again, some/all of the code already has the indirection/heap allocation unconditionally.
>
>
> And again, maybe not worth it. I show all the system-dependent code, attempting
> to portray in its worst light by showing all of it, but maybe it's really not a lot.
>
>
> For the attr type, we can do something specific to its use.
> There is just one use, and we can address it with the following function written in C..
> eh..I'll send a diff later tonight/this week I think.
>
>
> pthread_t and pthread_key_t always happen to be address-sized or smaller.
> Maybe just declare them both to be address and assert their size in some C code.
> That might waste a few bytes esp. on 64 bit platforms, or it might merely fill in the padding-for-alignment.
>
> For example, we have:
>
>
> TYPE
> Activation = UNTRACED REF RECORD
> (* global doubly-linked, circular list of all active threads *)
> next, prev: Activation := NIL; (* LL = activeMu *)
> (* thread handle *)
> handle: pthread_t; (* LL = activeMu *)
> (* base of thread stack for use by GC *)
> stackbase: ADDRESS := NIL; (* LL = activeMu *)
>
>
> so on 64 bit platforms where pthread_t is a 32bit integer, it is taking up 64 bits anyway.
> There are two static pthread_key_ts, so making them address would waste 8 bytes on some/many 64bit platforms.
>
>
> Leaving only cond and mutex.
> Some of the platforms declare more types such as rwlock, rwlockattr, but they are never used.
> rwlock is a useful type though.
>
>
> - Jay
>
>
> ----------------------------------------
>> From: hosking at cs.purdue.edu
>> To: jay.krell at cornell.edu
>> Date: Wed, 4 Feb 2009 12:53:54 +1100
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] further reducing cloned headers wrt pthread?
>>
>> I am very leery of this proposal -- the code will be inherently opaque
>> and unmaintainable. I don't see any advantage to it.
>>
>> On 4 Feb 2009, at 11:06, Jay wrote:
>>
>>>
>>> There are a few possibilities:
>>>
>>>
>>> Roughly:
>>>
>>> Where there is
>>>
>>> INTERFACE Upthread;
>>>
>>> TYPE
>>> pthread_t = ... system specific ...
>>> pthread_cond_t = ... system specific ...
>>> pthread_mutex_t = ... system specific ...
>>>
>>> PROCEDURE pthread_thread_init_or_whatever(VAR pthread_t);
>>> PROCEDURE pthread_mutex_init_or_whatever(VAR pthread_mutex_t);
>>> PROCEDURE pthread_cond_init_or_whatever(VAR pthread_cond_t);
>>>
>>> MODULE PThread;
>>> VAR
>>> a: pthread_t;
>>> b: pthread_cond_t;
>>> c: pthread_mutex_t;
>>>
>>> PROCEDURE Foo() =
>>> BEGIN
>>> Upthread.pthread_thread_init_or_whatever(a);
>>> Upthread.pthread_cond_init_or_whatever(b);
>>> Upthread.pthread_mutex_init_or_whatever(c);
>>> END Foo;
>>>
>>> change to:
>>>
>>> INTERFACE Upthread;
>>>
>>> TYPE
>>> pthread_t = RECORD END; or whatever is correct for an opaque
>>> preferably unique type
>>> pthread_cond_t = RECORD END; ditto
>>> pthread_mutex_t = RECORD END; ditto
>>>
>>> PROCEDURE pthread_thread_init_or_whatever(VAR pthread_t);
>>> PROCEDURE pthread_mutex_init_or_whatever(VAR pthread_mutex_t);
>>> PROCEDURE pthread_cond_init_or_whatever(VAR pthread_cond_t);
>>>
>>>
>>> INTERFACE PThreadC.i3
>>>
>>> PROCEDURE GetA(): UNTRACED REF Upthread.thread_t;
>>> PROCEDURE GetB(): UNTRACED REF Upthread.thread_cond_t;
>>> PROCEDURE GetC(): UNTRACED REF Upthread.thread_mutex_t;
>>>
>>> or possibly extern VAR
>>>
>>> PThreadC.c
>>>
>>> static pthread_t a = PTHREAD_INIT;
>>> static pthread_cond_t b = PTHREAD_COND_INIT;
>>> static pthread_mutex_t c = PTHREAD_MUTEX_INIT;
>>>
>>> pthread_t* GetA() { return &a; }
>>>
>>> pthread_cond_t* GetB() { return &b; }
>>>
>>> pthread_mutex_t* GetC() { return &c; }
>>>
>>> MODULE PThread;
>>> VAR
>>> a := PThreadC.GetA();
>>> b := PThreadC.GetB();
>>> c := PThreadC.GetA();
>>>
>>> PROCEDURE Foo() =
>>> BEGIN
>>> Upthread.pthread_thread_init_or_whatever(a^);
>>> Upthread.pthread_cond_init_or_whatever(b^);
>>> Upthread.pthread_mutex_init_or_whatever(c^);
>>> END Foo;
>>>
>>> or, again, possibly they are variables and it goes a little smaller/
>>> quicker:
>>>
>>> FROM UPthreadC IMPORT a, b, c;
>>>
>>>
>>> PROCEDURE Foo() =
>>> BEGIN
>>> Upthread.pthread_thread_init_or_whatever(a);
>>> Upthread.pthread_cond_init_or_whatever(b);
>>> Upthread.pthread_mutex_init_or_whatever(c);
>>> END Foo;
>>>
>>> I think that is pretty cut and dry, no controversy.
>>>
>>> What is less clear is what to do with non-statically allocated
>>> variables.
>>>
>>> Let's say:
>>>
>>> MODULE PThread;
>>>
>>> TYPE T = RECORD
>>> a:int;
>>> b:pthread_t;
>>> END;
>>>
>>> PROCEDURE CreateT():T=
>>> VAR
>>> t := NEW(T)
>>> BEGIN
>>> Upthread.init_or_whatever(t.b);
>>> RETURN t;
>>> END;
>>>
>>> PROCEDURE DisposeT(t:T)=
>>> BEGIN
>>> IF t = NIL THEN RETURN END;
>>> Upthread.pthread_cleanup_or_whatever(t.b);
>>> DISPOSE(t);
>>> END;
>>>
>>> The desire is something that does not know the size of pthread_t,
>>> something like:
>>>
>>> TYPE T = RECORD
>>> a:int;
>>> b:UNTRACED REF pthread_t;
>>> END;
>>>
>>>
>>> PROCEDURE CreateT():T=
>>> VAR
>>> t := NEW(T);
>>> BEGIN
>>> t.b := LOOPHOLE(UNTRACED REF pthread_t, NEW(UNTRACED REF ARRAY OF
>>> CHAR, Upthread.pthread_t_size));
>>> (* Though I really wanted t.b :=
>>> RTAllocator.MallocZeroed(Upthread.pthread_t_size); *)
>>> Upthread.init_or_whatever(t.b^);
>>> RETURN t;
>>> END;
>>>
>>> PROCEDURE DisposeT(t:T)=
>>> BEGIN
>>> IF t = NIL THEN RETURN END;
>>> Upthread.pthread_cleanup_or_whatever(t.b^);
>>> DISPOSE(t.b);
>>> DISPOSE(t);
>>> END;
>>>
>>>
>>> However that incurs an extra heap allocation, which is not great.
>>> In at least one place, the pointer-indirection-and-heap-allocation
>>> is already there
>>> so this isn't a deoptimization. However "reoptimizing" it might be
>>> nice.
>>>
>>>
>>> What I would prefer a pattern I often use in C -- merging
>>> allocations, something like,
>>> /assuming/ t is untraced, which I grant it might not be.
>>>
>>>
>>> And ensuring that BYTESIZE(T) is properly aligned:
>>>
>>>
>>> PROCEDURE CreateT():UNTRACED REF T=
>>> VAR
>>> p : ADDRESS;
>>> t : UNTRACED REF T;
>>> BEGIN
>>> (* Again I would prefer RTAllocator.MallocZeroed *)
>>> p := NEW(UNTRACED REF ARRAY OF CHAR, Upthread.pthread_t_size +
>>> BYTESIZE(T)));
>>> t := LOOPHOLE(UNTRACED REF T, p);
>>> t.b := LOOPHOLE(UNTRACED REF Upthread.pthread_t, p + BYTESIZE(T));
>>> Upthread.init_or_whatever(t.b^);
>>> RETURN t;
>>> END;
>>>
>>>
>>> That is -- opaque types, size not known at compile-time, but size
>>> known at runtime, and
>>> do not incur an extra heap allocation for lack of knowing sizes at
>>> compile-time.
>>>
>>>
>>> For the statically allocated variables I think there is no
>>> controversy.
>>> There might a tiny bit of overhead in the use, but it'd be very
>>> small, and possibly
>>> even removable in the future. I'd rather avoid the variables, as
>>> all writable
>>> data is to be avoided. Read only pages are better and all that,
>>> but ok..
>>>
>>>
>>> However the value is mainly realized only if statically and
>>> dynamically allocated variables are handled.
>>>
>>> The result of this would be further reduction in platform-
>>> specificity when cloning
>>> C headers into Modula-3 interfaces. i.e. less work to bring up new
>>> platforms.
>>>
>>>
>>> - Jay
>>>
>>>
>>> ----------------------------------------
>>>> From: hosking at cs.purdue.edu
>>>> To: jay.krell at cornell.edu
>>>> Date: Wed, 4 Feb 2009 09:54:01 +1100
>>>> CC: m3devel at elegosoft.com
>>>> Subject: Re: [M3devel] further reducing cloned headers wrt pthread?
>>>>
>>>> I suggest you come up with a proposal for us to look over before you
>>>> change the code base for this.
>>>>
>>>> On 4 Feb 2009, at 09:05, Jay wrote:
>>>>
>>>>>
>>>>>> Hmm, yes, you are right that there is a possible alignment issue. I
>>>>>> am used to pthread_mutext_t being a simple reference. But surely
>>>>>> in C
>>>>>> the type of the pthread_mutex_t struct would have appropriate
>>>>>> alignment padding anyway so as to allow allocation using
>>>>>> malloc(sizeof
>>>>>> pthread_mutex_t)? So, it all should just work right?
>>>>>
>>>>>
>>>>> I think "the other way around" and same conclusion.
>>>>> malloc should return something "maximally aligned" so that
>>>>>
>>>>> pthread_mutex_t* x = (pthread_mutex_t*)
>>>>> malloc(sizeof(pthread_mutex_t));
>>>>>
>>>>>
>>>>> works. pthread_mutex_t doesn't need the padding, malloc does, so to
>>>>> speak.
>>>>>
>>>>>
>>>>> Just as long as we don't have
>>>>>
>>>>>
>>>>> TYPE Foo = RECORD
>>>>> a: pthread_mutex_t;
>>>>> b: pthread_mutex_t;
>>>>> c: pthread_t;
>>>>> d: pthread_t;
>>>>> e: pthread_cond_t;
>>>>> f: pthread_cond_t;
>>>>> END;
>>>>>
>>>>>
>>>>> and such, ok.
>>>>>
>>>>>
>>>>> malloc on NT returns something with 2 * sizeof(void*) alignment.
>>>>> I think on Win9x only 4 alignment, thus there is _malloc_aligned for
>>>>> dealing with SSE stuff.
>>>>> Something like that.
>>>>>
>>>>>
>>>>> I didn't realize untraced allocations were basically just malloc but
>>>>> indeed they are.
>>>>>
>>>>>
>>>>> I'm still mulling over the possible deoptimizations here.
>>>>> I'm reluctant to increase heap allocations.
>>>>>
>>>>>
>>>>>
>>>>> - Jay
>>>>
>>