[M3devel] further reducing cloned headers wrt pthread?

Wed Feb 4 02:50:41 CET 2009

You need to be careful here w.r. to traced heap allocation, since  
objects can move.
It is not safe to return the address of any field of a traced object.

On 4 Feb 2009, at 11:35, Jay wrote:

>
> Addendum:
>  if size <= BYTESIZE(ADDRESS), which is common, it is desirable to  
> avoid the extra heap allocation and just use the space for the  
> pointer.
>
> You end up with like:
>
> TYPE T = RECORD
>   ...
>   pthread: UNTRACED REF pthread_t
> END;
>
>
> PROCEDURE GetPThread(T:t):UNTRACED REF pthread_t
> BEGIN
>  IF Upthread.pthread_t_size <= BYTESIZE(ADDRES)
>    RETURN LOOPHOLE(UNTRACED REF pthread_t, ADR(t.pthread));
>  ELSE
>    RETURN t.pthread;
> END GetPThread;
>
>
> - Jay
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ----------------------------------------
>> From: jay.krell at cornell.edu
>> To: hosking at cs.purdue.edu
>> Date: Wed, 4 Feb 2009 00:06:24 +0000
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] further reducing cloned headers wrt pthread?
>>
>>
>> There are a few possibilities:
>>
>>
>> Roughly:
>>
>> Where there is
>>
>> INTERFACE Upthread;
>>
>> TYPE
>> pthread_t = ... system specific ...
>> pthread_cond_t = ... system specific ...
>> pthread_mutex_t = ... system specific ...
>>
>> PROCEDURE pthread_thread_init_or_whatever(VAR pthread_t);
>> PROCEDURE pthread_mutex_init_or_whatever(VAR pthread_mutex_t);
>> PROCEDURE pthread_cond_init_or_whatever(VAR pthread_cond_t);
>>
>> MODULE PThread;
>> VAR
>> a: pthread_t;
>> b: pthread_cond_t;
>> c: pthread_mutex_t;
>>
>> PROCEDURE Foo() =
>> BEGIN
>> Upthread.pthread_thread_init_or_whatever(a);
>> Upthread.pthread_cond_init_or_whatever(b);
>> Upthread.pthread_mutex_init_or_whatever(c);
>> END Foo;
>>
>> change to:
>>
>> INTERFACE Upthread;
>>
>> TYPE
>> pthread_t = RECORD END; or whatever is correct for an opaque  
>> preferably unique type
>> pthread_cond_t = RECORD END; ditto
>> pthread_mutex_t = RECORD END; ditto
>>
>> PROCEDURE pthread_thread_init_or_whatever(VAR pthread_t);
>> PROCEDURE pthread_mutex_init_or_whatever(VAR pthread_mutex_t);
>> PROCEDURE pthread_cond_init_or_whatever(VAR pthread_cond_t);
>>
>>
>> INTERFACE PThreadC.i3
>>
>> PROCEDURE GetA(): UNTRACED REF Upthread.thread_t;
>> PROCEDURE GetB(): UNTRACED REF Upthread.thread_cond_t;
>> PROCEDURE GetC(): UNTRACED REF Upthread.thread_mutex_t;
>>
>> or possibly extern VAR
>>
>> PThreadC.c
>>
>> static pthread_t a = PTHREAD_INIT;
>> static pthread_cond_t b = PTHREAD_COND_INIT;
>> static pthread_mutex_t c = PTHREAD_MUTEX_INIT;
>>
>> pthread_t* GetA() { return &a; }
>>
>> pthread_cond_t* GetB() { return &b; }
>>
>> pthread_mutex_t* GetC() { return &c; }
>>
>> MODULE PThread;
>> VAR
>> a := PThreadC.GetA();
>> b := PThreadC.GetB();
>> c := PThreadC.GetA();
>>
>> PROCEDURE Foo() =
>> BEGIN
>> Upthread.pthread_thread_init_or_whatever(a^);
>> Upthread.pthread_cond_init_or_whatever(b^);
>> Upthread.pthread_mutex_init_or_whatever(c^);
>> END Foo;
>>
>> or, again, possibly they are variables and it goes a little smaller/ 
>> quicker:
>>
>> FROM UPthreadC IMPORT a, b, c;
>>
>>
>> PROCEDURE Foo() =
>> BEGIN
>> Upthread.pthread_thread_init_or_whatever(a);
>> Upthread.pthread_cond_init_or_whatever(b);
>> Upthread.pthread_mutex_init_or_whatever(c);
>> END Foo;
>>
>> I think that is pretty cut and dry, no controversy.
>>
>> What is less clear is what to do with non-statically allocated  
>> variables.
>>
>> Let's say:
>>
>> MODULE PThread;
>>
>> TYPE T = RECORD
>> a:int;
>> b:pthread_t;
>> END;
>>
>> PROCEDURE CreateT():T=
>> VAR
>> t := NEW(T)
>> BEGIN
>> Upthread.init_or_whatever(t.b);
>> RETURN t;
>> END;
>>
>> PROCEDURE DisposeT(t:T)=
>> BEGIN
>> IF t = NIL THEN RETURN END;
>> Upthread.pthread_cleanup_or_whatever(t.b);
>> DISPOSE(t);
>> END;
>>
>> The desire is something that does not know the size of pthread_t,  
>> something like:
>>
>> TYPE T = RECORD
>> a:int;
>> b:UNTRACED REF pthread_t;
>> END;
>>
>>
>> PROCEDURE CreateT():T=
>> VAR
>> t := NEW(T);
>> BEGIN
>> t.b := LOOPHOLE(UNTRACED REF pthread_t, NEW(UNTRACED REF ARRAY OF  
>> CHAR, Upthread.pthread_t_size));
>> (* Though I really wanted t.b :=  
>> RTAllocator.MallocZeroed(Upthread.pthread_t_size); *)
>> Upthread.init_or_whatever(t.b^);
>> RETURN t;
>> END;
>>
>> PROCEDURE DisposeT(t:T)=
>> BEGIN
>> IF t = NIL THEN RETURN END;
>> Upthread.pthread_cleanup_or_whatever(t.b^);
>> DISPOSE(t.b);
>> DISPOSE(t);
>> END;
>>
>>
>> However that incurs an extra heap allocation, which is not great.
>> In at least one place, the pointer-indirection-and-heap-allocation  
>> is already there
>> so this isn't a deoptimization. However "reoptimizing" it might be  
>> nice.
>>
>>
>> What I would prefer a pattern I often use in C -- merging  
>> allocations, something like,
>> /assuming/ t is untraced, which I grant it might not be.
>>
>>
>> And ensuring that BYTESIZE(T) is properly aligned:
>>
>>
>> PROCEDURE CreateT():UNTRACED REF T=
>> VAR
>> p : ADDRESS;
>> t : UNTRACED REF T;
>> BEGIN
>> (* Again I would prefer RTAllocator.MallocZeroed *)
>> p := NEW(UNTRACED REF ARRAY OF CHAR, Upthread.pthread_t_size +  
>> BYTESIZE(T)));
>> t := LOOPHOLE(UNTRACED REF T, p);
>> t.b := LOOPHOLE(UNTRACED REF Upthread.pthread_t, p + BYTESIZE(T));
>> Upthread.init_or_whatever(t.b^);
>> RETURN t;
>> END;
>>
>>
>> That is -- opaque types, size not known at compile-time, but size  
>> known at runtime, and
>> do not incur an extra heap allocation for lack of knowing sizes at  
>> compile-time.
>>
>>
>> For the statically allocated variables I think there is no  
>> controversy.
>> There might a tiny bit of overhead in the use, but it'd be very  
>> small, and possibly
>> even removable in the future. I'd rather avoid the variables, as  
>> all writable
>> data is to be avoided. Read only pages are better and all that, but  
>> ok..
>>
>>
>> However the value is mainly realized only if statically and  
>> dynamically allocated variables are handled.
>>
>> The result of this would be further reduction in platform- 
>> specificity when cloning
>> C headers into Modula-3 interfaces. i.e. less work to bring up new  
>> platforms.
>>
>>
>> - Jay
>>
>>
>> ----------------------------------------
>>> From: hosking at cs.purdue.edu
>>> To: jay.krell at cornell.edu
>>> Date: Wed, 4 Feb 2009 09:54:01 +1100
>>> CC: m3devel at elegosoft.com
>>> Subject: Re: [M3devel] further reducing cloned headers wrt pthread?
>>>
>>> I suggest you come up with a proposal for us to look over before you
>>> change the code base for this.
>>>
>>> On 4 Feb 2009, at 09:05, Jay wrote:
>>>
>>>>
>>>>> Hmm, yes, you are right that there is a possible alignment  
>>>>> issue. I
>>>>> am used to pthread_mutext_t being a simple reference. But surely  
>>>>> in C
>>>>> the type of the pthread_mutex_t struct would have appropriate
>>>>> alignment padding anyway so as to allow allocation using
>>>>> malloc(sizeof
>>>>> pthread_mutex_t)? So, it all should just work right?
>>>>
>>>>
>>>> I think "the other way around" and same conclusion.
>>>> malloc should return something "maximally aligned" so that
>>>>
>>>> pthread_mutex_t* x = (pthread_mutex_t*)
>>>> malloc(sizeof(pthread_mutex_t));
>>>>
>>>>
>>>> works. pthread_mutex_t doesn't need the padding, malloc does, so to
>>>> speak.
>>>>
>>>>
>>>> Just as long as we don't have
>>>>
>>>>
>>>> TYPE Foo = RECORD
>>>> a: pthread_mutex_t;
>>>> b: pthread_mutex_t;
>>>> c: pthread_t;
>>>> d: pthread_t;
>>>> e: pthread_cond_t;
>>>> f: pthread_cond_t;
>>>> END;
>>>>
>>>>
>>>> and such, ok.
>>>>
>>>>
>>>> malloc on NT returns something with 2 * sizeof(void*) alignment.
>>>> I think on Win9x only 4 alignment, thus there is _malloc_aligned  
>>>> for
>>>> dealing with SSE stuff.
>>>> Something like that.
>>>>
>>>>
>>>> I didn't realize untraced allocations were basically just malloc  
>>>> but
>>>> indeed they are.
>>>>
>>>>
>>>> I'm still mulling over the possible deoptimizations here.
>>>> I'm reluctant to increase heap allocations.
>>>>
>>>>
>>>>
>>>> - Jay
>>>