[M3devel] gcc 4.5.0?

Tony Hosking hosking at cs.purdue.edu
Fri May 21 19:08:02 CEST 2010


We don't want to bypass the gcc support.
It works well with register allocation, etc.


On 21 May 2010, at 03:41, Jay K wrote:

> 
> Of course. I tried to read it as a teenager and got stuck on the automata stuff.
> 
> 
> Wirth isn't 100% clear to me.
> He describes my two schemes.
> I think he throws out the second as an unimportant optimization.
> He criticizes the first as inefficient, granted.
> What I don't get is the static link in addition to the dynamic link.
> Perhaps I have not accounted for recursion.
> 
> 
> Oh..I see..he is allowing to take the address of a nested function.
> I didn't account for that.
> I still don't understand all the arrow he draws for the static link, but
> I do see that my scheme doesn't suffice.
> Taking the address of a nested function..doesn't seem like a good idea.
> Maybe I'm too C-minded.
> 
> 
> My prototypical favorite Scheme example is:
> 
> 
> (define (make-adder x)
>   (define (add y) (+ x y))
>   add)
> 
> 
> (define add2 (make-adder 2))
> (add2 3) => 5
> 
> 
> but Modula-3 doesn't allow for that anyway..
> 
> Still, we make "closures": tuple{-1, function pointer, frame pointer}
> before calling a pointer, we check if it points to -1, and if so
> we call function pointer with frame pointer as the link.
> 
> 
> And then I guess, you might have something like this?
> 
> 
> 
> int F1(int a, int b, int (*F)())
> {
>  int F2()
>  {
>    return a + b;
>  }
> 
>  if (F != NULL)
>   return F();
> 
>  return F1(a - 1, b - 1, &F2);
> 
> }
> 
> 
> F(1, 2, NULL) => 3
> 
> 
> Though that doesn't demonstrate the generality perhaps.
> 
> 
> Taking the address of a nested function seems dubious.
> 
> 
> My "favorite" bit of Schema:
> 
> (define (make-adder x)
>   (define (add y) (+ x y))
>   add)
> 
> 
> (define add2 (make-adder 2))
> 
> (add2 3) => 3
> 
> 
> But we don't allow that sort of thing -- frames are always stack allocated.
> 
> 
> 
> - Jay
> 
> 
> ----------------------------------------
>> To: jay.krell at cornell.edu
>> Date: Thu, 20 May 2010 23:45:22 -0700
>> From: mika at async.async.caltech.edu
>> CC: m3devel at elegosoft.com
>> Subject: Re: [M3devel] gcc 4.5.0?
>> 
>> The Dragon Book is
>> 
>> Aho, Sethi, Ullman. Compilers: Principles, Techniques, and Tools.
>> Addison-Wesley, 1988. Chapter 7, esp. Sec. 7.4.
>> 
>> Mika
>> 
>> Jay K writes:
>>> 
>>> Thanks for the reference!
>>> =20
>>> search for "Good Ideas=2C Through the Looking Glass"
>>> finds it right away: "http://www.cs.inf.ethz.ch/~wirth/Articles/GoodIdeas_o=
>>> rigFig.pdf"
>>> (I don't need to provide the link really.)
>>> =20
>>> - Jay
>>> 
>>> ----------------------------------------
>>>> To: jay.krell at cornell.edu
>>>> Date: Thu=2C 20 May 2010 22:37:56 -0700
>>>> From: mika at async.async.caltech.edu
>>>> CC: m3devel at elegosoft.com
>>>> Subject: Re: [M3devel] gcc 4.5.0?
>>>> 
>>>> 
>>>> As I recall it the "Dragon Book" has a whole section on this topic of
>>>> nested functions in Pascal. Including Dijkstra's "display". I'm not
>>>> an expert=2C just remember reading it. Wirth also talks about it in his
>>>> "Good Ideas=2C Through the Looking Glass" (where he says the display
>>>> might be a pessimization).
>>>> 
>>>> Mika
>>>> 
>>>> Jay K writes:
>>>>> 
>>>>> Maybe we can/should transform into one of the forms I show and dispense w=
>>> it=3D
>>>>> h any gcc support?
>>>>> At the cost of losing some "easy" optimization..of nested functions?
>>>>> "easy" as in=3D2C occurs even without -O2=3D2C but might occur with -O2?
>>>>> =3D20
>>>>> =3D20
>>>>> I know what you mean -- NT386 passes the static link in ecx.
>>>>> This is "ok" because in C++ the "this" pointer is passed in ecx.
>>>>> =3D20
>>>>> =3D20
>>>>> Passing it as "just" another parameter might be less efficient=3D2C but o=
>>> k?
>>>>> And really=3D2C it'd only be less efficient on one architecture -- 32bit=
>>> x8=3D
>>>>> 6.
>>>>> And maybe not all x86s -- depending on if some ABIs have a register base=
>>> d=3D
>>>>> calling convention.
>>>>> =3D20
>>>>> =3D20
>>>>> We don't need any actual ABI-compatibility with anything here=3D2C do we?
>>>>> =3D20
>>>>> =3D20
>>>>> - Jay
>>>>> 
>>>>> 
>>>>> ----------------------------------------
>>>>>> From: hosking at cs.purdue.edu
>>>>>> Date: Thu=3D2C 20 May 2010 22:22:46 -0400
>>>>>> To: jay.krell at cornell.edu
>>>>>> CC: m3devel at elegosoft.com
>>>>>> Subject: Re: [M3devel] gcc 4.5.0?
>>>>>> 
>>>>>> That is the standard formulation=3D2C but different architectures have =
>>> diff=3D
>>>>> erent optimised forms (passing static link in a register=3D2C etc.). In a=
>>> ny c=3D
>>>>> ase=3D2C gcc has its own weird way of doing things=3D2C which we mostly u=
>>> se=3D2C =3D
>>>>> but from which we need a little extra support (hence the small amount of =
>>> sp=3D
>>>>> ecial tailoring).
>>>>>> 
>>>>>> On 20 May 2010=3D2C at 21:43=3D2C Jay K wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Wierd. To me the model to follow is almost obvious. It should be obvio=
>>> us=3D
>>>>> to everyone and implemented the way I have in mind. :)
>>>>>>> 
>>>>>>> 
>>>>>>> The static link is an extra parameter.
>>>>>>> And=3D2C as such=3D2C also an extra local variable.
>>>>>>> You only need one.
>>>>>>> Each nesting level would follow another chain.
>>>>>>> Or you can pass each nesting level as an extra parameter.
>>>>>>> Or you can optimize and pass locals/parameters separately.
>>>>>>> But there is an obvious straightforward non-optimized form. ?
>>>>>>> 
>>>>>>> 
>>>>>>> Here=3D2C I worked it through:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Why doesn't it just look something like this:
>>>>>>> 
>>>>>>> 
>>>>>>> nested:
>>>>>>> 
>>>>>>> void F1(int a=3D2C int b)
>>>>>>> {
>>>>>>> int c =3D3D a + b=3D3B
>>>>>>> 
>>>>>>> int f2()
>>>>>>> {
>>>>>>> int d =3D3D a + b=3D3B
>>>>>>> 
>>>>>>> int f3()
>>>>>>> {
>>>>>>> return d + a + c=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> return a + c + f3()=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> return f2()=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> not nested:
>>>>>>> 
>>>>>>> 
>>>>>>> typedef struct {
>>>>>>> /* locals */
>>>>>>> int d=3D3B
>>>>>>> void* x86_frame_pointer=3D3B
>>>>>>> void* x86_return_address=3D3B
>>>>>>> /* parameters */
>>>>>>> f1_frame_t* f1=3D3B
>>>>>>> } f2_frame_t=3D3B
>>>>>>> 
>>>>>>> 
>>>>>>> typedef struct {
>>>>>>> /* locals */
>>>>>>> int c=3D3B
>>>>>>> void* x86_frame_pointer=3D3B
>>>>>>> void* x86_return_address=3D3B
>>>>>>> /* parameters */
>>>>>>> int a=3D3B
>>>>>>> int b=3D3B
>>>>>>> } f1_frame_t=3D3B
>>>>>>> 
>>>>>>> 
>>>>>>> int f3(f2_frame_t* f2)
>>>>>>> {
>>>>>>> return f2->d + f2->f1->a + f2->f1->c=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> int f2(f1_frame_t* f1)
>>>>>>> {
>>>>>>> int d =3D3D f1->a + f1->b=3D3B
>>>>>>> return f1->a + f1->c + f3((f2_frame_t*)&d)=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> int F1(int a=3D2C int b)
>>>>>>> {
>>>>>>> int c =3D3D a + b=3D3B
>>>>>>> return f2((f1_frame_t*)&c)=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> or=3D2C alernatively=3D2C almost the same=3D2C pass each chain as anot=
>>> her para=3D
>>>>> meter:
>>>>>>> 
>>>>>>> 
>>>>>>> typedef struct {
>>>>>>> /* locals */
>>>>>>> int d=3D3B
>>>>>>> void* x86_frame_pointer=3D3B
>>>>>>> void* x86_return_address=3D3B
>>>>>>> /* parameters */
>>>>>>> } f2_frame_t=3D3B
>>>>>>> 
>>>>>>> 
>>>>>>> typedef struct {
>>>>>>> /* locals */
>>>>>>> int c=3D3B
>>>>>>> void* x86_frame_pointer=3D3B
>>>>>>> void* x86_return_address=3D3B
>>>>>>> /* parameters */
>>>>>>> int a=3D3B
>>>>>>> int b=3D3B
>>>>>>> } f1_frame_t=3D3B
>>>>>>> 
>>>>>>> 
>>>>>>> int f3(f2_frame_t* f2=3D2C f1_frame_t* f1)
>>>>>>> {
>>>>>>> return f2->d + f1->a + f1->c=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> int f2(f1_frame_t* f1)
>>>>>>> {
>>>>>>> int d =3D3D f1->a + f1->b=3D3B
>>>>>>> return f1->a + f1->c + f3((f2_frame_t*)&d=3D2C f1)=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> int F1(int a=3D2C int b)
>>>>>>> {
>>>>>>> int c =3D3D a + b=3D3B
>>>>>>> return f2((f1_frame_t*)&c)=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> This should be easily adapted to any function call convention/sequence=
>>> .
>>>>>>> e.g. even if parameters are passed in registers.
>>>>>>> 
>>>>>>> 
>>>>>>> And for that matter=3D2C why don't we just transform it to be like thi=
>>> s?
>>>>>>> With the goal of reducing/eliminating gcc patches?
>>>>>>> 
>>>>>>> 
>>>>>>> Am I missing something?
>>>>>>> 
>>>>>>> 
>>>>>>> This form doesn't work?
>>>>>>> 
>>>>>>> 
>>>>>>> Isn't reasonably simple?
>>>>>>> 
>>>>>>> 
>>>>>>> Granted it might not be optimal. But it depends.
>>>>>>> You know=3D2C you can copy in the local/parameter values=3D2C but then=
>>> you h=3D
>>>>> ave
>>>>>>> to copy them out too. You can do analysis to show they aren't changed=
>>> =3D2C
>>>>>>> in which case no need to copy them out.
>>>>>>> Similarly you can pass the parameters as separate parameters=3D2C by a=
>>> ddre=3D
>>>>> ss
>>>>>>> if they are ever changed.
>>>>>>> 
>>>>>>> 
>>>>>>> A portable transform couldn't depend on adjacency of locals and parame=
>>> te=3D
>>>>> rs.
>>>>>>> It would have to either copy the parameters into the frame_t=3D2C or t=
>>> heir=3D
>>>>> addresses.
>>>>>>> 
>>>>>>> 
>>>>>>> A portable form couldn't get the frame by taking the address of a "ind=
>>> iv=3D
>>>>> idual" local.
>>>>>>> 
>>>>>>> 
>>>>>>> Here is portable form:
>>>>>>> 
>>>>>>> 
>>>>>>> typedef struct {
>>>>>>> int d=3D3B
>>>>>>> f1_frame_t* f1=3D3B
>>>>>>> } f2_frame_t=3D3B
>>>>>>> 
>>>>>>> typedef struct {
>>>>>>> int a=3D3B
>>>>>>> int b=3D3B
>>>>>>> int c=3D3B
>>>>>>> } f1_frame_t=3D3B
>>>>>>> 
>>>>>>> 
>>>>>>> int f3(f2_frame_t* f2)
>>>>>>> {
>>>>>>> return f2->d + f2->f1->a + f2->f1->c=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> int f2(f1_frame_t* f1)
>>>>>>> {
>>>>>>> f2_frame_t f=3D3B
>>>>>>> f.f1 =3D3D f1=3D3B
>>>>>>> f.d =3D3D f1->a + f1->b=3D3B
>>>>>>> return f1->a + f1->c + f3(&f)=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> int F1(int a=3D2C int b)
>>>>>>> {
>>>>>>> f1_frame_t f=3D3B
>>>>>>> f.a =3D3D a=3D3B
>>>>>>> f.b =3D3D b=3D3B
>>>>>>> f.c =3D3D a + b=3D3B
>>>>>>> return f2(&f)=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> or:
>>>>>>> 
>>>>>>> 
>>>>>>> typedef struct {
>>>>>>> int d=3D3B
>>>>>>> } f2_frame_t=3D3B
>>>>>>> 
>>>>>>> 
>>>>>>> typedef struct {
>>>>>>> int a=3D3B
>>>>>>> int b=3D3B
>>>>>>> int c=3D3B
>>>>>>> } f1_frame_t=3D3B
>>>>>>> 
>>>>>>> 
>>>>>>> int f3(f2_frame_t* f2=3D2C f1_frame_t* f1)
>>>>>>> {
>>>>>>> return f2->d + f1->a + f1->c=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> int f2(f1_frame_t* f1)
>>>>>>> {
>>>>>>> f2_frame_t f=3D3B
>>>>>>> f.d =3D3D f1->a + f1->b=3D3B
>>>>>>> return f1->a + f1->c + f3(&f=3D2C f1)=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> int F1(int a=3D2C int b)
>>>>>>> {
>>>>>>> f1_frame_t f=3D3B
>>>>>>> f.a =3D3D a=3D3B
>>>>>>> f.b =3D3D b=3D3B
>>>>>>> f.c =3D3D a + b=3D3B
>>>>>>> return f2(&f)=3D3B
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> - Jay
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ----------------------------------------
>>>>>>>> Date: Thu=3D2C 20 May 2010 12:12:00 -0500
>>>>>>>> From: rodney_bates at lcwb.coop
>>>>>>>> To: m3devel at elegosoft.com
>>>>>>>> Subject: Re: [M3devel] gcc 4.5.0?
>>>>>>>> 
>>>>>>>> gcc even extends C with nested functions=3D2C and the support is ther=
>>> e fo=3D
>>>>> r that. We use it too.
>>>>>>>> 
>>>>>>>> Beware: The runtime model is=3D2C IMO=3D2C bizarre. Definitely counte=
>>> rintui=3D
>>>>> tive and unlike anything
>>>>>>>> you will ever see elsewhere. There are two static links. One is used =
>>> fo=3D
>>>>> r the first step
>>>>>>>> in following a static chain and points to the expected place in the t=
>>> ar=3D
>>>>> get frame. The other
>>>>>>>> is used of subsequent steps in chain-following=3D2C and points to som=
>>> ewhe=3D
>>>>> re inside the target
>>>>>>>> frame that is dependent on the target procedure. It's the beginning o=
>>> f =3D
>>>>> a subrecord where
>>>>>>>> all the nonlocally-referenced variables are collected. The second sta=
>>> ti=3D
>>>>> c link is also
>>>>>>>> located in there. All the nonlocally-referenced variables and paramet=
>>> er=3D
>>>>> s are duplicated
>>>>>>>> in there too.
>>>>>>>> 
>>>>>>>> I'm not sure I remembered these details exactly right. Maybe both lin=
>>> ks=3D
>>>>> point to the
>>>>>>>> subrecord=3D2C but one is located at a traditional procedure-independ=
>>> ent=3D
>>>>> =3D2C fixed location
>>>>>>>> in the frame=3D2C while the second is located in the subrecord. Also=
>>> =3D2C s=3D
>>>>> ometimes the first
>>>>>>>> static link value is kept in a register and never stored.
>>>>>>>> 
>>>>>>>> What this all apparently accomplishes is simplifying an "insanely com=
>>> pl=3D
>>>>> icated" _compile-time_
>>>>>>>> scheme that=3D2C I believe came from interaction between nested proce=
>>> dure=3D
>>>>> s and inlining them.
>>>>>>>> 
>>>>>>>> But it's at the cost of what I would call an insanely complicated _ru=
>>> nt=3D
>>>>> ime_ scheme.
>>>>>>>> 
>>>>>>>> It undermined m3gdb's handling of static links=3D2C and I don't think=
>>> it'=3D
>>>>> s possible to fix
>>>>>>>> with the current design of emitting debug info very early. The locati=
>>> on=3D
>>>>> s and target
>>>>>>>> offsets of these things aren't know until later=3D2C and m3gdb really=
>>> nee=3D
>>>>> ds to know them.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jay K wrote:
>>>>>>>>> What I meant regarding "static chain" was more like "does gcc alread=
>>> y =3D
>>>>> have support that we can use".
>>>>>>>>> Don't any of the other languages need it already?
>>>>>>>>> Ada? Pascal? Maybe the nested C function support?
>>>>>>>>> 
>>>>>>>>> - Jay
>>>>>>>>> 
>>>>>> =3D = 		 	   		  




More information about the M3devel mailing list