[M3devel] many matters big and small esp. wrt C backend

Jay K jay.krell at cornell.edu
Fri Apr 12 00:19:23 CEST 2013


It is a mix of proposals and what-do-people-prefer-among-working-options andLet me try again. Just some of it.
 1) Stronger typing on pop_param.
 Given
INTERFACE I;
PROCEDURE A(VAR a:INTEGER);
PROCEDURE B() VAR b; BEGIN A(b); END B; 
I want
  void I__A(INTEGER* a);  
  void I__B() { INTEGER b; A__I(&b) or A__I((INTEGER*)&b);  }  
unnecessary casts are ok.  This is not directly supported in M3CG:
  current: M3CG.pop_param(type := M3CG.Addr); (* Addr of what ? *)    but it might be indirectly supported, i.e. if I look at the function type. 
  Currently, on my machine, I cast to void*, but that isn't valid C++, only C.
  I might be able to cast the function itself to "untyped" and get away with it, but
  that is kind ugly too.
  proposal something like:   M3CG.pop_param(CGType, TypeUID);    but, furthermore, pop_struct doesn't need to be separate, so just:    M3CG.pop_param(CGType, TypeUID, BitSize);    and remove pop_struct.    Much longer term, merely:    M3CG.pop_param(TypeUID); 
 suffices. CGType is generally redundant with TypeUIDs, however 
 existing backends ignore TypeUIDs and get by with CGType.   2) Preference for how to handle passing structs by value in C backend? 
There are two obvious choices.
I'm doing it "manually" because, perhaps, I didn't have good type information.
I have good type information now, so I can use the C/C++ feature of passing structs
by value, instead of passing a pointer and copying into a local. 
Works either way.
No M3CG interface change.  3) Small dense TypeIndexes mostly-but-not-entirely in place of TypeUIDs.   TypeUIDs imply a lot of "lookups" in the backend. Slow seeming.
 It'd be nice if we had a "linear" TypeIndex as well, that could be indices into a "full" array.  
Proposal: TypeIndex = CARDINAL; (* index into an array *) (* a separate typename here isn't all that valuable *) 
 M3CG.declare_typeid or declare_typeindex or declare_type(TypeUID, TypeIndex);
 and possibly
  M3CG.declare_type_count(CARDINAL); (* maximum value of TypeIndex + 1, backends can allocate
  arrays of this size and the index by later TypeIndex *)
 TypeIndexes should take on the values roughly [0..N] where N is the number
of types in the "program" (or unit..)
and then replace TypeUID everywhere else with TypeIndex. 
Depending on how the frontend flows, it might not be able to compute TypeCount early enough.
That is, I don't know if m3cg calls happen "during compilation" or only "at the end". 
As well, this does imply likely the same perf/lookups in the frontend.
Moving rather than eliminating cost.
However, it'd save it from multiple backends, and the frontend might already be paying this cost,
I haven't looked yet. 
It is ok and works today, but I'd really rather have "small" dense integers that can index into
an array than "random" integers that I'm forced to use something like a hash table or binary
search a sorted array for. 
Thanks,
 - Jay



 CC: m3devel at elegosoft.com
From: hosking at cs.purdue.edu
Subject: Re: [M3devel] many matters big and small esp. wrt C backend
Date: Fri, 12 Apr 2013 07:12:42 +1000
To: jay.krell at cornell.edu

Hi Jay,
Is there any chance you could distill this stream of consciousness into an organized proposal of alternatives?  I find it difficult to extract your precise proposals and arguments from this.
--Tony

Sent from my iPad
On Apr 12, 2013, at 4:57 AM, Jay K <jay.krell at cornell.edu> wrote:




 The C/C++ works, for a while now, and is improving nicely (wrt debuggability). 
 

 Here are some current problems/dilemnas. 
 

  --- getting pointer parameters correctly typed esp. on passing side --- 
 
 
 Up until now, any pointer parameter to a function has been typed as char*.  
 

 (I have some preference for char* over void* because 1) it is valid pre-ANSI 2) you can do
  math on them; however given that char* is usually wrong, void* actually debugs better,
  showing nothing instead of garbage. gcc does allow math on void* but it
  is an extension -- http://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html#Pointer-Arith;
  I should probably use void* with gcc and wherever the extension is supported...autoconf...)
 Anyway..
 Every pointer passed is cast to char*. This applies to VAR, READONLY, and records-by-value (more later). 
 This works but is bad for debugging (again: void* is better than char*, but both are bad) 
 I've been working on this. 
 

 The interface to the backend is 
  pop_param(cgtype)  
  pop_struct(typeid, size)  
 

 non-struct readonly and var parameters are just: 
   pop_param(cgtype.addr)
 

 It would be nice if the frontend also passed a typeid here, 
 and either using the typeid from a "declare_indirect" or "declare_pointer" 
 or had separate booleans/flags for readonly/var. 
 

 How about: 
  TYPE ParameterMode = {Value (* or Normal or None? *), Var, ReadOnly};   
  PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode);  
 
 
 might as well merge this with pop_struct, no further change required probably.

 Or, at worst, another mode: 
  TYPE ParameterMode = {Value (* or Normal or None? *), Var, ReadOnly, StructByValue};   
  PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode; bitSize: BitSize);  
 
 
 or less abstract:
  TYPE ParameterMode = {Value, Pointer, StructByValue}; 
  PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode; bitSize: BitSize);  
 
 
 or, again, if declare_indirect/declare_pointer is used to twiddle the typeuid, this suffices,
 I like it:

  (* bitSize, cgtype and typeid all imply size and agree, are somewhat redundant
     Backends without type checking can ignore typeid. e.g. NTx86.m3.
     bitSize is definitely redundant, but helps typeid-ignoring backends
       that implement struct-by-value themselves e.g. NTx86.m3 easily adapt. 
     cgtype will be CGType.Addr for READONLY/VAR/ADDRESS/OBJECT/REF/TEXT, bitSize = sizeof(pointer)
     cgtype will be CGType.Struct for struct-by-value (size in bitSize) 
     typeid will be declare_indirect/declare_pointer for READONLY/VAR (READONLY OBJECT?) 
     typeid will NOT be to a declare_indirect/declare_pointer for struct-by-value *) 
  PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; bitSize: BitSize);  
 

 Ideally all backends would track typeids and it'd suffice to say: 
  (* typeid will be declare_indirect/declare_pointer for READONLY/VAR (READONLY OBJECT?) 
     typeid will NOT be to a declare_indirect/declare_pointer for struct-by-value *) 
  PROCEDURE pop_param(typeid: TypeUID);  

 
 but I don't see that happening soon. In reality, CGType could go away entirely. Not soon.
 (We'd need declare_integer(typeid, size, is_signed, is_word); declare_float(typeid, size),
 and maybe a few others for some pointer types..REFANY, TEXT, MUTEX, etc.)
 

 I haven't looked to see if this information (typeid/size for pop_param)
 is readily available in the frontend.  I will do that soon. 
 

 I have a few potential workarounds: 
 cast to void*
  This appears to be working, with limited testing (my new test case, p254)
  Big drawback here is the code is no longer valid C++, only C.
  This is ok temporarily if I'm making improvements otherwise, but I really want to output valid C++.
  The function is still prototyped as taking stronger types, like INTEGER* or T1234* and C++
   doesn't allow conversion from void* to other pointer types without a cast.
 

  Introspect on the function pointer type and cast appropriately, or even not at all. 
    This should provide the ideal output and is probably viable. I'll look into it later. 
 

 Cast the function to (*)(...) for C++ or (*)() for C.
    This is kind of gross. Hopefully it is not a deoptimization, but it might be.  
    I already do such casting for indirect function calls, for reasons to do with the static link.  
    I'm going to try this next.
 
 
  --- to depend on C passing/returning structs/records by value, or to do the copying ourselves?  --- 

 Now a minor dilemna, not a problem.  

 
Up until recently, I didn't have much type information flowing through the C backend.
Or specifically I was only using CGType and not TypeUID.
I'm now at the point where TypeUIDs and "almost everything" about them is kept track of.
(just some loose ends maybe around opaque types and object runtime type information.
 

 I had known record sizes, but not fields. Maybe this was a dilemna already before. 
 

 Anyway, the point is, up until now, record passing and returning by value I have handled 
 internally by passing around pointers and making copies as needed (at function start). 
 

 I forget exactly how returning works, I'll deal with that later. 
 

 Passing works as follows: 
   caller passes pointer to record  
   callee has a local variable of that type  
   callee early on copies pointer to local variable, and references that thereafter  
 

 This works and has almost no downside. 
 It is likely how the C compiler implements things anyway. 
 Except maybe for "small" records/structs. Some calling conventions 
 do allow for passing structs/records by value in registers. 

 
 Passing structs/records by value is relatively rare, so we probably don't care much.
 

 Nevertheless, my question is, if I should go ahead and use the underlying C/C++ feature of
  passing structs by value?
 

 There are multiple choices: 
   - no, leave it alone 
   - yes, change it unconditionally 
   - leave it as a const or var in M3C.m3 
   - make it #ifdefed in the output .c 
 

 I think for returning, we have similar choices, but the frontend is willing
 to do the transform and currently does -- a matter of a boolean in Target.i3.
 

 This second question has equal quality debugging either way and needs no M3CG/frontend changes
  either way. (though, you know, frontend is willing to make more transforms for
  record return than record pass, I believe; it probably should be willing to do
  more of the work.) 
 
 
 Very old compilers don't support passing structs by value?  
 Or don't do it thread-safely, passing the value through a global? ORCA/C for Apple IIGS I think.. 
 

  --- #line directives or not? --- 
 
 
 Third question that has been bugging me.
 The C backend can output #line directives.
 So you step through the Modula-3 source. What people expect.
 This was working, and probably still does.
   I turned it off subject to a constant in the backend. 
 Currently I output "//line" instead of "#line". (subject to the constant, and yes, I know // isn't portable C) 

 This is great for, during backend development/debugging, the C compiler gives me C line numbers.
 If the backend worked perfectly, this would be pointless.
 I debug stuff *a lot* (beyond Modula-3) and I am sensitive to anything that inhibits debugging in any way.
 There are bugs everywhere (I have seen them!) and everything needs to be debugged, both with logs and live. 
 
 
 What to do to cater to both/everyone? 
 I wish I could have multiple #line directives: 
  #line 123 foo.m3.c 456 foo.m3 
 

 but that doesn't exit. 
 I could encode information in the file name:
  #line 123 "foo.m3.c/456 foo.m3" 

 
 but that is imperfect; error messages will be good, but debugging won't work
 

 I could leave it as an #ifdef in the code. 
 I do not believe the following works: 

  #ifndef CLINE  
  #define LINE(cline, cfile, m3line, m3file) cline cfile 
  #else 
  #define LINE(cline, cfile, m3line, m3file) m3line m3file 
  #endif 
  #line LINE(123, "foo.c", 456, "foo.m3") 
 

 but I'll try it. 
 

 I think the "best" ends up being to sprinkle in a steady stream of #ifdefs: 
  #ifndef CLINE  
  #line 456 "foo.m3" (* might need to adjust by 1 to account for #endif *)   
  #endif  

  This is bloated, but might be best.  
 

  if "#define LINE" works, great, but I doubt it will. 
 

 --- typeindex besides typeid? --- 
 

I'm now doing a lot of lookups of typeids.
It'd be super nice if the frontend also maintained "small" incrementing
typeIndices that I could use to index into an array.
 

 set_type_count(typeCount:CARDINAL); (* maybe *) 
 declare_object/pointer/indirect/record/etc.(typeId: TypeUID; typeIndex: CARDINAL; ...); 
 

 and thereafter, use typeIndex instead of typeId, an index into an array. 
 

 I've been tempted to ask for just: 
 declare_object/pointer/indirect/record/etc.(typeIndex: CARDINAL; ...); 

 
 but I realize that preserving the structural hash id is likely too useful/important,
 either now or hypothetically.

 
There is then the question as to if begin_unit/end_unit reset typeIndex.
This somewhat depends on how the frontend works.

 
At some point I'd like to try outputing one C file across multiple units,
and add M3CG.begin_library, M3CG.end_library, M3CG.begin_program, M3CG.end_program,
M3CG.import_library(static | dynamic | unknown),
so the backend knows which units definitely link together,
and guide ELF visibility/__declspec(dllimport,dllexport).
 

 Given that, typeIndices would not reset upon end_unit. 
 There are challenges here, e.g. separate/incremental compilation.
 I would like to amortize C compiler startup, as well, all the type declarations
 would be shared across units, so the overall C source smaller.
 Computer memory is vastly larger today than when CM3 was written and compilation
 strategies have shifted significantly toward "whole program compilation".
 We could do similar in the C backend..or leave it to the C compiler to try.
 

 - Jay


 		 	   		  
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20130411/61ceec09/attachment-0002.html>


More information about the M3devel mailing list