[M3devel] many matters big and small esp. wrt C backend

Tony Hosking hosking at cs.purdue.edu
Thu Apr 11 23:12:42 CEST 2013


Hi Jay,

Is there any chance you could distill this stream of consciousness into an organized proposal of alternatives?  I find it difficult to extract your precise proposals and arguments from this.

--Tony

Sent from my iPad

On Apr 12, 2013, at 4:57 AM, Jay K <jay.krell at cornell.edu> wrote:

>  The C/C++ works, for a while now, and is improving nicely (wrt debuggability). 
>  
> 
>  Here are some current problems/dilemnas. 
>  
> 
>   --- getting pointer parameters correctly typed esp. on passing side --- 
>  
>  
>  Up until now, any pointer parameter to a function has been typed as char*.  
>  
> 
>  (I have some preference for char* over void* because 1) it is valid pre-ANSI 2) you can do
>   math on them; however given that char* is usually wrong, void* actually debugs better,
>   showing nothing instead of garbage. gcc does allow math on void* but it
>   is an extension -- http://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html#Pointer-Arith;
>   I should probably use void* with gcc and wherever the extension is supported...autoconf...)
>  Anyway..
>  Every pointer passed is cast to char*. This applies to VAR, READONLY, and records-by-value (more later). 
>  This works but is bad for debugging (again: void* is better than char*, but both are bad) 
>  I've been working on this. 
>  
> 
>  The interface to the backend is 
>   pop_param(cgtype)  
>   pop_struct(typeid, size)  
>  
> 
>  non-struct readonly and var parameters are just: 
>    pop_param(cgtype.addr)
>  
> 
>  It would be nice if the frontend also passed a typeid here, 
>  and either using the typeid from a "declare_indirect" or "declare_pointer" 
>  or had separate booleans/flags for readonly/var. 
>  
> 
>  How about: 
>   TYPE ParameterMode = {Value (* or Normal or None? *), Var, ReadOnly};   
>   PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode);  
>  
>  
>  might as well merge this with pop_struct, no further change required probably.
> 
>  Or, at worst, another mode: 
>   TYPE ParameterMode = {Value (* or Normal or None? *), Var, ReadOnly, StructByValue};   
>   PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode; bitSize: BitSize);  
>  
>  
>  or less abstract:
>   TYPE ParameterMode = {Value, Pointer, StructByValue}; 
>   PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode; bitSize: BitSize);  
>  
>  
>  or, again, if declare_indirect/declare_pointer is used to twiddle the typeuid, this suffices,
>  I like it:
> 
>   (* bitSize, cgtype and typeid all imply size and agree, are somewhat redundant
>      Backends without type checking can ignore typeid. e.g. NTx86.m3.
>      bitSize is definitely redundant, but helps typeid-ignoring backends
>        that implement struct-by-value themselves e.g. NTx86.m3 easily adapt. 
>      cgtype will be CGType.Addr for READONLY/VAR/ADDRESS/OBJECT/REF/TEXT, bitSize = sizeof(pointer)
>      cgtype will be CGType.Struct for struct-by-value (size in bitSize) 
>      typeid will be declare_indirect/declare_pointer for READONLY/VAR (READONLY OBJECT?) 
>      typeid will NOT be to a declare_indirect/declare_pointer for struct-by-value *) 
>   PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; bitSize: BitSize);  
>  
> 
>  Ideally all backends would track typeids and it'd suffice to say: 
>   (* typeid will be declare_indirect/declare_pointer for READONLY/VAR (READONLY OBJECT?) 
>      typeid will NOT be to a declare_indirect/declare_pointer for struct-by-value *) 
>   PROCEDURE pop_param(typeid: TypeUID);  
> 
>  
>  but I don't see that happening soon. In reality, CGType could go away entirely. Not soon.
>  (We'd need declare_integer(typeid, size, is_signed, is_word); declare_float(typeid, size),
>  and maybe a few others for some pointer types..REFANY, TEXT, MUTEX, etc.)
>  
> 
>  I haven't looked to see if this information (typeid/size for pop_param)
>  is readily available in the frontend.  I will do that soon. 
>  
> 
>  I have a few potential workarounds: 
>  cast to void*
>   This appears to be working, with limited testing (my new test case, p254)
>   Big drawback here is the code is no longer valid C++, only C.
>   This is ok temporarily if I'm making improvements otherwise, but I really want to output valid C++.
>   The function is still prototyped as taking stronger types, like INTEGER* or T1234* and C++
>    doesn't allow conversion from void* to other pointer types without a cast.
>  
> 
>   Introspect on the function pointer type and cast appropriately, or even not at all. 
>     This should provide the ideal output and is probably viable. I'll look into it later. 
>  
> 
>  Cast the function to (*)(...) for C++ or (*)() for C.
>     This is kind of gross. Hopefully it is not a deoptimization, but it might be.  
>     I already do such casting for indirect function calls, for reasons to do with the static link.  
>     I'm going to try this next.
>  
>  
>   --- to depend on C passing/returning structs/records by value, or to do the copying ourselves?  --- 
> 
>  Now a minor dilemna, not a problem.  
> 
>  
> Up until recently, I didn't have much type information flowing through the C backend.
> Or specifically I was only using CGType and not TypeUID.
> I'm now at the point where TypeUIDs and "almost everything" about them is kept track of.
> (just some loose ends maybe around opaque types and object runtime type information.
>  
> 
>  I had known record sizes, but not fields. Maybe this was a dilemna already before. 
>  
> 
>  Anyway, the point is, up until now, record passing and returning by value I have handled 
>  internally by passing around pointers and making copies as needed (at function start). 
>  
> 
>  I forget exactly how returning works, I'll deal with that later. 
>  
> 
>  Passing works as follows: 
>    caller passes pointer to record  
>    callee has a local variable of that type  
>    callee early on copies pointer to local variable, and references that thereafter  
>  
> 
>  This works and has almost no downside. 
>  It is likely how the C compiler implements things anyway. 
>  Except maybe for "small" records/structs. Some calling conventions 
>  do allow for passing structs/records by value in registers. 
> 
>  
>  Passing structs/records by value is relatively rare, so we probably don't care much.
>  
> 
>  Nevertheless, my question is, if I should go ahead and use the underlying C/C++ feature of
>   passing structs by value?
>  
> 
>  There are multiple choices: 
>    - no, leave it alone 
>    - yes, change it unconditionally 
>    - leave it as a const or var in M3C.m3 
>    - make it #ifdefed in the output .c 
>  
> 
>  I think for returning, we have similar choices, but the frontend is willing
>  to do the transform and currently does -- a matter of a boolean in Target.i3.
>  
> 
>  This second question has equal quality debugging either way and needs no M3CG/frontend changes
>   either way. (though, you know, frontend is willing to make more transforms for
>   record return than record pass, I believe; it probably should be willing to do
>   more of the work.) 
>  
>  
>  Very old compilers don't support passing structs by value?  
>  Or don't do it thread-safely, passing the value through a global? ORCA/C for Apple IIGS I think.. 
>  
> 
>   --- #line directives or not? --- 
>  
>  
>  Third question that has been bugging me.
>  The C backend can output #line directives.
>  So you step through the Modula-3 source. What people expect.
>  This was working, and probably still does.
>    I turned it off subject to a constant in the backend. 
>  Currently I output "//line" instead of "#line". (subject to the constant, and yes, I know // isn't portable C) 
> 
>  This is great for, during backend development/debugging, the C compiler gives me C line numbers.
>  If the backend worked perfectly, this would be pointless.
>  I debug stuff *a lot* (beyond Modula-3) and I am sensitive to anything that inhibits debugging in any way.
>  There are bugs everywhere (I have seen them!) and everything needs to be debugged, both with logs and live. 
>  
>  
>  What to do to cater to both/everyone? 
>  I wish I could have multiple #line directives: 
>   #line 123 foo.m3.c 456 foo.m3 
>  
> 
>  but that doesn't exit. 
>  I could encode information in the file name:
>   #line 123 "foo.m3.c/456 foo.m3" 
> 
>  
>  but that is imperfect; error messages will be good, but debugging won't work
>  
> 
>  I could leave it as an #ifdef in the code. 
>  I do not believe the following works: 
> 
>   #ifndef CLINE  
>   #define LINE(cline, cfile, m3line, m3file) cline cfile 
>   #else 
>   #define LINE(cline, cfile, m3line, m3file) m3line m3file 
>   #endif 
>   #line LINE(123, "foo.c", 456, "foo.m3") 
>  
> 
>  but I'll try it. 
>  
> 
>  I think the "best" ends up being to sprinkle in a steady stream of #ifdefs: 
>   #ifndef CLINE  
>   #line 456 "foo.m3" (* might need to adjust by 1 to account for #endif *)   
>   #endif  
> 
>   This is bloated, but might be best.  
>  
> 
>   if "#define LINE" works, great, but I doubt it will. 
>  
> 
>  --- typeindex besides typeid? --- 
>  
> 
> I'm now doing a lot of lookups of typeids.
> It'd be super nice if the frontend also maintained "small" incrementing
> typeIndices that I could use to index into an array.
>  
> 
>  set_type_count(typeCount:CARDINAL); (* maybe *) 
>  declare_object/pointer/indirect/record/etc.(typeId: TypeUID; typeIndex: CARDINAL; ...); 
>  
> 
>  and thereafter, use typeIndex instead of typeId, an index into an array. 
>  
> 
>  I've been tempted to ask for just: 
>  declare_object/pointer/indirect/record/etc.(typeIndex: CARDINAL; ...); 
> 
>  
>  but I realize that preserving the structural hash id is likely too useful/important,
>  either now or hypothetically.
> 
>  
> There is then the question as to if begin_unit/end_unit reset typeIndex.
> This somewhat depends on how the frontend works.
> 
>  
> At some point I'd like to try outputing one C file across multiple units,
> and add M3CG.begin_library, M3CG.end_library, M3CG.begin_program, M3CG.end_program,
> M3CG.import_library(static | dynamic | unknown),
> so the backend knows which units definitely link together,
> and guide ELF visibility/__declspec(dllimport,dllexport).
>  
> 
>  Given that, typeIndices would not reset upon end_unit. 
>  There are challenges here, e.g. separate/incremental compilation.
>  I would like to amortize C compiler startup, as well, all the type declarations
>  would be shared across units, so the overall C source smaller.
>  Computer memory is vastly larger today than when CM3 was written and compilation
>  strategies have shifted significantly toward "whole program compilation".
>  We could do similar in the C backend..or leave it to the C compiler to try.
>  
> 
>  - Jay
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20130412/37fa075e/attachment-0002.html>


More information about the M3devel mailing list