[M3devel] many matters big and small esp. wrt C backend
Jay K
jay.krell at cornell.edu
Thu Apr 11 20:57:53 CEST 2013
The C/C++ works, for a while now, and is improving nicely (wrt debuggability).
Here are some current problems/dilemnas.
--- getting pointer parameters correctly typed esp. on passing side ---
Up until now, any pointer parameter to a function has been typed as char*.
(I have some preference for char* over void* because 1) it is valid pre-ANSI 2) you can do
math on them; however given that char* is usually wrong, void* actually debugs better,
showing nothing instead of garbage. gcc does allow math on void* but it
is an extension -- http://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html#Pointer-Arith;
I should probably use void* with gcc and wherever the extension is supported...autoconf...) Anyway..
Every pointer passed is cast to char*. This applies to VAR, READONLY, and records-by-value (more later).
This works but is bad for debugging (again: void* is better than char*, but both are bad)
I've been working on this.
The interface to the backend is
pop_param(cgtype)
pop_struct(typeid, size)
non-struct readonly and var parameters are just:
pop_param(cgtype.addr)
It would be nice if the frontend also passed a typeid here,
and either using the typeid from a "declare_indirect" or "declare_pointer"
or had separate booleans/flags for readonly/var.
How about:
TYPE ParameterMode = {Value (* or Normal or None? *), Var, ReadOnly};
PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode); might as well merge this with pop_struct, no further change required probably.
Or, at worst, another mode: TYPE ParameterMode = {Value (* or Normal or None? *), Var, ReadOnly, StructByValue};
PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode; bitSize: BitSize); or less abstract: TYPE ParameterMode = {Value, Pointer, StructByValue};
PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode; bitSize: BitSize); or, again, if declare_indirect/declare_pointer is used to twiddle the typeuid, this suffices,
I like it:
(* bitSize, cgtype and typeid all imply size and agree, are somewhat redundant
Backends without type checking can ignore typeid. e.g. NTx86.m3.
bitSize is definitely redundant, but helps typeid-ignoring backends
that implement struct-by-value themselves e.g. NTx86.m3 easily adapt.
cgtype will be CGType.Addr for READONLY/VAR/ADDRESS/OBJECT/REF/TEXT, bitSize = sizeof(pointer)
cgtype will be CGType.Struct for struct-by-value (size in bitSize)
typeid will be declare_indirect/declare_pointer for READONLY/VAR (READONLY OBJECT?)
typeid will NOT be to a declare_indirect/declare_pointer for struct-by-value *)
PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; bitSize: BitSize);
Ideally all backends would track typeids and it'd suffice to say:
(* typeid will be declare_indirect/declare_pointer for READONLY/VAR (READONLY OBJECT?)
typeid will NOT be to a declare_indirect/declare_pointer for struct-by-value *)
PROCEDURE pop_param(typeid: TypeUID);
but I don't see that happening soon. In reality, CGType could go away entirely. Not soon.
(We'd need declare_integer(typeid, size, is_signed, is_word); declare_float(typeid, size),
and maybe a few others for some pointer types..REFANY, TEXT, MUTEX, etc.)
I haven't looked to see if this information (typeid/size for pop_param)
is readily available in the frontend. I will do that soon.
I have a few potential workarounds: cast to void*
This appears to be working, with limited testing (my new test case, p254)
Big drawback here is the code is no longer valid C++, only C.
This is ok temporarily if I'm making improvements otherwise, but I really want to output valid C++.
The function is still prototyped as taking stronger types, like INTEGER* or T1234* and C++
doesn't allow conversion from void* to other pointer types without a cast.
Introspect on the function pointer type and cast appropriately, or even not at all.
This should provide the ideal output and is probably viable. I'll look into it later.
Cast the function to (*)(...) for C++ or (*)() for C.
This is kind of gross. Hopefully it is not a deoptimization, but it might be.
I already do such casting for indirect function calls, for reasons to do with the static link.
I'm going to try this next. --- to depend on C passing/returning structs/records by value, or to do the copying ourselves? ---
Now a minor dilemna, not a problem.
Up until recently, I didn't have much type information flowing through the C backend.
Or specifically I was only using CGType and not TypeUID.
I'm now at the point where TypeUIDs and "almost everything" about them is kept track of.
(just some loose ends maybe around opaque types and object runtime type information.
I had known record sizes, but not fields. Maybe this was a dilemna already before.
Anyway, the point is, up until now, record passing and returning by value I have handled
internally by passing around pointers and making copies as needed (at function start).
I forget exactly how returning works, I'll deal with that later.
Passing works as follows:
caller passes pointer to record
callee has a local variable of that type
callee early on copies pointer to local variable, and references that thereafter
This works and has almost no downside.
It is likely how the C compiler implements things anyway.
Except maybe for "small" records/structs. Some calling conventions
do allow for passing structs/records by value in registers.
Passing structs/records by value is relatively rare, so we probably don't care much.
Nevertheless, my question is, if I should go ahead and use the underlying C/C++ feature of
passing structs by value?
There are multiple choices:
- no, leave it alone
- yes, change it unconditionally
- leave it as a const or var in M3C.m3
- make it #ifdefed in the output .c
I think for returning, we have similar choices, but the frontend is willing
to do the transform and currently does -- a matter of a boolean in Target.i3.
This second question has equal quality debugging either way and needs no M3CG/frontend changes
either way. (though, you know, frontend is willing to make more transforms for
record return than record pass, I believe; it probably should be willing to do
more of the work.) Very old compilers don't support passing structs by value? Or don't do it thread-safely, passing the value through a global? ORCA/C for Apple IIGS I think..
--- #line directives or not? --- Third question that has been bugging me.
The C backend can output #line directives.
So you step through the Modula-3 source. What people expect.
This was working, and probably still does.
I turned it off subject to a constant in the backend.
Currently I output "//line" instead of "#line". (subject to the constant, and yes, I know // isn't portable C)
This is great for, during backend development/debugging, the C compiler gives me C line numbers.
If the backend worked perfectly, this would be pointless.
I debug stuff *a lot* (beyond Modula-3) and I am sensitive to anything that inhibits debugging in any way.
There are bugs everywhere (I have seen them!) and everything needs to be debugged, both with logs and live.
What to do to cater to both/everyone?
I wish I could have multiple #line directives:
#line 123 foo.m3.c 456 foo.m3
but that doesn't exit. I could encode information in the file name:
#line 123 "foo.m3.c/456 foo.m3"
but that is imperfect; error messages will be good, but debugging won't work
I could leave it as an #ifdef in the code.
I do not believe the following works:
#ifndef CLINE
#define LINE(cline, cfile, m3line, m3file) cline cfile
#else
#define LINE(cline, cfile, m3line, m3file) m3line m3file
#endif #line LINE(123, "foo.c", 456, "foo.m3")
but I'll try it.
I think the "best" ends up being to sprinkle in a steady stream of #ifdefs: #ifndef CLINE
#line 456 "foo.m3" (* might need to adjust by 1 to account for #endif *)
#endif
This is bloated, but might be best.
if "#define LINE" works, great, but I doubt it will.
--- typeindex besides typeid? ---
I'm now doing a lot of lookups of typeids.
It'd be super nice if the frontend also maintained "small" incrementing
typeIndices that I could use to index into an array.
set_type_count(typeCount:CARDINAL); (* maybe *)
declare_object/pointer/indirect/record/etc.(typeId: TypeUID; typeIndex: CARDINAL; ...);
and thereafter, use typeIndex instead of typeId, an index into an array.
I've been tempted to ask for just: declare_object/pointer/indirect/record/etc.(typeIndex: CARDINAL; ...);
but I realize that preserving the structural hash id is likely too useful/important,
either now or hypothetically.
There is then the question as to if begin_unit/end_unit reset typeIndex.
This somewhat depends on how the frontend works.
At some point I'd like to try outputing one C file across multiple units,
and add M3CG.begin_library, M3CG.end_library, M3CG.begin_program, M3CG.end_program,
M3CG.import_library(static | dynamic | unknown),
so the backend knows which units definitely link together,
and guide ELF visibility/__declspec(dllimport,dllexport).
Given that, typeIndices would not reset upon end_unit.
There are challenges here, e.g. separate/incremental compilation.
I would like to amortize C compiler startup, as well, all the type declarations
would be shared across units, so the overall C source smaller.
Computer memory is vastly larger today than when CM3 was written and compilation
strategies have shifted significantly toward "whole program compilation".
We could do similar in the C backend..or leave it to the C compiler to try.
- Jay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20130411/f2e41dce/attachment-0001.html>
More information about the M3devel
mailing list