[M3devel] many matters big and small esp. wrt C backend
Tony Hosking
hosking at cs.purdue.edu
Thu Apr 11 23:12:42 CEST 2013
Hi Jay,
Is there any chance you could distill this stream of consciousness into an organized proposal of alternatives? I find it difficult to extract your precise proposals and arguments from this.
--Tony
Sent from my iPad
On Apr 12, 2013, at 4:57 AM, Jay K <jay.krell at cornell.edu> wrote:
> The C/C++ works, for a while now, and is improving nicely (wrt debuggability).
>
>
> Here are some current problems/dilemnas.
>
>
> --- getting pointer parameters correctly typed esp. on passing side ---
>
>
> Up until now, any pointer parameter to a function has been typed as char*.
>
>
> (I have some preference for char* over void* because 1) it is valid pre-ANSI 2) you can do
> math on them; however given that char* is usually wrong, void* actually debugs better,
> showing nothing instead of garbage. gcc does allow math on void* but it
> is an extension -- http://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html#Pointer-Arith;
> I should probably use void* with gcc and wherever the extension is supported...autoconf...)
> Anyway..
> Every pointer passed is cast to char*. This applies to VAR, READONLY, and records-by-value (more later).
> This works but is bad for debugging (again: void* is better than char*, but both are bad)
> I've been working on this.
>
>
> The interface to the backend is
> pop_param(cgtype)
> pop_struct(typeid, size)
>
>
> non-struct readonly and var parameters are just:
> pop_param(cgtype.addr)
>
>
> It would be nice if the frontend also passed a typeid here,
> and either using the typeid from a "declare_indirect" or "declare_pointer"
> or had separate booleans/flags for readonly/var.
>
>
> How about:
> TYPE ParameterMode = {Value (* or Normal or None? *), Var, ReadOnly};
> PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode);
>
>
> might as well merge this with pop_struct, no further change required probably.
>
> Or, at worst, another mode:
> TYPE ParameterMode = {Value (* or Normal or None? *), Var, ReadOnly, StructByValue};
> PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode; bitSize: BitSize);
>
>
> or less abstract:
> TYPE ParameterMode = {Value, Pointer, StructByValue};
> PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; mode: ParameterMode; bitSize: BitSize);
>
>
> or, again, if declare_indirect/declare_pointer is used to twiddle the typeuid, this suffices,
> I like it:
>
> (* bitSize, cgtype and typeid all imply size and agree, are somewhat redundant
> Backends without type checking can ignore typeid. e.g. NTx86.m3.
> bitSize is definitely redundant, but helps typeid-ignoring backends
> that implement struct-by-value themselves e.g. NTx86.m3 easily adapt.
> cgtype will be CGType.Addr for READONLY/VAR/ADDRESS/OBJECT/REF/TEXT, bitSize = sizeof(pointer)
> cgtype will be CGType.Struct for struct-by-value (size in bitSize)
> typeid will be declare_indirect/declare_pointer for READONLY/VAR (READONLY OBJECT?)
> typeid will NOT be to a declare_indirect/declare_pointer for struct-by-value *)
> PROCEDURE pop_param(cgtype: CGType; typeid: TypeUID; bitSize: BitSize);
>
>
> Ideally all backends would track typeids and it'd suffice to say:
> (* typeid will be declare_indirect/declare_pointer for READONLY/VAR (READONLY OBJECT?)
> typeid will NOT be to a declare_indirect/declare_pointer for struct-by-value *)
> PROCEDURE pop_param(typeid: TypeUID);
>
>
> but I don't see that happening soon. In reality, CGType could go away entirely. Not soon.
> (We'd need declare_integer(typeid, size, is_signed, is_word); declare_float(typeid, size),
> and maybe a few others for some pointer types..REFANY, TEXT, MUTEX, etc.)
>
>
> I haven't looked to see if this information (typeid/size for pop_param)
> is readily available in the frontend. I will do that soon.
>
>
> I have a few potential workarounds:
> cast to void*
> This appears to be working, with limited testing (my new test case, p254)
> Big drawback here is the code is no longer valid C++, only C.
> This is ok temporarily if I'm making improvements otherwise, but I really want to output valid C++.
> The function is still prototyped as taking stronger types, like INTEGER* or T1234* and C++
> doesn't allow conversion from void* to other pointer types without a cast.
>
>
> Introspect on the function pointer type and cast appropriately, or even not at all.
> This should provide the ideal output and is probably viable. I'll look into it later.
>
>
> Cast the function to (*)(...) for C++ or (*)() for C.
> This is kind of gross. Hopefully it is not a deoptimization, but it might be.
> I already do such casting for indirect function calls, for reasons to do with the static link.
> I'm going to try this next.
>
>
> --- to depend on C passing/returning structs/records by value, or to do the copying ourselves? ---
>
> Now a minor dilemna, not a problem.
>
>
> Up until recently, I didn't have much type information flowing through the C backend.
> Or specifically I was only using CGType and not TypeUID.
> I'm now at the point where TypeUIDs and "almost everything" about them is kept track of.
> (just some loose ends maybe around opaque types and object runtime type information.
>
>
> I had known record sizes, but not fields. Maybe this was a dilemna already before.
>
>
> Anyway, the point is, up until now, record passing and returning by value I have handled
> internally by passing around pointers and making copies as needed (at function start).
>
>
> I forget exactly how returning works, I'll deal with that later.
>
>
> Passing works as follows:
> caller passes pointer to record
> callee has a local variable of that type
> callee early on copies pointer to local variable, and references that thereafter
>
>
> This works and has almost no downside.
> It is likely how the C compiler implements things anyway.
> Except maybe for "small" records/structs. Some calling conventions
> do allow for passing structs/records by value in registers.
>
>
> Passing structs/records by value is relatively rare, so we probably don't care much.
>
>
> Nevertheless, my question is, if I should go ahead and use the underlying C/C++ feature of
> passing structs by value?
>
>
> There are multiple choices:
> - no, leave it alone
> - yes, change it unconditionally
> - leave it as a const or var in M3C.m3
> - make it #ifdefed in the output .c
>
>
> I think for returning, we have similar choices, but the frontend is willing
> to do the transform and currently does -- a matter of a boolean in Target.i3.
>
>
> This second question has equal quality debugging either way and needs no M3CG/frontend changes
> either way. (though, you know, frontend is willing to make more transforms for
> record return than record pass, I believe; it probably should be willing to do
> more of the work.)
>
>
> Very old compilers don't support passing structs by value?
> Or don't do it thread-safely, passing the value through a global? ORCA/C for Apple IIGS I think..
>
>
> --- #line directives or not? ---
>
>
> Third question that has been bugging me.
> The C backend can output #line directives.
> So you step through the Modula-3 source. What people expect.
> This was working, and probably still does.
> I turned it off subject to a constant in the backend.
> Currently I output "//line" instead of "#line". (subject to the constant, and yes, I know // isn't portable C)
>
> This is great for, during backend development/debugging, the C compiler gives me C line numbers.
> If the backend worked perfectly, this would be pointless.
> I debug stuff *a lot* (beyond Modula-3) and I am sensitive to anything that inhibits debugging in any way.
> There are bugs everywhere (I have seen them!) and everything needs to be debugged, both with logs and live.
>
>
> What to do to cater to both/everyone?
> I wish I could have multiple #line directives:
> #line 123 foo.m3.c 456 foo.m3
>
>
> but that doesn't exit.
> I could encode information in the file name:
> #line 123 "foo.m3.c/456 foo.m3"
>
>
> but that is imperfect; error messages will be good, but debugging won't work
>
>
> I could leave it as an #ifdef in the code.
> I do not believe the following works:
>
> #ifndef CLINE
> #define LINE(cline, cfile, m3line, m3file) cline cfile
> #else
> #define LINE(cline, cfile, m3line, m3file) m3line m3file
> #endif
> #line LINE(123, "foo.c", 456, "foo.m3")
>
>
> but I'll try it.
>
>
> I think the "best" ends up being to sprinkle in a steady stream of #ifdefs:
> #ifndef CLINE
> #line 456 "foo.m3" (* might need to adjust by 1 to account for #endif *)
> #endif
>
> This is bloated, but might be best.
>
>
> if "#define LINE" works, great, but I doubt it will.
>
>
> --- typeindex besides typeid? ---
>
>
> I'm now doing a lot of lookups of typeids.
> It'd be super nice if the frontend also maintained "small" incrementing
> typeIndices that I could use to index into an array.
>
>
> set_type_count(typeCount:CARDINAL); (* maybe *)
> declare_object/pointer/indirect/record/etc.(typeId: TypeUID; typeIndex: CARDINAL; ...);
>
>
> and thereafter, use typeIndex instead of typeId, an index into an array.
>
>
> I've been tempted to ask for just:
> declare_object/pointer/indirect/record/etc.(typeIndex: CARDINAL; ...);
>
>
> but I realize that preserving the structural hash id is likely too useful/important,
> either now or hypothetically.
>
>
> There is then the question as to if begin_unit/end_unit reset typeIndex.
> This somewhat depends on how the frontend works.
>
>
> At some point I'd like to try outputing one C file across multiple units,
> and add M3CG.begin_library, M3CG.end_library, M3CG.begin_program, M3CG.end_program,
> M3CG.import_library(static | dynamic | unknown),
> so the backend knows which units definitely link together,
> and guide ELF visibility/__declspec(dllimport,dllexport).
>
>
> Given that, typeIndices would not reset upon end_unit.
> There are challenges here, e.g. separate/incremental compilation.
> I would like to amortize C compiler startup, as well, all the type declarations
> would be shared across units, so the overall C source smaller.
> Computer memory is vastly larger today than when CM3 was written and compilation
> strategies have shifted significantly toward "whole program compilation".
> We could do similar in the C backend..or leave it to the C compiler to try.
>
>
> - Jay
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20130412/37fa075e/attachment-0002.html>
More information about the M3devel
mailing list