[M3devel] higher level m3cg?
Hendrik Boom
hendrik at topoi.pooq.com
Wed Aug 22 20:38:52 CEST 2012
On Tue, Aug 21, 2012 at 01:18:48PM +0200, Dirk Muysers wrote:
> *** A warning ***
> Norman Ramsey's opinion (in stackoverflow) on possible compiler backends:
>
> Code generation is my business :-)
>
> Comments on a few options:
>
> a.. CLR:
>
> a.. Pro: industrial support
> b.. Con: you have to buy into their type system pretty much completely; depending on what you want to do with types, this may not matter
> c.. Con: Only Windows platform is really prime-time quality
> b.. LLVM:
>
> a.. Pro: enthusiastic user community with charismatic leader
> b.. Pro: serious backing from Apple
> c.. Pro: many interesting performance improvements
> d.. Con: somewhat complex interface
> e.. Con: history of holes in the engineering; as LLVM matures expect the holes in the engineering to be plugged by adding to the complexity of the interface
When I was investigating using LLVM for Algol 68, I ran into one problem
that persuaded me to use C-- instead. Pointer arithmetic is C's pointer
arithmetic; that is adding an integer to a pointer automatically scales
the integer by the size of the thing pointed to. Now arrays in C++ have
the property that the stride -- the offset from each element to the next
-- in not neccarily a multiple of the item pointed to, so using LLVm's
pointer arithmetic is pretty well excluded -- without kludhes like
casting everything to void* and then casting it back again. If LLVM
does any kind of type-based optimization, this will effectively disable
it.
I prefer to use things the way they were intended to be used, bot to
abuse them and hope things still work.
Of course the same objections apply to C and C++.
There was also a problem with having to completely define every
structure before you can even put any of its field names in the parse
tree you can in principle generate in any order you wish. This
interfered with building run-time data structures whose full contents
would not be known until compilation was complete.
> c.. C--
>
> a.. Pro: target is an actual written language, not an API; you can easily inspect, debug, and edit your C-- code
> b.. Pro: design is reasonably mature and reasonably clean
> c.. Pro: supports accurate garbage collection
> d.. Pro: most users report it is very easy to use
> e.. Con: very small development team
> f.. Con: as of early 2009, supports only three hardware platforms (x86, PPC, ARM)
> g.. Con: does not ship with a garbage collector
> h.. Con: project has no future
C-- did allow unscaled pointer arithmetic.
C-- did not have any API calling order restrictions, because it didn't
have an API. Any define-before-use restrictions could be fudged by
generating output as several files and then concatenating them
afterward. Of course this could be done with LLVM too.
About the garbage collector. The world needs a robust garbage
collector supports multicore systems on which collection is
concurrent with execution. The necessary synchronisation will cause
performance loss, but there are applications where chunky
garbage-colections delays are unaccepatable, bot general slowness is OK.
Of course such a collector could have compile-time (or maybe run-time)
options whether to support expensive features for applications where
they are not relevant.
> d.. C as target language
>
> a.. Pro: looks easy
> b.. Con: nearly impossible to get decent performance
> c.. Con: will drive you nuts in the long run; ask the long line of people who have tried to compile Haskell, ML, Modula-3, Scheme and more using this technique. At some point every one of these people gave up and built their own native code generator.
> Summary: anything except C is a reasonable choice. For the best combination of flexibility, quality, and expected longevity, I'd probably recommend LLVM.
>
> Full disclosure: I am affiliated with the C-- project.
Yay!
>
>
>
>
> From: Jay K
> Sent: Thursday, August 16, 2012 4:21 PM
> To: m3devel
> Subject: [M3devel] higher level m3cg?
>
>
> Should m3cg provide enough information for a backend to generate idiomatic C?
> (What is idiomatic C? e.g. I'm ignoring loop constructs and exception handlinh..)
>
>
> Should we make it so?
>
>
> Or be pragmatic and see if anyone gets to that point?
>
>
> But, look at this another way.
> Let's say we are keeping the gcc backend.
>
>
> Isn't it reasonable to have a better experience with stock gdb?
>
>
> What should m3cg look like then?
>
>
> Matching up m3front to gcc turns out to be "wierd".
> As does having a backend generate "C".
>
>
> In particular, "wierd" because there is a "level mismatch".
>
>
> m3cg presents a fairly low level view of the program.
> It does layout. Global variables are stuffed into what you might call a "struct", with
> no assigned field names. Field references are done by adding to addresses and casting.
>
>
> Too low level to provide a "good" gcc tree representation or to generate "normal" C.
>
>
> One might be able to, by somewhat extraordinary means, make due.
> That is, specifically one could deduce field references from
> offsets/sizes. But maybe it is reasonable for load/store
> to include fields? Maybe in addition to what it provides?
>
>
> As well, it appears to me, that
>
>
> given TYPE Enum = {One, Two, Three};
>
> the m3cg is like:
>
> declare enum typeidblah
> declare enum_elt One
> declare enum_elt Two
> declare enum_elt Three
> declare_typename typeidblah Enum
>
>
> One kind of instead wants more like:
>
>
> declare enum typeidblah Enum
> declare enum_elt One => rename it Enum_One
> declare enum_elt Two ""
> declare enum_elt Three ""
>
>
> However I understand that {One, Two, Three} exists
> as anonymous type independent of the name "Enum".
>
>
> One could just as well have:
> given TYPE Enum1 = {One, Two, Three};
> given TYPE Enum2 = {One, Two, Three};
>
>
> Enum1 and Enum2 probably have the same typeid, and are just
> two typenames for the same type.
>
>
> likewise:
> given TYPE Enum1 = {One, Two, Three};
> given TYPE Enum2 = Enum1;
>
>
> but, pragmatically, in the interest of generating better C,
> can we pass a name along with declare_enum?
>
> I ask somewhat rhetorically. I realize there is the answer:
> enum Mtypeid { Mtypeid_One, Mtypeid_Two, Mtypeid_Three };
> typedef enum Mtypeid Enum1;
>
>
> Also, enum variables I believe end up as just UINT8, 16, or 32.
> Loads of enum values I believe end up as just loads of integers.
> Can we pass along optional enum names with declare_local/declare_param?
> And optional enum names with load_int?
> Or add a separate load_enum call?
>
>
> Really, I understand that the current interface can be pressed to do
> pretty adequate things. I can infer field references. The way enums work
> isn't too bad.
>
>
> - Jay
More information about the M3devel
mailing list