[M3devel] higher level m3cg?

Thu Aug 23 02:51:46 CEST 2012

Hi all:
I was reading an article and one a good consequence of Using a type safe like CLR, C-- would be be able to detect deadlocks with type inference.
http://arxiv.org/pdarf/1002.0942.pdf

So it can be good to do that for safety reasons, but at the cost of type inference time, which I think you wouldn't like too much for JIT compilers.
Thanks in advance

--- El mié, 22/8/12, Hendrik Boom <hendrik at topoi.pooq.com> escribió:

De: Hendrik Boom <hendrik at topoi.pooq.com>
Asunto: Re: [M3devel] higher level m3cg?
Para: m3devel at elegosoft.com
Fecha: miércoles, 22 de agosto, 2012 13:38

On Tue, Aug 21, 2012 at 01:18:48PM +0200, Dirk Muysers wrote:
> *** A warning ***
> Norman Ramsey's opinion (in stackoverflow) on possible compiler backends:
> 
> Code generation is my business :-)
> 
> Comments on a few options:
> 
>   a.. CLR: 
> 
>     a.. Pro: industrial support 
>     b.. Con: you have to buy into their type system pretty much completely; depending on what you want to do with types, this may not matter 
>     c.. Con: Only Windows platform is really prime-time quality
>   b.. LLVM:
> 
>     a.. Pro: enthusiastic user community with charismatic leader 
>     b.. Pro: serious backing from Apple 
>     c.. Pro: many interesting performance improvements 
>     d.. Con: somewhat complex interface 
>     e.. Con: history of holes in the engineering; as LLVM matures expect the holes in the engineering to be plugged by adding to the complexity of the interface

When I was investigating using LLVM for Algol 68, I ran into one problem 
that persuaded me to use C-- instead.  Pointer arithmetic is C's pointer 
arithmetic; that is adding an integer to a pointer automatically scales 
the integer by the size of the thing pointed to.  Now arrays in C++ have 
the property that the stride -- the offset from each element to the next 
-- in not neccarily a multiple of the item pointed to, so using LLVm's 
pointer arithmetic is pretty well excluded -- without kludhes like 
casting everything to void* and then casting it back again.  If LLVM 
does any kind of type-based optimization, this will effectively disable 
it.

I prefer to use things the way they were intended to be used, bot to 
abuse them and hope things still work.

Of course the same objections apply to C and C++.

There was also a problem with having to completely define every 
structure before you can even put any of its field names in the parse 
tree you can in principle generate in any order you wish.  This 
interfered with building run-time data structures whose full contents 
would not be known until compilation was complete.

>   c.. C--
> 
>     a.. Pro: target is an actual written language, not an API; you can easily inspect, debug, and edit your C-- code 
>     b.. Pro: design is reasonably mature and reasonably clean 
>     c.. Pro: supports accurate garbage collection 
>     d.. Pro: most users report it is very easy to use 
>     e.. Con: very small development team 
>     f.. Con: as of early 2009, supports only three hardware platforms (x86, PPC, ARM) 
>     g.. Con: does not ship with a garbage collector 
>     h.. Con: project has no future

C-- did allow unscaled pointer arithmetic.

C-- did not have any API calling order restrictions, because it didn't 
have an API.  Any define-before-use restrictions could be fudged by 
generating output as several files and then concatenating them 
afterward.  Of course this could be done with LLVM too.

About the garbage collector.  The world needs a robust garbage 
collector supports multicore systems on which collection is 
concurrent with execution.  The necessary synchronisation will cause 
performance loss, but there are applications where chunky 
garbage-colections delays are unaccepatable, bot general slowness is OK.

Of course such a collector could have compile-time (or maybe run-time) 
options whether to support expensive features for applications where 
they are not relevant.

>   d.. C as target language
> 
>     a.. Pro: looks easy 
>     b.. Con: nearly impossible to get decent performance 
>     c.. Con: will drive you nuts in the long run; ask the long line of people who have tried to compile Haskell, ML, Modula-3, Scheme and more using this technique. At some point every one of these people gave up and built their own native code generator.
> Summary: anything except C is a reasonable choice. For the best combination of flexibility, quality, and expected longevity, I'd probably recommend LLVM.
> 
> Full disclosure: I am affiliated with the C-- project.

Yay!

> 
>  
> 
> 
> From: Jay K 
> Sent: Thursday, August 16, 2012 4:21 PM
> To: m3devel 
> Subject: [M3devel] higher level m3cg?
> 
> 
> Should m3cg provide enough information for a backend to generate idiomatic C?
> (What is idiomatic C? e.g. I'm ignoring loop constructs and exception handlinh..)
> 
> 
> Should we make it so?
> 
> 
> Or be pragmatic and see if anyone gets to that point?
> 
> 
> But, look at this another way.
> Let's say we are keeping the gcc backend.
> 
> 
> Isn't it reasonable to have a better experience with stock gdb?
> 
> 
> What should m3cg look like then?
> 
> 
> Matching up m3front to gcc turns out to be "wierd".
> As does having a backend generate "C".
> 
> 
> In particular, "wierd" because there is a "level mismatch".
> 
> 
> m3cg presents a fairly low level view of the program.
>   It does layout. Global variables are stuffed into what you might call a "struct", with
> no assigned field names. Field references are done by adding to addresses and casting.
> 
> 
> Too low level to provide a "good" gcc tree representation or to generate "normal" C.
> 
> 
> One might be able to, by somewhat extraordinary means, make due.
> That is, specifically one could deduce field references from
> offsets/sizes. But maybe it is reasonable for load/store
> to include fields? Maybe in addition to what it provides?
> 
> 
> As well, it appears to me, that
> 
> 
> given TYPE Enum = {One, Two, Three};
> 
> the m3cg is like:
> 
> declare enum typeidblah
> declare enum_elt One
> declare enum_elt Two
> declare enum_elt Three
> declare_typename typeidblah Enum
> 
> 
> One kind of instead wants more like:
> 
> 
> declare enum typeidblah Enum
> declare enum_elt One => rename it Enum_One
> declare enum_elt Two ""
> declare enum_elt Three ""
> 
> 
> However I understand that {One, Two, Three} exists
> as anonymous type independent of the name "Enum".
> 
> 
> One could just as well have:
> given TYPE Enum1 = {One, Two, Three};
> given TYPE Enum2 = {One, Two, Three};
> 
> 
> Enum1 and Enum2 probably have the same typeid, and are just
> two typenames for the same type.
> 
> 
> likewise:
> given TYPE Enum1 = {One, Two, Three};
> given TYPE Enum2 = Enum1;
> 
> 
> but, pragmatically, in the interest of generating better C,
> can we pass a name along with declare_enum?
> 
> I ask somewhat rhetorically. I realize there is the answer:
>   enum Mtypeid { Mtypeid_One, Mtypeid_Two, Mtypeid_Three };
>   typedef enum Mtypeid Enum1;
> 
> 
> Also, enum variables I believe end up as just UINT8, 16, or 32.
> Loads of enum values I believe end up as just loads of integers.
> Can we pass along optional enum names with declare_local/declare_param?
> And optional enum names with load_int?
> Or add a separate load_enum call?
> 
> 
> Really, I understand that the current interface can be pressed to do
> pretty adequate things. I can infer field references. The way enums work
> isn't too bad.
> 
> 
>  - Jay 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20120823/e6dfa3ab/attachment-0002.html>