<html>

<head>

<style><!--

.hmmessage P

{

margin:0px;

padding:0px

}

body.hmmessage

{

font-size: 10pt;

font-family:Verdana

}

--></style>

</head>

<body class='hmmessage'>

Simple and fast does not entirely preclude multiple passes, and one pass does not equate to simple, though yes probably fast.<BR>

The front end already makes multiple passes I believe, and is still fast.<BR>

 <BR>

 <BR>

Sometimes decomposition into more smaller pieces (passes, layers, etc.) make things simpler.<BR>

  Albeit slower.<BR>

 <BR>

 <BR>

A more historical factor here is that multiple passes imply keeping more data in memory longer.<BR>

 Implies slower.<BR>

 However memory capacity has increased tremendously.<BR>

 <BR>

 <BR>

Whereas I'm only thinking about multiple passes, while still compiling one file at a time,<BR>

other compilation systems have capitalized on larger memories and now compile entire<BR>

programs at once, including multiple passes.<BR>

  This goes by multiple names -- "whole program optimization", "link time codegen (LTCG)".<BR>

   And enables even more aggressive but still "fairly simple" optimization, such as cross<BR>

   module inlining, custom calling conventions. I say "fairly simple", because these<BR>

   are just the same optimizations one would perform within a module, merely applied<BR>

   across module boundaries.<BR>

 <BR>

 <BR>

(Sometimes what people do is #include all .c files into one master .c file, to help<BR>

get the compiler to see everything at once. sqllite does this, for example.)<BR>

 <BR>

 <BR>

Surely we can afford multiple passes on a per-file basis.<BR>

?<BR>

 <BR>

 <BR>

The code is already not that easy to understand imho.<BR>

  It took me quite some time to get comfortable and add the 64 bit integer support, years.<BR>

  And there's a few things I still don't understand (you can tell from my comments<BR>

  in the code, like where do I need to call "newdest")?<BR>

 <BR>

 <BR>

Embrace it and make it incrementally harder to understand (by virtue of simply adding to it),<BR>

or decide it is stuck asis?<BR>

 <BR>

 <BR>

I'm not entirely sure.<BR>

 <BR>

 <BR>

While it is an interesting compromise in its current form, I'm fairly certain<BR>

there are other viable compromises that achieve better code quality, are<BR>

still plenty fast, and aren't much more difficult to understand.<BR>

 <BR>

 <BR>

Heck, I found it confusing at first that the -debug output is wrong in some places.<BR>

That is a fallout from it being single pass "plus patching".<BR>

The debug output doesn't include the "patches".<BR>

 <BR>

 <BR>

I have some very simple small optimizations in mind.<BR>

 <BR>

 <BR>

And then, in stepping through code, and writing code, inlining constantly comes to mind.<BR>

In fact, I frequently see "manual partial inlining" in the Modula-3 code.<BR>

It'd be nifty if the Modula-3 code could have less duplication, relying on the compiler more.<BR>

 <BR>

 <BR>

I don't see inlining as entirely simple, but I've given it a little thought and it might not<BR>

 be so hard. I think it can be done, roughly, by taking the body of a procedure <BR>

as expressed as cg.i3 calls, omitting the begin/procedure, and just "playing them back"<BR>

at the call site. You'd only do this for "small" functions. You'd have to make some<BR>

adjustments like merging local variables and exception handling, though really<BR>

the inlined function would just be a "block".<BR>

 <BR>

 <BR>

Probably the way to go here, if inlining is really to be considered, is first<BR>

implement it as "one pass plus some memory", and only inline when<BR>

the definition comes before the call. Once that is achieved, no small feat,<BR>

further considering can be made to allow for inlining when the call precedes the definition.<BR>

 <BR>

eh, sorry, I'm tried, rambling.<BR>

 <BR>

 <BR>

 - Jay<BR> <BR>> Date: Thu, 1 Apr 2010 09:28:53 -0500<BR>> From: rodney_bates@lcwb.coop<BR>> CC: m3devel@elegosoft.com<BR>> Subject: Re: [M3devel] changing m3middle/m3back to have multiple passes?<BR>> <BR>> <BR>> <BR>> Jay K wrote:<BR>> > I'm thinking m3back could use multiple passes.<BR>> <BR>> This would be a big reversal of the fundamental design philosophy of this<BR>> back end. Its primary design criterion was to be simple and fast. That<BR>> means single-pass. There have been several compilers (often whole compilers,<BR>> not just code generators) written to this principle. For some purposes,<BR>> it is a good thing.<BR>> <BR>> This, of course, inevitably means the generated code is not as small/fast<BR>> as it could be. It's part of the trade off. We have two points on this<BR>> trade off scale now. I think we want to keep the one-pass code generator.<BR>> If you change it, at least make a branch and keep the original design<BR>> easily accessible.<BR>> <BR>> <BR>> > <BR>> > <BR>> > There a few reasons, things that would be easier/possible if it had them.<BR>> > <BR>> > <BR>> > I know adding passes will slow it down, but I suspect it will still be<BR>> > about as pleasantly fast as it is now.<BR>> > <BR>> > <BR>> > The optimizations I have in mind:<BR>> > inlining, including where function definitions come after the call<BR>> > I think inlining where the "small" functions come first is doable in <BR>> > pass,<BR>> > but not if uses come before calls.<BR>> > <BR>> > <BR>> > not saving unused registers<BR>> > This only a small optimization and could probably be done in one<BR>> > pass if m3objfile got a "memmove" operation. Well, I already<BR>> > have the optimization in somewhat, but it involves nop-ing out<BR>> > code instead of removing it completely.<BR>> > <BR>> > <BR>> > dead store removal, and then removing the resulting dead code to <BR>> > compute the value<BR>> > <BR>> > <BR>> > possibly better register allocation -- e.g. based on noticing which<BR>> > variables are used often, or based on how they are used, for example<BR>> > if a variable (or expression/temporary) ends serving a shift count,<BR>> > we should favor putting it in ecx up front, instead of moving it to<BR>> > ecx later, but we don't generally know this till too late<BR>> > <BR>> > <BR>> > As I understand, m3middle already has the notion of "recording"<BR>> > and "playing back" calls to the M3CG interface.<BR>> > That seems like a good starting point?<BR>> > <BR>> > <BR>> > Some optimizations can be made via "CG to CG" transforms.<BR>> > <BR>> > <BR>> > However I think in general I think the required design would include:<BR>> > - ability to add in "private operations"; maybe<BR>> > - ability to add "private data" to existing operations<BR>> > A very simple one is annotate functions with required frame size.<BR>> > This isn't currently known until the end of the function.<BR>> > - clearly, ability to remove operations<BR>> > <BR>> > <BR>> > Can/should we somehow augment m3middle in support of this line of thinking?<BR>> > <BR>> > <BR>> > Or is that just not the way to go?<BR>> > Each backend should have its own internal representation and loop over it?<BR>> > <BR>> > - Jay<BR>> > <BR>                                      </body>

</html>