<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Verdana
}
--></style>
</head>
<body class='hmmessage'>
Simple and fast does not entirely preclude multiple passes, and one pass does not equate to simple, though yes probably fast.<BR>
The front end already makes multiple passes I believe, and is still fast.<BR>
<BR>
<BR>
Sometimes decomposition into more smaller pieces (passes, layers, etc.) make things simpler.<BR>
Albeit slower.<BR>
<BR>
<BR>
A more historical factor here is that multiple passes imply keeping more data in memory longer.<BR>
Implies slower.<BR>
However memory capacity has increased tremendously.<BR>
<BR>
<BR>
Whereas I'm only thinking about multiple passes, while still compiling one file at a time,<BR>
other compilation systems have capitalized on larger memories and now compile entire<BR>
programs at once, including multiple passes.<BR>
This goes by multiple names -- "whole program optimization", "link time codegen (LTCG)".<BR>
And enables even more aggressive but still "fairly simple" optimization, such as cross<BR>
module inlining, custom calling conventions. I say "fairly simple", because these<BR>
are just the same optimizations one would perform within a module, merely applied<BR>
across module boundaries.<BR>
<BR>
<BR>
(Sometimes what people do is #include all .c files into one master .c file, to help<BR>
get the compiler to see everything at once. sqllite does this, for example.)<BR>
<BR>
<BR>
Surely we can afford multiple passes on a per-file basis.<BR>
?<BR>
<BR>
<BR>
The code is already not that easy to understand imho.<BR>
It took me quite some time to get comfortable and add the 64 bit integer support, years.<BR>
And there's a few things I still don't understand (you can tell from my comments<BR>
in the code, like where do I need to call "newdest")?<BR>
<BR>
<BR>
Embrace it and make it incrementally harder to understand (by virtue of simply adding to it),<BR>
or decide it is stuck asis?<BR>
<BR>
<BR>
I'm not entirely sure.<BR>
<BR>
<BR>
While it is an interesting compromise in its current form, I'm fairly certain<BR>
there are other viable compromises that achieve better code quality, are<BR>
still plenty fast, and aren't much more difficult to understand.<BR>
<BR>
<BR>
Heck, I found it confusing at first that the -debug output is wrong in some places.<BR>
That is a fallout from it being single pass "plus patching".<BR>
The debug output doesn't include the "patches".<BR>
<BR>
<BR>
I have some very simple small optimizations in mind.<BR>
<BR>
<BR>
And then, in stepping through code, and writing code, inlining constantly comes to mind.<BR>
In fact, I frequently see "manual partial inlining" in the Modula-3 code.<BR>
It'd be nifty if the Modula-3 code could have less duplication, relying on the compiler more.<BR>
<BR>
<BR>
I don't see inlining as entirely simple, but I've given it a little thought and it might not<BR>
be so hard. I think it can be done, roughly, by taking the body of a procedure <BR>
as expressed as cg.i3 calls, omitting the begin/procedure, and just "playing them back"<BR>
at the call site. You'd only do this for "small" functions. You'd have to make some<BR>
adjustments like merging local variables and exception handling, though really<BR>
the inlined function would just be a "block".<BR>
<BR>
<BR>
Probably the way to go here, if inlining is really to be considered, is first<BR>
implement it as "one pass plus some memory", and only inline when<BR>
the definition comes before the call. Once that is achieved, no small feat,<BR>
further considering can be made to allow for inlining when the call precedes the definition.<BR>
<BR>
eh, sorry, I'm tried, rambling.<BR>
<BR>
<BR>
- Jay<BR> <BR>> Date: Thu, 1 Apr 2010 09:28:53 -0500<BR>> From: rodney_bates@lcwb.coop<BR>> CC: m3devel@elegosoft.com<BR>> Subject: Re: [M3devel] changing m3middle/m3back to have multiple passes?<BR>> <BR>> <BR>> <BR>> Jay K wrote:<BR>> > I'm thinking m3back could use multiple passes.<BR>> <BR>> This would be a big reversal of the fundamental design philosophy of this<BR>> back end. Its primary design criterion was to be simple and fast. That<BR>> means single-pass. There have been several compilers (often whole compilers,<BR>> not just code generators) written to this principle. For some purposes,<BR>> it is a good thing.<BR>> <BR>> This, of course, inevitably means the generated code is not as small/fast<BR>> as it could be. It's part of the trade off. We have two points on this<BR>> trade off scale now. I think we want to keep the one-pass code generator.<BR>> If you change it, at least make a branch and keep the original design<BR>> easily accessible.<BR>> <BR>> <BR>> > <BR>> > <BR>> > There a few reasons, things that would be easier/possible if it had them.<BR>> > <BR>> > <BR>> > I know adding passes will slow it down, but I suspect it will still be<BR>> > about as pleasantly fast as it is now.<BR>> > <BR>> > <BR>> > The optimizations I have in mind:<BR>> > inlining, including where function definitions come after the call<BR>> > I think inlining where the "small" functions come first is doable in <BR>> > pass,<BR>> > but not if uses come before calls.<BR>> > <BR>> > <BR>> > not saving unused registers<BR>> > This only a small optimization and could probably be done in one<BR>> > pass if m3objfile got a "memmove" operation. Well, I already<BR>> > have the optimization in somewhat, but it involves nop-ing out<BR>> > code instead of removing it completely.<BR>> > <BR>> > <BR>> > dead store removal, and then removing the resulting dead code to <BR>> > compute the value<BR>> > <BR>> > <BR>> > possibly better register allocation -- e.g. based on noticing which<BR>> > variables are used often, or based on how they are used, for example<BR>> > if a variable (or expression/temporary) ends serving a shift count,<BR>> > we should favor putting it in ecx up front, instead of moving it to<BR>> > ecx later, but we don't generally know this till too late<BR>> > <BR>> > <BR>> > As I understand, m3middle already has the notion of "recording"<BR>> > and "playing back" calls to the M3CG interface.<BR>> > That seems like a good starting point?<BR>> > <BR>> > <BR>> > Some optimizations can be made via "CG to CG" transforms.<BR>> > <BR>> > <BR>> > However I think in general I think the required design would include:<BR>> > - ability to add in "private operations"; maybe<BR>> > - ability to add "private data" to existing operations<BR>> > A very simple one is annotate functions with required frame size.<BR>> > This isn't currently known until the end of the function.<BR>> > - clearly, ability to remove operations<BR>> > <BR>> > <BR>> > Can/should we somehow augment m3middle in support of this line of thinking?<BR>> > <BR>> > <BR>> > Or is that just not the way to go?<BR>> > Each backend should have its own internal representation and loop over it?<BR>> > <BR>> > - Jay<BR>> > <BR> </body>
</html>