[M3devel] backend interface vs. types vs. forward references?

Mon Oct 4 19:23:16 CEST 2010

ps: gcc has a very large number of passes over its trees, at least when optimizing.
Like tens or 100+.
The Modula-3 frontend also makes a few passes over everything, just a few.
I don't know where the cost is, but I don't expect to add much. We'll see.
I can try to limit it to not even walk the non-type data.
I should see if the frontend reliably front-loads the type data. It seems to.
We could also put in an end-types opcode to  make it easier to notice.
I think we could also address it in the frontend, by introducing
a type forward declaration call.

> > How big are the intermediate files for all our own sources?

A few months ago I took a quick survey.
This is when I grew the buffer fromwhateve it was to 64K.
I couldn't justify larger because so many would fit in 64K.

I lied somewhat about working set.
If you use a small buffer and iterate in place, your working
set can only grow by the size of the buffer.
If you read the entire thing into memory and walk it linearly,
well, the operating system doesn't necessarily know you won't
walk backwards so it'll let your working set grow, only to throw
out the memory later as needed. This is my rough understanding
based on OS principles. As well, using more address space
has a little extra cost, vs. looping over a small buffer multiple times.

We might might might be able to make some optimizations though,
such as having strings be direct pointers into the buffer instead
of copying them out. There is the matter of the terminal nuls though.
And checking that they fit in the buffer.

> > Can you write out some statistics?

Yeah..
I routinely use cm3 -keep, then just -l target/*c

 - Jay

----------------------------------------
> From: jay.krell at cornell.edu
> To: wagner at elegosoft.com; m3devel at elegosoft.com
> Date: Mon, 4 Oct 2010 17:03:08 +0000
> Subject: Re: [M3devel] backend interface vs. types vs. forward references?
>
>
> The passes I'm talking about I think will be fast.
> True the backend is very slow but I don't think this will matter.
> The earlier will passes will ignore most of the data.
> The cost will only be in the extra but ignored serialization.
> And even then, it might be better -- if the ordering is a certain way
> and guaranteed, once it hits certain opcodes, it will know the types
> are all done and start over, without walking each opcode one at a time.
>
> I tried building m3cc on virtual machines with only 256MB and it failed.
> I had to up to 384 MB. If I cal recall correctly.
>
> Granted, we don't always build m3cc.
>
> Remember that optimized builds would often use "unit at once"
>  compilation, so the entire gcc tree would be in memory.
> Now, currently, we never do that for Modula-3, because of a bug
> where it throws out functions that are needed to be kept.
> But for C/C++ it is not unusual (again, including compiling m3cc).
> The tree representation is presumably not much different/smaller than
> the m3cg representation. For actual C/C++ there might be a bigger difference,
> what with comments/whitespace removed.
> But from the gcc point of view, Modula-3 source is already in an encoded binary form.
> Granted, the strings are duplicatd.
>
> Still, the access pattern remains linear.
> So it doesn't increase working set. Just virtual address space requiremens.
>
> This is something I learned reeently working with large data -- linear access
> patterns are what is good and keeps working set down, vs. random access.
>
> Plus, the file gets closed which does free a little of resources, though
> probably less than are being additional consumed.
>
>  - Jay
>
> ----------------------------------------
> > Date: Mon, 4 Oct 2010 16:45:46 +0200
> > From: wagner at elegosoft.com
> > To: m3devel at elegosoft.com
> > Subject: Re: [M3devel] backend interface vs. types vs. forward references?
> >
> > Quoting Jay K :
> >
> > > I think I'll just solve this in the backend by making a few passes.
> > > Maybe something with specific passes where early passes only pay
> > > attention to certain opcodes, that declare types.
> >
> > I'm not really happy with multiple passes within the backend just to
> > make gcc happy. The performance of the gcc backend is already poor
> > compared to an integrated backend and to what M3 should be able to
> > achieve. How much will it cost wrt. performance?
> >
> > > The new/current "replay" stuff will maybe go away.
> >
> > Hm, I must have missed that.
> >
> > > The new/current keeping of the entire file in memory will stay
> > > unless someone has strong evidence/argument that is shouldn't.
> >
> > Keeping the whole (intermediate code) file in memory should be fine,
> > unless we get problems for large generated files on small machines
> > somewhere.
> >
> > How big are the intermediate files for all our own sources?
> > Can you write out some statistics?
> >
> > Olaf
> > --
> > Olaf Wagner -- elego Software Solutions GmbH
> > Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
> > phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 45 86 95
> > http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
> > Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194
> >
>