[M3devel] cm3 llvm backend?

Rodney M. Bates rodney_bates at lcwb.coop
Tue Jun 28 02:29:47 CEST 2016

On 06/27/2016 03:28 PM, Jay K wrote:
> Regarding debug info.
> I have thought about this.
> Various optimizing compilers make some effort to produce
> some debug info.
> And as much as they don't optimize, they should do that.
> However I believe in general, optimization and describability
> with any debug format, is not a solvable problem.
> There are many points in a program where things aren't particularly
> transformed/lost by the optimizer.
> Things tend to be clearly defined at the start of a function.
> Variable locations can be described over ranges of code as either
> being in a particular register, or register + offset (frame), or nowhere.
> But line numbers, as critical as they are, I think are the most
> untenable aspect. Compilers can move code around arbitrarily, including removing it.

Yes, this is a bundle of hard problems.  llvm's official goal is that, in the
face of optimizations, it should still be possible for a debugger to observe
all program state of the original program, but not necessarily alter any state.
Without optimizations, it should also be possible to alter any state as well.
How well they are achieving this, I don't know, but I would guess it comes closer
than gcc, especially an old gcc.

> The C backend also has good debugging as a goal.
> We should be able to have structs with fields, instead of just byte arrays.
> And all the parameters/locals should have good names.
>    This I messed up slightly and will try to fix. Every identifier
>    gets an appended number for uniqueness, which is sometimes but rarely needed.
> Does the current LLVM backend produce any debug info or none?

It has lots of code for producing debug info, probably as much code as
for producing compilable output, maybe slightly more.  I think it is
pretty complete.  But it has not had a lot of testing, and there are
known cases where it has crashes.  The tests expose several of these.
I was working on this when I hit the problem I have complained about.
If you don't ask for debug output, these don't happen.  If I could
decide what to do when llvm's debug info is inadequate, I would work
on fixing them--probably not too big a job.

> I probably avoid any approach that requires writing bindings.
> Unless it is to our own code, like I did for Posix.
> Otherwise keeping things up to date is tedious and error prone and errors are fatal.
> To our own code is just slightly better.
> Maybe I'll write bitcode. Or go through the persisted cm3cg IR that m3cc uses, but it sounds
> like you are super far along and we should build on that.
> I have other things to do first though and I can't promise that I won't reinvent eventually.

At this point, I think this might be nice to have.  The existing m3llvm would be a
good starting point.  I think mostly, it would just need the calls on llvm APIs
replaced by bitcode-emitting calls, plus the low-level things for them to call.

>   - Jay
> ----------------------------------------
>> Date: Mon, 27 Jun 2016 15:19:30 -0500
>> From: rodney_bates at lcwb.coop
>> To: m3devel at elegosoft.com
>> Subject: Re: [M3devel] cm3 llvm backend?
>> On 06/27/2016 01:31 AM, Jay K wrote:
>>> redirecting...
>>>> Olaf
>>>> LLVM didn't seem to satisfy M3's needs, however, if I understood Rodney's laments correctl
>> My big lament about llvm regards producing Dwarf debug info. Having
>> debug info in this vastly superior format is one of my main reasons
>> for wanting an llvm back end. From the llvm documentation, it sounded
>> like it would do this, including altering debug info as needed to
>> match any optimizations done. Then I hit a brick wall, finding out
>> llvm only handles a severe subset of Dwarf, apparently what they felt
>> was needed for C & C++. More below.
>>> I would like to take this up, maybe soon.
>>> I do have a bit of an agenda.
>>> Maybe my priorities are mixed up.
>>> 1 Provide a very portable system.
>>> 2 Provide an easy to install and use system.
>>> 3 Switch from gcc backend to LLVM backend, at least optionally (i.e. at least
>>> for their supported backends).
>>> 4 Maybe write our own backends.
>>> Where is the LLVM support at?
>>> Mostly working? Barely working?
>> I could swear I remember getting the llvm back end to recompile
>> everything in the m3front group twice and converge to identical-sized
>> compiled code, and that I said so in either a commit message or
>> an m3devel list post. But I couldn't find any such statement.
>> This is on AMD64_LINUX, and if you are careful not to ask m3llvm
>> for debug info at all. Otherwise, there are crashes in m3llvm
>> and/or llc, trying to provide Dwarf. This is what I was working
>> on when I made the disappointing discoveries. If my memory is
>> correct here, this means the llvm back end is close to usable
>> for building.
>> Also, again from memory, I think the llvm back end has no failures
>> of test cases that do not also fail using m3cg.
>> There are still other questions, though. So far, there are no
>> changes required to the stock llvm version being used (3.6.1)
>> But it still requires a build of that on the machine where the
>> M3 compilation is done, and it is big and slow to build. Lots
>> of llvm libraries have to be linked in to m3llvm, and separate
>> executable llc needs to be available.
>> The llvm folk are constantly making major API changes, with no
>> explanation other than diffing successive versions of header
>> files, so there is no practical chance of using whatever llvm
>> version may be already installed on a particular host machine.
>> In light of that, I suppose we could fork and modify a specific
>> version of llvm and put its source code in our own repository,
>> similar to m3cg. But maintenance work on llvm is, to my
>> standards, a real nightmare. Just about every single identifier
>> and operator you see involves a needle-in-haystack search to
>> locate its declaration, which could be just about anywhere,
>> in order to know what it is.
>> And no, the names and operator spellings are not close to adequate
>> to clue you in. They have gone to every length possible to use
>> every clever new C++ "feature" that comes out in the latest
>> C++-<n> standard, which always just increases the complexity
>> of the search to a declaration. So I don't fancy doing any of
>> this. (BTW, <n>=17 in recent discussions.)
>> As for the debug info problem, there has been talk in the llvm
>> list of a rework that allows a front end to directly produce
>> true Dwarf (presumably not a subset) for type info only, which
>> would not be changed by optimizations and so llvm would not
>> need to pass it through in its own, different, representation.
>> I don't know how far this has gone.
>> If/when it happens, utilizing it would entail constructing new
>> bindings to altered llvm APIs. This in itself is an extremely
>> tedious and error-prone task that I hoped, after doing it for
>> llvm 3.6.1, never to repeat. Moreover, at the C/M3 interface,
>> there is no type checking possible, so nitty little errors will
>> only show up as hard-to-diagnose runtime errors. That is made
>> even worse by the lack of a debugger that handles both languages.
>> Or, we could use Dragisha's rtinfo library to get type info for
>> a debugger from the .M3WEB files, and Dwarf for the rest. Off
>> hand, this sounds like the best approach to me. Stick with
>> llvm 3.6.1 as long a possible, and avoid more pain.
>>> I know LLVM is big and changing, and maybe they don't value
>>> compatibility of bitcode.
>> Actually, keeping their bitcode stable across llvm releases is
>> one place they do talk about compatibility. But m3llvm uses calls
>> to llvm APIs to construct llvm IR as in-memory data, then another
>> call to get llvm to convert it to bitcode. So bitcode's stability
>> is irrelevant to us. I once thought about producing llvm bitcode
>> directly, but that seems like a pretty big job. It would, however,
>> obviate creating most of those wretched bindings.
>>> But look at what we have with the gcc backend.
>>> Even if we didn't have to patch it at all, I expect
>>> we'd still have to keep and build a local copy.
>>> Perhaps we should just do that?
>>> With LLVM, with its different licensing, perhaps we could
>>> get our "frontend" merged upstream, but this would
>>> then give us a compatibility burden in the persisted m3cg.
>>> Is that ok?
>>> It is hypothetical at this point.
>>> I know everyone here doesn't really like C/C++ (except me).
>>> And, more significantly, I know the system written in itself
>>> is a great test case, but I wonder if we shouldn't write a new
>>> "real" frontend in C or C++, and see if we can't merge that
>>> upstream with gcc and/or clang.
>>> It also worth mentioning that I believe gcc's Ada front end
>>> is written in Ada -- you don't actually have to write
>>> the frontend in C/C++ to merge upstream.
>> Yes, I once worked on this front end. As I understood, SRC M3
>> was not merged into gcc only because RMS was annoyed that SRC made
>> it a separate executable, so its license was not poisoned by GPL.
>>> But there might remain licensing concern.
>>> - Jay
>>> _______________________________________________
>>> M3devel mailing list
>>> M3devel at elegosoft.com
>>> https://m3lists.elegosoft.com/mailman/listinfo/m3devel
>> --
>> Rodney Bates
>> rodney.m.bates at acm.org
>> _______________________________________________
>> M3devel mailing list
>> M3devel at elegosoft.com
>> https://m3lists.elegosoft.com/mailman/listinfo/m3devel
> _______________________________________________
> M3devel mailing list
> M3devel at elegosoft.com
> https://m3lists.elegosoft.com/mailman/listinfo/m3devel

Rodney Bates
rodney.m.bates at acm.org

More information about the M3devel mailing list