[M3devel] A rant about llvm and debug information

Fri Mar 11 21:57:33 CET 2016

I have grown very disillusioned and discouraged about llvm.  It does
not seem to have lived up to a couple of its claims that were very
important to what I am trying to do.

The latest frustration is a recent discovery about its treatment of
debug information and Dwarf.  They say its internal information is
loosely based on Dwarf, and the anecdotal things I had looked at in
the past suggested it was isomorphic to Dwarf, with different
low-level data structure.  But I now find out, llvm only handles a
very severe subset of Dwarf.

The decisive example is the subrange node.  Dwarf3 defines 18
attributes for its DIE for a subrange type.  Llvm will only handle
two, the lower and upper bounds.  Especially, it will not even handle
a base type.  This will make a Modula-3 debugger completely useless
for anything having subrange type.

I can't imagine what would have led to such a decision.  Is there any
language that has subranges, but they all implicitly have the same
base type?  It certainly won't support any language's subranges that I
know of.  Llvm evidently doesn't have the commitment to multiple
languages that Dwarf has.

My main motive for wanting an llvm back end for Modula-3 has always
been a better Modula-3 debugger.  Dwarf is a vastly superior debug
information format than the highly non-standard stabs we are now
using.  I had imagined using llvm would be the way to get it.

At this point, it seems predictable that there is a lot more of
Dwarf's very extensive multi-language support that we need, but that
llvm will not pipe through, yet to be discovered.  There is no
question that we would have to modify llvm to get any decent
debugging, even parity with current m3cc/m3gdb.  I had believed we
could avoid forking and modifying llvm, but apparently not so.

Which leads to my second disillusionment.  Llvm is constantly
undergoing very rapid change, and if you need or want to track the
changes, the claims about well-documented formats and interfaces are,
well, at least exaggerated.

Recently, in the llvm mailing list, a couple of others who maintain
things outside the official llvm tree have been seconding this, saying
that important APIs constantly undergo extensive revisions, with no
explanation other than just revised header files one must diff.  E.g.,
no suggestions what removed/altered functions/parameters should be
replaced by.  At least these posts give some confirmation that I am
not just paranoid.

Even if we could avoid actual forking by persuading llvm to
incorporate changes for us, we would then have to constantly track the
development head to get them.  This in itself would entail so much
time spent adapting, that I, for one, could hardly find time for any
functional progress.

I have been through one round of updating bindings for DIBuilder, and
that was a nightmare.  Whether looking at diffs in llvm headers and
adapting our existing bindings, or starting over from scratch with the
revised llvm headers, it is extremely tedious and error-prone.
Moreover, since there is no intra-language type checking, many picky
little errors will only show up as runtime assertion failures,
segfaults, hard to explain behavior, etc.  And this all has to happen
many times over before we could get a single debugger that would
handle both languages, making diagnosis all the harder.

I have put a lot of work into this, and Peter obviously has put in a
lot more.  But at this point, it looks to be far more productive to
abandon llvm/Dwarf debugging and put the energy into improving m3gdb,
using/further extending the existing stabs.

Or possibly modifying m3cc to produce Dwarf, but that raises a
different issue.

-- 
Rodney Bates
rodney.m.bates at acm.org