[M3devel] A rant about llvm and debug information

Rodney M. Bates rodney_bates at lcwb.coop
Mon Mar 14 17:45:12 CET 2016



On 03/13/2016 12:00 PM, Darko Volaric wrote:
> I agree that nasty bugs are still possible in M3, especially in large code bases, but I was thinking that safety avoids bugs that involve nasty things like memory overwrites and that safety facilitates an internal debugger. I have a bit of an "all M3" fetish, but wouldn't a compiled in, built in debugger be a better solution given the difficulties you described as the backend changes? I'd think it could be relatively easy to write if parts of the compiler can be re-used (eg parsing and evaluating expressions) and leveraged (eg compiling in breakpoints and traces).
>

I had never thought about a compiled-in debugger.  I think that distinction
(vs. a separate executable in a separate process) sounds somewhat orthogonal
to the list of functions I want.  Yes, M3's safety greatly reduces the need
for the protection of a separate process.  And I *really* prefer to stay in
Modula-3 as much as possible, but that sometimes means either massive
reimplementation or abandonment of functions already available.

For a linked-in debugger, the debug information that the compiler itself would
need to provide would be pretty much the same info, tho' obviously not at all
the same representation of it.

I did not mean I like the gdb-derived code of m3gdb as an implementation.  I just
want better features and functions.  While I am far from satisfied with what we
already have in m3gdb, it does do quite a lot, and it would be a big undertaking
to duplicate even that in Modula-3.

There is another issue here.  Some of the debugging features require that the
back end provide some of the debug info, or at least alter it.  That is the
barrier to fixing one m3gdb's worst bugs, using m3cc.  Here, code and debug
info split paths at the front of the back end, (in fact, in our own code that
is not part of gcc at all) and the code generation makes changes that require
corresponding changes to debug info, which would be very hard to do.

That is (was?) a big part of the attraction of an llvm-derived backend.  In
their introductory documents, llvm make it sound like llvm does such alterations
and does so easily.  But it apparently is not that easy at all.  That is what
I am now so disillusioned over about llvm.

We do have one backend written in M3, but it is for only one target, now
becoming less common and does not do much optimization.  The C backend,
by definition, has to trick a C compiler into generating its debug info, and
given the differences in the languages, this is just about hopeless to
even match current m3gdb.

So that leaves two backends, written in C and C++. I hate trying to maintain
hundreds of thousands of other peoples' code in these languages, although
I have to admit, C++ has somewhat softened my feelings about C.

To provide the necessary debug info for a good debugger, compiled-in or
stand-alone, with all coding in M3, we would have to also write a multi-language,
optimizing back end in M3, in addition to the debugger itself.  These things
run into the million line-of-code range.

And this is the one reason why I also want a multi-language debugger.  Passing
parameters/results between languages is the place where there is no safety at
all, and so an interactive debugger is most needed to diagnose garbled values.
With a debugger derived from gcc, we at least got its languages more or less
for free.

BTW, the current m3gdb is sufficiently old that it will not read the Dwarf
debug info produced by either modern gcc of llvm, so it doesn't function as
a combined M3/C++ debugger.

So it looks to me like better debug info from at least one back end is by
far the easiest way to get good function.  We would really need several
full-time-equivalents' worth of developers working on the backend/debugger
combination to get a really nice implementation.



> Of course you most likely are happy with gdb and want to continue along that path. I've wanted to add some more debugging features to the compiler for some time.
>
> On Sun, Mar 13, 2016 at 5:29 PM, Rodney M. Bates <rodney_bates at lcwb.coop <mailto:rodney_bates at lcwb.coop>> wrote:
>
>     I want a better m3gdb, i.e., an interactive debugger for an executing program.
>     Even with excellent type safety, there are plenty of algorithmic bugs that
>     are not type violations.  Then with large sets of source code and large sets
>     of data, a debugger is just so much faster than anything else.
>
>     As an example, I very recently fixed a long-standing compiler bug that Peter
>     reported.  The compiler crashed after first correctly reporting a static
>     error on the code being compiled.  Figuring out what was wrong and how
>     to fix all the affected cases without breaking any others required looking
>     at several spots in the compiler, all unfamiliar to me.  Such as it is,
>     the current m3gdb helped immensely.  For example, in many places, I saw
>     in the data structure, only the compiler's internal integer-mapped representation
>     of an identifier, I could quickly see the actual identifier with:
>
>     m3gdb> print M3ID.ToText(436)
>
>     But I have many pages of todo lists of fixes and improvements to make to
>     m3gdb.  I had been thinking this would all be so much easier with Dwarf
>     debug info.
>
>     On 03/12/2016 03:02 PM, Darko Volaric wrote:
>
>         Rodney can you tell me what your motivation for this sort of debugging support is? Is it for post mortem debugging, multi-language, external tool support or something else? M3 is so safe I've always envisaged an integrated (compiled-in) debugging tool which is in effect a call logger with some extras, I guess because that's my style of debugging. I'm wondering if there's another angle on what you want to achieve.
>
>         - Darko
>
>
>         On Fri, Mar 11, 2016 at 9:57 PM, Rodney M. Bates <rodney_bates at lcwb.coop <mailto:rodney_bates at lcwb.coop> <mailto:rodney_bates at lcwb.coop <mailto:rodney_bates at lcwb.coop>>> wrote:
>
>              I have grown very disillusioned and discouraged about llvm.  It does
>              not seem to have lived up to a couple of its claims that were very
>              important to what I am trying to do.
>
>              The latest frustration is a recent discovery about its treatment of
>              debug information and Dwarf.  They say its internal information is
>              loosely based on Dwarf, and the anecdotal things I had looked at in
>              the past suggested it was isomorphic to Dwarf, with different
>              low-level data structure.  But I now find out, llvm only handles a
>              very severe subset of Dwarf.
>
>              The decisive example is the subrange node.  Dwarf3 defines 18
>              attributes for its DIE for a subrange type.  Llvm will only handle
>              two, the lower and upper bounds.  Especially, it will not even handle
>              a base type.  This will make a Modula-3 debugger completely useless
>              for anything having subrange type.
>
>              I can't imagine what would have led to such a decision.  Is there any
>              language that has subranges, but they all implicitly have the same
>              base type?  It certainly won't support any language's subranges that I
>              know of.  Llvm evidently doesn't have the commitment to multiple
>              languages that Dwarf has.
>
>              My main motive for wanting an llvm back end for Modula-3 has always
>              been a better Modula-3 debugger.  Dwarf is a vastly superior debug
>              information format than the highly non-standard stabs we are now
>              using.  I had imagined using llvm would be the way to get it.
>
>              At this point, it seems predictable that there is a lot more of
>              Dwarf's very extensive multi-language support that we need, but that
>              llvm will not pipe through, yet to be discovered.  There is no
>              question that we would have to modify llvm to get any decent
>              debugging, even parity with current m3cc/m3gdb.  I had believed we
>              could avoid forking and modifying llvm, but apparently not so.
>
>              Which leads to my second disillusionment.  Llvm is constantly
>              undergoing very rapid change, and if you need or want to track the
>              changes, the claims about well-documented formats and interfaces are,
>              well, at least exaggerated.
>
>              Recently, in the llvm mailing list, a couple of others who maintain
>              things outside the official llvm tree have been seconding this, saying
>              that important APIs constantly undergo extensive revisions, with no
>              explanation other than just revised header files one must diff.  E.g.,
>              no suggestions what removed/altered functions/parameters should be
>              replaced by.  At least these posts give some confirmation that I am
>              not just paranoid.
>
>              Even if we could avoid actual forking by persuading llvm to
>              incorporate changes for us, we would then have to constantly track the
>              development head to get them.  This in itself would entail so much
>              time spent adapting, that I, for one, could hardly find time for any
>              functional progress.
>
>              I have been through one round of updating bindings for DIBuilder, and
>              that was a nightmare.  Whether looking at diffs in llvm headers and
>              adapting our existing bindings, or starting over from scratch with the
>              revised llvm headers, it is extremely tedious and error-prone.
>              Moreover, since there is no intra-language type checking, many picky
>              little errors will only show up as runtime assertion failures,
>              segfaults, hard to explain behavior, etc.  And this all has to happen
>              many times over before we could get a single debugger that would
>              handle both languages, making diagnosis all the harder.
>
>              I have put a lot of work into this, and Peter obviously has put in a
>              lot more.  But at this point, it looks to be far more productive to
>              abandon llvm/Dwarf debugging and put the energy into improving m3gdb,
>              using/further extending the existing stabs.
>
>              Or possibly modifying m3cc to produce Dwarf, but that raises a
>              different issue.
>
>              --
>              Rodney Bates
>         rodney.m.bates at acm.org <mailto:rodney.m.bates at acm.org> <mailto:rodney.m.bates at acm.org <mailto:rodney.m.bates at acm.org>>
>              _______________________________________________
>              M3devel mailing list
>         M3devel at elegosoft.com <mailto:M3devel at elegosoft.com> <mailto:M3devel at elegosoft.com <mailto:M3devel at elegosoft.com>>
>         https://mail.elegosoft.com/cgi-bin/mailman/listinfo/m3devel
>
>
>
>     --
>     Rodney Bates
>     rodney.m.bates at acm.org <mailto:rodney.m.bates at acm.org>
>
>
>
>
> _______________________________________________
> M3devel mailing list
> M3devel at elegosoft.com
> https://mail.elegosoft.com/cgi-bin/mailman/listinfo/m3devel
>

-- 
Rodney Bates
rodney.m.bates at acm.org



More information about the M3devel mailing list