[M3devel] A rant about llvm and debug information

Mon Mar 14 19:01:19 CET 2016

My theory about an internal debugger is that it could be (but not would
be) implemented as a source to source translation. The idea is to stay
within the front end and avoid these back end issues and use the compiler
infrastructure rather than export debug info. Obviously some runtime
extensions would be needed like storing declaration names and being able to
index the elements of call frames (I guess that is a large part of debug
info). This implies some (possibly serious) limitations, like having to
recompile if you want to set a breakpoint and the like, but with the
compilers relative speed and efficiency I do not see that as a huge
problem. Obviously it's not the same as a proper interactive debugger.
There would also be some performance penalties but possibly some gains too.

As you say, this would probably not be what you want, but I'm wondering how
much GDB functionality could be implemented this way and how useful it
would be. I think it would be an enduring benefit to the compiler to have
something that even approximates it, available straight out of the box.

Obviously implementation effort is a big deal as usual, but I might try
pinning down how it would be implemented in detail and if it could be done
relatively simply. One important question is: what is the most important
functionality required? Please don't say "reversing through code".

On Mon, Mar 14, 2016 at 5:45 PM, Rodney M. Bates <rodney_bates at lcwb.coop>
wrote:

>
>
> On 03/13/2016 12:00 PM, Darko Volaric wrote:
>
>> I agree that nasty bugs are still possible in M3, especially in large
>> code bases, but I was thinking that safety avoids bugs that involve nasty
>> things like memory overwrites and that safety facilitates an internal
>> debugger. I have a bit of an "all M3" fetish, but wouldn't a compiled in,
>> built in debugger be a better solution given the difficulties you described
>> as the backend changes? I'd think it could be relatively easy to write if
>> parts of the compiler can be re-used (eg parsing and evaluating
>> expressions) and leveraged (eg compiling in breakpoints and traces).
>>
>>
> I had never thought about a compiled-in debugger.  I think that distinction
> (vs. a separate executable in a separate process) sounds somewhat
> orthogonal
> to the list of functions I want.  Yes, M3's safety greatly reduces the need
> for the protection of a separate process.  And I *really* prefer to stay in
> Modula-3 as much as possible, but that sometimes means either massive
> reimplementation or abandonment of functions already available.
>
> For a linked-in debugger, the debug information that the compiler itself
> would
> need to provide would be pretty much the same info, tho' obviously not at
> all
> the same representation of it.
>
> I did not mean I like the gdb-derived code of m3gdb as an implementation.
> I just
> want better features and functions.  While I am far from satisfied with
> what we
> already have in m3gdb, it does do quite a lot, and it would be a big
> undertaking
> to duplicate even that in Modula-3.
>
> There is another issue here.  Some of the debugging features require that
> the
> back end provide some of the debug info, or at least alter it.  That is the
> barrier to fixing one m3gdb's worst bugs, using m3cc.  Here, code and debug
> info split paths at the front of the back end, (in fact, in our own code
> that
> is not part of gcc at all) and the code generation makes changes that
> require
> corresponding changes to debug info, which would be very hard to do.
>
> That is (was?) a big part of the attraction of an llvm-derived backend.  In
> their introductory documents, llvm make it sound like llvm does such
> alterations
> and does so easily.  But it apparently is not that easy at all.  That is
> what
> I am now so disillusioned over about llvm.
>
> We do have one backend written in M3, but it is for only one target, now
> becoming less common and does not do much optimization.  The C backend,
> by definition, has to trick a C compiler into generating its debug info,
> and
> given the differences in the languages, this is just about hopeless to
> even match current m3gdb.
>
> So that leaves two backends, written in C and C++. I hate trying to
> maintain
> hundreds of thousands of other peoples' code in these languages, although
> I have to admit, C++ has somewhat softened my feelings about C.
>
> To provide the necessary debug info for a good debugger, compiled-in or
> stand-alone, with all coding in M3, we would have to also write a
> multi-language,
> optimizing back end in M3, in addition to the debugger itself.  These
> things
> run into the million line-of-code range.
>
> And this is the one reason why I also want a multi-language debugger.
> Passing
> parameters/results between languages is the place where there is no safety
> at
> all, and so an interactive debugger is most needed to diagnose garbled
> values.
> With a debugger derived from gcc, we at least got its languages more or
> less
> for free.
>
> BTW, the current m3gdb is sufficiently old that it will not read the Dwarf
> debug info produced by either modern gcc of llvm, so it doesn't function as
> a combined M3/C++ debugger.
>
> So it looks to me like better debug info from at least one back end is by
> far the easiest way to get good function.  We would really need several
> full-time-equivalents' worth of developers working on the backend/debugger
> combination to get a really nice implementation.
>
>
>
> Of course you most likely are happy with gdb and want to continue along
>> that path. I've wanted to add some more debugging features to the compiler
>> for some time.
>>
>> On Sun, Mar 13, 2016 at 5:29 PM, Rodney M. Bates <rodney_bates at lcwb.coop
>> <mailto:rodney_bates at lcwb.coop>> wrote:
>>
>>     I want a better m3gdb, i.e., an interactive debugger for an executing
>> program.
>>     Even with excellent type safety, there are plenty of algorithmic bugs
>> that
>>     are not type violations.  Then with large sets of source code and
>> large sets
>>     of data, a debugger is just so much faster than anything else.
>>
>>     As an example, I very recently fixed a long-standing compiler bug
>> that Peter
>>     reported.  The compiler crashed after first correctly reporting a
>> static
>>     error on the code being compiled.  Figuring out what was wrong and how
>>     to fix all the affected cases without breaking any others required
>> looking
>>     at several spots in the compiler, all unfamiliar to me.  Such as it
>> is,
>>     the current m3gdb helped immensely.  For example, in many places, I
>> saw
>>     in the data structure, only the compiler's internal integer-mapped
>> representation
>>     of an identifier, I could quickly see the actual identifier with:
>>
>>     m3gdb> print M3ID.ToText(436)
>>
>>     But I have many pages of todo lists of fixes and improvements to make
>> to
>>     m3gdb.  I had been thinking this would all be so much easier with
>> Dwarf
>>     debug info.
>>
>>     On 03/12/2016 03:02 PM, Darko Volaric wrote:
>>
>>         Rodney can you tell me what your motivation for this sort of
>> debugging support is? Is it for post mortem debugging, multi-language,
>> external tool support or something else? M3 is so safe I've always
>> envisaged an integrated (compiled-in) debugging tool which is in effect a
>> call logger with some extras, I guess because that's my style of debugging.
>> I'm wondering if there's another angle on what you want to achieve.
>>
>>         - Darko
>>
>>
>>         On Fri, Mar 11, 2016 at 9:57 PM, Rodney M. Bates <
>> rodney_bates at lcwb.coop <mailto:rodney_bates at lcwb.coop> <mailto:
>> rodney_bates at lcwb.coop <mailto:rodney_bates at lcwb.coop>>> wrote:
>>
>>              I have grown very disillusioned and discouraged about llvm.
>> It does
>>              not seem to have lived up to a couple of its claims that
>> were very
>>              important to what I am trying to do.
>>
>>              The latest frustration is a recent discovery about its
>> treatment of
>>              debug information and Dwarf.  They say its internal
>> information is
>>              loosely based on Dwarf, and the anecdotal things I had
>> looked at in
>>              the past suggested it was isomorphic to Dwarf, with different
>>              low-level data structure.  But I now find out, llvm only
>> handles a
>>              very severe subset of Dwarf.
>>
>>              The decisive example is the subrange node.  Dwarf3 defines 18
>>              attributes for its DIE for a subrange type.  Llvm will only
>> handle
>>              two, the lower and upper bounds.  Especially, it will not
>> even handle
>>              a base type.  This will make a Modula-3 debugger completely
>> useless
>>              for anything having subrange type.
>>
>>              I can't imagine what would have led to such a decision.  Is
>> there any
>>              language that has subranges, but they all implicitly have
>> the same
>>              base type?  It certainly won't support any language's
>> subranges that I
>>              know of.  Llvm evidently doesn't have the commitment to
>> multiple
>>              languages that Dwarf has.
>>
>>              My main motive for wanting an llvm back end for Modula-3 has
>> always
>>              been a better Modula-3 debugger.  Dwarf is a vastly superior
>> debug
>>              information format than the highly non-standard stabs we are
>> now
>>              using.  I had imagined using llvm would be the way to get it.
>>
>>              At this point, it seems predictable that there is a lot more
>> of
>>              Dwarf's very extensive multi-language support that we need,
>> but that
>>              llvm will not pipe through, yet to be discovered.  There is
>> no
>>              question that we would have to modify llvm to get any decent
>>              debugging, even parity with current m3cc/m3gdb.  I had
>> believed we
>>              could avoid forking and modifying llvm, but apparently not
>> so.
>>
>>              Which leads to my second disillusionment.  Llvm is constantly
>>              undergoing very rapid change, and if you need or want to
>> track the
>>              changes, the claims about well-documented formats and
>> interfaces are,
>>              well, at least exaggerated.
>>
>>              Recently, in the llvm mailing list, a couple of others who
>> maintain
>>              things outside the official llvm tree have been seconding
>> this, saying
>>              that important APIs constantly undergo extensive revisions,
>> with no
>>              explanation other than just revised header files one must
>> diff.  E.g.,
>>              no suggestions what removed/altered functions/parameters
>> should be
>>              replaced by.  At least these posts give some confirmation
>> that I am
>>              not just paranoid.
>>
>>              Even if we could avoid actual forking by persuading llvm to
>>              incorporate changes for us, we would then have to constantly
>> track the
>>              development head to get them.  This in itself would entail
>> so much
>>              time spent adapting, that I, for one, could hardly find time
>> for any
>>              functional progress.
>>
>>              I have been through one round of updating bindings for
>> DIBuilder, and
>>              that was a nightmare.  Whether looking at diffs in llvm
>> headers and
>>              adapting our existing bindings, or starting over from
>> scratch with the
>>              revised llvm headers, it is extremely tedious and
>> error-prone.
>>              Moreover, since there is no intra-language type checking,
>> many picky
>>              little errors will only show up as runtime assertion
>> failures,
>>              segfaults, hard to explain behavior, etc.  And this all has
>> to happen
>>              many times over before we could get a single debugger that
>> would
>>              handle both languages, making diagnosis all the harder.
>>
>>              I have put a lot of work into this, and Peter obviously has
>> put in a
>>              lot more.  But at this point, it looks to be far more
>> productive to
>>              abandon llvm/Dwarf debugging and put the energy into
>> improving m3gdb,
>>              using/further extending the existing stabs.
>>
>>              Or possibly modifying m3cc to produce Dwarf, but that raises
>> a
>>              different issue.
>>
>>              --
>>              Rodney Bates
>>         rodney.m.bates at acm.org <mailto:rodney.m.bates at acm.org> <mailto:
>> rodney.m.bates at acm.org <mailto:rodney.m.bates at acm.org>>
>>              _______________________________________________
>>              M3devel mailing list
>>         M3devel at elegosoft.com <mailto:M3devel at elegosoft.com> <mailto:
>> M3devel at elegosoft.com <mailto:M3devel at elegosoft.com>>
>>         https://mail.elegosoft.com/cgi-bin/mailman/listinfo/m3devel
>>
>>
>>
>>     --
>>     Rodney Bates
>>     rodney.m.bates at acm.org <mailto:rodney.m.bates at acm.org>
>>
>>
>>
>>
>> _______________________________________________
>> M3devel mailing list
>> M3devel at elegosoft.com
>> https://mail.elegosoft.com/cgi-bin/mailman/listinfo/m3devel
>>
>>
> --
> Rodney Bates
> rodney.m.bates at acm.org
> _______________________________________________
> M3devel mailing list
> M3devel at elegosoft.com
> https://mail.elegosoft.com/cgi-bin/mailman/listinfo/m3devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20160314/d232b4db/attachment-0002.html>