[M3devel] additional CVS repositories for additional gcc forks?

Jay K jay.krell at cornell.edu
Sun Aug 29 02:21:31 CEST 2010


Ps we don't even have m3gdb for all systems e.g. Darwin, and the Windows debuggers are much better than anything I've seen on Unix. On these systems intermediate C would improve debugging much. Though Darwin gdb I've also been improving.

Also you seem to confuse C   name mangling with what Modula-3 does. They are quite different. C   only mangles things with linkage, for linking reasons, not for debugging information. Locals, parameters, record fields: no mangling. C   code analogous to what Modula-3 allows would survive with everything being extern C, no name mangling.

In both cases as I understand, an effective hack to tunnel information through systems not quite designed/extended to suit.

What we have is flawed. What I favor is flawed. But differently.

 - Jay/phone

> Date: Sat, 28 Aug 2010 14:15:02 -0500
> From: rodney_bates at lcwb.coop
> To: m3devel at elegosoft.com
> Subject: Re: [M3devel] additional CVS repositories for additional gcc forks?
> 
> 
> 
> Jay K wrote:
> > 
> >> There is no way a debugger that has no Modula-3 awareness is going to provide
> >> a Modula-3-like view. The operators will have C spellings and C semantics,
> > 
> > 
> > How many operators do people use in a debugger?
> > 
> > I use very few. Partly because for a long time I used a debugger
> > with a great gui and an awful expression evaluator.
> > 
> > 
> > Still, I use basically only "+" "->" "*" (dereference) and "=" for assignment.
> >  Sometimes multiplication and subtraction.
> > I agree it would be nice if all the C debuggers would be lenient about "->" vs. ".".
> >   That would unify Modula-3, Java, C#, C, C++.
> >    Except where C++ has an operator-> overload. But operator overload
> >    is an area where.. tangent... C++ is a great language..my compiler implements
> >    it well..but my debugger, my editor, plain text search.. can't cope with it.
> >    Modula-3, C#, Java run afoul of plain text search too -- anything with prevalent "scoped names".
> >    In C you get Window_Init, File_Open, etc. never just Init or Open.
> >    How do you search for calls to operator+ in C++? For a certain type?
> >    In C, except for the builtin types, they'd be unique function names.
> >   Anyway, tangent over.
> > 
> > 
> > + is the same in the various languages.
> > 
> > I think "=", ":=", "==" are the main problem.
> > You might try a compare and accidentally to an assignment.
> > 
> > 
> >> The syntax will be strictly C.
> > 
> > Almost the same.
> > 
> > 
> >  > The display of values will be C.
> > 
> > Almost the same.
> 
> On the strength of your comments, I rest my case.
> 
> > 
> > Also if you have a particularly good C compiler/debugger, we could do
> >   #define AND &&  
> >   #define OR ||  
> > 
> > 
> > getting you back those two operators, which I rarely use in a debugger.
> > 
> > 
> >  > TEXT won't work in any reasonable way at all.
> > 
> > Sure it might.
> > In Visual Studio you can write little addins to help the debugger display stuff.
> > I believe there is a small builtin "language" or I believe you can write actual code.
> > In Windbg you can write little plugins. You could provide like !m3.text.
> > I don't know if you can tell the debugger ahead of time how to custom display types.
> > I don't know if gdb has a story here.
> > Still, one might imagine a *small* patch to gdb.
> 
> All of which is just different ways of providing a debugger with proper Modula-3 support.
> 
> > 
> > 
> >  > Demangling names in the compiler's debug output would make them look nice, but then the Modula-3
> >  > type info would be lost, and output formats would lose.
> > 
> > 
> > Um, you think maybe this stuff was done the wrong way in the first place?
> >  That the names shouldn't be mangled in the first place?
> >   I strongly suspect so. Other systems don't depend on this.
> >    (Yes, I know about C++ name mangling, and even though it does something similar,
> >    that's a trick for the linker and now how debug information works. It for
> >    in the absence of debug information, among other reasons.)
> > 
> 
> I think it could be done a lot better if switching to a better debug info format.
> stabs may have been the best option around at the time it was done.  And I don't
> know if Modula-3's structural type equivalence rules could be supported any better
> without the uid's.
> 
> But there will still have to be name mangling to get a standard linker to work,
> with just about any language.  Stock gdb does what it does in part because it
> has builtin demanglers for the various languages it supports.  It chooses the
> appropriate demangler dynamically.  It would be a pretty ugly user interface if
> it didn't.
> 
> > There is "naturally" type information you get just by building up decent gcc trees.
> > Ditto for intermediate C code.
> > For a while you know, every record is a void* or just has a size, and all the type information
> > is buried in the names. This is questionable. I'm sure it has some advantages.
> > You can describe things maybe not easily described in C.
> >   e.g. Subranges?
> > And then our code in m3gdb is probably very portable, in that, I think, we just ferry along
> > some strings, from our code to our code, and we can decipher them the same in all systems.
> > I think, I'm not sure, there is like no dependence on the vagaries of coff, dwarf, etc., and
> > what they can or cannot represent. However there is a dependency on stabs being available.
> > It is not for example available on HP-UX.
> > 
> > 
> > Furthermore the lack of correct type information, apart from stabs, causes problems.
> > For some targets the backend wants accurate type information to pass records by value.
> > I again/still think we should probably not rely on the backend for this anyway.
> > We should probably make a copy and pass a pointer to it, kind of like m3x86 does.
> > 
> 
> Certainly, a back end needs type information for code generation.  That won't do much to
> help a debugger that is oblivious to the source language.
> 
> > 
> >  > Things that use pointers at the machine level can never know whether the pointers
> >  > point to a single value or an array, and if the latter, with what bounds.
> > 
> > 
> > C programmers can cope with that. Can't we?
> 
> Look at the security advisories.  Buffer overflow, buffer overrun, buffer overflow, ...
> over and over.  Almost all of them are buffer overruns.  But that's a tangent too.
> 
> > And..I admit.. I don't know what our machine level mapping looks like.
> > Do we pass a pointer and a size as two parameters? Or a small record with pointer/size by value?
> > 
> 
> If it's a fixed array, its all in the static type, which the stabs (as extended for Modula-3)
> info conveys.  If it's an open array, there is runtime dope: a pointer to the zero-th array
> element, followed by a shape, which is just a list of words giving the NUMBER of each dimension.
> The dimension count is statically known.  generally passed by reference, although it is never
> altered.  For heap-allocated arrays, it's located right in the heap object, at the beginning,
> making the pointer redundant.  For Formal parameters, if the actual doesn't already have the
> needed dope, it is constructed at runtime by code at the call site.  This works for, e.g.,
> passing a fixed array or a SUBARRAY to a formal that is open.
> 
> BTW, this is another thing a debugger has to know about to either pass, display, or alter
> open array values.  Perhaps Dwarf is sophisticated enough that it could just be encoded in
> Dwarf, but certainly not in stabs.  (Well, we could probably cobble up yet another stabs
> extension, but that would still require specialized debugger support.)
> 
> > 
> > The debugger need not be a full blown Modula-3 interpreter.
> > 
> > 
> >> Probably the worst thing will be calls. They just don't work without the debugger
> >> having knowledge of a lot of stuff. There are extra hidden parameters, method
> >> calls, passing procedure-typed parameters with environments, calling the same,
> >> the three modes of Modula-3, etc. I consider calls in debugger commands very
> >> valuable.
> > 
> > 
> > I use calls very rarely.
> > I'm not super keen on running some of my code when otherwise my code is all frozen
> > and some of it is misbehaving. I know this is partly me.
> > 
> > 
> > Even so, generally you only call certain functions that put there for use from a debugger, right?
> > Like gcc's debug_node or such?
> > And they tend to not be fancy?
> 
> I regularly type a debugger call to rexecute something that I just stepped over, not knowing whether
> the problem I am looking for would occur inside the procedure or not.  Reverse debugging in the newest
> gdb could provide an alternative, but I am hearing that the necessary recording costs ~ n*10 slowdown.
> 
> I also do it as an easy way to test some parameter combination.  Kind of like having an interpreter for
> the language.  And I do it with fancy procedures that format some elaborate data structure in a readable,
> high-level way, which is, I think, what you meant.  Sometimes a *lot* of effort.
> 
> 
> > 
> > And the extra parameters..debugger would complain about missing them, programmer would figure it out?
> 
> Before I got call support working, I found there were many calls I could not make work at all.  Either
> I couldn't figure out what was needed (This amounts, in part, to manually reading stabs), or there
> was no way to supply what was needed in a debugger command.  Also, on method calls, there was no way
> to figure out where it would dispatch to, even when you could locate all the possible overrides in all
> the subtypes, in all the source files of the closure.  Language-specific support can take care of this,
> when done completely.  m3gdb is a long way from there now, but it helps a lot.
> 
> > 
> > 
> > I'm not saying there aren't drawbacks here.
> > But there are also major advantages.
> > There are major costs and drawbacks to our current approach.
> >   We have a ton of extra code.
> >   Which I don't think we are well equipped for.
> >   Maybe Tony is. Maybe someone else is. I'm not.
> >   
> > 
> > Partly, I'll admit, anything I write, I am much more able to maintain.
> > Or, another lazy angle, anything smaller is easier to maintain.
> > 
> > 
> > In gcc we have a large code base. It takes me a long time to get just slightly up to speed on it.
> > We have several nagging problems with it. Maybe I just need to look at the C front end more.
> > Or read tree.h. I don't know.
> > 
> > 
> >  4.5.1 doesn't work with SPARC32_SOLARIS/SOLgnu/SOLsun. 
> >  4.3.5 maybe not either. 
> >  A few optimizations I have turned off for 4.5.1 because they cause problems. Including inlining. 
> >  Maybe I just need to debug more. 
> >  Apple and OpenBSD each maintain their own forks. So that, *sort of but not really*, triples things.
> >    (now, they are all highly related, so it doesn't) So far we don't have the OpenBSD fork.
> >   But for example 4.5.1 doesn't have the OpenBSD/powerpc stuff quite. And there is a small OpenBSD/mips64
> >    problem I worked around. Minor, I guess. We could just drop these platforms, or OpenBSD entirely.
> >   Not a huge deal. But it is yet more stuff that C as an intermediate platform solves.
> >   Exception handling stinks on all platforms but SOLsun/SOLgnu.
> >    The ALPHA_OSF code no longer works. I tried.
> >   Generating C/C++ would significantly improve exception handling nearly across the board.
> >    It is possible otherwise, but much more difficult.
> > 
> > 
> > So, again, C as intermediate code isn't perfect or without drawbacks, but it promises:
> >   greatly increased portability 
> >   more efficient exception handling 
> >   better codegen by letting the optimizer be full on, even acrosss modules with some C compilers (gcc 4.5, Visual C++, at least) 
> >   better debugging with stock debuggers (including Visual C++, windbg) 
> >   a portable distribution format -- no more having to distribute binaries, though they still have advantages 
> >      easier to get into the various "ports" systems I think as a result 
> >   a much smaller system overall (no GPL, if it matters) 
> > 
> > 
> > Again, there are drawbacks, but it just seems so very tempting.
> > 
> > 
> > I'm sure I'll plug away at m3cc a while longer, but I think more and more it is questionable.
> > 
> > 
> > I can try again to read the LLVM stuff.
> > 
> > 
> > A new backend I think is unavoidably a lot of work, be it C or LLVM.
> > That's my hangup on both of these. It requires knowing a lot about two big things -- M3CG and the underlying generator.
> > parse.c is "only" 6,000 lines but pretty dense in terms of information gone into it. Maybe I'm just feeling dumb.
> > 
> > 
> > The thing about C though, is it is a very well understood next layer down. Certainly compared to
> > the gcc trees or LLVM. I don't think it is just me, that I'm some C expert.
> > 
> > 
> > 
> >> This could probably be improved a lot by switching to a better debug info format,
> >> probably the latest Dwarf variant. But that is a big job.
> > 
> > 
> > I don't believe we have to do *anything* sort of to switch debug formats.
> > We just have to provide gcc with decently formed typeful trees.
> > It should do the rest.
> > Currently I guess it is all custom.
> 
> It doesn't provide nearly enough information in stock form.  The "stabs" it now produces has a lot
> of Modula-3-specific stuff crammed inside the fields of stabs entries.  This had be be added by the
> original implementors of the gcc backend.  A different debug format will need changes to gcc to
> emit what is needed.  However, it might well be entirely within the "llanguage" of, say Dwarf, which
> is very general.  It certainly would be a lot cleaner, and it could easily be completed in places
> that are now hard.  So there would still be a lot of work.
> 
> I don't completely understand where the current back end emits all the stabs stuff, but I believe
> all or almost all of it comes through code in parse.c calling utility code (dbxout.c, e.g.), and
> is not much taken from the trees gcc uses.  This is why I have yet to figure out how to write
> correct debug info describing the locations of static links, since gcc develops this information
> by transforming trees, after parse.c has done its thing.
> 
> > 
> > I tried -g again, thinking maybe things were better now. It still crashes.
> > It seems related to the fact that _m3_fault is in an unknown location.
> > But that seems actually deliberate and reasonable, and I tried fiddling with it anyway.
> > No luck yet.
> > 
> > 
> > This debug format problem is also solved by using C intermediate code.
> > You just use -g or -gdb or -Zi or whatever, whatever is normal for C, and it'd just work.
> > 
> 
> Lots of things, TEXT being probably the worst, won't display in a Modula-3 form this way.
> And things in expressions/statements won't work either.  A debugger user will have to understand
> a lot of low-level stuff about how the C back end translates Modula-3 code to C, to use it at all,
> and it will still be far less convenient.
> 
> > 
> > Anyway..
> >  - Jay
> >  		 	   		  
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20100829/9eaa0ded/attachment-0002.html>


More information about the M3devel mailing list