[M3devel] additional CVS repositories for additional gcc forks?

Sat Aug 28 21:15:02 CEST 2010

Jay K wrote:
> 
>> There is no way a debugger that has no Modula-3 awareness is going to provide
>> a Modula-3-like view. The operators will have C spellings and C semantics,
> 
> 
> How many operators do people use in a debugger?
> 
> I use very few. Partly because for a long time I used a debugger
> with a great gui and an awful expression evaluator.
> 
> 
> Still, I use basically only "+" "->" "*" (dereference) and "=" for assignment.
>  Sometimes multiplication and subtraction.
> I agree it would be nice if all the C debuggers would be lenient about "->" vs. ".".
>   That would unify Modula-3, Java, C#, C, C++.
>    Except where C++ has an operator-> overload. But operator overload
>    is an area where.. tangent... C++ is a great language..my compiler implements
>    it well..but my debugger, my editor, plain text search.. can't cope with it.
>    Modula-3, C#, Java run afoul of plain text search too -- anything with prevalent "scoped names".
>    In C you get Window_Init, File_Open, etc. never just Init or Open.
>    How do you search for calls to operator+ in C++? For a certain type?
>    In C, except for the builtin types, they'd be unique function names.
>   Anyway, tangent over.
> 
> 
> + is the same in the various languages.
> 
> I think "=", ":=", "==" are the main problem.
> You might try a compare and accidentally to an assignment.
> 
> 
>> The syntax will be strictly C.
> 
> Almost the same.
> 
> 
>  > The display of values will be C.
> 
> Almost the same.

On the strength of your comments, I rest my case.

> 
> Also if you have a particularly good C compiler/debugger, we could do
>   #define AND &&  
>   #define OR ||  
> 
> 
> getting you back those two operators, which I rarely use in a debugger.
> 
> 
>  > TEXT won't work in any reasonable way at all.
> 
> Sure it might.
> In Visual Studio you can write little addins to help the debugger display stuff.
> I believe there is a small builtin "language" or I believe you can write actual code.
> In Windbg you can write little plugins. You could provide like !m3.text.
> I don't know if you can tell the debugger ahead of time how to custom display types.
> I don't know if gdb has a story here.
> Still, one might imagine a *small* patch to gdb.

All of which is just different ways of providing a debugger with proper Modula-3 support.

> 
> 
>  > Demangling names in the compiler's debug output would make them look nice, but then the Modula-3
>  > type info would be lost, and output formats would lose.
> 
> 
> Um, you think maybe this stuff was done the wrong way in the first place?
>  That the names shouldn't be mangled in the first place?
>   I strongly suspect so. Other systems don't depend on this.
>    (Yes, I know about C++ name mangling, and even though it does something similar,
>    that's a trick for the linker and now how debug information works. It for
>    in the absence of debug information, among other reasons.)
> 

I think it could be done a lot better if switching to a better debug info format.
stabs may have been the best option around at the time it was done.  And I don't
know if Modula-3's structural type equivalence rules could be supported any better
without the uid's.

But there will still have to be name mangling to get a standard linker to work,
with just about any language.  Stock gdb does what it does in part because it
has builtin demanglers for the various languages it supports.  It chooses the
appropriate demangler dynamically.  It would be a pretty ugly user interface if
it didn't.

> There is "naturally" type information you get just by building up decent gcc trees.
> Ditto for intermediate C code.
> For a while you know, every record is a void* or just has a size, and all the type information
> is buried in the names. This is questionable. I'm sure it has some advantages.
> You can describe things maybe not easily described in C.
>   e.g. Subranges?
> And then our code in m3gdb is probably very portable, in that, I think, we just ferry along
> some strings, from our code to our code, and we can decipher them the same in all systems.
> I think, I'm not sure, there is like no dependence on the vagaries of coff, dwarf, etc., and
> what they can or cannot represent. However there is a dependency on stabs being available.
> It is not for example available on HP-UX.
> 
> 
> Furthermore the lack of correct type information, apart from stabs, causes problems.
> For some targets the backend wants accurate type information to pass records by value.
> I again/still think we should probably not rely on the backend for this anyway.
> We should probably make a copy and pass a pointer to it, kind of like m3x86 does.
> 

Certainly, a back end needs type information for code generation.  That won't do much to
help a debugger that is oblivious to the source language.

> 
>  > Things that use pointers at the machine level can never know whether the pointers
>  > point to a single value or an array, and if the latter, with what bounds.
> 
> 
> C programmers can cope with that. Can't we?

Look at the security advisories.  Buffer overflow, buffer overrun, buffer overflow, ...
over and over.  Almost all of them are buffer overruns.  But that's a tangent too.

> And..I admit.. I don't know what our machine level mapping looks like.
> Do we pass a pointer and a size as two parameters? Or a small record with pointer/size by value?
> 

If it's a fixed array, its all in the static type, which the stabs (as extended for Modula-3)
info conveys.  If it's an open array, there is runtime dope: a pointer to the zero-th array
element, followed by a shape, which is just a list of words giving the NUMBER of each dimension.
The dimension count is statically known.  generally passed by reference, although it is never
altered.  For heap-allocated arrays, it's located right in the heap object, at the beginning,
making the pointer redundant.  For Formal parameters, if the actual doesn't already have the
needed dope, it is constructed at runtime by code at the call site.  This works for, e.g.,
passing a fixed array or a SUBARRAY to a formal that is open.

BTW, this is another thing a debugger has to know about to either pass, display, or alter
open array values.  Perhaps Dwarf is sophisticated enough that it could just be encoded in
Dwarf, but certainly not in stabs.  (Well, we could probably cobble up yet another stabs
extension, but that would still require specialized debugger support.)

> 
> The debugger need not be a full blown Modula-3 interpreter.
> 
> 
>> Probably the worst thing will be calls. They just don't work without the debugger
>> having knowledge of a lot of stuff. There are extra hidden parameters, method
>> calls, passing procedure-typed parameters with environments, calling the same,
>> the three modes of Modula-3, etc. I consider calls in debugger commands very
>> valuable.
> 
> 
> I use calls very rarely.
> I'm not super keen on running some of my code when otherwise my code is all frozen
> and some of it is misbehaving. I know this is partly me.
> 
> 
> Even so, generally you only call certain functions that put there for use from a debugger, right?
> Like gcc's debug_node or such?
> And they tend to not be fancy?

I regularly type a debugger call to rexecute something that I just stepped over, not knowing whether
the problem I am looking for would occur inside the procedure or not.  Reverse debugging in the newest
gdb could provide an alternative, but I am hearing that the necessary recording costs ~ n*10 slowdown.

I also do it as an easy way to test some parameter combination.  Kind of like having an interpreter for
the language.  And I do it with fancy procedures that format some elaborate data structure in a readable,
high-level way, which is, I think, what you meant.  Sometimes a *lot* of effort.

> 
> And the extra parameters..debugger would complain about missing them, programmer would figure it out?

Before I got call support working, I found there were many calls I could not make work at all.  Either
I couldn't figure out what was needed (This amounts, in part, to manually reading stabs), or there
was no way to supply what was needed in a debugger command.  Also, on method calls, there was no way
to figure out where it would dispatch to, even when you could locate all the possible overrides in all
the subtypes, in all the source files of the closure.  Language-specific support can take care of this,
when done completely.  m3gdb is a long way from there now, but it helps a lot.

> 
> 
> I'm not saying there aren't drawbacks here.
> But there are also major advantages.
> There are major costs and drawbacks to our current approach.
>   We have a ton of extra code.
>   Which I don't think we are well equipped for.
>   Maybe Tony is. Maybe someone else is. I'm not.
>   
> 
> Partly, I'll admit, anything I write, I am much more able to maintain.
> Or, another lazy angle, anything smaller is easier to maintain.
> 
> 
> In gcc we have a large code base. It takes me a long time to get just slightly up to speed on it.
> We have several nagging problems with it. Maybe I just need to look at the C front end more.
> Or read tree.h. I don't know.
> 
> 
>  4.5.1 doesn't work with SPARC32_SOLARIS/SOLgnu/SOLsun. 
>  4.3.5 maybe not either. 
>  A few optimizations I have turned off for 4.5.1 because they cause problems. Including inlining. 
>  Maybe I just need to debug more. 
>  Apple and OpenBSD each maintain their own forks. So that, *sort of but not really*, triples things.
>    (now, they are all highly related, so it doesn't) So far we don't have the OpenBSD fork.
>   But for example 4.5.1 doesn't have the OpenBSD/powerpc stuff quite. And there is a small OpenBSD/mips64
>    problem I worked around. Minor, I guess. We could just drop these platforms, or OpenBSD entirely.
>   Not a huge deal. But it is yet more stuff that C as an intermediate platform solves.
>   Exception handling stinks on all platforms but SOLsun/SOLgnu.
>    The ALPHA_OSF code no longer works. I tried.
>   Generating C/C++ would significantly improve exception handling nearly across the board.
>    It is possible otherwise, but much more difficult.
> 
> 
> So, again, C as intermediate code isn't perfect or without drawbacks, but it promises:
>   greatly increased portability 
>   more efficient exception handling 
>   better codegen by letting the optimizer be full on, even acrosss modules with some C compilers (gcc 4.5, Visual C++, at least) 
>   better debugging with stock debuggers (including Visual C++, windbg) 
>   a portable distribution format -- no more having to distribute binaries, though they still have advantages 
>      easier to get into the various "ports" systems I think as a result 
>   a much smaller system overall (no GPL, if it matters) 
> 
> 
> Again, there are drawbacks, but it just seems so very tempting.
> 
> 
> I'm sure I'll plug away at m3cc a while longer, but I think more and more it is questionable.
> 
> 
> I can try again to read the LLVM stuff.
> 
> 
> A new backend I think is unavoidably a lot of work, be it C or LLVM.
> That's my hangup on both of these. It requires knowing a lot about two big things -- M3CG and the underlying generator.
> parse.c is "only" 6,000 lines but pretty dense in terms of information gone into it. Maybe I'm just feeling dumb.
> 
> 
> The thing about C though, is it is a very well understood next layer down. Certainly compared to
> the gcc trees or LLVM. I don't think it is just me, that I'm some C expert.
> 
> 
> 
>> This could probably be improved a lot by switching to a better debug info format,
>> probably the latest Dwarf variant. But that is a big job.
> 
> 
> I don't believe we have to do *anything* sort of to switch debug formats.
> We just have to provide gcc with decently formed typeful trees.
> It should do the rest.
> Currently I guess it is all custom.

It doesn't provide nearly enough information in stock form.  The "stabs" it now produces has a lot
of Modula-3-specific stuff crammed inside the fields of stabs entries.  This had be be added by the
original implementors of the gcc backend.  A different debug format will need changes to gcc to
emit what is needed.  However, it might well be entirely within the "llanguage" of, say Dwarf, which
is very general.  It certainly would be a lot cleaner, and it could easily be completed in places
that are now hard.  So there would still be a lot of work.

I don't completely understand where the current back end emits all the stabs stuff, but I believe
all or almost all of it comes through code in parse.c calling utility code (dbxout.c, e.g.), and
is not much taken from the trees gcc uses.  This is why I have yet to figure out how to write
correct debug info describing the locations of static links, since gcc develops this information
by transforming trees, after parse.c has done its thing.

> 
> I tried -g again, thinking maybe things were better now. It still crashes.
> It seems related to the fact that _m3_fault is in an unknown location.
> But that seems actually deliberate and reasonable, and I tried fiddling with it anyway.
> No luck yet.
> 
> 
> This debug format problem is also solved by using C intermediate code.
> You just use -g or -gdb or -Zi or whatever, whatever is normal for C, and it'd just work.
> 

Lots of things, TEXT being probably the worst, won't display in a Modula-3 form this way.
And things in expressions/statements won't work either.  A debugger user will have to understand
a lot of low-level stuff about how the C back end translates Modula-3 code to C, to use it at all,
and it will still be far less convenient.

> 
> Anyway..
>  - Jay
>