[M3devel] How to integrate llvm into cm3

Fri May 22 19:55:27 CEST 2015

This is a worthwhile discussion, but it has very little to do with using
llvm as a back end.  In the llvm-IR, sizes of integers, pointers, etc.
are constant numbers.  The frontend, whether Clang, CM3, or any other, makes
the decisions about mapping language types like long, INTEGER, pointers,
etc. to a size, target-dependently or otherwise, according to the language's
rules.  LLvm does not make these decisions.  Its target dependencies are
mostly in the line of different code generators for different instruction sets.

On 05/22/2015 05:53 AM, Elmar Stellnberger wrote:
>
> Am 22.05.2015 um 12:16 schrieb dirk muysers:
>
>> >> What about the said platform dependencies you have discovered?
>> Not me (I never seriously considered using it), but many people on the llvm
>> forums pointed to the fact. One example among
>> many:
>>
>> Does your C code ever use the 'long' type? If so, the LLVM IR will be
>> different depending on whether it's targeting linux-32 or linux-64. Do
>> you ever use size_t? Same problem. Do you ever use a union containing
>> both pointers and integers? See above. In principle, it's possible to
>> write platform-independent IR, or even C code that compiles to
>> platform-independent IR. In practice, especially if you include any
>> system headers, it's remarkably hard.
>> (Jeffrey Yasskin jyasskin at google.com)
>
> Concerning me I am a very conscientious programmer when it comes to
> make a difference between long, long long and int. I only use long if my
> code requires a data item to be exactly as large as a pointer (in special
> cases also when it comes to tap the power of 64bit machines, f.i. that
> might be either 32/64bit as a base type for arbitrary length integers;
> however not without taking special provisions that will tackle the
> difference in data size. ). Usually aligning the pointers for the next
> structure at the beginning would also solve such an issue when it comes
> to reuse existing code where data sizes may not be changed from long
> either to int or long long without special consideration. Those who use
> glib f.i. additionally have a g[u]int32/64 which they can use instead of int
> / long long though that should at last never make a difference for Intel x86
> based systems. So when it comes to use int or long long I mostly rely
> on them being either 32 or 64bit.
> I know that most programmers do not care and just always use long which
> I consider to be a particularly bad practice. Even in the Linux kernel they
> have declared "typedef long time_t" instead of "typedef long long time_t"
> which will create an Y2K mess all over in 2038 for all 32bit machines still
> in use then. A somehow bad decision which needs to be changed sooner
> or later even without llvm.
>
> Now let us think of Modula-3. I believe we had a long type for cm3 the last
> time I have seen it. However an equivalent to long long which does also
> exist on 32bit platforms would be an absolute requirement to not break
> things for llvm! Many Thanks for notifying us about this issue, Dirk.
>

Whether types like integer have a language-specified or target-dependent
range is a tough language design question.  I have tended to favor a
fixed, language-specified range, but there are pros and cons.  I do
think all the time about end-of-range cases and native word size dependencies.
It takes a great deal of care, and I know of no way to design a language
that doesn't, to some extent, trade one set of problems for another.
Signed/unsigned creates similar language dilemmas.

> As far as I can see a Modula-3 programmer will need a good core for
> portable programming anyway as we did not even uphold a guarantee for
> WIDECHAR to be either 16 or 32bit.
>

The evolving nature of first UCS and then Unicode standards has left
many language designers knocked off balance.  Critical Mass first
introduced WIDECHAR as 16-bit when that was what everybody thought
was enough.  Then things changed, and it wasn't anymore.  Right now,
it's a configuration parameter (must be the same for the entire link
closure) in Modula-3.  I personally favor making it full Unicode
by default, in the next release, as this is where the world is now.
This is hopefully a simpler problem than INTEGER, etc., because, as of
now, the Unicode committee has emphatically assured us that the range will
*never* increase.  We can hope.

>
>
>
>
>> And then, besides the IR proper, there is that steadily increasing
>> legion of intrinsics.
>> Unless you translate C-like code and build upon the existing technical
>> LLVM heritage, /je vous souhaite bien du plaisir/ as the French say...
>> **
>> *From:* Elmar Stellnberger <mailto:estellnb at elstel.org>
>> *Sent:* Friday, May 22, 2015 11:49 AM
>> *To:* dirk muysers <mailto:dmuysers at hotmail.com>
>> *Subject:* Re: [M3devel] How to integrate llvm into cm3
>> Am 22.05.2015 um 10:48 schrieb dirk muysers:
>>
>>> Personally I have a strong dislike towards LLVM.
>>> 1. You first have to compile the whole tool chain.
>>> 2. It is a monstrous blob of code, mainly on Windows.
>>> 3. Contrary to a widespread belief, It is definitely NOT platform independent.
>>> 4. It changes at every release.
>>> 5. Having built your objects, you still have to run them through a platform assembler-linker.
>> Is it really that bad? What about the said platform dependencies you have discovered?
>> I believe llvm could be beneficial in deed when it comes to debugging and/or analyzing Modula-3 programs,
>> as there are tools like SAFECode and to my knowledge we never had a fully featured m3gdb.
>> Besides this I would hardly like to believe that llvm is still that volatile when it comes to changes.
>> I know it had some issues in its first days but I can hardly believe that qt5 on MacOS would rely on clang/llvm
>> if that were not a ready to use technology nowadays. I would hope the main changes to llvm had already
>> been done when Apple started to adopt llvm for its own needs.
>> Concerning the code size of llvm that should not be a problem as long as it remains a separate module
>> compiling into an own executable or a shared library loaded in addition to other backends at runtime.
>>
>>> If I still had the energy of my younger years I would try to pack the platform
>>> dependent part of the libraries into a dynamic load library together with a JIT
>>> translator (e.g. libjit) for the portable application code and have a single byte-code
>>> producing compiler backend.
>>> *From:*Jay K <mailto:jay.krell at cornell.edu>
>>> *Sent:*Friday, May 22, 2015 2:57 AM
>>> *To:*Elmar Stellnberger <mailto:estellnb at elstel.org>;rodney.m.bates at acm.org <mailto:rodney.m.bates at acm.org>;m3devel <mailto:m3devel at elegosoft.com>
>>> *Subject:*Re: [M3devel] How to integrate llvm into cm3
>>> Imho "all" options should be implemented, for purposes of convenient debugging/development of the backends.
>>>
>>> "external" is good for developing backends. You can "snapshot" the state of things
>>> slightly into the pipeline and then just iterate on later parts.
>>>
>>> At the cost of having all the serialization code.
>>>
>>> "integrated" is usually preferable for performance, for users.
>>>
>>> E.g. NTx86 backend has been sitting in there for decades unused by half the users.
>>>
>>> Having extra backends sitting in there unused is ok.
>>> Ideally, agreed, they'd be .dll/.sos if we can construct it that way, but ok either way imho.
>>> Ideally also cm3 would dynamically link to libm3/m3core, but it doesn't.
>>>
>>> Everything is demand paged so there is cost to distribute over the network
>>> and copy around, but at runtime, the pages just sit mostly cold on disk.
>>>
>>> One difficulty though is the need to have and build the LLVM code.
>>> For that reason, delayload-dynamically-linked might be preferable.
>>> It depends on how small/easy-to-build LLVM is.
>>>
>>> I guess LLVM provides more choices than before.
>>> In order of efficiency and inverse order of debuggability:
>>>   1 We could construct LLVM IR in memory and run LLVM in-proc and write .o.
>>>   2 We could write out LLVM bytes and run an executable.
>>>   3 We could write out LLVM text and run an executable.
>>>
>>> > My personal preference would be to only have one default targetstatically compiled in
>>> It has never worked that away. Granted, we didn't really have backends before, just writing mainly IR.
>>> And I don't think LLVM works that way, does it?
>>> I like one compiler to have all targets and just select with a command line switch.
>>> I don't like how hard it is to acquire various cross-toolschains.
>>> Granted, we cheat and are incomplete -- you still need the next piece of the pipeline,
>>> be it LLVM or m3cc (which has one target), or a C compiler or assembler or linker or "libc.a".
>>>
>>> binutils at least has this "all" notion reasonably well working now I believe.
>>> There are tradeoffs though. If only one backend has a bug, and they are all statically linked together, you have to update them all.
>>> And the largely wasted bloat.
>>> Ultimately really, I'd like the C backend to output portable C and then just one C backend, one distribution .tar.gz for all targets.
>>> There is work to do there..not easy..and no progress lately.
>>> Things like INTEGER preserving flexibility in the output, and using sizeof(INTEGER) in expressions instead of using 4 or 8 and folding...
>>>
>>> - Jay
>>>
>>>
>>>
>>>
>>> > Date: Thu, 21 May 2015 20:13:18 +0200
>>> > From:estellnb at elstel.org <mailto:estellnb at elstel.org>
>>> > To:rodney.m.bates at acm.org <mailto:rodney.m.bates at acm.org>; m3devel at elegosoft.com <mailto:m3devel at elegosoft.com>
>>> > Subject:Re: [M3devel] How to integrate llvm into cm3
>>> >
>>> > Am 21.05.15 um 19:24 schriebRodney M. Bates:
>>> > >
>>> > > There are pros and cons.Integrating Peter's cm3-to-llvm conversion into
>>> > > the cm3executable would be faster compiling--one fewer time per
>>> > > interface
>>> > >or module for the OS to create a process and run an executable. But it
>>> >> would also entail linking in this code, along with some of llvm's
>>> > > infrastructure,
>>> >> into cm3, making its executable bigger, with code that might not be
>>> > > executed
>>> > >at all, when a different backend is used. We already have the x86
>>> > > integrated
>>> > >backend and the C backend linked in to cm3, whether used or not.
>>> >>
>>> > > Anybody have thoughts on this? I suppose it could be set upto be fairly
>>> > > easily changed either way too.
>>> >>
>>> >
>>> > Why notput each backend into a shared library and load it dynamically?
>>> > Arethere still problems with shared libraries for some build targets?
>>> > Onthe other hand having cm3-IR handy and being able to translate
>>> > cm3-IRby an executable like m3cc into any desired target has proven
>>> > to bevery handy for debugging as well as chocking the Modula-3
>>> > compiler ona new platform.
>>> > My personal preference would be to only have onedefault target
>>> > statically compiled in namely that on for cm3-IR andload all other
>>> > targets by a shared libarary dynamically. If thatshould fail for some
>>> > reason one can still use m3cc or one of itscounterparts to
>>> > accomplish the translation process.
>>> >
>>> > Elmar
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>

-- 
Rodney Bates
rodney.m.bates at acm.org