[M3devel] multi-threaded m3front?

Tue Aug 11 20:56:03 CEST 2015

 The problem with cross compiling is you need: 
  a working cross-compiler for C/C++  
  a working cross-assembler  
  a working cross-linker  

   If you have those, it is fairly easy.  * 
   Getting those is often not easy.  
   Esp. the C/C+ compiler as you need headers/libraries/startup code. "sysroot" or "buildroot"
   it is often called.

 * fairly easy: 
  for NT386 -- nothing extra to do, the backend is in cm3 
  There are endianness bugs in mklib, but it works on the majority of systems -- little endian hosts. I can fix it. 
  For the C backend -- nothing extra to do, the backend is in cm3.  
  For the gcc-based backend -- fairly easy -- you need to rebuild a different cm3cg 
  and point the config at it. It kind of is already setup to work. 

  For the LLVM backend, probably also easy. 

 I think on Debian and Gentoo, cross C compilers at least to every Linux architecture are readily available. 
 But I don't know about otherwise. 

 In as much as the compiler is gcc, the assembler GNU as, and the linker GNU ld, that gets you far, 
 but you still need the headers/libraries/startup code. 

  If your C compiler is LLVM, that might also help for compiler/assembler/linker.
  But again, the headers/libraries/startup code.

  There are various systems out there that try to automate this but I don't have much experience
  with them, and they often are slightly different than what we want.

 Otherwise, what we have setup is we cross compile to assembly files, copy them to the target, 
 which typically does have the compiler/assembler/linker, and finish there.

 What remains there is that I only have this automated to build cm3. 
 It should also at least also build m3core and libm3 statically. And therefore reach
 approximately parity with the 3.6 "boot" archives and install process. 

 Has anyone here installed 3.6? And consider there method a decent goal and stopping point?
 Today is slightly different since quake has been rewritten from C to Modula-3 in the 4.0 timeframe.

 And *possibly* boot archives should contain everything -- going beyond what 3.6 did.

 This is just a matter of work in scripts/python/pylib.py. 
 Copying stuff into sub directories and generating a hierarhical recursive or not make system in there. 
 My make skills aren't great, and I've been hung up on details like depending on GNU make or trying to
 be more portable. 

On the threading area, I believe there is a simple almost ideal design.
 - parse and mostly "execute" all the quake code almost unchanged, single thread 
 - difference starts at about the last line of quake where it says library or program 
 - possibly though queueing every file to a "prefetcher" thread, it just reads
   every file into a reused buffer and throws out the data, populating the operating
   system's file system cache; or possibly mmaping every file and keeping it mapped,
   and touching every page; doing this in a somewhat thread aware/safe way for the
   upcoming actual accesses
 - once quake is done, one of two choices:
 simple but not optimal: do a single threaded parse of every interface
 less simple: parsing here can be in parallel, depending on the dependency graph;
 you'd just start up a thread per interface, but block on parsing its dependencies;
 as you get away from the root (RT0.i3), the tree should get ever wider

 You would either manually throttle the number of threads, or rely on an underlying threadpool.

 NOTE: Modula-3 used to have a thread pool on Win32 but I removed it to be simplar.
 Maybe that wasn't good. Simpler as in, including, thread id is now just the underlying
 Win32 thread id. It wasn't before. 
 Win32 has had a thread pool since circa Windows 2000. That might be profitable to use.
 - once parsing interfaces is done, I believe codegen of every interface, and parsing and codegen
  of every module can proceed with arbitrary parallelism

 Fetching interface/module contents might synchronize with the prefetcher depending
 on which of the two approaches -- if the prefetcher is just prefetch and discard,
 populating the file system cache, then later compilation just refetches oblibviously/unchanged.
 If the prefetcher thread mmaps and keeps everything, then serialize with it.

 Also, you might have multiple prefetcher threads. What you really want is..difficult.
 You want a prefetcher thread per spindle. How to count them up?
 If you have an SSD, then prefetching might not buy much and just forget about it.
 The thing is, if you have spindles, this might be the most gain, and if you have an SSD,
 the cost might be negligible or slightly beneficial.

 I/O is so often the bottleneck, at least in the days of rotating storage. 

 In the event that the file system cache is small, or the source for a package
 really large, prefetching can be counterproductive -- moving stuff through the cache
 only to flush it before it is used and fetching it a second time.

 I expect on most systems this is not a concern though.

 One of the pieces of work here, I believe, is to move most globals in m3front
 to be in Module.T. This would actually make things cleaner imho.
 Yes, it consolidates knowledge that you might want distributed.
 And accessing context'ed data is slightly slower than globals.
 But imho globals are bad and everything should be placed in *some* context
 other than the process or thread.

 I would not advocate compiling functions within a module in parallel, only modules.

 Similarly, code generators must have no globals, and put all the state in the cg object.

 - Jay

Date: Tue, 11 Aug 2015 10:20:26 -0700
From: lists at darko.org
To: wagner at elegosoft.com
CC: m3devel at elegosoft.com; jay.krell at cornell.edu
Subject: Re: [M3devel] multi-threaded m3front?

I couldn't agree with you more. I think being able to compile or cross compile the system without spending hours (or days) hacking scripts/environments would be a huge step forward for the project. Or having compiler binaries for more than two platforms (four if you count different word sizes). Meanwhile I've never heard anyone complain that the compiler is too slow.
But of course everyone has different priorities, different interests and different opinions in a volunteer project so any discussion on this subject will inevitably boil down to "he who writes the code determines the priorities."

On Tue, Aug 11, 2015 at 12:13 AM, Olaf Wagner <wagner at elegosoft.com> wrote:
On Mon, 10 Aug 2015 22:00:17 -0700

Darko Volaric <lists at darko.org> wrote:

> A more fruitful approach might be pipelining the compiler:

>

> - files enter the pipeline in reverse dependency order

> - have a thread (stage) to read those files into memory

> - have a thread to tokenize and parse the files and form the data structure

> - have a thread to do intermediary processing

> - have a thread to generate the output representation for the back end

>

> You'll only get a maximum 4x speedup and you'll only be as fast as the

> slowest thread, but you can reliably tune the divisions based on simple

> benchmarking of each thread. You're very likely to get somewhat close to

> the 4x speedup. This works best when there is a 1:1 between pipeline stages

> and cores. If you were ambitious you could attempt an 8 stage pipeline for

> 8 core processors.

>

>

> On Sat, Aug 8, 2015 at 8:05 PM, Jay K <jay.krell at cornell.edu> wrote:

>

> > Is anyone interested in updating m3front to be multi-threaded?

> >

> > I haven't seen a single core system in a while.

> >

> > Surely each module can be compiled separately, possibly with some

> > serialization around compiling interfaces?

While this is all correct, I'd like to remark that in my expecience

optimizing for performance has always got in the way of clear,

understandable and maintainable code.

So while there are semantic refactorings and feature additions in the

pipeline I would not like to see anybody rework the whole compiler

with the intention of making it faster. I would rather see the few

active programmers interested in M3 concentrate on more backends, better

intermediate code, better code optimization, better maintainability

etc.

Olaf

--

Olaf Wagner -- elego Software Solutions GmbH -- http://www.elegosoft.com

               Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany

phone: +49 30 23 45 86 96  mobile: +49 177 2345 869  fax: +49 30 23 45 86 95

Geschäftsführer: Olaf Wagner | Sitz: Berlin

Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194

_______________________________________________
M3devel mailing list
M3devel at elegosoft.com
https://mail.elegosoft.com/cgi-bin/mailman/listinfo/m3devel 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20150811/3815263b/attachment-0002.html>