[M3devel] M3 programming problem : GC efficiency / per-thread storage areas?

Jay jay.krell at cornell.edu
Fri Oct 17 06:40:28 CEST 2008


Making this per-thread is a fairly classic good improvement.

You need to worry about what happens with many threads, and being sure to cleanup when a thread dies, and allowing for a free to come in from any thread.

A good way to mitigate all those problems is to use a small fixed size cache instead of per-thread. Including an array of mutexes.

If "thread ids" have adequate distribution, just use their lower bits as an array index. If not, have a global counter that gets assigned into the thread on first use per-thread.

The cache could also be more than one element.

How do you manage okToFree?

Windows has __declspec(thread), which is an optimized form of aTlsGetValue/TlsSetValue, but it doesn't work with dynamically loaded .dlls before Vista, and isn't __declspec(fiber) like maybe it should be.
 
 - Jay

----------------------------------------
> To: hosking at cs.purdue.edu
> Date: Thu, 16 Oct 2008 16:30:01 -0700
> From: mika at async.caltech.edu
> CC: m3devel at elegosoft.com; mika at camembert.async.caltech.edu
> Subject: Re: [M3devel] M3 programming problem : GC efficiency / per-thread	storage areas?
> 
> Hi Tony,
> 
> I figured you would chime in!
> 
> Yes, @M3noincremental seems to make things consistently a tad bit
> slower (but a very small difference), on both FreeBSD and Linux.
> @M3nogc makes a bigger difference, of course.
> 
> Unfortunately I seem to have lost the code that did a lot of memory
> allocations.  My tricks (as described in the email---and others!)
> have removed most of the troublesome memory allocations, but now
> I'm stuck with the mutex instead...
> 
>       Mika
> 
> Tony Hosking writes:
>>Have you tried running @M3noincremental?
>>
>>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>>
>>> Hello Modula-3 people,
>>>
>>> As I mentioned in an earlier email about printing structures (thanks
>>> Darko), I'm in the midst of coding an interpreter embedded in
>>> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
>>> JScheme for Java (well it was at first strongly based, but more and
>>> more loosely, if you know what I mean...)
>>>
>>> I expected that the performance of the interpreter would be much
>>> better in Modula-3 than in Java, and I have been testing on two
>>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>>> and the other is CM3 on a recent Debian system.  What I am finding
>>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>>> close to ten times as fast on some tasks at this point), but on
>>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>>
>>> When I started, with code that was essentially equivalent to JScheme,
>>> I found that it was a bit slower than JScheme on Linux/CM3 and
>>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>>> spend most of its time in (surprise, surprise!) memory allocation
>>> and garbage collection.  The speedup I have achieved between the
>>> first implementation and now was due to the use of Modula-3 constructs
>>> that are superior to Java's, such as the use of arrays of RECORDs
>>> to make small stacks rather than linked lists.  (I get readable
>>> code with much fewer memory allocations and GC work.)
>>>
>>> Now, since this is an interpreter, I as the implementer have limited
>>> control over how much memory is allocated and freed, and where it is
>>> needed.  However, I can sometimes fall back on C-style memory  
>>> management,
>>> but I would like to do it in a safe way.  For instance, I have  
>>> special-cased
>>> evaluation of Scheme primitives, as follows.
>>>
>>> Under the "normal" implementation, a list of things to evaluate is
>>> built up, passed to an evaluation function, and then the GC is left
>>> to sweep up the mess.  The problem is that there are various tricky
>>> routes by which references can escape the evaluator, so you can't
>>> just assume that what you put in is going to be dead right after
>>> an eval and free it.  Instead, I set a flag in the evaluator, which
>>> is TRUE if it is OK to free the list after the eval and FALSE if
>>> it's unclear (in which case the problem is left up to the GC).
>>>
>>> For the vast majority of Scheme primitives, one can indeed free the
>>> list right after the eval.  Now of course I am not interested
>>> in unsafe code, so what I do is this:
>>>
>>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>>
>>> VAR
>>>  mu := NEW(MUTEX);
>>>  free : Pair := NIL;
>>>
>>> PROCEDURE GetPair() : Pair =
>>>  BEGIN
>>>    LOCK mu DO
>>>      IF free # NIL THEN
>>>        TRY
>>>          RETURN free
>>>        FINALLY
>>>          free := free.rest
>>>        END
>>>      END
>>>    END;
>>>    RETURN NEW(Pair)
>>>  END GetPair;
>>>
>>> PROCEDURE ReturnPair(cons : Pair) =
>>>  BEGIN
>>>    cons.first := NIL;
>>>    LOCK mu DO
>>>      cons.rest := free;
>>>      free := cons
>>>    END
>>>  END ReturnPair;
>>>
>>> my eval code looks like
>>>
>>> VAR okToFree : BOOLEAN; BEGIN
>>>
>>>   args := GetPair(); ...
>>>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>>
>>>   IF okToFree THEN ReturnPair(args) END;
>>>   RETURN result
>>> END
>>>
>>> and this does work well.  In fact it speeds up the Linux  
>>> implementation
>>> by almost 100% to recycle the lists like this *just* for the
>>> evaluation of Scheme primitives.
>>>
>>> But it's still ugly, isn't it?  There's a mutex, and a global
>>> variable.  And yes, the time spent messing with the mutex is
>>> noticeable, and I haven't even made the code multi-threaded yet
>>> (and that is coming!)
>>>
>>> So I'm thinking, what I really want is a structure that is attached
>>> to my current Thread.T.  I want to be able to access just a single
>>> pointer (like the free list) but be sure it is unique to my current
>>> thread.  No locking would be necessary if I could do this.
>>>
>>> Does anyone have an elegant solution that does something like this?
>>> Thread-specific "static" variables?  Just one REFANY would be enough
>>> for a lot of uses...  seems to me this should be a frequently
>>> occurring problem?
>>>
>>>     Best regards,
>>>       Mika
>>>
>>>
>>>
>>>
>>>
>>>



More information about the M3devel mailing list