[M3devel] M3 programming problem : GC efficiency / per-thread storage areas?

Fri Oct 17 01:30:01 CEST 2008

Hi Tony,

I figured you would chime in!

Yes, @M3noincremental seems to make things consistently a tad bit
slower (but a very small difference), on both FreeBSD and Linux.
@M3nogc makes a bigger difference, of course.

Unfortunately I seem to have lost the code that did a lot of memory
allocations.  My tricks (as described in the email---and others!)
have removed most of the troublesome memory allocations, but now
I'm stuck with the mutex instead...

      Mika

Tony Hosking writes:
>Have you tried running @M3noincremental?
>
>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>
>> Hello Modula-3 people,
>>
>> As I mentioned in an earlier email about printing structures (thanks
>> Darko), I'm in the midst of coding an interpreter embedded in
>> Modula-3.  It's a Scheme interpreter, loosely based on Peter Norvig's
>> JScheme for Java (well it was at first strongly based, but more and
>> more loosely, if you know what I mean...)
>>
>> I expected that the performance of the interpreter would be much
>> better in Modula-3 than in Java, and I have been testing on two
>> different systems.  One is my ancient FreeBSD-4.11 with an old PM3,
>> and the other is CM3 on a recent Debian system.  What I am finding
>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>> close to ten times as fast on some tasks at this point), but on
>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>
>> When I started, with code that was essentially equivalent to JScheme,
>> I found that it was a bit slower than JScheme on Linux/CM3 and
>> possibly 2x as fast on FreeBSD/PM3.  On Linux/CM3, it appears to
>> spend most of its time in (surprise, surprise!) memory allocation
>> and garbage collection.  The speedup I have achieved between the
>> first implementation and now was due to the use of Modula-3 constructs
>> that are superior to Java's, such as the use of arrays of RECORDs
>> to make small stacks rather than linked lists.  (I get readable
>> code with much fewer memory allocations and GC work.)
>>
>> Now, since this is an interpreter, I as the implementer have limited
>> control over how much memory is allocated and freed, and where it is
>> needed.  However, I can sometimes fall back on C-style memory  
>> management,
>> but I would like to do it in a safe way.  For instance, I have  
>> special-cased
>> evaluation of Scheme primitives, as follows.
>>
>> Under the "normal" implementation, a list of things to evaluate is
>> built up, passed to an evaluation function, and then the GC is left
>> to sweep up the mess.  The problem is that there are various tricky
>> routes by which references can escape the evaluator, so you can't
>> just assume that what you put in is going to be dead right after
>> an eval and free it.  Instead, I set a flag in the evaluator, which
>> is TRUE if it is OK to free the list after the eval and FALSE if
>> it's unclear (in which case the problem is left up to the GC).
>>
>> For the vast majority of Scheme primitives, one can indeed free the
>> list right after the eval.  Now of course I am not interested
>> in unsafe code, so what I do is this:
>>
>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>
>> VAR
>>  mu := NEW(MUTEX);
>>  free : Pair := NIL;
>>
>> PROCEDURE GetPair() : Pair =
>>  BEGIN
>>    LOCK mu DO
>>      IF free # NIL THEN
>>        TRY
>>          RETURN free
>>        FINALLY
>>          free := free.rest
>>        END
>>      END
>>    END;
>>    RETURN NEW(Pair)
>>  END GetPair;
>>
>> PROCEDURE ReturnPair(cons : Pair) =
>>  BEGIN
>>    cons.first := NIL;
>>    LOCK mu DO
>>      cons.rest := free;
>>      free := cons
>>    END
>>  END ReturnPair;
>>
>> my eval code looks like
>>
>> VAR okToFree : BOOLEAN; BEGIN
>>
>>   args := GetPair(); ...
>>   result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>
>>   IF okToFree THEN ReturnPair(args) END;
>>   RETURN result
>> END
>>
>> and this does work well.  In fact it speeds up the Linux  
>> implementation
>> by almost 100% to recycle the lists like this *just* for the
>> evaluation of Scheme primitives.
>>
>> But it's still ugly, isn't it?  There's a mutex, and a global
>> variable.  And yes, the time spent messing with the mutex is
>> noticeable, and I haven't even made the code multi-threaded yet
>> (and that is coming!)
>>
>> So I'm thinking, what I really want is a structure that is attached
>> to my current Thread.T.  I want to be able to access just a single
>> pointer (like the free list) but be sure it is unique to my current
>> thread.  No locking would be necessary if I could do this.
>>
>> Does anyone have an elegant solution that does something like this?
>> Thread-specific "static" variables?  Just one REFANY would be enough
>> for a lot of uses...  seems to me this should be a frequently
>> occurring problem?
>>
>>     Best regards,
>>       Mika
>>
>>
>>
>>
>>
>>