[M3devel] M3 programming problem : GC efficiency / per-thread storage areas?
Mika Nystrom
mika at async.caltech.edu
Fri Oct 17 01:30:01 CEST 2008
Hi Tony,
I figured you would chime in!
Yes, @M3noincremental seems to make things consistently a tad bit
slower (but a very small difference), on both FreeBSD and Linux.
@M3nogc makes a bigger difference, of course.
Unfortunately I seem to have lost the code that did a lot of memory
allocations. My tricks (as described in the email---and others!)
have removed most of the troublesome memory allocations, but now
I'm stuck with the mutex instead...
Mika
Tony Hosking writes:
>Have you tried running @M3noincremental?
>
>On 16 Oct 2008, at 23:32, Mika Nystrom wrote:
>
>> Hello Modula-3 people,
>>
>> As I mentioned in an earlier email about printing structures (thanks
>> Darko), I'm in the midst of coding an interpreter embedded in
>> Modula-3. It's a Scheme interpreter, loosely based on Peter Norvig's
>> JScheme for Java (well it was at first strongly based, but more and
>> more loosely, if you know what I mean...)
>>
>> I expected that the performance of the interpreter would be much
>> better in Modula-3 than in Java, and I have been testing on two
>> different systems. One is my ancient FreeBSD-4.11 with an old PM3,
>> and the other is CM3 on a recent Debian system. What I am finding
>> is that it is indeed much faster than JScheme on FreeBSD/PM3 (getting
>> close to ten times as fast on some tasks at this point), but on
>> Linux/CM3 it is much closer in speed to JScheme than I would like.
>>
>> When I started, with code that was essentially equivalent to JScheme,
>> I found that it was a bit slower than JScheme on Linux/CM3 and
>> possibly 2x as fast on FreeBSD/PM3. On Linux/CM3, it appears to
>> spend most of its time in (surprise, surprise!) memory allocation
>> and garbage collection. The speedup I have achieved between the
>> first implementation and now was due to the use of Modula-3 constructs
>> that are superior to Java's, such as the use of arrays of RECORDs
>> to make small stacks rather than linked lists. (I get readable
>> code with much fewer memory allocations and GC work.)
>>
>> Now, since this is an interpreter, I as the implementer have limited
>> control over how much memory is allocated and freed, and where it is
>> needed. However, I can sometimes fall back on C-style memory
>> management,
>> but I would like to do it in a safe way. For instance, I have
>> special-cased
>> evaluation of Scheme primitives, as follows.
>>
>> Under the "normal" implementation, a list of things to evaluate is
>> built up, passed to an evaluation function, and then the GC is left
>> to sweep up the mess. The problem is that there are various tricky
>> routes by which references can escape the evaluator, so you can't
>> just assume that what you put in is going to be dead right after
>> an eval and free it. Instead, I set a flag in the evaluator, which
>> is TRUE if it is OK to free the list after the eval and FALSE if
>> it's unclear (in which case the problem is left up to the GC).
>>
>> For the vast majority of Scheme primitives, one can indeed free the
>> list right after the eval. Now of course I am not interested
>> in unsafe code, so what I do is this:
>>
>> TYPE Pair = OBJECT first, rest : REFANY; END;
>>
>> VAR
>> mu := NEW(MUTEX);
>> free : Pair := NIL;
>>
>> PROCEDURE GetPair() : Pair =
>> BEGIN
>> LOCK mu DO
>> IF free # NIL THEN
>> TRY
>> RETURN free
>> FINALLY
>> free := free.rest
>> END
>> END
>> END;
>> RETURN NEW(Pair)
>> END GetPair;
>>
>> PROCEDURE ReturnPair(cons : Pair) =
>> BEGIN
>> cons.first := NIL;
>> LOCK mu DO
>> cons.rest := free;
>> free := cons
>> END
>> END ReturnPair;
>>
>> my eval code looks like
>>
>> VAR okToFree : BOOLEAN; BEGIN
>>
>> args := GetPair(); ...
>> result := EvalPrimitive(args, (*VAR OUT*) okToFree);
>>
>> IF okToFree THEN ReturnPair(args) END;
>> RETURN result
>> END
>>
>> and this does work well. In fact it speeds up the Linux
>> implementation
>> by almost 100% to recycle the lists like this *just* for the
>> evaluation of Scheme primitives.
>>
>> But it's still ugly, isn't it? There's a mutex, and a global
>> variable. And yes, the time spent messing with the mutex is
>> noticeable, and I haven't even made the code multi-threaded yet
>> (and that is coming!)
>>
>> So I'm thinking, what I really want is a structure that is attached
>> to my current Thread.T. I want to be able to access just a single
>> pointer (like the free list) but be sure it is unique to my current
>> thread. No locking would be necessary if I could do this.
>>
>> Does anyone have an elegant solution that does something like this?
>> Thread-specific "static" variables? Just one REFANY would be enough
>> for a lot of uses... seems to me this should be a frequently
>> occurring problem?
>>
>> Best regards,
>> Mika
>>
>>
>>
>>
>>
>>
More information about the M3devel
mailing list