[M3devel] This disgusting TEXT business
Darko
darko at darko.org
Wed Dec 24 01:20:27 CET 2008
I agree with your list of problems, but you don't need a cartesian
product of operations for string operations between multiple
representations. All you need is iteration through the characters of a
string and concatenation with another string for each representation.
The object interface I proposed is little more than a wrapper for the
existing Text interface plus the iteration method. The getData and
setData methods in that object provide an interface to external APIs.
On 24/12/2008, at 8:22 AM, Rodney M. Bates wrote:
> I hear three problems with CM3 TEXT:
>
> 1) WIDECHAR and the TEXT implementation won't handle Unicode values
> that
> exceed 216-1.
>
> 2) The CM3 TEXT implementation has serious inefficiencies in at
> least some
> realistic cases.
>
> 3) We want some kind of compatibility with other software.
>
> As for 2), the implementation could be greatly improved without
> changing
> the existing TEXT abstraction. It could even be improved a lot
> without
> changing the existing data structures, only the algorithms.
>
> Back in March, I conjectured about improving get_chars for a
> concatenation
> by recursing on only one side and iterating on the other, but I
> never did
> anything about it. Text.Cat could flatten concatenations that are
> short
> into one of the atomic representations. It could perhaps do a two-
> level
> rebalance of concatenation trees. All these would help at least
> somewhat
> with the performance consequences of building a string one character
> at
> a time by concatenation.
>
> And of course, ignoring the needs of those who have violated the TEXT
> abstraction, we could come up with entirely new internal
> representations.
> We could even just add some new ones and support mixtures of both
> the old
> and new ones. This would really just be an expansion of what we
> already
> have, which has several representations, with all the operations
> dynamically
> checking and adapting to them.
>
> On the other hand, I do know from hard experience, that the
> implementation
> size and complexity go up surprisingly as the number of alternative
> representations goes up. It's really a sort of cartesion product of
> operations, operands for each operation, and representations for each
> operand.
>
> Something that has not been mentioned is the _space_ overhead of the
> existing concatenation scheme. Proponents of the pure or nearly-pure
> OO language design philosophy (e.g. Smalltalk, Java) seem pretty
> oblivious to how much you lose with lots of separately-allocated
> heap objects connected by pointers.
>
> Aside from the extra pointers, there is allocator/GC overhead
> per object and fragmentation loss. Then there's loss of
> reference locality, leading to bigger working sets, which spills
> back into time overhead. And with heap-allocated open arrays,
> there are additional 1+NoOfDimensions words as well. So I am
> strongly in favor of cutting down on the number of separate objects,
> wherever reasonable.
>
> Rodney Bates
More information about the M3devel
mailing list