[M3devel] This disgusting TEXT business

Wed Dec 24 01:20:27 CET 2008

I agree with your list of problems, but you don't need a cartesian  
product of operations for string operations between multiple  
representations. All you need is iteration through the characters of a  
string and concatenation with another string for each representation.  
The object interface I proposed is little more than a wrapper for the  
existing Text interface plus the iteration method. The getData and  
setData methods in that object provide an interface to external APIs.

On 24/12/2008, at 8:22 AM, Rodney M. Bates wrote:

> I hear three problems with CM3 TEXT:
>
> 1) WIDECHAR and the TEXT implementation won't handle Unicode values  
> that
>   exceed 216-1.
>
> 2) The CM3 TEXT implementation has serious inefficiencies in at  
> least some
>   realistic cases.
>
> 3) We want some kind of compatibility with other software.
>
> As for 2), the implementation could be greatly improved without  
> changing
> the existing TEXT abstraction.  It could even be improved a lot  
> without
> changing the existing data structures, only the algorithms.
>
> Back in March, I conjectured about improving get_chars for a  
> concatenation
> by recursing on only one side and iterating on the other, but I  
> never did
> anything about it.  Text.Cat could flatten concatenations that are  
> short
> into one of the atomic representations.  It could perhaps do a two- 
> level
> rebalance of concatenation trees.  All these would help at least  
> somewhat
> with the performance consequences of building a string one character  
> at
> a time by concatenation.
>
> And of course, ignoring the needs of those who have violated the TEXT
> abstraction, we could come up with entirely new internal  
> representations.
> We could even just add some new ones and support mixtures of both  
> the old
> and new ones.  This would really just be an expansion of what we  
> already
> have, which has several representations, with all the operations  
> dynamically
> checking and adapting to them.
>
> On the other hand, I do know from hard experience, that the  
> implementation
> size and complexity go up surprisingly as the number of alternative
> representations goes up.  It's really a sort of cartesion product of
> operations, operands for each operation, and representations for each
> operand.
>
> Something that has not been mentioned is the _space_ overhead of the
> existing concatenation scheme.  Proponents of the pure or nearly-pure
> OO language design philosophy (e.g. Smalltalk, Java) seem pretty
> oblivious to how much you lose with lots of separately-allocated
> heap objects connected by pointers.
>
> Aside from the extra pointers, there is allocator/GC overhead
> per object and fragmentation loss.  Then there's loss of
> reference locality, leading to bigger working sets, which spills
> back into time overhead.  And with heap-allocated open arrays,
> there are additional 1+NoOfDimensions words as well.  So I am
> strongly in favor of cutting down on the number of separate objects,
> wherever reasonable.
>
> Rodney Bates