[M3devel] a text proposal??

Jay jay.krell at cornell.edu
Wed Dec 24 09:00:39 CET 2008


NOTE I am not a vehement advocate of the below.
I'm at least "floating" a "thought balloon".
Like, doesn't this sound fairly ok and reasonable?
 

I propose the following "one" "parameterized" representation:
 

  a block of memory  
  with a stored allocated size, probably in bytes, but could be divided by character size (i.e. "length in chars" vs. "length in bytes")  
  a stored character size, in bytes, which shall be 1, 2, or 4  
  a stored length, probably in bytes, but could be divided by character size  
  a stored maximum character value, maybe
    This is too expensive to maintain in general.
    It could be "advisory".
    However, it is also somewhat redundant with the character size.
  

There shall be one implementation, though part of it shall be
in a generic module, parameterized by character size.
(yeah, yeah, parameterized by whatever Modula-3 allows)

Appending a string with character size n with a string with character size m
shall result in a string with character size max(n, m).

String append shall always result in a "flat array".
Never linked lists.

Strings are never "encoded".
Characters shall only ever be zero extended.

Maybe there shall be functions to create strings from encoded strings.
e.g. from utf8 and whatever is the name for 16 bit unicode that
contain 20 bit surrogate pairs or whatever they are called.
And to encode strings.
 

Concatenating shall grow the allocated size by at least a constant
factor greater than one. Possibly 2, possibly smaller.
If that fails, fallback to allocating only what is needed.
[This probably makes no sense, given text immutability and that
concat produces new texts. But it might make sense if "text"
is a value type.]

 
Concatenation of n texts shall be a primitive, not just 2 texts.
[It looks like this is already the case.]
 
 
I think TEXTs are considered immutable.
Over-allocating for later in-place concatenation doesn't, I believe, contract that.
However perhaps I misspoke, and the allocated size is stored with the pointer,
and the text contains a pointer to that, and a length. And maybe an offset.
But "internal pointers" are ok and therefore an offset not needed, right?

 
Cop out rationale:
  Anyone who needs O(1) concatenation of very long strings, is probably using the wrong type.
 

Since texts are immutable, it may be viable, or even cheap, to allow a text to stick
around with multiple representations, possibly using "weak refs".
 

That is, string shall have probably a function or functions to get at the raw chars.
Either the user has to specify a char size he wants, and get back null if the wrong size,
or perhaps if user asks for larger than is needed, the text realizes that representation
on-demand? And then it sticks around, in case another client soon thereafter asks
for the same wider-than-necessary representation?

 
The first order implementation can be simpler though.
It can either return null for all but the current representation, or realize and
return, but not hold onto, whatever representation is requested.

 
Note that I'm not familiar with the code (yet) so this might be partly or total nonsense.
I'm particular uncertain about the ability to double allocation size as individual chars are appended to a text.
 
Issues of normalization remain punted.
They already exist.
Functions could be provided for convert to composed, decomposed, etc.
 

 - Jay


More information about the M3devel mailing list