[M3devel] text inefficiency? good mutable string type? arrays?

Olaf Wagner wagner at elegosoft.com
Tue Feb 26 11:45:51 CET 2008


Quoting Jay <jayk123 at hotmail.com>:

> I know this area has been brewing under the surface a while.
> I'll bring it up. :)
>
> I assume texts are read only?

Yes.

> I know lots of systems have read only strings.
> There are pluses and minus to them. They can be single-instanced.
> Some systems with read only strings have another type, such as   
> "StringBuilder" or "StringBuffer".
> So -- I don't have a specific question, but maybe a mutable string   
> "class" aka "type" is needed?Not necessarily in the language but in   
> m3core or libm3?
> Maybe it's already there?

For string construction you can use TextWr.T.
I wouldn't object to a mutable string type as an extension though;
but it won't be too easy to design. Unless you are willing to
suggest some interface, we should just add this issue to the
long-term TODO list.

I'd rather prefer getting everything more stable again.
The more tests I try to add to the regression, the more problems
show up :-/

Olaf

> I just spent a few hours diddling with arrays of chars.
> Unfortunately they are not resizable.
> It was not entirely satisfactory. Besides the array I had to pass   
> around a length and a start.
> Wrapping this up in one record TYPE MutableString= ... might be a good idea.
>
> For more efficient read only access, would it be reasonable for the   
> runtime to materialize on-demand 8 bit and 16 bit representations of  
>  a string if a user calls some new thing like Text.GetDirectA (t:   
> TEXT) : REF ARRAY OF CHAR, Text.GetDirectW (t: TEXT) : REF ARRAY OF   
> WIDECHAR? Throw an exception if the string cannot be represented   
> with 8 bit characters? Or use utf8?
>
> Besides, I know I'm a big predictable whiner but I like how this   
> works in Windows..
> It may not have been as seamless, but it works and it really doesn't  
>  tend to break or slow down existing code.
> roughly:
>   "foo" is an 8 bit string of type char* (or const char* or possibly  
>  const char[4])
>   L"foo" is an 16 bit string of type wchar_t* (ditto, and aka WCHAR)
>   "L" for "long" aka "wide"
>
> Functions must be written to specifically use one or the other.
> In C++ you can templatize. There is std::string and std::wstring   
> that are template instantiations.
> Lots of C functions are duplicated. strcpy => wcscpy, strcat => wcscat, etc.
> And really there's no point in 8 bit strings.
> If you have a blob, that's an array of bytes or something, not characters.
>
> It works.
>
> Utf8 is another seemingly popular route but I think it's a hack.
> I think mostly people don't touch their code and say, voila, it's   
> utf8, and they only really support the same old English or possibly   
> 8 bit characters (some European characters).
> Granted, to some extent, this does work, as long as you don't do   
> anything with the string but strlen, strcpy, and some others and   
> pass it to code that does treat it correctly. Still, variably sized   
> encodings seem like a bad idea here, and 16 bits per character seem   
> affordable enough. And yes, I know that Unicode is actually know 20   
> bits per character and some characters take two wchar_ts but I try   
> to ignore that...
> And I know there is a lot of existing code, but sometimes there is a  
>  need for progress too...
>
> Ok, maybe NOW I'll look at the cygwin/shobjgen problem. :)
>
>
>  - Jay
>
> _________________________________________________________________
> Helping your favorite cause is as easy as instant messaging. You IM, we give.
> http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join



-- 
Olaf Wagner -- elego Software Solutions GmbH
                Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23 45 86 96  mobile: +49 177 2345 869  fax: +49 30 23 45 86 95
    http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194




More information about the M3devel mailing list