<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>
> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash().<br><BR>I don't quite agree.<BR>There are two ideal approaches.<BR>1)<BR> TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) <BR> "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR <BR> <BR> <BR>2) something that can change between them, or possibly store both, but is still mainly flat arrays<BR>That is, once you store a value over 0xFF, the internal represenation changes to flat array of WIDECHAR.<BR>Probably it stays that way -- you don't want to thrash back and forth in worst case.<BR>Lesser evil is probably to stick with wide represenation.<BR>Setting the string to empty might bounce it back narrow.<BR>Ditto assigning it from another narrow text, maybe.<BR> <BR> <BR> <BR>What I don't yet understand in all this is how to efficiently combine thread safety, immutability, and quadratic growth.<BR> <BR>The following should be as efficient as in typical C++ libraries:<BR> <BR> <BR>VAR a: TEXT;<BR>WHILE TRUE DO<BR> a := a & " ";<BR>END;<BR> <BR> <BR>I kind of thing that immutability and quadratic growth are in conflict.<BR>But not because that sounds obvious.<BR>Note that typical C++ libraries do have value semantics for std::string and std::vector.<BR> <BR> <BR> - Jay<BR> <BR> <BR><div><div id="SkyDrivePlaceholder"></div>> From: dragisha@m3w.org<br>> Date: Wed, 27 Jun 2012 11:52:53 +0200<br>> To: mika@async.caltech.edu<br>> CC: m3devel@elegosoft.com<br>> Subject: Re: [M3devel] Windows, Unicode file names<br>> <br>> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash().<br>> <br>> What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8.<br>> <br>> On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote:<br>> <br>> > <br>> > As far as I know, SRC M3 and PM3 come with a TEXT implementation that<br>> > works exactly as described below. An extra byte is used at the end with<br>> > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters.<br>> > <br>> > One of the big advantages of the old version is that Text.Hash is really,<br>> > really fast. Especially on Alphas... it's hugely more expensive to<br>> > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under<br>> > CM3 than under the old compilers and runtimes. We're talking a factor<br>> > of five or so in speed since the Table routines are generally entirely<br>> > dominated by Text.Hash.<br>> > <br>> > Mika<br>> > <br>> > Hendrik Boom writes:<br>> >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote:<br>> >>> <br>> >>> Somewhat but not fully. Text.Length should fetch a stored length. As <br>> >>> I'm sure it already does.That length should always be correctly <br>> >>> maintained. Same as today.Adding one extra nul at the end doesn't <br>> >>> invalidate the data.std::string has the same properties -- c_str() can <br>> >>> on-demand append a terminal nul,but there could also be one in the <br>> >>> string itself.I understand it is a bit wierd. Maintaining a terminal <br>> >>> nul does add cost that might be wasted.And reduces the capacity by <br>> >>> one.It could be on-demand, I guess. - Jay<br>> >> <br>> >> Don't need the 'on demand'. For the benefits of C interoperability, the <br>> >> extra byte is well worth the price. What I'm worrying about is someone <br>> >> using an enbedded NUL as an end-of-string marker. I smell more bugs <br>> >> creeping in. But I guess bug are inherent in C use, so I'm not <br>> >> surprised seeing them in C interoperation.<br>> >> <br>> >> -- hendrik<br>> <br></div> </div></body>
</html>