[M3devel] Windows, Unicode file names

Jay K jay.krell at cornell.edu
Wed Jun 27 12:19:08 CEST 2012


 > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash().
I don't quite agree.There are two ideal approaches.1)  TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F)   "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR   2) something that can change between them, or possibly store both, but is still mainly flat arraysThat is, once you store a value over 0xFF, the internal represenation changes to flat array of WIDECHAR.Probably it stays that way -- you don't want to thrash back and forth in worst case.Lesser evil is probably to stick with wide represenation.Setting the string to empty might bounce it back narrow.Ditto assigning it from another narrow text, maybe.   What I don't yet understand in all this is how to efficiently combine thread safety, immutability, and quadratic growth. The following should be as efficient as in typical C++ libraries:  VAR a: TEXT;WHILE TRUE DO  a := a & " ";END;  I kind of thing that immutability and quadratic growth are in conflict.But not because that sounds obvious.Note that typical C++ libraries do have value semantics for std::string and std::vector.   - Jay  > From: dragisha at m3w.org
> Date: Wed, 27 Jun 2012 11:52:53 +0200
> To: mika at async.caltech.edu
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] Windows, Unicode file names
> 
> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash().
> 
> What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8.
> 
> On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote:
> 
> > 
> > As far as I know, SRC M3 and PM3 come with a TEXT implementation that
> > works exactly as described below.  An extra byte is used at the end with
> > a character VAL(0,CHAR).  The Texts are simply arrays of 8-bit characters.
> > 
> > One of the big advantages of the old version is that Text.Hash is really,
> > really fast.  Especially on Alphas...  it's hugely more expensive to
> > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under
> > CM3 than under the old compilers and runtimes.  We're talking a factor
> > of five or so in speed since the Table routines are generally entirely
> > dominated by Text.Hash.
> > 
> >    Mika
> > 
> > Hendrik Boom writes:
> >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote:
> >>> 
> >>> Somewhat but not fully.  Text.Length should fetch a stored length. As 
> >>> I'm sure it already does.That length should always be correctly 
> >>> maintained. Same as today.Adding one extra nul at the end doesn't 
> >>> invalidate the data.std::string has the same properties -- c_str() can 
> >>> on-demand append a terminal nul,but there could also be one in the 
> >>> string itself.I understand it is a bit wierd.  Maintaining a terminal 
> >>> nul does add cost that might be wasted.And reduces the capacity by 
> >>> one.It could be on-demand, I guess.   - Jay
> >> 
> >> Don't need the 'on demand'.  For the benefits of C interoperability, the 
> >> extra byte is well worth the price.  What I'm worrying about is someone 
> >> using an enbedded NUL as an end-of-string marker.  I smell more bugs 
> >> creeping in.  But I guess bug are inherent in C use, so I'm not 
> >> surprised seeing them in C interoperation.
> >> 
> >> -- hendrik
> 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20120627/28739a55/attachment-0002.html>


More information about the M3devel mailing list