[M3devel] UTF-8 TEXT

Hendrik Boom hendrik at topoi.pooq.com
Wed Jun 27 05:30:01 CEST 2012


On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote:
> I seem to recall that Rodney did some work a while back relating to TEXT.
> Rodney, can you weigh in on some of this?
> --Randy Coleburn
> 
> From: Dragiša Durić [mailto:dragisha at m3w.org]
> Sent: Tuesday, June 26, 2012 12:46 PM
> To: Jay
> Cc: m3devel
> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope!
> 
> You had idea in other message. Store length!
> 
> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea.

Most of the time, you don't need explicit integer indexes to character 
locations.  What you do need is an operation that fetches a character 
given the string and its index (whatever data structure that index is), 
and  one that increments the index past that character.  As long as you 
can save an index and use it later on the same string, that's probably 
all you ever need.  And with a simple TEXT representation (such as the 
obvious array of bytes containing characters of various widths) a byte 
index is all you need (note: NOT a character index).  It's easy even to 
use TEXT and its integer indices as the data representation, as long as 
you use the proper functions parse the characters and increment the 
indices by amounts that might differ from 1.

And if your source code is represented in UTF-8, the representation that 
requires little extra compiler effort to parse,  your TEXT strings will 
automagically appear in UTF-8.

I can see a use for various wide characters -- the things you extract 
from a TEXT by parsing biits of it, but none for anything 
really new complicated for wide TEXT.

The only confusing thing is that the existing operations for extracting 
bytes from TEXT have names that suggest they are extracting characters.

-- Hendrik



More information about the M3devel mailing list