[M3devel] Semantics of HasWideChars

Rodney M. Bates rodney.bates at wichita.edu
Thu Jan 29 22:43:37 CET 2009


CM3's Text.HasWideChars, as implemented, has some strange behavior.
 From Text.i3:

PROCEDURE HasWideChars (t: T): BOOLEAN;
(* Returns "TRUE" if "t" contains any "WIDECHAR" characters. *)

This reads to me like it refers to the actual characters comprising t,
which also makes sense to a client of this interface.

In fact, the implementation returns TRUE if the internal representation
used is capable of representing any wide characters, even though they
could all be narrow.

Some examples:

Text.HasWideChars(W"ABC") = TRUE

Text.HasWideChars(W"ABC"&"def") = TRUE

Text.HasWideChars(Text.Sub(W"ABC"&"def",3,3)) = FALSE

Text.HasWideChars(Text.Sub(W"ABC"&"def",2,3)) = TRUE

These results have nothing to do with the abstract view of TEXT as
a string of characters.  Moreover, a principal reason a client of
Text would call HasWideChars would be to know whether it could
subsequently call, e.g., Text.GetChar without having the value
silently truncated to 8 bits.

So I propose to fix HasWideChars.  This will entail some performance
penalties, as in many cases, it will have to go through the character
values one at at time in a loop and range-check each one.

The alternative would be to redefine HasWideChars as a one-sided
approximation, (which is really is now) i.e., such that a result
of FALSE means the string  is guaranteed to contain no wide characters,
but TRUE only means the real answer is unknown.  Not very useful, and in
that case, it really should be be renamed MightHaveWideChars.

What do others think?  Anybody using GetWideChars who would be
affected?

Rodney Bates





More information about the M3devel mailing list