[M3devel] Windows, Unicode file names

Dragiša Durić dragisha at m3w.org
Tue Jun 26 00:55:45 CEST 2012


On Jun 25, 2012, at 11:30 PM, Jay K wrote:

>  >  Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls?
>  
> Yes.
>  
>  > WinNLS does that.
>  
>  
> I doubt that. There is a 32bit to 16bit conversion?

http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx

whatever this means:
12000utf-32Unicode UTF-32, little endian byte order; available only to managed applications 
12001utf-32BEUnicode UTF-32, big endian byte order; available only to managed applications

> Ok, I guess there is. "Surrogate pairs" and all that?
> Maybe not in WinNLS, but easy enough for us to write, in portable C or Modula-3. :)

That too :)

> Part of Text.i3 perhaps.

UTF-32 -> UTF-16? Maybe.

>  
>  
> So then, I guess I can sign up for WIDECHAR being 32bits across the board.
>  
>  - Jay
> 
> Subject: Re: [M3devel] Windows, Unicode file names
> From: dragisha at m3w.org
> Date: Mon, 25 Jun 2012 23:09:37 +0200
> CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com
> To: jay.krell at cornell.edu
> 
> 
> On Jun 25, 2012, at 10:17 PM, Jay K wrote:
> 
> I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from
> TEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array.
> The size can, I guess, vary between Win32 and non-Win32 platforms.
> 
> a) If you like to make it as unportable as possible then yes - 16 or 32 is not important.
> b) invalid value would be over 0xFFFFF, not 0xFFFF
> c) Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? WinNLS does that. Simple narrowing (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to UTF-16 is very different thing.
> d) Size varies, yes.
> 
> Its size should be stored in a global to communicate between Modula-3 and C.
>  
>  
> I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths.
>  
>  - Jay 
> Subject: Re: [M3devel] Windows, Unicode file names
> From: dragisha at m3w.org
> Date: Mon, 25 Jun 2012 21:48:09 +0200
> CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com
> To: jay.krell at cornell.edu
> 
> It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode.
> 
> At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT.
> 
> As long as we agree on what exacty WIDECHAR is :)
> ===From wikipedia
> The Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed].
> ===
> 
> 
> On Jun 25, 2012, at 9:39 PM, Jay K wrote:
> 
> I think I know what to do here and will look into it..later..
>  
> We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW.
> Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution.
> A layer above this needs to decode UTF8, if that is the encoding.
>  
> Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt.
>  
>  - Jay 
> From: dragisha at m3w.org
> Date: Mon, 25 Jun 2012 21:05:59 +0200
> To: dabenavidesd at yahoo.es
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] Windows, Unicode file names
> 
> If you cared enough to check FSWin32.m3, answer would be obvious :).
> 
> Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is:
> 
> PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}=
>   VAR
>     handle: WinNT.HANDLE;
>     fname := M3toC.SharedTtoS(p);
>     dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0);
>     pwText: WinBaseTypes.PCWSTR; 
>   BEGIN
>     IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN
>       (* dwNum includes terminating null character. that's +1 above.
>       *)
>       handle := WinBase.CreateFile(
>                     lpFileName := fname,
>                     dwDesiredAccess := WinNT.GENERIC_READ,
>                     dwShareMode :=  WinNT.FILE_SHARE_READ,
>                     lpSecurityAttributes := NIL,
>                     dwCreationDisposition := WinBase.OPEN_EXISTING,
>                     dwFlagsAndAttributes := 0,
>                     hTemplateFile := NIL);
>     ELSE
>       pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR);
>       EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum);
>       handle := WinBase.CreateFileW(
>                     lpFileName := pwText,
>                     dwDesiredAccess := WinNT.GENERIC_READ,
>                     dwShareMode := WinNT.FILE_SHARE_READ,
>                     lpSecurityAttributes := NIL,
>                     dwCreationDisposition := WinBase.OPEN_EXISTING,
>                     dwFlagsAndAttributes := 0,
>                     hTemplateFile := NIL);
>       DISPOSE(pwText);
>     END;
> 
>     IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN
>       Fail(p, fname);
>     END;
>     M3toC.FreeSharedS(p, fname);
>     RETURN FileWin32.New(handle, FileWin32.Read)
>   END OpenFileReadonly;
> 
> And similar in OpenFile. Not nice :).
> 
> Also, I've added CP_UTF8 constant to WinNLS.i3.
> 
> On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:
> 
> Hi all:
> So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you?
> Thanks in advance
> 
> --- El lun, 25/6/12, Dragiša Durić <dragisha at m3w.org> escribió:
> 
> De: Dragiša Durić <dragisha at m3w.org>
> Asunto: Re: [M3devel] Windows, Unicode file names
> Para: "Daniel Alejandro Benavides D." <dabenavidesd at yahoo.es>
> CC: "m3devel" <m3devel at elegosoft.com>
> Fecha: lunes, 25 de junio, 2012 13:20
> 
> Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there.
> 
> What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly.
> 
> On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:
> 
> Hi all:
> OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API.
> But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here):
> http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html
> 
> Thanks in advance
> 
> --- El lun, 25/6/12, Dragiša Durić <dragisha at m3w.org> escribió:
> 
> De: Dragiša Durić <dragisha at m3w.org>
> Asunto: Re: [M3devel] Windows, Unicode file names
> Para: "Daniel Alejandro Benavides D." <dabenavidesd at yahoo.es>
> CC: "m3devel" <m3devel at elegosoft.com>
> Fecha: lunes, 25 de junio, 2012 12:36
> 
> Daniel,
> 
> I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there.
> 
> Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround.
> 
> I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution.
> 
> dd
> 
> On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:
> 
> Hi all:
> I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3.
> But in turn you want to keep compatibility with older file name encodes.
> I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved!
> Thanks in advance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20120626/e11ac054/attachment-0002.html>


More information about the M3devel mailing list