[M3devel] Windows, Unicode file names
Dragiša Durić
dragisha at m3w.org
Tue Jun 26 00:55:45 CEST 2012
On Jun 25, 2012, at 11:30 PM, Jay K wrote:
> > Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls?
>
> Yes.
>
> > WinNLS does that.
>
>
> I doubt that. There is a 32bit to 16bit conversion?
http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx
whatever this means:
12000utf-32Unicode UTF-32, little endian byte order; available only to managed applications
12001utf-32BEUnicode UTF-32, big endian byte order; available only to managed applications
> Ok, I guess there is. "Surrogate pairs" and all that?
> Maybe not in WinNLS, but easy enough for us to write, in portable C or Modula-3. :)
That too :)
> Part of Text.i3 perhaps.
UTF-32 -> UTF-16? Maybe.
>
>
> So then, I guess I can sign up for WIDECHAR being 32bits across the board.
>
> - Jay
>
> Subject: Re: [M3devel] Windows, Unicode file names
> From: dragisha at m3w.org
> Date: Mon, 25 Jun 2012 23:09:37 +0200
> CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com
> To: jay.krell at cornell.edu
>
>
> On Jun 25, 2012, at 10:17 PM, Jay K wrote:
>
> I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from
> TEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array.
> The size can, I guess, vary between Win32 and non-Win32 platforms.
>
> a) If you like to make it as unportable as possible then yes - 16 or 32 is not important.
> b) invalid value would be over 0xFFFFF, not 0xFFFF
> c) Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? WinNLS does that. Simple narrowing (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to UTF-16 is very different thing.
> d) Size varies, yes.
>
> Its size should be stored in a global to communicate between Modula-3 and C.
>
>
> I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths.
>
> - Jay
> Subject: Re: [M3devel] Windows, Unicode file names
> From: dragisha at m3w.org
> Date: Mon, 25 Jun 2012 21:48:09 +0200
> CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com
> To: jay.krell at cornell.edu
>
> It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode.
>
> At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT.
>
> As long as we agree on what exacty WIDECHAR is :)
> ===From wikipedia
> The Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed].
> ===
>
>
> On Jun 25, 2012, at 9:39 PM, Jay K wrote:
>
> I think I know what to do here and will look into it..later..
>
> We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW.
> Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution.
> A layer above this needs to decode UTF8, if that is the encoding.
>
> Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt.
>
> - Jay
> From: dragisha at m3w.org
> Date: Mon, 25 Jun 2012 21:05:59 +0200
> To: dabenavidesd at yahoo.es
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] Windows, Unicode file names
>
> If you cared enough to check FSWin32.m3, answer would be obvious :).
>
> Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is:
>
> PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}=
> VAR
> handle: WinNT.HANDLE;
> fname := M3toC.SharedTtoS(p);
> dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0);
> pwText: WinBaseTypes.PCWSTR;
> BEGIN
> IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN
> (* dwNum includes terminating null character. that's +1 above.
> *)
> handle := WinBase.CreateFile(
> lpFileName := fname,
> dwDesiredAccess := WinNT.GENERIC_READ,
> dwShareMode := WinNT.FILE_SHARE_READ,
> lpSecurityAttributes := NIL,
> dwCreationDisposition := WinBase.OPEN_EXISTING,
> dwFlagsAndAttributes := 0,
> hTemplateFile := NIL);
> ELSE
> pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR);
> EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum);
> handle := WinBase.CreateFileW(
> lpFileName := pwText,
> dwDesiredAccess := WinNT.GENERIC_READ,
> dwShareMode := WinNT.FILE_SHARE_READ,
> lpSecurityAttributes := NIL,
> dwCreationDisposition := WinBase.OPEN_EXISTING,
> dwFlagsAndAttributes := 0,
> hTemplateFile := NIL);
> DISPOSE(pwText);
> END;
>
> IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN
> Fail(p, fname);
> END;
> M3toC.FreeSharedS(p, fname);
> RETURN FileWin32.New(handle, FileWin32.Read)
> END OpenFileReadonly;
>
> And similar in OpenFile. Not nice :).
>
> Also, I've added CP_UTF8 constant to WinNLS.i3.
>
> On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:
>
> Hi all:
> So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you?
> Thanks in advance
>
> --- El lun, 25/6/12, Dragiša Durić <dragisha at m3w.org> escribió:
>
> De: Dragiša Durić <dragisha at m3w.org>
> Asunto: Re: [M3devel] Windows, Unicode file names
> Para: "Daniel Alejandro Benavides D." <dabenavidesd at yahoo.es>
> CC: "m3devel" <m3devel at elegosoft.com>
> Fecha: lunes, 25 de junio, 2012 13:20
>
> Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there.
>
> What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly.
>
> On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:
>
> Hi all:
> OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API.
> But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here):
> http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html
>
> Thanks in advance
>
> --- El lun, 25/6/12, Dragiša Durić <dragisha at m3w.org> escribió:
>
> De: Dragiša Durić <dragisha at m3w.org>
> Asunto: Re: [M3devel] Windows, Unicode file names
> Para: "Daniel Alejandro Benavides D." <dabenavidesd at yahoo.es>
> CC: "m3devel" <m3devel at elegosoft.com>
> Fecha: lunes, 25 de junio, 2012 12:36
>
> Daniel,
>
> I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there.
>
> Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround.
>
> I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution.
>
> dd
>
> On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:
>
> Hi all:
> I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3.
> But in turn you want to keep compatibility with older file name encodes.
> I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved!
> Thanks in advance
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20120626/e11ac054/attachment-0002.html>
More information about the M3devel
mailing list