[M3devel] Windows, Unicode file names
Jay K
jay.krell at cornell.edu
Tue Jun 26 02:58:05 CEST 2012
> http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx
> 12000utf-32Unicode UTF-32, little endian byte order; available only to managed applications
> 12001utf-32BEUnicode UTF-32, big endian byte order; available only to managed applications
Is not useful to us...unless we target .NET instead of native code...
Portable Modula-3 or C it should be.
- Jay
________________________________
> From: dragisha at m3w.org
> Date: Tue, 26 Jun 2012 00:55:45 +0200
> To: jay.krell at cornell.edu
> CC: m3devel at elegosoft.com
> Subject: Re: [M3devel] Windows, Unicode file names
>
>
> On Jun 25, 2012, at 11:30 PM, Jay K wrote:
>
> > Why would you narrow it to 16bit? You need to convert to UTF-16 and
> make it ready for Windows API calls?
>
> Yes.
>
> > WinNLS does that.
>
>
> I doubt that. There is a 32bit to 16bit conversion?
>
> http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx
>
> whatever this means:
> 12000utf-32Unicode UTF-32, little endian byte order; available only to
> managed applications
> 12001utf-32BEUnicode UTF-32, big endian byte order; available only to
> managed applications
>
> Ok, I guess there is. "Surrogate pairs" and all that?
> Maybe not in WinNLS, but easy enough for us to write, in portable C or
> Modula-3. :)
>
> That too :)
>
> Part of Text.i3 perhaps.
>
> UTF-32 -> UTF-16? Maybe.
>
>
>
> So then, I guess I can sign up for WIDECHAR being 32bits across the board.
>
> - Jay
>
> ________________________________
> Subject: Re: [M3devel] Windows, Unicode file names
> From: dragisha at m3w.org<mailto:dragisha at m3w.org>
> Date: Mon, 25 Jun 2012 23:09:37 +0200
> CC: dabenavidesd at yahoo.es<mailto:dabenavidesd at yahoo.es>;
> m3devel at elegosoft.com<mailto:m3devel at elegosoft.com>
> To: jay.krell at cornell.edu<mailto:jay.krell at cornell.edu>
>
>
> On Jun 25, 2012, at 10:17 PM, Jay K wrote:
>
> I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from
> TEXT to a flat array of either, and if 32bits, walk the array, checking
> for > 0xFFFF, throw an exception or return some error if any found,
> narrow to 16bits, call some "W" function, free the flat array.
> The size can, I guess, vary between Win32 and non-Win32 platforms.
>
> a) If you like to make it as unportable as possible then yes - 16 or 32
> is not important.
> b) invalid value would be over 0xFFFFF, not 0xFFFF
> c) Why would you narrow it to 16bit? You need to convert to UTF-16 and
> make it ready for Windows API calls? WinNLS does that. Simple narrowing
> (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to
> UTF-16 is very different thing.
> d) Size varies, yes.
>
> Its size should be stored in a global to communicate between Modula-3 and C.
>
>
> I'd also quite like if TEXT was internally represented as a nul
> terminated flat array of 8 and/or 16 and/or 32bit quantities,
> materialzing on demand some of them. But I suspect that flat and
> readonly and exposing a concat operation are in conflict. I'm not sure.
> MFC uses a flat reference counted nul terminated representation and it
> works pretty well. It doesn't materialize-on-demand other widths.
>
> - Jay
> ________________________________
> Subject: Re: [M3devel] Windows, Unicode file names
> From: dragisha at m3w.org<mailto:dragisha at m3w.org>
> Date: Mon, 25 Jun 2012 21:48:09 +0200
> CC: dabenavidesd at yahoo.es<mailto:dabenavidesd at yahoo.es>; m3devel at elegosoft.com<mailto:m3devel at elegosoft.com>
> To: jay.krell at cornell.edu<mailto:jay.krell at cornell.edu>
>
> It can be what cm3 people had in mind when they created WIDECHAR as a
> catchall for Unicode.
>
> At first glance it looked like no solution to me, but after counting to
> ten - I think it is. We can have an UTF-8 layer and use it when and
> where needed, to recode our strings to catchall WIDECHAR/WIDETEXT.
>
> As long as we agree on what exacty WIDECHAR is :)
> ===From wikipedia
> The Microsoft Windows application programming
> interfaces<http://en.wikipedia.org/wiki/Application_programming_interface> Win32<http://en.wikipedia.org/wiki/Win32> and Win64<http://en.wikipedia.org/wiki/Win64>,
> as well as
> the Java<http://en.wikipedia.org/wiki/Java_%28software_platform%29> and .Net
> Framework<http://en.wikipedia.org/wiki/.Net_Framework> platforms,
> require that wide character variables be defined as 16-bit values, and
> that characters be encoded
> using UTF-16<http://en.wikipedia.org/wiki/UTF-16> (due to former use of
> UCS-2), while modern Unix<http://en.wikipedia.org/wiki/Unix>-like
> systems generally require 32-bit values encoded
> using UTF-32<http://en.wikipedia.org/wiki/UTF-32>[citation
> needed<http://en.wikipedia.org/wiki/Wikipedia:Citation_needed>].
> ===
>
>
> On Jun 25, 2012, at 9:39 PM, Jay K wrote:
>
> I think I know what to do here and will look into it..later..
>
> We have TEXT. We should just always get WIDECHARs out of it and call
> CreateFileW.
> Assuming UTF8 is the wrong solution at this level, and passing in UTF8
> won't work with the correct solution.
> A layer above this needs to decode UTF8, if that is the encoding.
>
> Unless someone has declared and implemented that TEXT is in fact always
> UTF8-encoded, which I doubt.
>
> - Jay
> ________________________________
> From: dragisha at m3w.org<mailto:dragisha at m3w.org>
> Date: Mon, 25 Jun 2012 21:05:59 +0200
> To: dabenavidesd at yahoo.es<mailto:dabenavidesd at yahoo.es>
> CC: m3devel at elegosoft.com<mailto:m3devel at elegosoft.com>
> Subject: Re: [M3devel] Windows, Unicode file names
>
> If you cared enough to check FSWin32.m3, answer would be obvious :).
>
> Whatever I do with pathname before I call FS.OpenFile(Readonly)? -
> FSWin32.m3 will call CreateFileA. My solution is:
>
> PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}=
> VAR
> handle: WinNT.HANDLE;
> fname := M3toC.SharedTtoS(p);
> dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1,
> NIL, 0);
> pwText: WinBaseTypes.PCWSTR;
> BEGIN
> IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN
> (* dwNum includes terminating null character. that's +1 above.
> *)
> handle := WinBase.CreateFile(
> lpFileName := fname,
> dwDesiredAccess := WinNT.GENERIC_READ,
> dwShareMode := WinNT.FILE_SHARE_READ,
> lpSecurityAttributes := NIL,
> dwCreationDisposition := WinBase.OPEN_EXISTING,
> dwFlagsAndAttributes := 0,
> hTemplateFile := NIL);
> ELSE
> pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2),
> WinBaseTypes.PCWSTR);
> EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1,
> pwText, dwNum);
> handle := WinBase.CreateFileW(
> lpFileName := pwText,
> dwDesiredAccess := WinNT.GENERIC_READ,
> dwShareMode := WinNT.FILE_SHARE_READ,
> lpSecurityAttributes := NIL,
> dwCreationDisposition := WinBase.OPEN_EXISTING,
> dwFlagsAndAttributes := 0,
> hTemplateFile := NIL);
> DISPOSE(pwText);
> END;
>
> IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN
> Fail(p, fname);
> END;
> M3toC.FreeSharedS(p, fname);
> RETURN FileWin32.New(handle, FileWin32.Read)
> END OpenFileReadonly;
>
> And similar in OpenFile. Not nice :).
>
> Also, I've added CP_UTF8 constant to WinNLS.i3.
>
> On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:
>
> Hi all:
> So do you need Double-Byte Character String module as currently in TEXT
> types? but you can do that already. Couldn't you?
> Thanks in advance
>
> --- El lun, 25/6/12, Dragiša
> Durić <dragisha at m3w.org<mailto:dragisha at m3w.org>> escribió:
>
> De: Dragiša Durić <dragisha at m3w.org<mailto:dragisha at m3w.org>>
> Asunto: Re: [M3devel] Windows, Unicode file names
> Para: "Daniel Alejandro Benavides D."
> <dabenavidesd at yahoo.es<mailto:dabenavidesd at yahoo.es>>
> CC: "m3devel" <m3devel at elegosoft.com<mailto:m3devel at elegosoft.com>>
> Fecha: lunes, 25 de junio, 2012 13:20
>
> Yes, they exposed parts of NLS. That's how problem can be, albeit
> partially, solved. By using methods exposed there.
>
> What we don't have is how to communicate actual encoding of string to
> FS module so FS methods can handle filenames accordingly.
>
> On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:
>
> Hi all:
> OK, good, Win32 API dealt with inter-NLS (National Language Support) at
> ASCII and other formats level with NLS API.
> But it appears to be have not been used for DEC-SRC WinNT port of
> Modula-3 (but for CM3, though it isn't compiled in elego servers, but
> here):
> http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html
>
> Thanks in advance
>
> --- El lun, 25/6/12, Dragiša
> Durić <dragisha at m3w.org<mailto:dragisha at m3w.org>> escribió:
>
> De: Dragiša Durić <dragisha at m3w.org<mailto:dragisha at m3w.org>>
> Asunto: Re: [M3devel] Windows, Unicode file names
> Para: "Daniel Alejandro Benavides D."
> <dabenavidesd at yahoo.es<mailto:dabenavidesd at yahoo.es>>
> CC: "m3devel" <m3devel at elegosoft.com<mailto:m3devel at elegosoft.com>>
> Fecha: lunes, 25 de junio, 2012 12:36
>
> Daniel,
>
> I can talk about many things, and most things Modula-3 are of interest
> to me. Once you start a topic, and I can understand what is it about,
> and it meets my interests - I'll be there.
>
> Problem I met with filenames is nothing old. Windows can open files
> with filenames in ASCII and UTF-16. Everything else - you must check
> twice and do a workaround.
>
> I've written here in hope I can get i to some fruitful discussion with
> people who understand this problem. My solution is a workaround and
> assumes filename is UTF-8 or ASCII. I would like to start discussion on
> this and work from there to more general solution.
>
> dd
>
> On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:
>
> Hi all:
> I as I understood, thought you don't want to talk about compatible W 95
> / NT distro of Modula-3.
> But in turn you want to keep compatibility with older file name encodes.
> I don't care that but if its useful anyway (because newer windows don't
> care at all either) I don't know know your problem was because it won't
> be able to be solved!
> Thanks in advance
>
>
>
More information about the M3devel
mailing list