[M3devel] Windows, Unicode file names

Jay K jay.krell at cornell.edu
Tue Jun 26 02:58:05 CEST 2012


  > http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx   
  > 12000utf-32Unicode UTF-32, little endian byte order; available only to   managed applications   
  > 12001utf-32BEUnicode UTF-32, big endian byte order; available only to   managed applications   

Is not useful to us...unless we target .NET instead of native code...

Portable Modula-3 or C it should be.

 - Jay

________________________________
> From: dragisha at m3w.org 
> Date: Tue, 26 Jun 2012 00:55:45 +0200 
> To: jay.krell at cornell.edu 
> CC: m3devel at elegosoft.com 
> Subject: Re: [M3devel] Windows, Unicode file names 
>  
>  
> On Jun 25, 2012, at 11:30 PM, Jay K wrote: 
>  
>   >  Why would you narrow it to 16bit? You need to convert to UTF-16 and  
> make it ready for Windows API calls? 
>  
> Yes. 
>  
>   > WinNLS does that. 
>  
>  
> I doubt that. There is a 32bit to 16bit conversion? 
>  
> http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx 
>  
> whatever this means: 
> 12000utf-32Unicode UTF-32, little endian byte order; available only to  
> managed applications 
> 12001utf-32BEUnicode UTF-32, big endian byte order; available only to  
> managed applications 
>  
> Ok, I guess there is. "Surrogate pairs" and all that? 
> Maybe not in WinNLS, but easy enough for us to write, in portable C or  
> Modula-3. :) 
>  
> That too :) 
>  
> Part of Text.i3 perhaps. 
>  
> UTF-32 -> UTF-16? Maybe. 
>  
>  
>  
> So then, I guess I can sign up for WIDECHAR being 32bits across the board. 
>  
>   - Jay 
>  
> ________________________________ 
> Subject: Re: [M3devel] Windows, Unicode file names 
> From: dragisha at m3w.org<mailto:dragisha at m3w.org> 
> Date: Mon, 25 Jun 2012 23:09:37 +0200 
> CC: dabenavidesd at yahoo.es<mailto:dabenavidesd at yahoo.es>;  
> m3devel at elegosoft.com<mailto:m3devel at elegosoft.com> 
> To: jay.krell at cornell.edu<mailto:jay.krell at cornell.edu> 
>  
>  
> On Jun 25, 2012, at 10:17 PM, Jay K wrote: 
>  
> I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from 
> TEXT to a flat array of either, and if 32bits, walk the array, checking  
> for > 0xFFFF, throw an exception or return some error if any found,  
> narrow to 16bits, call some "W" function, free the flat array. 
> The size can, I guess, vary between Win32 and non-Win32 platforms. 
>  
> a) If you like to make it as unportable as possible then yes - 16 or 32  
> is not important. 
> b) invalid value would be over 0xFFFFF, not 0xFFFF 
> c) Why would you narrow it to 16bit? You need to convert to UTF-16 and  
> make it ready for Windows API calls? WinNLS does that. Simple narrowing  
> (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to  
> UTF-16 is very different thing. 
> d) Size varies, yes. 
>  
> Its size should be stored in a global to communicate between Modula-3 and C. 
>  
>  
> I'd also quite like if TEXT was internally represented as a nul  
> terminated flat array of 8 and/or 16 and/or 32bit quantities,  
> materialzing on demand some of them. But I suspect that flat and  
> readonly and exposing a concat operation are in conflict. I'm not sure.  
> MFC uses a flat reference counted nul terminated representation and it  
> works pretty well. It doesn't materialize-on-demand other widths. 
>  
>   - Jay 
> ________________________________ 
> Subject: Re: [M3devel] Windows, Unicode file names 
> From: dragisha at m3w.org<mailto:dragisha at m3w.org> 
> Date: Mon, 25 Jun 2012 21:48:09 +0200 
> CC: dabenavidesd at yahoo.es<mailto:dabenavidesd at yahoo.es>; m3devel at elegosoft.com<mailto:m3devel at elegosoft.com> 
> To: jay.krell at cornell.edu<mailto:jay.krell at cornell.edu> 
>  
> It can be what cm3 people had in mind when they created WIDECHAR as a  
> catchall for Unicode. 
>  
> At first glance it looked like no solution to me, but after counting to  
> ten - I think it is. We can have an UTF-8 layer and use it when and  
> where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. 
>  
> As long as we agree on what exacty WIDECHAR is :) 
> ===From wikipedia 
> The Microsoft Windows application programming  
> interfaces<http://en.wikipedia.org/wiki/Application_programming_interface> Win32<http://en.wikipedia.org/wiki/Win32> and Win64<http://en.wikipedia.org/wiki/Win64>,  
> as well as  
> the Java<http://en.wikipedia.org/wiki/Java_%28software_platform%29> and .Net  
> Framework<http://en.wikipedia.org/wiki/.Net_Framework> platforms,  
> require that wide character variables be defined as 16-bit values, and  
> that characters be encoded  
> using UTF-16<http://en.wikipedia.org/wiki/UTF-16> (due to former use of  
> UCS-2), while modern Unix<http://en.wikipedia.org/wiki/Unix>-like  
> systems generally require 32-bit values encoded  
> using UTF-32<http://en.wikipedia.org/wiki/UTF-32>[citation  
> needed<http://en.wikipedia.org/wiki/Wikipedia:Citation_needed>]. 
> === 
>  
>  
> On Jun 25, 2012, at 9:39 PM, Jay K wrote: 
>  
> I think I know what to do here and will look into it..later.. 
>  
> We have TEXT. We should just always get WIDECHARs out of it and call  
> CreateFileW. 
> Assuming UTF8 is the wrong solution at this level, and passing in UTF8  
> won't work with the correct solution. 
> A layer above this needs to decode UTF8, if that is the encoding. 
>  
> Unless someone has declared and implemented that TEXT is in fact always  
> UTF8-encoded, which I doubt. 
>  
>   - Jay 
> ________________________________ 
> From: dragisha at m3w.org<mailto:dragisha at m3w.org> 
> Date: Mon, 25 Jun 2012 21:05:59 +0200 
> To: dabenavidesd at yahoo.es<mailto:dabenavidesd at yahoo.es> 
> CC: m3devel at elegosoft.com<mailto:m3devel at elegosoft.com> 
> Subject: Re: [M3devel] Windows, Unicode file names 
>  
> If you cared enough to check FSWin32.m3, answer would be obvious :). 
>  
> Whatever I do with pathname before I call FS.OpenFile(Readonly)? -  
> FSWin32.m3 will call CreateFileA. My solution is: 
>  
> PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= 
>    VAR 
>      handle: WinNT.HANDLE; 
>      fname := M3toC.SharedTtoS(p); 
>      dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1,  
> NIL, 0); 
>      pwText: WinBaseTypes.PCWSTR; 
>    BEGIN 
>      IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN 
>        (* dwNum includes terminating null character. that's +1 above. 
>        *) 
>        handle := WinBase.CreateFile( 
>                      lpFileName := fname, 
>                      dwDesiredAccess := WinNT.GENERIC_READ, 
>                      dwShareMode :=  WinNT.FILE_SHARE_READ, 
>                      lpSecurityAttributes := NIL, 
>                      dwCreationDisposition := WinBase.OPEN_EXISTING, 
>                      dwFlagsAndAttributes := 0, 
>                      hTemplateFile := NIL); 
>      ELSE 
>        pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2),  
> WinBaseTypes.PCWSTR); 
>        EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1,  
> pwText, dwNum); 
>        handle := WinBase.CreateFileW( 
>                      lpFileName := pwText, 
>                      dwDesiredAccess := WinNT.GENERIC_READ, 
>                      dwShareMode := WinNT.FILE_SHARE_READ, 
>                      lpSecurityAttributes := NIL, 
>                      dwCreationDisposition := WinBase.OPEN_EXISTING, 
>                      dwFlagsAndAttributes := 0, 
>                      hTemplateFile := NIL); 
>        DISPOSE(pwText); 
>      END; 
>  
>      IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN 
>        Fail(p, fname); 
>      END; 
>      M3toC.FreeSharedS(p, fname); 
>      RETURN FileWin32.New(handle, FileWin32.Read) 
>    END OpenFileReadonly; 
>  
> And similar in OpenFile. Not nice :). 
>  
> Also, I've added CP_UTF8 constant to WinNLS.i3. 
>  
> On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: 
>  
> Hi all: 
> So do you need Double-Byte Character String module as currently in TEXT  
> types? but you can do that already. Couldn't you? 
> Thanks in advance 
>  
> --- El lun, 25/6/12, Dragiša  
> Durić <dragisha at m3w.org<mailto:dragisha at m3w.org>> escribió: 
>  
> De: Dragiša Durić <dragisha at m3w.org<mailto:dragisha at m3w.org>> 
> Asunto: Re: [M3devel] Windows, Unicode file names 
> Para: "Daniel Alejandro Benavides D."  
> <dabenavidesd at yahoo.es<mailto:dabenavidesd at yahoo.es>> 
> CC: "m3devel" <m3devel at elegosoft.com<mailto:m3devel at elegosoft.com>> 
> Fecha: lunes, 25 de junio, 2012 13:20 
>  
> Yes, they exposed parts of NLS. That's how problem can be, albeit  
> partially, solved. By using methods exposed there. 
>  
> What we don't have is how to communicate actual encoding of string to  
> FS module so FS methods can handle filenames accordingly. 
>  
> On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: 
>  
> Hi all: 
> OK, good, Win32 API dealt with inter-NLS (National Language Support) at  
> ASCII and other formats level with NLS API. 
> But it appears to be have not been used for DEC-SRC WinNT port of  
> Modula-3 (but for CM3, though it isn't compiled in elego servers, but  
> here): 
> http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html 
>  
> Thanks in advance 
>  
> --- El lun, 25/6/12, Dragiša  
> Durić <dragisha at m3w.org<mailto:dragisha at m3w.org>> escribió: 
>  
> De: Dragiša Durić <dragisha at m3w.org<mailto:dragisha at m3w.org>> 
> Asunto: Re: [M3devel] Windows, Unicode file names 
> Para: "Daniel Alejandro Benavides D."  
> <dabenavidesd at yahoo.es<mailto:dabenavidesd at yahoo.es>> 
> CC: "m3devel" <m3devel at elegosoft.com<mailto:m3devel at elegosoft.com>> 
> Fecha: lunes, 25 de junio, 2012 12:36 
>  
> Daniel, 
>  
> I can talk about many things, and most things Modula-3 are of interest  
> to me. Once you start a topic, and I can understand what is it about,  
> and it meets my interests - I'll be there. 
>  
> Problem I met with filenames is nothing old. Windows can open files  
> with filenames in ASCII and UTF-16. Everything else - you must check  
> twice and do a workaround. 
>  
> I've written here in hope I can get i to some fruitful discussion with  
> people who understand this problem. My solution is a workaround and  
> assumes filename is UTF-8 or ASCII. I would like to start discussion on  
> this and work from there to more general solution. 
>  
> dd 
>  
> On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: 
>  
> Hi all: 
> I as I understood, thought you don't want to talk about compatible W 95  
> / NT distro of Modula-3. 
> But in turn you want to keep compatibility with older file name encodes. 
> I don't care that but if its useful anyway (because newer windows don't  
> care at all either) I don't know know your problem was because it won't  
> be able to be solved! 
> Thanks in advance 
>  
>  
>  
 		 	   		  


More information about the M3devel mailing list