[M3devel] Pathname.Legal

Jay jay.krell at cornell.edu
Sun Oct 14 16:11:22 CEST 2007


My goodness varying file system, OS file system support, network file system protocol feature sets, character encodings, case sensitivity rules, is a rats of nest of subtle but significant problems even if you are developing on one OS and/or one file system..
 
On my Mac I have file names with forward slashes and question marks (I didn't create them, they were downloaded that way, such as the MPW SC/SCpp reference and "Where is JBindery?"). I can't copy them to my low end Linux NAS.
 
The source to the low end Linux NAS has dots at the start of some file names. It cannot be copied over itself from Windows, because such files are hidden and cannot be unhidden.
 
Here's a tidbit -- Windows has "long" file names and "short" file names. Guess which is longer?
Try this:
 cd \
 mkdir "1.1.1" 
 dir /x 1* 
 
Short names can be around twice as long as long names, at least. Anything with two dots, or Unicode I believe, needs a generated short name, even if it isn't particulary long. Short names these days tends to get more randomness in them I think, for security..
 
Also the wildcard matching is unpredictable due to generated short names.
 mkdir foo.1234 
 dir *.123 
 
That probably matches, but can't say for sure.
 
And still I trust NTFS more than anything else. :)
 
 - Jay
> Date: Sun, 14 Oct 2007 14:56:54 +0200> From: stsp at elego.de> To: rodney.bates at wichita.edu> CC: m3devel at elegosoft.com> Subject: Re: [M3devel] Pathname.Legal> > On Sun, Oct 14, 2007 at 06:11:49AM -0500, Rodney M. Bates wrote:> > Since the language itself specifies that program variables of type> > CHAR are in ISO Latin-1, not just ASCII, I think extending compilers,> > etc., to handle those characters makes complete sense, without even> > needing to view it as support for unicode or differing locales.> >> > Do I understand correctly that Neels' patch extends just to ISO Latin-1?> > More than that. The patch allows any byte-sized character> except the DirSepChar, which effectivly makes any character> encoding that uses single byte encoding legal.> > So Latin-2 etc. are also included, which is a feature,> not a bug. As long as only single byte encodings are involved> this is totally fine.> > So since CM3 assumes Latin-1 anyway, not handling unicode correctly> is not a problem. But users should be made aware that if they> use CM3 programs with filenames in multi-byte encodings such> as UTF-8, really strange things may happen...> > CM3 should get unicode support some day... unicode is quite hairy,> I've seen quite a few UTF-8 related problems in the subversion bug> tracker. Subversion tries to use UTF-8 all the way.> > The problems were along the lines of using either> 'this an a with umlaut;',> or 'the next char has an umlaut; a;',> or 'a; the previous char had an umlaut;'> for encoding the ä character. These are all legal UTF-8.> > But: The encoding method used on a given system is up to the> filesystem implementation in the OS, i.e. hard to detect.> So in case of subversion, which does not heed all these cases (yet),> filenames with umlauts work on UNIX and Windows, but not on MacOSX.> Wheeee! :)> > -- > Stefan Sperling <stsp at elego.de> Software Developer> elego Software Solutions GmbH HRB 77719> Gustav-Meyer-Allee 25, Gebaeude 12 Tel: +49 30 23 45 86 96 > 13355 Berlin Fax: +49 30 23 45 86 95> http://www.elego.de Geschaeftsfuehrer: Olaf Wagner
_________________________________________________________________
Windows Live Hotmail and Microsoft Office Outlook – together at last.  Get it now.
http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20071014/9083d35f/attachment-0002.html>


More information about the M3devel mailing list