[M3devel] some vagaries of Win32 path semantics (and some Mac)
Jay
jayk123 at hotmail.com
Fri Feb 22 11:09:41 CET 2008
I remembered more complications.
Remember that many of these are exposed to Unix via Windows file servers.
And of course Mac -- notice that PPC_DARWIN is case sensitive about names, even though that is commonly "wrong".
But again, it varies per file system, not "POSIX" vs. "WIN32" ("POSIX" as in Modula-3 source directory/OS_TYPE. I don't know about the standard.)
When you are case insensitive, just what are the rules?
M3Path handles only a-z.
And it doesn't handle Unicode anyway I think, not sure.
On NTFS..
C:\>more < $UpcaseAccess is denied.
C:\>more < $UpcasexThe system cannot find the file specified.
C:\>
http://www.ntfs.com/ntfs-system-files.htm
There's also a technote on www.apple.com talking about precomposed or non-precomposed and what is canonical.
This has something to do with accent characters.
And short names are funky.
You can turn off the generation of them.
I think I sent there here already, but watch this:
First, they aren't necessarily shorter than long names. In this case the short name is twice the length of the long name.
Short names are always limited to 8.3 though, one dot, and certain characters (ie: not Unicode, I think).
C:\>mkdir 1.1.1
C:\>dir /x 1*02/22/2008 02:03 AM <DIR> 112E5D~1.1 1.1.1
Second, wildcard matching (FindFirstFile/FindNextFile) always includes them:
C:\>mkdir foo.bar
C:\>mkdir foo.barf
C:\>dir *.bar
02/22/2008 02:04 AM <DIR> foo.bar02/22/2008 02:04 AM <DIR> foo.barf Huh?
C:\>dir /x *bar02/22/2008 02:04 AM <DIR> foo.bar02/22/2008 02:04 AM <DIR> FOO~1.BAR foo.barf Oh..
- Jay
From: jayk123 at hotmail.comTo: m3devel at elegosoft.comSubject: some vagaries of Win32 path semantics (and some Mac)Date: Fri, 22 Feb 2008 07:37:37 +0000
some vagaries of Win32 path semantics (and some Mac) You can learn about this stuff by looking at the NT namespace with winobj.And/or watching calls to NtCreateFile in a debugger.And/or with filemon.And/or reading the documentation for driver writers.And/or various documentation about the NT kernel interface.And/or experimenting with cmd (assuming cmd isn't doing the wierd stuff).This is about kernel32.dll for now.When using 8 bit characters, the length limit is 260 characters.Whether or not that includes the terminal zero is not clear.When using 16 bit characters, the "default" limit is also 260 characters. MS-DOS limit is 64 or maybe 128 characters, so this is progress. (!) Unix limit I think is usually around 1024. That's still pretty lame imho, just a little less lame. The actual Windows limit is 32K which seems pretty ok to me, though 16bit limits are surprising.The limit on INDIVIDUAL PATH ELEMENTS, as dictated by FindFirstFile/FindNextFileis also 260 characters. But again, for individual path elements.I haven't tried exceeding that. The creation paths don't clearly havethis limit. It is worth experimenting with.Ignoring Windows 9x, everything is built on top of the NT kernel.Most interesting here is NtCreateFile.At the kernel level, "relative opens" are allowed.All the various "name" or "path" based functions don't just take a string,they take an OBJECT_ATTRIBUTES.This is mainly flags, an optional parent handle, and a unicode string.The length of the unicode string is stored in an unsigned shortrepresenting a number of bytes. So the limit is around 32,767.One of the flags controls case sensitivity, for example, at least somewhat.I don't know what happens if you try to be case sensitive on FAT, for example.NTFS allows volumes to be mounted in empty directories.I assume such a volume could be FAT, so even if c:\ is NTFS and capableof case sensitivity, c:\foo might not be.If you are doing a non-relative open at the NT level, there is no working directoryor relative paths or such. There is basically just full paths.They don't look quite exactly like anything else.They look like \??\c:\windows\system32\kernel32.dll or \??\unc\machine\share\foo\bar.txt \?? before around NT4 was named \DosDevices.I suspect \?? was an optimization -- making the common case a shorter string.Seems lame but oh well.You can see in driver stuff about setting up symbolic links related to \DosDevices.Yes, NT has symbolic links, in the kernel namespace.At the kernel32.dll level, the documentation clearly exposes something very related.To open paths longer than 260 characters, they say to use the prefixes: \\? or \\?\uncLast I checked, the documentation was a little unclear.What they mean is, to form paths like: \\?\c:\windows\system32\kernel32.dll \\?\unc\machine\share\foo\bar.txt They don't say so, but quick attempts otherwise clearly show, thatthe paths must be full paths. No relative to any "current working directory".An implementation trick should be evident.Just change the second character from \ to ? and you get an NT path.The documentation says \\? "turns off path parsing".Usually all file paths undergo some amount of canonicalization.Forward slashes are changed to backward slashes.Runs of backward slashes are changed to a single slash.Spaces might be removed in some places?Trailing dots also?That is what \\? "turns off".You can see this if you make some CreateFile calls and watch the resulting NtCreateFile.I should put together a demo. Using some hack to intercept the NtCreateFile call.Nearly everything is demoable from the command line.Try this: C:\> mkdir "foo" C:\> mkdir "foo " A subdirectory or file foo already exists. Huh? C:\>mkdir foo. A subdirectory or file foo. already exists. Huh? C:\>mkdir " foo"ok.C:\>mkdir "\\?\c:\foo"C:\>mkdir "\\?\c:\foo " => works rmdir "\\?\c:\foo "C:\>mkdir "\\?\c:\foo."C:\>dir fo* Volume in drive C has no label. Volume Serial Number is A803-BC73 Directory of C:\02/21/2008 11:07 PM <DIR> foo02/21/2008 11:07 PM <DIR> foo. 0 File(s) 0 bytes 2 Dir(s) 26,666,397,696 bytes freeC:\>rmdir foo.The system cannot find the file specified.Huh?C:\>dir fo* Volume in drive C has no label. Volume Serial Number is A803-BC73 Directory of C:\02/21/2008 11:07 PM <DIR> foo02/21/2008 11:07 PM <DIR> foo. 0 File(s) 0 bytes 2 Dir(s) 26,666,397,696 bytes free C:\>mkdir bar. C:\>rmdir bar. C:\>mkdir bar. C:\>rmdir bar huh? mkdir "foo \bar" dir foo<tab> expands to "foo " because tab found it but then enterand File Not Found Huh? C:\>mkdir "\\?\c:\foo/" The filename, directory name, or volume label syntax is incorrect. => forward slash not liked C:\>mkdir "c:\foo/" => no error C:\>mkdir "\\?\c:\foo\..\bar" The filename, directory name, or volume label syntax is incorrect. => .. apparently not liked I tried . and that did work. C:\>mkdir c:\foo\..\bar => ok So now I ask -- what is the portable interface and implementation? Most code uses CreateFile and doesn't use \\?. So most code is limited to MAX_PATH and has problems with spaces and dots in some places. These features are all laudable -- allow paths with more than 260 characters and with spaces and dots in more places, if the programmer or user really wants, but this \\? vs. \\? behavior is strange. Trailing spaces in paths tend to be "invisible" in any user interface.(I wonder about tabs too, vertical space, beep, etc.) Note that not everything goes through Win32 usermode CreateFile/kernel32.dll. It is not the one and only path to the file system, but it is overwhelmingly common. Imagine going over the network from a non-Windows client (e.g. Samba) And maybe Services for Unix. shlwapi.dll and even shell32.dll also have a bunch of path/file unitility functions. I've hardly ever used them. I think I tried forward slashes with shlwapi.dll once and no go.In an ntsd/cdb/windbg, try a breakpoint like: bp ntdll!NtCreateFile "!obja poi(@esp+c);g" That will trace the paths to NtCreateFile. It is a low tech filemon.esp is the stack pointer and the object attributes is the third parameter (4*3=c),and !obja prints object attributes.I got bored writing this and maybe didn't finish covering everything, sorry.The thing to do is experiment and see what all changes between CreateFile and NtCreateFile.e.g. paths relative to the current working directory.Oh, also, there is a relative working directory per volume on Windows.There are special environment variables used to store them.Something like the variable =c: has c:'s working directory. C:\>echo %=c:% C:\ C:\>cd foo C:\foo>echo %=c:% C:\foo I only have one volume, so let's make another: C:\foo>subst d: c:\ d: cd bar D:\bar>echo %=d:% D:\bar c: brings me back to c:\foo cd d:\windows changes the working directory of D:, but doesn't bring me there d: now I am on the d: drive You can use /d to cd to change drive and directory at the same time.Cygwin also uses NtCreateFile sometimes. I haven't yet looked at why. NTFS has hardlinks for files, not directories (avoid cycles and I guess adequate for strict Posix?). NTFS on Vista has symlinks I guess for files and directories, not sure. Windows 2000 added a CreateHardLink function. XP has "fsutil hardlink create newlink existingfile" Vista has that and "mklink". Also, on my Mac I can create filenames with forward slashes in them.I have some old files from www.apple.com named like "C/C++ compiler reference".At the command line I think the forward slashes show as colons.Historically on the Mac the colon is path separator, but the "syntax"is different than Posix and Windows. Pathname.i3 documents it.If you read the Mac OS X overviews, you can see Mac OS X is a tremendous mish-mashof similar redundant interfaces and implementations.There are many sets of functions for doing the same thing.There are older functions for example that take 8 bit characters, with limits of255, though usually I believe you do directory relative opens.There are newer APIs that use Unicode and are more opaque.You can have two volumes with the same nameso: :hard disk:foo:bar (at least on Mac OS Classic, not sure about X)can actually name any number of files on Mac, depending on how many disks are named "hard disk".I think what you have to do is enumerate volumes, get their "reference numbers" and do relativeopens from there.so much for string equality meaning two paths reference the same file.Now, granted, if you do two open calls with the same path, I assume it always gets the same one.However, what if in the meantime, more volumes come online?Maybe earlier mounted is earlier opened and consistent? Big mess.(GS/OS on the long dead Apple IIGS also has this forward slash or colon behavior,and I think instead of one current working directory, you could assign a bunch ofthem to numbers. I think file names couldn't start with numbers, or maybe that was ambiguous.) - Jay
Climb to the top of the charts! Play the word scramble challenge with star power. Play now!
_________________________________________________________________
Shed those extra pounds with MSN and The Biggest Loser!
http://biggestloser.msn.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20080222/61525154/attachment-0002.html>
More information about the M3devel
mailing list