[M3devel] On magic numbers

Hendrik Boom hendrik at topoi.pooq.com
Sat Jun 13 13:37:43 CEST 2015


On Fri, Jun 12, 2015 at 08:51:31PM +0200, Elmar Stellnberger wrote:
 
> Basically any random
> number should suffice as with 1.000.000 already registered file formats the
> probability for a clash would just be 1/4000. Nonetheless we could double-
> check against the database of the "file" program.

For more collision-freeness for the foreseeable future, I'd suggest a 
64-bit random number.  Even if there were a collision with someone 
else's 32-bit number, then next 32 bits would likely resolve the issue.

It's not too far-fetched to assume that the number of different file 
formats will continue increasing exponentially even as our world-wide 
data storage increases.

And maybe it's tie that the hash codes we use for data types also 
increase in length.  I've always considered 32 bits a bit too small for 
this, especially in the days of *huge* program libraries.  Maybe a 
necessary evil as a concession to antiquated linkers, but it could 
legitimately be made platform-dependent.

For backward copatibility, the compiler could just start checking for 
the magic number.  If it's present, skip it.  If it's absent, go on as 
at present.

> Not all files have a completely random magic; f.i. pyc (compiled
> python files)
> have xx\r\ndddd as a header where xx is a 2-byte number and dddd must be
> a valid date. However if we can choose things from scratch I would speak for
> a fixed header f.i. FD,10,01,XX and add things like gcc, cm3 version numbers
> and timestamps in the following (*).
> It would be beneficial to have at least a cm3-middleend version number
> encoded since not every backend can be combined with any middle/front-end.

Of course this should still be appended to the 128 (or however many) 
bits.

-- hendrik



More information about the M3devel mailing list