[M3devel] On magic numbers

Rodney M. Bates rodney_bates at lcwb.coop
Sat Jun 13 20:42:26 CEST 2015



On 06/13/2015 06:37 AM, Hendrik Boom wrote:
> On Fri, Jun 12, 2015 at 08:51:31PM +0200, Elmar Stellnberger wrote:
>
>> Basically any random
>> number should suffice as with 1.000.000 already registered file formats the
>> probability for a clash would just be 1/4000. Nonetheless we could double-
>> check against the database of the "file" program.
>
> For more collision-freeness for the foreseeable future, I'd suggest a
> 64-bit random number.  Even if there were a collision with someone
> else's 32-bit number, then next 32 bits would likely resolve the issue.
>

There already is a 64-bit "signature" hash of type structure computed in
the compiler, but it is only used in pickles.  Elsewhere, a 32-bit "UID"
is used.  It is just the XOR of the halves of the signature.  It was very
difficult for me to ferret this conclusion out of the code.  The UID
is also called something else (I don't remember what) in m3linker and the
messages it emits, and may even have more names, making for confusing
error messages and difficulty understanding the build system.

The fingerprint algorithm has comments suggesting a careful design by
someone familiar with high-quality hashes.  (That's not me!)  Using the
full 64 bits everywhere would probably create some annoying transitional
compatibility problems.

> It's not too far-fetched to assume that the number of different file
> formats will continue increasing exponentially even as our world-wide
> data storage increases.
>
> And maybe it's tie that the hash codes we use for data types also
> increase in length.  I've always considered 32 bits a bit too small for
> this, especially in the days of *huge* program libraries.  Maybe a
> necessary evil as a concession to antiquated linkers, but it could
> legitimately be made platform-dependent.
>
> For backward copatibility, the compiler could just start checking for
> the magic number.  If it's present, skip it.  If it's absent, go on as
> at present.
>
>> Not all files have a completely random magic; f.i. pyc (compiled
>> python files)
>> have xx\r\ndddd as a header where xx is a 2-byte number and dddd must be
>> a valid date. However if we can choose things from scratch I would speak for
>> a fixed header f.i. FD,10,01,XX and add things like gcc, cm3 version numbers
>> and timestamps in the following (*).
>> It would be beneficial to have at least a cm3-middleend version number
>> encoded since not every backend can be combined with any middle/front-end.
>
> Of course this should still be appended to the 128 (or however many)
> bits.
>
> -- hendrik
>

-- 
Rodney Bates
rodney.m.bates at acm.org



More information about the M3devel mailing list