[M3devel] cm3: what are *.mc files
Rodney M. Bates
rodney_bates at lcwb.coop
Sat Jun 13 20:23:20 CEST 2015
On 06/12/2015 01:51 PM, Elmar Stellnberger wrote:
> Am 12.06.15 um 20:00 schrieb Rodney M. Bates:
>>
>>
>> On 06/12/2015 09:51 AM, Elmar Stellnberger wrote:
>>>
>>> Thanks a lot Rodney and Jay;
>>> that will certainly help my implementation.
>>>
>>> So far all *.mc files found on my machine have the
>>> following signature:
>>> 16_FD,00,01,{00}
>>>
>>> except a few text - .mc from PM3 which start
>>> alltogether with "begin_unit".
>>>
>>> Rodney, do you believe that I can rely on the 4th byte
>>> to be zero as generated by the Modula-3 middle end. -
>>> or would anyone be ready to uphold such a guarantee
>>> for the future?
>>>
I doubt anyone would want to guarantee any of these 4 bytes, as they are
actual content information and not a magic number. However, I would say
the 1st (FD) and last (00, code for begin_unit) have pretty low probability
of changing. The middle two contain a version number, so can be expected
to change. The second (counting from one) is least significant, and more
likely to change. We already have two possible values here, as I reported.
>>
>> The 4th byte is not really dependable for the future. It never has had
>> a real magic number. The FD,00,01 is a version number on the binary
>> format, so even it is likely to change.
>>
>> The 4th byte zero is a binary opcode for begin_unit, equivalent
>> to the "begin_unit" in the PM3 text version.
> Well, the begin_unit is exactly what I check for when an .mc appears to be text.
> If 00 encodes begin_unit I believe it should be save to check for FD,00,01,00
> and FD,10,01,00. How could an .mc file not start with begin_unit? Wouldn`t
> that be invalid? - or if it still would be valid I believe we didn`t generate such
> files, yet.
> - so if for the future it may start with any other command a fixed 4-byte magic
> which is not already interpreted would be welcome. Basically any random
> number should suffice as with 1.000.000 already registered file formats the
> probability for a clash would just be 1/4000. Nonetheless we could double-
> check against the database of the "file" program.
> Not all files have a completely random magic; f.i. pyc (compiled python files)
> have xx\r\ndddd as a header where xx is a 2-byte number and dddd must be
> a valid date. However if we can choose things from scratch I would speak for
> a fixed header f.i. FD,10,01,XX and add things like gcc, cm3 version numbers
> and timestamps in the following (*).
> It would be beneficial to have at least a cm3-middleend version number
> encoded since not every backend can be combined with any middle/front-end.
>
> * with a version dependent 2-byte header portion I will need a vaildly set current
> system date to determine whether it is a .pyc of a future version of python.
>
>>
>> I think the most reliable long-term way is just to look for file names *.mc and
>> *.ic. Be sure to look for both. *.mc is for a MODULE and *.ic is for an
>> INTERFACE. These can be regenerated from source and will not be needed once a
>> compile is complete, unless you are into vetting/debugging the compiler.
>> So deleting them is quite safe.
> Not all *.mc belong to Modula-3. I have some *.mc in my home directory which
> are in a fact MS Visual Studio files. That is why I prefer a combination of the
> file extension and file header/magic to determine whether a file can be auto-
> matically deleted.
OK, file names are not adequate.
> For Modula-3 it is also quite save to look for TARGET directories**. However if we
> meet a file which does not contain plain human readable text we may always
> want to determine in some way where the file stems from and what data it may
> contain. File suffixes can be stripped by accident (f.i. on an iso9660 file system)
> or intendedly by will. - and perhaps we do not want to look to deep into a
> binary before determining what it is (f.i. by a file manager). Even the "file"-tool
> was already reported to have a security vulnerability ...
>
> ** that will at best poorly work on a non-Unix system where file names are not
> case sensitive.
>
>>
>> I suppose we could add a magic number. We already have a front/back end
>> compatibility change between the release and head compilers. I can do this,
>> if there is consensus we should. How would we choose the number?
>>
>>
>>> Anyone here who has applied "od" on an .mc generated
>>> by a very recent compiler? - do they start with
>>> 16_FD,10,01,?00?
>>>
>>> Most binary file types would guarantee a header of at
>>> least 4 Byte and it should be more straight forward and
>>> secure to check 32bit instead of 24bit if possible.
>>>
>>> Any suggestions?
>>>
>>>
>>> Am 10.06.15 um 02:21 schrieb Rodney M. Bates:
>>>>
>>>>
>>>> On 06/09/2015 03:02 PM, Elmar Stellnberger wrote:
>>>>> What are *.mc - files?
>>>>> They appear in TARGET - directories;
>>>>> most of them are just called _m3main.mc but some of them have other names.
>>>>>
>>>>> I ask because I am writing a program which should recognize and clear object files.
>>>>> It does not seem to be sufficient to check for uppercase directories which are located together with an src directory.
>>>>>
>>>>> Usually files of a specific type start with a 32bit magic;
>>>>> however the mc files all have different starting sequences.
>>>>>
>>>>> Is there still a straight forward way to recognize an .mc file just by its binary content?
>>>>>
>>>>
>>>> They will start with either 16_FD 16_00 16_01, produced by older versions of cm3,
>>>> or 16_FD 16_10 16_01, produced by a very recent head compiler.
>>>> Ignore the 4th byte.
>>>
>>>
>>> Am 09.06.15 um 22:14 schrieb Jay K:
>>>> ps:
>>>>
>>>> foo.m3 => foo.mc => cm3cg => foo.ms => as => foo.mo
>>>> foo.i3 => foo.ic => cm3cg => foo.is => as => foo.io
>>>>
>>>> again, see cm3 -keep, err better yet, cm3 -keep -verbose
>>>> You can see it running cm3cg and as and rm.
>>>>
>>>>
>>>> - Jay
>>>>
>>>
>>
>
>
--
Rodney Bates
rodney.m.bates at acm.org
More information about the M3devel
mailing list