[M3devel] cm3: what are *.mc files

Elmar Stellnberger estellnb at elstel.org
Fri Jun 12 20:51:31 CEST 2015


Am 12.06.15 um 20:00 schrieb Rodney M. Bates:
>
>
> On 06/12/2015 09:51 AM, Elmar Stellnberger wrote:
>>
>> Thanks a lot Rodney and Jay;
>> that will certainly help my implementation.
>>
>> So far all *.mc files found on my machine have the
>> following signature:
>> 16_FD,00,01,{00}
>>
>> except a few text - .mc from PM3 which start
>> alltogether with "begin_unit".
>>
>> Rodney, do you believe that I can rely on the 4th byte
>> to be zero as generated by the Modula-3 middle end. -
>> or would anyone be ready to uphold such a guarantee
>> for the future?
>>
>
> The 4th byte is not really dependable for the future.  It never has had
> a real magic number.  The FD,00,01 is a version number on the binary
> format, so even it is likely to change.
>
> The 4th byte zero is a binary opcode for begin_unit, equivalent
> to the "begin_unit" in the PM3 text version.
Well, the begin_unit is exactly what I check for when an .mc appears to 
be text.
If 00 encodes begin_unit I believe it should be save to check for 
FD,00,01,00
and FD,10,01,00. How could an .mc file not start with begin_unit? Wouldn`t
that be invalid? - or if it still would be valid I believe we didn`t 
generate such
files, yet.
- so if for the future it may start with any other command a fixed 
4-byte magic
which is not already interpreted would be welcome. Basically any random
number should suffice as with 1.000.000 already registered file formats the
probability for a clash would just be 1/4000. Nonetheless we could double-
check against the database of the "file" program.
Not all files have a completely random magic; f.i. pyc (compiled python 
files)
have xx\r\ndddd as a header where xx is a 2-byte number and dddd must be
a valid date. However if we can choose things from scratch I would speak for
a fixed header f.i. FD,10,01,XX and add things like gcc, cm3 version numbers
and timestamps in the following (*).
It would be beneficial to have at least a cm3-middleend version number
encoded since not every backend can be combined with any middle/front-end.

* with a version dependent 2-byte header portion I will need a vaildly 
set current
system date to determine whether it is a .pyc of a future version of python.

>
> I think the most reliable long-term way is just to look for file names 
> *.mc and
> *.ic.  Be sure to look for both.  *.mc is for a MODULE and *.ic is for an
> INTERFACE.  These can be regenerated from source and will not be 
> needed once a
> compile is complete, unless you are into vetting/debugging the compiler.
> So deleting them is quite safe.
Not all *.mc belong to Modula-3. I have some *.mc in my home directory which
are in a fact MS Visual Studio files. That is why I prefer a combination 
of the
file extension and file header/magic to determine whether a file can be 
auto-
matically deleted.
For Modula-3 it is also quite save to look for TARGET directories**. 
However if we
meet a file which does not contain plain human readable text we may always
want to determine in some way where the file stems from and what data it may
contain. File suffixes can be stripped by accident (f.i. on an iso9660 
file system)
or intendedly by will. - and perhaps we do not want to look to deep into a
binary before determining what it is (f.i. by a file manager). Even the 
"file"-tool
was already reported to have a security vulnerability ...

** that will at best poorly work on a non-Unix system where file names 
are not
case sensitive.

>
> I suppose we could add a magic number.  We already have a front/back end
> compatibility change between the release and head compilers.  I can do 
> this,
> if there is consensus we should.  How would we choose the number?
>
>
>> Anyone here who has applied "od" on an .mc generated
>> by a very recent compiler? - do they start with
>> 16_FD,10,01,?00?
>>
>> Most binary file types would guarantee a header of at
>> least 4 Byte and it should be more straight forward and
>> secure to check 32bit instead of 24bit if possible.
>>
>> Any suggestions?
>>
>>
>> Am 10.06.15 um 02:21 schrieb Rodney M. Bates:
>>>
>>>
>>> On 06/09/2015 03:02 PM, Elmar Stellnberger wrote:
>>>> What are *.mc - files?
>>>> They appear in TARGET - directories;
>>>> most of them are just called _m3main.mc but some of them have other 
>>>> names.
>>>>
>>>> I ask because I am writing a program which should recognize and 
>>>> clear object files.
>>>> It does not seem to be sufficient to check for uppercase 
>>>> directories which are located together with an src directory.
>>>>
>>>> Usually files of a specific type start with a 32bit magic;
>>>> however the mc files all have different starting sequences.
>>>>
>>>> Is there still a straight forward way to recognize an .mc file just 
>>>> by its binary content?
>>>>
>>>
>>> They will start with either 16_FD 16_00 16_01, produced by older 
>>> versions of cm3,
>>>                          or 16_FD 16_10 16_01, produced by a very 
>>> recent head compiler.
>>> Ignore the 4th byte.
>>
>>
>> Am 09.06.15 um 22:14 schrieb Jay K:
>>> ps:
>>>
>>>   foo.m3 => foo.mc => cm3cg => foo.ms => as => foo.mo
>>>   foo.i3 => foo.ic => cm3cg => foo.is => as => foo.io
>>>
>>>  again, see cm3 -keep, err better yet, cm3 -keep -verbose
>>>  You can see it running cm3cg and as and rm.
>>>
>>>
>>>  - Jay
>>>
>>
>




More information about the M3devel mailing list