[M3devel] Heartbleed, initialization, and Modula-3

Fri Jun 6 16:26:18 CEST 2014

On 06/04/2014 08:03 PM, Hendrik Boom wrote:
> On Wed, Jun 04, 2014 at 05:39:30PM -0500, Rodney M. Bates wrote:
>> Olaf's recent mention of safe languages and Heartbleed prompted me to
>> look into the specifics of the bug, particularly to see what Modula-3
>> might have done to prevent it.
>>
>> According to the descriptions I found, a recent protocol extension
>> (called "heartbeat") allows one machine (call it the client) to ask
>> another (call it the server) to echo a verbatim copy of an arbitrary
>> character string, I suppose to check whether the server is alive and
>> responding.
>>
>> The request message contains two redundant string lengths, one part of
>> the string itself (in a way not described in the descriptions I saw,
>> but it doesn't matter) and one prepended at the beginning, so the
>> server can allocate a buffer prior to storing the string.  The two
>> were probably presumed to be equal, but nothing forces this.  How good
>> a protocol design this is could be debated.
>>
>> The real bug is in the server-side implementation, which uses the
>> requested buffer size rather than the sent string length as the length
>> of the string to echo.  So an attacker can give an over-large buffer
>> size and a short string.  The string gets stored in the first few
>> bytes of the buffer, while the rest are uninitialized.  The attacker
>> gets back a lot of left-over bytes from whatever the buffer was used
>> for previously.  It would then have to figure out what it actually got
>> and how to exploit it, but the data is at least there.
>>
>> Modula-3, as it is would likely not have prevented this.  The language
>> requires only that all newly allocated variables come into existence
>> with some bit pattern that is a legal value of the type.  For many
>> types, this would require a compiler-generated runtime initialization,
>> which would have overlaid the leftover sensitive data.  But for a type
>> whose value set covers all bit patterns of the machine-level
>> representation, the language rule is satisfied with no initialization.
>>
>> Coded in Modula-3, the buffer would almost surely been typed as an
>> array of CHAR or array of some other discrete type whose range exactly
>> uses a byte, thus requiring no compiler-generated initialization and
>> leaving the sensitive data intact.
>>
>> I have long been ambivalent about this language rule.  I see the
>> argument for it.  It is the minimum runtime penalty that ensures the
>> abstract type system of the language behaves as expected.  But it also
>> allows some things to happen that, although not type-safety bugs, are
>> nevertheless undefined behaviour, even if explainable in the abstract
>> data system of the language.
>>
>> Defined initialization of everything would have prevented the
>> Heartbleed bug from compromising sensitive data.  It would also make
>> uninitialized (by explicit source code) bugs deterministic and
>> repeatable, which is a huge advantage.  The cost would be small
>> constant-time execution speed losses, greatly diluted by other things.
>
> constant-time?  only if they are only ever executed once each.
>
>>
>> This would allow us to claim publically that modern Modula-3 would
>> have prevented the Heartbleed bug from compromising anything.
>>
>> We could also implement this without stating it in the language, but I
>> think that might be something of the worst of both worlds, since one
>> could not fully rely on it's staying that way.
>>
>> What does everyone think?
>>
>> P.S.: It's pretty easy to define the values and pretty easy to
>> implement.
>
> And pretty slow if the first thing you do with that storage is to
> initialize it yourself.
> I'd be in favour of such an initialization as a compile-time option it
> the values were values that are not likely to be welcome in normal use,
> so that the programmer's missed initialiations are likely to show
> up.  for example, a NaN for floating point.

Hmm, defining it in the language for floating piont is harder than had
occurred to me.  A NaN is obviously right, but not all floating point
representations have NaNs, and the language needs not to require them.

Actually, it would be sufficient for the original example if variables
were just initialized, not necessarily to anything in particular, as
long as it didn't depend on anything happening at runtime.

> But in production I'd really like the option of turning this off if
> performance is critical.
> And even if it's on, I'd like the optimiser, if any, to remove these
> default initialisations if it can determine that the default
> initializations are dead.

Yes, I would take it for granted that well-known and already implemented
optimizations would do this.

>
> (that said, I hardly ever turn checking options off in practice.)
>
> -- hendrik
>>
>> --
>> Rodney Bates
>> rodney.m.bates at acm.org
>

-- 
Rodney Bates
rodney.m.bates at acm.org