[M3devel] Heartbleed, initialization, and Modula-3
Rodney M. Bates
rodney_bates at lcwb.coop
Thu Jun 5 00:39:30 CEST 2014
Olaf's recent mention of safe languages and Heartbleed prompted me to
look into the specifics of the bug, particularly to see what Modula-3
might have done to prevent it.
According to the descriptions I found, a recent protocol extension
(called "heartbeat") allows one machine (call it the client) to ask
another (call it the server) to echo a verbatim copy of an arbitrary
character string, I suppose to check whether the server is alive and
responding.
The request message contains two redundant string lengths, one part of
the string itself (in a way not described in the descriptions I saw,
but it doesn't matter) and one prepended at the beginning, so the
server can allocate a buffer prior to storing the string. The two
were probably presumed to be equal, but nothing forces this. How good
a protocol design this is could be debated.
The real bug is in the server-side implementation, which uses the
requested buffer size rather than the sent string length as the length
of the string to echo. So an attacker can give an over-large buffer
size and a short string. The string gets stored in the first few
bytes of the buffer, while the rest are uninitialized. The attacker
gets back a lot of left-over bytes from whatever the buffer was used
for previously. It would then have to figure out what it actually got
and how to exploit it, but the data is at least there.
Modula-3, as it is would likely not have prevented this. The language
requires only that all newly allocated variables come into existence
with some bit pattern that is a legal value of the type. For many
types, this would require a compiler-generated runtime initialization,
which would have overlaid the leftover sensitive data. But for a type
whose value set covers all bit patterns of the machine-level
representation, the language rule is satisfied with no initialization.
Coded in Modula-3, the buffer would almost surely been typed as an
array of CHAR or array of some other discrete type whose range exactly
uses a byte, thus requiring no compiler-generated initialization and
leaving the sensitive data intact.
I have long been ambivalent about this language rule. I see the
argument for it. It is the minimum runtime penalty that ensures the
abstract type system of the language behaves as expected. But it also
allows some things to happen that, although not type-safety bugs, are
nevertheless undefined behaviour, even if explainable in the abstract
data system of the language.
Defined initialization of everything would have prevented the
Heartbleed bug from compromising sensitive data. It would also make
uninitialized (by explicit source code) bugs deterministic and
repeatable, which is a huge advantage. The cost would be small
constant-time execution speed losses, greatly diluted by other things.
This would allow us to claim publically that modern Modula-3 would
have prevented the Heartbleed bug from compromising anything.
We could also implement this without stating it in the language, but I
think that might be something of the worst of both worlds, since one
could not fully rely on it's staying that way.
What does everyone think?
P.S.: It's pretty easy to define the values and pretty easy to
implement.
--
Rodney Bates
rodney.m.bates at acm.org
More information about the M3devel
mailing list