To summarise the summary:
92% of the catastrophic failures in distributed systems
that Ding Yuan, Yu Luo, Xin Zhuang, Guilherme Renna Rodrigues,
Xu Zhao, Yongle Zhang, Pranay U. Jain, and Michael Stumm
studied "are the result of incorrect handling of nonfatal
errors explicitly signalled in software."
and from the paper:
"We found the majority of catastrophic failures could easily
have been prevented by performing simple testing on error
handling code - the last line of defense – even without an
understanding of the software design."
The second quote is basically talking about ensuring that test
coverage covers exception handlers.
Reading this made me realise just what a big deal
"Let it Crash!" is. It is literally unthinkable
for most programmers, due to the way we teach them
using languages like Java. Nobody seems to be able
to think "hey, if we get a lot of crashes due to
'sloppy' exception handling code, maybe we shouldn't
*have* exception handlers." And it's sobering to
realise that *I* would probably never have had this
insight. I am just awed by the mind that could think
such a thing.
Viva Erlang! Semper floreat et crescat!
(Another article in that issue,
"Too Big NOT to Fail, Embrace failure so it does not embrace you",
may also be of interest.)