Quantcast

Reliable kernels?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Reliable kernels?

Richard A. O'Keefe-2
I was introduced to the concept of a "million-year bug" today.
That's a bug in a program that, if you were running it on a
single CPU, would be expected to show up once in a million years.

With a couple of thousand million Linux kernels around the world
in phones &c, we can expect a million year bug in the kernel to
show up tens of times a day.

There's been a spate of problems with 911 in Dallas being overloaded
with bogus calls apparently sent autonomously by certain mobile
phones, to the point where a man and a child are thought to have
died because genuine 911 callers were put on hold for a long time.
That's probably *not* a million-year bug, but a million-year bug
might do that kind of thing.

Now me, I'm still happily pottering away on 4-core and 16-core
machines (even a 1-core machine that sees a lot of use because it
Just Keeps Working).  But I'm talking to people who want to use
unbelievable amounts of computing power.

I understand the Erlang Way:  write your software as lots of
small things communicating through narrow protocols, *expect*
failure and deal with it.  I believe!  Praise Joe, I believe!

That's not the way the people I'm talking to think.  They've got
a somewhat resilient data flow scheme they're proud of that has
thousands and tens of thousands of nodes hooked up through
Python, where the protocol between the nodes is Pyro.  Not,
"uses Pyro", "IS Pyro".

I'm supposed to tell these people what would be a good stripped
down kernel to use (or what would be a good kernel to strip
further), and I'm tempted to start by saying "strip away Python"
which certainly won't make me popular (;-).  But thinking about
error rates and million-year bugs has me thinking harder.

It seems as if we have to think in terms of *expecting* the
kernel itself to be unreliable-at-scale, so that something like
JailHouse might be the right level to start.

Does anyone have any experience with trying to harden a large system
against faults in the distribution layer and in the kernel, and have
any advice they'd care to share?

None of this is Erlang-specific, it's just that I think the Erlang
community are likely to have more relevant experience than most.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reliable kernels?

Michael Truog
On 03/27/2017 09:22 PM, Richard A. O'Keefe wrote:

> I was introduced to the concept of a "million-year bug" today.
> That's a bug in a program that, if you were running it on a
> single CPU, would be expected to show up once in a million years.
>
> With a couple of thousand million Linux kernels around the world
> in phones &c, we can expect a million year bug in the kernel to
> show up tens of times a day.
>
> There's been a spate of problems with 911 in Dallas being overloaded
> with bogus calls apparently sent autonomously by certain mobile
> phones, to the point where a man and a child are thought to have
> died because genuine 911 callers were put on hold for a long time.
> That's probably *not* a million-year bug, but a million-year bug
> might do that kind of thing.
>
> Now me, I'm still happily pottering away on 4-core and 16-core
> machines (even a 1-core machine that sees a lot of use because it
> Just Keeps Working).  But I'm talking to people who want to use
> unbelievable amounts of computing power.
>
> I understand the Erlang Way:  write your software as lots of
> small things communicating through narrow protocols, *expect*
> failure and deal with it.  I believe!  Praise Joe, I believe!
>
> That's not the way the people I'm talking to think.  They've got
> a somewhat resilient data flow scheme they're proud of that has
> thousands and tens of thousands of nodes hooked up through
> Python, where the protocol between the nodes is Pyro.  Not,
> "uses Pyro", "IS Pyro".
>
> I'm supposed to tell these people what would be a good stripped
> down kernel to use (or what would be a good kernel to strip
> further), and I'm tempted to start by saying "strip away Python"
> which certainly won't make me popular (;-).  But thinking about
> error rates and million-year bugs has me thinking harder.
>
> It seems as if we have to think in terms of *expecting* the
> kernel itself to be unreliable-at-scale, so that something like
> JailHouse might be the right level to start.
>
> Does anyone have any experience with trying to harden a large system
> against faults in the distribution layer and in the kernel, and have
> any advice they'd care to share?
>
> None of this is Erlang-specific, it's just that I think the Erlang
> community are likely to have more relevant experience than most.
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>

The expectation seems to be that microkernels would eventually replace monolithic kernels and the change would provide better reliability and security.  The two most accessible, well-known, and complete approaches appear to be:

1) seL4 which is open-source and formally verified (https://en.wikipedia.org/wiki/L4_microkernel_family) available at https://github.com/seL4/seL4
2) MINIX 3 (https://en.wikipedia.org/wiki/MINIX_3) available at http://www.minix3.org/

For normal UNIX use of an operating system you would likely need to go with MINIX 3 and use its ability to install from the NetBSD ports tree.  MINIX has a history of mainly being used for teaching, so this approach is likely not something most people would agree with immediately.  There are other attempts to pursue microkernels but it doesn't appear like they have been able to receive much attention and keep their SLOC low.

A more typical choice would be FreeBSD instead of Linux (though both are monolithic kernels) since FreeBSD is perceived as being more reliable (with the explanation that features are added to Linux at a much quicker rate which includes the addition of bugs).

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reliable kernels?

duncan
In reply to this post by Richard A. O'Keefe-2
re: "A more typical choice would be FreeBSD instead of Linux"
You might also want to look at hardenedbsd (https://hardenedbsd.org/content/about), a security hardened fork of freebsd. Because it's "security hardened", it might meet your needs.

Another option you might want to look at is NixOS, https://nixos.org/
If I were to try to make a minimalized OS, I'd probably start with NizOS since I think it would be easier to strip stuff out of.

Duncan Sparrell
sFractal Consulting LLC
iPhone, iTypo, iApologize


-------- Original Message --------
Subject: Re: [erlang-questions] Reliable kernels?
From: Michael Truog <[hidden email]>
Date: Tue, March 28, 2017 1:41 am
To: "Richard A. O'Keefe" <[hidden email]>, Erlang-Questions Questions
<[hidden email]>

On 03/27/2017 09:22 PM, Richard A. O'Keefe wrote:
> I was introduced to the concept of a "million-year bug" today.
> That's a bug in a program that, if you were running it on a
> single CPU, would be expected to show up once in a million years.
>
> With a couple of thousand million Linux kernels around the world
> in phones &c, we can expect a million year bug in the kernel to
> show up tens of times a day.
>
> There's been a spate of problems with 911 in Dallas being overloaded
> with bogus calls apparently sent autonomously by certain mobile
> phones, to the point where a man and a child are thought to have
> died because genuine 911 callers were put on hold for a long time.
> That's probably *not* a million-year bug, but a million-year bug
> might do that kind of thing.
>
> Now me, I'm still happily pottering away on 4-core and 16-core
> machines (even a 1-core machine that sees a lot of use because it
> Just Keeps Working). But I'm talking to people who want to use
> unbelievable amounts of computing power.
>
> I understand the Erlang Way: write your software as lots of
> small things communicating through narrow protocols, *expect*
> failure and deal with it. I believe! Praise Joe, I believe!
>
> That's not the way the people I'm talking to think. They've got
> a somewhat resilient data flow scheme they're proud of that has
> thousands and tens of thousands of nodes hooked up through
> Python, where the protocol between the nodes is Pyro. Not,
> "uses Pyro", "IS Pyro".
>
> I'm supposed to tell these people what would be a good stripped
> down kernel to use (or what would be a good kernel to strip
> further), and I'm tempted to start by saying "strip away Python"
> which certainly won't make me popular (;-). But thinking about
> error rates and million-year bugs has me thinking harder.
>
> It seems as if we have to think in terms of *expecting* the
> kernel itself to be unreliable-at-scale, so that something like
> JailHouse might be the right level to start.
>
> Does anyone have any experience with trying to harden a large system
> against faults in the distribution layer and in the kernel, and have
> any advice they'd care to share?
>
> None of this is Erlang-specific, it's just that I think the Erlang
> community are likely to have more relevant experience than most.
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>

The expectation seems to be that microkernels would eventually replace monolithic kernels and the change would provide better reliability and security. The two most accessible, well-known, and complete approaches appear to be:

1) seL4 which is open-source and formally verified (https://en.wikipedia.org/wiki/L4_microkernel_family) available at https://github.com/seL4/seL4
2) MINIX 3 (https://en.wikipedia.org/wiki/MINIX_3) available at http://www.minix3.org/

For normal UNIX use of an operating system you would likely need to go with MINIX 3 and use its ability to install from the NetBSD ports tree. MINIX has a history of mainly being used for teaching, so this approach is likely not something most people would agree with immediately. There are other attempts to pursue microkernels but it doesn't appear like they have been able to receive much attention and keep their SLOC low.

A more typical choice would be FreeBSD instead of Linux (though both are monolithic kernels) since FreeBSD is perceived as being more reliable (with the explanation that features are added to Linux at a much quicker rate which includes the addition of bugs).

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Loading...