Asynchronous consensus in <40 code lines of Erlang

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Asynchronous consensus in <40 code lines of Erlang

Bryan Ford
Hi all,

I’ve heard about Erlang for a long time, but finally had reason to try it.

So I used it to prototype a simple model of Que Sera Consensus (QSC), a new experimental asynchronous consensus algorithm.  I was impressed to discover that the complete, working algorithm - though just running on a single machine for now - is expressible in only about 37 code lines as counted by CLOC, not counting test code.  Erlang’s selective receive feature, in particular, seems to have been presciently designed three decades ago just to implement the Threshold Logical Clocks (TLC) communication primitive that QSC builds on.

If anyone feels like poking at it, helping to find any flaws, or building more realistic cross-node implementations and experimenting with it in real distributed applications, the code lives here:


I’d be happy to hear any feedback or experiences, either privately, on this list, or via the above GitHub repo.  Is it comprehensible?  Can you break it?  Can you find a competitively small and simple implementation of, say, Paxos or Raft or any other comparable consensus algorithm in Erlang?

For background, the actual algorithms are described in this preprint:

Que Sera Consensus: Simple Asynchronous Agreement with Private Coins and Threshold Logical Clocks

Enjoy.  And by the way, nice language. :)

Cheers
Bryan


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous consensus in <40 code lines of Erlang

Loïc Hoguin-3
You can test it using ConcuError.

"Concuerror is a model checking tool for debugging, testing and
verifying concurrent Erlang programs."

https://concuerror.com/

Cheers,

On 06/03/2020 11:52, Bryan Ford wrote:

> Hi all,
>
> I’ve heard about Erlang for a long time, but finally had reason to try it.
>
> So I used it to prototype a simple model of Que Sera Consensus (QSC), a
> new experimental asynchronous consensus algorithm.  I was impressed to
> discover that the complete, working algorithm - though just running on a
> single machine for now - is expressible in only about 37 code lines as
> counted by CLOC, not counting test code.  Erlang’s selective receive
> feature, in particular, seems to have been presciently designed three
> decades ago just to implement the Threshold Logical Clocks (TLC)
> communication primitive that QSC builds on.
>
> If anyone feels like poking at it, helping to find any flaws, or
> building more realistic cross-node implementations and experimenting
> with it in real distributed applications, the code lives here:
>
> https://github.com/dedis/tlc/tree/master/erlang/model
>
> I’d be happy to hear any feedback or experiences, either privately, on
> this list, or via the above GitHub repo.  Is it comprehensible?  Can you
> break it?  Can you find a competitively small and simple implementation
> of, say, Paxos or Raft or any other comparable consensus algorithm in
> Erlang?
>
> For background, the actual algorithms are described in this preprint:
>
> Que Sera Consensus: Simple Asynchronous Agreement with Private Coins and
> Threshold Logical Clocks
> https://arxiv.org/abs/2003.02291
>
> Enjoy.  And by the way, nice language. :)
>
> Cheers
> Bryan
>

--
Loïc Hoguin
https://ninenines.eu
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous consensus in <40 code lines of Erlang

Bryan Ford
On 6 Mar 2020, at 12:32, Loïc Hoguin <[hidden email]> wrote:

You can test it using ConcuError.

"Concuerror is a model checking tool for debugging, testing and verifying concurrent Erlang programs."

https://concuerror.com/

Thanks - good to know there’s a model checker for Erlang.

I had already model checked an earlier formulation of QSC and TLC written in Promela with the SPIN model checker (see https://github.com/dedis/tlc/tree/master/spin).  That version model checks in just a couple minutes, partly because it’s hugely simplified so that “messages” are just bits and such.  So I’m pretty confident in the algorithm’s basic logic but it’s not a “real” implementation of course.

I’d of course love to have a model-checked implementation in a real programming language like Erlang, so I spent a bit of time getting it to be compatible with Concuerror, e.g., by eliminating some false-positive errors that the test synchronization was producing.  Concuerror didn’t find any errors after running for about a day, but at that point it estimated another week or so to go before the test run got interrupted.  I further simplified the test synchronization to reduce interleavings more; we’ll see how this version model checks.

One potential issue is that QSC is a randomized algorithm, and model checking isn’t perfectly suited towards randomized algorithms.  In the Promela/SPIN model, I could model random number selection as a simple nondeterministic choice, which works for checking safety but of course does nothing to verify (probabilistic) liveness.  I can live with that.  The Erlang version calls rand:uniform/1 to pick random numbers, and I’m not entirely sure how Concuerror deals with that.  I guess the considerations for non-deterministic functions probably apply (https://concuerror.com/faq/#how-does-concuerror-handle-non-deterministic-functions), which means (for any thread or execution trace?) it just picks one random number and memoizes that, and doesn’t explore the space of all the random numbers it could have picked.  That in principle makes the model checking non-exhaustive, so it could in principle miss errors even if it ever finishes.  I didn’t see in the Concuerror documentation any obvious way to express a “nondeterministic choice” for exhaustive search as in Promela/SPIN.  

I tried creating a “random number generator” that chooses between 1 and 2 by setting up a race between two threads sending two messages.  This in principle transforms the “random number generation” into an interleaving choice, which is the state space Concuerror is designed to explore.  But surprise surprise, that led to a further explosion in the estimated time Concuerror model checking would take assuming it ever finishes.

So while a nice exercise, I guess this is another data point supporting the rule of thumb that model checking is great in principle but if the concurrent code isn’t designed to be hopelessly unrealistically trivial, model checking will never finish. :)

Cheers
Bryan

Cheers,

On 06/03/2020 11:52, Bryan Ford wrote:
Hi all,
I’ve heard about Erlang for a long time, but finally had reason to try it.
So I used it to prototype a simple model of Que Sera Consensus (QSC), a new experimental asynchronous consensus algorithm.  I was impressed to discover that the complete, working algorithm - though just running on a single machine for now - is expressible in only about 37 code lines as counted by CLOC, not counting test code.  Erlang’s selective receive feature, in particular, seems to have been presciently designed three decades ago just to implement the Threshold Logical Clocks (TLC) communication primitive that QSC builds on.
If anyone feels like poking at it, helping to find any flaws, or building more realistic cross-node implementations and experimenting with it in real distributed applications, the code lives here:
https://github.com/dedis/tlc/tree/master/erlang/model
I’d be happy to hear any feedback or experiences, either privately, on this list, or via the above GitHub repo.  Is it comprehensible?  Can you break it?  Can you find a competitively small and simple implementation of, say, Paxos or Raft or any other comparable consensus algorithm in Erlang?
For background, the actual algorithms are described in this preprint:
Que Sera Consensus: Simple Asynchronous Agreement with Private Coins and Threshold Logical Clocks
https://arxiv.org/abs/2003.02291
Enjoy.  And by the way, nice language. :)
Cheers
Bryan

--
Loïc Hoguin
https://ninenines.eu


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous consensus in <40 code lines of Erlang

Stavros Aronis
Hi Bryan!

Thank you for trying out Concuerror!

You can express nondeterministic choice with a timeout, like this:

R = make_ref(),
S = self(),
spwan(fun() -> S ! R end),
Choice =
  receive
   Ref -> 1
  after
    0 -> 2
end

This is gonna produce the two choices depending on the scheduling of
the delivery of the message before the timeout or after. It is
important to use a ref and a pattern with it, so that this message
doesn't interfere with any other messages.

As a co-author of Concuerror and relevant publications I agree with
the engineering sentiment that "model checking will never finish" when
applied to examples that are not painfully examined and minimized as
much as possible. Just in case you missed it, however, Concuerror also
has a --scheduling_bound option that does not do exhaustive search and
might yield something interesting.

Finally, the "remaining time" estimate is a bit like a magic 8-ball. A
week is not that bad and might actually mean a few hours or a day; a
century, on the other hand, is fairly likely to be bad...!

Regards,

Stavros

On Sat, Mar 7, 2020 at 1:46 PM Bryan Ford <[hidden email]> wrote:

>
> On 6 Mar 2020, at 12:32, Loïc Hoguin <[hidden email]> wrote:
>
>
> You can test it using ConcuError.
>
> "Concuerror is a model checking tool for debugging, testing and verifying concurrent Erlang programs."
>
> https://concuerror.com/
>
>
> Thanks - good to know there’s a model checker for Erlang.
>
> I had already model checked an earlier formulation of QSC and TLC written in Promela with the SPIN model checker (see https://github.com/dedis/tlc/tree/master/spin).  That version model checks in just a couple minutes, partly because it’s hugely simplified so that “messages” are just bits and such.  So I’m pretty confident in the algorithm’s basic logic but it’s not a “real” implementation of course.
>
> I’d of course love to have a model-checked implementation in a real programming language like Erlang, so I spent a bit of time getting it to be compatible with Concuerror, e.g., by eliminating some false-positive errors that the test synchronization was producing.  Concuerror didn’t find any errors after running for about a day, but at that point it estimated another week or so to go before the test run got interrupted.  I further simplified the test synchronization to reduce interleavings more; we’ll see how this version model checks.
>
> One potential issue is that QSC is a randomized algorithm, and model checking isn’t perfectly suited towards randomized algorithms.  In the Promela/SPIN model, I could model random number selection as a simple nondeterministic choice, which works for checking safety but of course does nothing to verify (probabilistic) liveness.  I can live with that.  The Erlang version calls rand:uniform/1 to pick random numbers, and I’m not entirely sure how Concuerror deals with that.  I guess the considerations for non-deterministic functions probably apply (https://concuerror.com/faq/#how-does-concuerror-handle-non-deterministic-functions), which means (for any thread or execution trace?) it just picks one random number and memoizes that, and doesn’t explore the space of all the random numbers it could have picked.  That in principle makes the model checking non-exhaustive, so it could in principle miss errors even if it ever finishes.  I didn’t see in the Concuerror documentation any obvious way to express a “nondeterministic choice” for exhaustive search as in Promela/SPIN.
>
> I tried creating a “random number generator” that chooses between 1 and 2 by setting up a race between two threads sending two messages.  This in principle transforms the “random number generation” into an interleaving choice, which is the state space Concuerror is designed to explore.  But surprise surprise, that led to a further explosion in the estimated time Concuerror model checking would take assuming it ever finishes.
>
> So while a nice exercise, I guess this is another data point supporting the rule of thumb that model checking is great in principle but if the concurrent code isn’t designed to be hopelessly unrealistically trivial, model checking will never finish. :)
>
> Cheers
> Bryan
>
> Cheers,
>
> On 06/03/2020 11:52, Bryan Ford wrote:
>
> Hi all,
> I’ve heard about Erlang for a long time, but finally had reason to try it.
> So I used it to prototype a simple model of Que Sera Consensus (QSC), a new experimental asynchronous consensus algorithm.  I was impressed to discover that the complete, working algorithm - though just running on a single machine for now - is expressible in only about 37 code lines as counted by CLOC, not counting test code.  Erlang’s selective receive feature, in particular, seems to have been presciently designed three decades ago just to implement the Threshold Logical Clocks (TLC) communication primitive that QSC builds on.
> If anyone feels like poking at it, helping to find any flaws, or building more realistic cross-node implementations and experimenting with it in real distributed applications, the code lives here:
> https://github.com/dedis/tlc/tree/master/erlang/model
> I’d be happy to hear any feedback or experiences, either privately, on this list, or via the above GitHub repo.  Is it comprehensible?  Can you break it?  Can you find a competitively small and simple implementation of, say, Paxos or Raft or any other comparable consensus algorithm in Erlang?
> For background, the actual algorithms are described in this preprint:
> Que Sera Consensus: Simple Asynchronous Agreement with Private Coins and Threshold Logical Clocks
> https://arxiv.org/abs/2003.02291
> Enjoy.  And by the way, nice language. :)
> Cheers
> Bryan
>
>
> --
> Loïc Hoguin
> https://ninenines.eu
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous consensus in <40 code lines of Erlang

Bryan Ford
Thanks Stavros.  The race-based “random” function I tried was similar to what you suggested, but used two racing threads instead of just one plus a timeout.  I guess the timeout approach is likely more efficient.

And I did find and try the —interleaving_bound option.  I also noticed the —scheduling_bound option too but it wasn’t as clear what it meant or why I should use one or the other (or both).  I’m not a model checking expert, just an occasional dabbler. :). If there’s some kind of “try this, then try that” kind of best-practice process for playing with the various bounds that you find kinda works, it might be nice to add that to the Concuerror documentation, since the current documentation basically just says that the bound parameters exist without saying much about what they mean or how to use them.

In any case, nice work on Concuerrer, and thanks for your help!  I’m letting the maybe-a-week-long Concuerror run go and we’ll see if it’s a day or a week or a century. ;)

Cheers
Bryan

> On 7 Mar 2020, at 14:18, Stavros Aronis <[hidden email]> wrote:
>
> Hi Bryan!
>
> Thank you for trying out Concuerror!
>
> You can express nondeterministic choice with a timeout, like this:
>
> R = make_ref(),
> S = self(),
> spwan(fun() -> S ! R end),
> Choice =
>  receive
>   Ref -> 1
>  after
>    0 -> 2
> end
>
> This is gonna produce the two choices depending on the scheduling of
> the delivery of the message before the timeout or after. It is
> important to use a ref and a pattern with it, so that this message
> doesn't interfere with any other messages.
>
> As a co-author of Concuerror and relevant publications I agree with
> the engineering sentiment that "model checking will never finish" when
> applied to examples that are not painfully examined and minimized as
> much as possible. Just in case you missed it, however, Concuerror also
> has a --scheduling_bound option that does not do exhaustive search and
> might yield something interesting.
>
> Finally, the "remaining time" estimate is a bit like a magic 8-ball. A
> week is not that bad and might actually mean a few hours or a day; a
> century, on the other hand, is fairly likely to be bad...!
>
> Regards,
>
> Stavros
>
> On Sat, Mar 7, 2020 at 1:46 PM Bryan Ford <[hidden email]> wrote:
>>
>> On 6 Mar 2020, at 12:32, Loïc Hoguin <[hidden email]> wrote:
>>
>>
>> You can test it using ConcuError.
>>
>> "Concuerror is a model checking tool for debugging, testing and verifying concurrent Erlang programs."
>>
>> https://concuerror.com/
>>
>>
>> Thanks - good to know there’s a model checker for Erlang.
>>
>> I had already model checked an earlier formulation of QSC and TLC written in Promela with the SPIN model checker (see https://github.com/dedis/tlc/tree/master/spin).  That version model checks in just a couple minutes, partly because it’s hugely simplified so that “messages” are just bits and such.  So I’m pretty confident in the algorithm’s basic logic but it’s not a “real” implementation of course.
>>
>> I’d of course love to have a model-checked implementation in a real programming language like Erlang, so I spent a bit of time getting it to be compatible with Concuerror, e.g., by eliminating some false-positive errors that the test synchronization was producing.  Concuerror didn’t find any errors after running for about a day, but at that point it estimated another week or so to go before the test run got interrupted.  I further simplified the test synchronization to reduce interleavings more; we’ll see how this version model checks.
>>
>> One potential issue is that QSC is a randomized algorithm, and model checking isn’t perfectly suited towards randomized algorithms.  In the Promela/SPIN model, I could model random number selection as a simple nondeterministic choice, which works for checking safety but of course does nothing to verify (probabilistic) liveness.  I can live with that.  The Erlang version calls rand:uniform/1 to pick random numbers, and I’m not entirely sure how Concuerror deals with that.  I guess the considerations for non-deterministic functions probably apply (https://concuerror.com/faq/#how-does-concuerror-handle-non-deterministic-functions), which means (for any thread or execution trace?) it just picks one random number and memoizes that, and doesn’t explore the space of all the random numbers it could have picked.  That in principle makes the model checking non-exhaustive, so it could in principle miss errors even if it ever finishes.  I didn’t see in the Concuerror documentation any obvious way to express a “nondeterministic choice” for exhaustive search as in Promela/SPIN.
>>
>> I tried creating a “random number generator” that chooses between 1 and 2 by setting up a race between two threads sending two messages.  This in principle transforms the “random number generation” into an interleaving choice, which is the state space Concuerror is designed to explore.  But surprise surprise, that led to a further explosion in the estimated time Concuerror model checking would take assuming it ever finishes.
>>
>> So while a nice exercise, I guess this is another data point supporting the rule of thumb that model checking is great in principle but if the concurrent code isn’t designed to be hopelessly unrealistically trivial, model checking will never finish. :)
>>
>> Cheers
>> Bryan
>>
>> Cheers,
>>
>> On 06/03/2020 11:52, Bryan Ford wrote:
>>
>> Hi all,
>> I’ve heard about Erlang for a long time, but finally had reason to try it.
>> So I used it to prototype a simple model of Que Sera Consensus (QSC), a new experimental asynchronous consensus algorithm.  I was impressed to discover that the complete, working algorithm - though just running on a single machine for now - is expressible in only about 37 code lines as counted by CLOC, not counting test code.  Erlang’s selective receive feature, in particular, seems to have been presciently designed three decades ago just to implement the Threshold Logical Clocks (TLC) communication primitive that QSC builds on.
>> If anyone feels like poking at it, helping to find any flaws, or building more realistic cross-node implementations and experimenting with it in real distributed applications, the code lives here:
>> https://github.com/dedis/tlc/tree/master/erlang/model
>> I’d be happy to hear any feedback or experiences, either privately, on this list, or via the above GitHub repo.  Is it comprehensible?  Can you break it?  Can you find a competitively small and simple implementation of, say, Paxos or Raft or any other comparable consensus algorithm in Erlang?
>> For background, the actual algorithms are described in this preprint:
>> Que Sera Consensus: Simple Asynchronous Agreement with Private Coins and Threshold Logical Clocks
>> https://arxiv.org/abs/2003.02291
>> Enjoy.  And by the way, nice language. :)
>> Cheers
>> Bryan
>>
>>
>> --
>> Loïc Hoguin
>> https://ninenines.eu
>>
>>


signature.asc (849 bytes) Download Attachment