question re. message delivery

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

question re. message delivery

Miles Fidelman
Folks,

I've just been re-reading Joe Armstrong's thesis, and I'm reminded of a
question that's been nagging me.

As I understand it, message delivery is not guaranteed, but message
order IS. So how, exactly does that work?  What's the underlying
mechanism that imposes sequencing, but allows messages to get lost? 
(Particularly across a network.)  What are the various scenarios at play?

Inquiring minds want to know!

Thanks,

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

zxq9-2
On 2017年09月24日 日曜日 16:50:45 Miles Fidelman wrote:
> Folks,
>
> I've just been re-reading Joe Armstrong's thesis, and I'm reminded of a
> question that's been nagging me.
>
> As I understand it, message delivery is not guaranteed, but message
> order IS. So how, exactly does that work?  What's the underlying
> mechanism that imposes sequencing, but allows messages to get lost?  
> (Particularly across a network.)  What are the various scenarios at play?

This is sort of backwards.

Message delivery is guaranteed, assuming the process you are sending a
message to exists and is available, BUT from the perspective of the
sender there is no way to tell whether the receiver actually got it,
has crashed, disappeared, fell into a network blackhole, or whatever.
Monitoring can tell you whether the process you are trying to reach
is available right at that moment, but that's it.

The point is, though, that whether the receiver is unreachable, has
crashed, got the message and did its work but was unable to report
back about it, or whatever -- its all the same reality from the
perspective of the sender. "Unavailable" means "unavailable", not matter
what the cause -- because the cause cannot be determined from the
perspective of the sender. You can only know this with an out of
context check of some sort, and that is basically the role the runtime
plays for you with regard to monitors and links.

The OTP synchronous "call" mechanism is actually a complex procedure
built from asynchronous messages, unique reference tags, and monitors.

What IS guaranteed is the ordering of messages *relative to two processes*.

If A sends B the messages 1, 2 and 3 in that order, they will certainly
arrive in that order (assuming they arrive at all -- meaning that B is
available from the perspective of A). If C sends B the messages 4, 5, 6
in that order those will also certainly arrive in that order for B.
If A sends B and C the messages 1, 2 and 3, and as a reaction C starts
sending B the messages 4, 5, 6 -- we can never know what order of
interleaving these will have.

It could be [1,2,3,4,5,6], or [1,2,4,5,3,6] or [1,4,5,6,2,3] or whatever,
but only the relative ordering between a pair of processes can be known.


A digression about design implications...

One magical side effect of these strict guarantees AND strict ambiguities
is that right from the start of a project in Erlang, even one running on
a local system, you wind up staring the CAP theorem straight in the face.
This tends to result in a better understanding of the constraints
introduced by concurrency and distribution because they are present in the
mind of every developer right from the start. The general outcome I've
noticed (but don't know how to quantify with a metric of any sort) is
that consideration of design tradeoffs rules architecture, even on a
subconscious level, and this really bears itself out as a project matures.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman
See below....


On 9/24/17 6:10 PM, zxq9 wrote:

> On 2017年09月24日 日曜日 16:50:45 Miles Fidelman wrote:
>> Folks,
>>
>> I've just been re-reading Joe Armstrong's thesis, and I'm reminded of a
>> question that's been nagging me.
>>
>> As I understand it, message delivery is not guaranteed, but message
>> order IS. So how, exactly does that work?  What's the underlying
>> mechanism that imposes sequencing, but allows messages to get lost?
>> (Particularly across a network.)  What are the various scenarios at play?
> This is sort of backwards.
>
> Message delivery is guaranteed, assuming the process you are sending a
> message to exists and is available, BUT from the perspective of the
> sender there is no way to tell whether the receiver actually got it,
> has crashed, disappeared, fell into a network blackhole, or whatever.
> Monitoring can tell you whether the process you are trying to reach
> is available right at that moment, but that's it.
>
> The point is, though, that whether the receiver is unreachable, has
> crashed, got the message and did its work but was unable to report
> back about it, or whatever -- its all the same reality from the
> perspective of the sender. "Unavailable" means "unavailable", not matter
> what the cause -- because the cause cannot be determined from the
> perspective of the sender. You can only know this with an out of
> context check of some sort, and that is basically the role the runtime
> plays for you with regard to monitors and links.
>
> The OTP synchronous "call" mechanism is actually a complex procedure
> built from asynchronous messages, unique reference tags, and monitors.
>

Note that I didn't ask about the synchronous calls, I asked about raw
interprocess messages.

> What IS guaranteed is the ordering of messages *relative to two processes*
>
> If A sends B the messages 1, 2 and 3 in that order, they will certainly
> arrive in that order (assuming they arrive at all -- meaning that B is
> available from the perspective of A).

But that's the question.  Particularly when sent via network, 1, 2, 3
may be sent in that order, but, at the protocol level, they may not
arrive in that order.

With a reliable transport protocol - say TCP - if the message-containing
packets arrived as 1, 3, 2, the protocol engine would wait for 2 to
arrive and deliver 1,2,3 in that order.  If It received 1 & 3, but 2 got
lost, it would request a re-transmit, wait for it to arrive, and again,
deliver in that order.

But the implication of Erlang's stated rules is that an unreliable
transport protocol is being used, if you send 1, 2, 3, and what arrives
is 1, 3, 2 - then what would be delivered to the receiving PiD is 1 & 3
and 2 would be discarded.  Is that a correct assumption about the
underlying transport mechanism?  Is that guaranteed behavior?


Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Raimo Niskanen-2
On Sun, Sep 24, 2017 at 11:24:31PM -0700, Miles Fidelman wrote:

> See below....
>
>
> On 9/24/17 6:10 PM, zxq9 wrote:
> > On 2017年09月24日 日曜日 16:50:45 Miles Fidelman wrote:
> >> Folks,
> >>
> >> I've just been re-reading Joe Armstrong's thesis, and I'm reminded of a
> >> question that's been nagging me.
> >>
> >> As I understand it, message delivery is not guaranteed, but message
> >> order IS. So how, exactly does that work?  What's the underlying
> >> mechanism that imposes sequencing, but allows messages to get lost?
> >> (Particularly across a network.)  What are the various scenarios at play?
> > This is sort of backwards.
> >
> > Message delivery is guaranteed, assuming the process you are sending a
> > message to exists and is available, BUT from the perspective of the
> > sender there is no way to tell whether the receiver actually got it,
> > has crashed, disappeared, fell into a network blackhole, or whatever.
> > Monitoring can tell you whether the process you are trying to reach
> > is available right at that moment, but that's it.
> >
> > The point is, though, that whether the receiver is unreachable, has
> > crashed, got the message and did its work but was unable to report
> > back about it, or whatever -- its all the same reality from the
> > perspective of the sender. "Unavailable" means "unavailable", not matter
> > what the cause -- because the cause cannot be determined from the
> > perspective of the sender. You can only know this with an out of
> > context check of some sort, and that is basically the role the runtime
> > plays for you with regard to monitors and links.
> >
> > The OTP synchronous "call" mechanism is actually a complex procedure
> > built from asynchronous messages, unique reference tags, and monitors.
> >
>
> Note that I didn't ask about the synchronous calls, I asked about raw
> interprocess messages.
>
> > What IS guaranteed is the ordering of messages *relative to two processes*
> >
> > If A sends B the messages 1, 2 and 3 in that order, they will certainly
> > arrive in that order (assuming they arrive at all -- meaning that B is
> > available from the perspective of A).
>
> But that's the question.  Particularly when sent via network, 1, 2, 3
> may be sent in that order, but, at the protocol level, they may not
> arrive in that order.

What protocol level?

Erlang distribution has to use or implement a reliable protocol.  Today
TCP, but anything is possible.  Note that this protocol is between two
nodes, both containing many processes.  But the emulator relies on the
transport protocol being reliable.

>
> With a reliable transport protocol - say TCP - if the message-containing
> packets arrived as 1, 3, 2, the protocol engine would wait for 2 to
> arrive and deliver 1,2,3 in that order.  If It received 1 & 3, but 2 got
> lost, it would request a re-transmit, wait for it to arrive, and again,
> deliver in that order.
>
> But the implication of Erlang's stated rules is that an unreliable
> transport protocol is being used, if you send 1, 2, 3, and what arrives

What?  What is stated?

> is 1, 3, 2 - then what would be delivered to the receiving PiD is 1 & 3
> and 2 would be discarded.  Is that a correct assumption about the
> underlying transport mechanism?  Is that guaranteed behavior?

Aah.  That would not be possible,  1 might be delivered to a Pid1,
and 3 to Pid2, if you send to a registered name.  But if you send to a Pid;
1 might be delivered and 2 & 3 discarded, 1 & 2 delivered and 3 discarded,
or 1 & 2 & 3 delivered.

The order is guaranteed while the destination is available, and if it is
not, the messages are discarded.

But if you send to a registered name the Pid behind that name may change
any time.  Still order is guaranteed - up to a point all messages are
delivered in order, then some may be discarded, then when the new pid is up
and registered all messages are delivered in order again.

>
>
> Miles Fidelman
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.  .... Yogi Berra
>

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman

On 9/24/17 11:53 PM, Raimo Niskanen wrote:

On Sun, Sep 24, 2017 at 11:24:31PM -0700, Miles Fidelman wrote:
See below....


On 9/24/17 6:10 PM, zxq9 wrote:
On 2017年09月24日 日曜日 16:50:45 Miles Fidelman wrote:
Folks,

I've just been re-reading Joe Armstrong's thesis, and I'm reminded of a
question that's been nagging me.

As I understand it, message delivery is not guaranteed, but message
order IS. So how, exactly does that work?  What's the underlying
mechanism that imposes sequencing, but allows messages to get lost?
(Particularly across a network.)  What are the various scenarios at play?
This is sort of backwards.

Message delivery is guaranteed, assuming the process you are sending a
message to exists and is available, BUT from the perspective of the
sender there is no way to tell whether the receiver actually got it,
has crashed, disappeared, fell into a network blackhole, or whatever.
Monitoring can tell you whether the process you are trying to reach
is available right at that moment, but that's it.

The point is, though, that whether the receiver is unreachable, has
crashed, got the message and did its work but was unable to report
back about it, or whatever -- its all the same reality from the
perspective of the sender. "Unavailable" means "unavailable", not matter
what the cause -- because the cause cannot be determined from the
perspective of the sender. You can only know this with an out of
context check of some sort, and that is basically the role the runtime
plays for you with regard to monitors and links.

The OTP synchronous "call" mechanism is actually a complex procedure
built from asynchronous messages, unique reference tags, and monitors.

Note that I didn't ask about the synchronous calls, I asked about raw 
interprocess messages.

What IS guaranteed is the ordering of messages *relative to two processes*

If A sends B the messages 1, 2 and 3 in that order, they will certainly
arrive in that order (assuming they arrive at all -- meaning that B is
available from the perspective of A).
But that's the question.  Particularly when sent via network, 1, 2, 3 
may be sent in that order, but, at the protocol level, they may not 
arrive in that order.
What protocol level?

Erlang distribution has to use or implement a reliable protocol.  Today
TCP, but anything is possible.  Note that this protocol is between two
nodes, both containing many processes.  But the emulator relies on the
transport protocol being reliable.

No.  It doesn't.  It could simply send UDP packets.  I'm asking about implementation details.  In Joe's thesis, he says that the behavior is a "design choice."  I'm asking about the implementation details.  How does BEAM actually handle message delivery - locally, via network?


      
With a reliable transport protocol - say TCP - if the message-containing 
packets arrived as 1, 3, 2, the protocol engine would wait for 2 to 
arrive and deliver 1,2,3 in that order.  If It received 1 & 3, but 2 got 
lost, it would request a re-transmit, wait for it to arrive, and again, 
deliver in that order.

But the implication of Erlang's stated rules is that an unreliable 
transport protocol is being used, if you send 1, 2, 3, and what arrives 
What?  What is stated?
From Joe Armstrong's Thesis:

"Message passing is assumed to be unreliable with no guarantee of delivery."

"Since we made no assumptions about reliable message passing, and must write our application so that it works in the presence of unreliable message passing it should indeed work in the presence of message passing errors. The initial ecort involved will reward us when we try to scale up our systems."

"2. Message passing between a pair of processes is assumed to be ordered meaning that if a sequence of messages is sent and received between any pair of processes then the messages will be received in the same order they were sent."

 
"Note that point two is a design decision, and does not reflect any under- lying semantics in the network used to transmit messages. The underlying network might reorder the messages, but between any pair of processes these messages can be buffered, and re-assembled into the correct order before delivery. This assumption makes programming message passing applications much easier than if we had to always allow for out of order messages."

---
I read this as saying, messages will be delivered in order, but some may be missing. 

I'm really interested in this design decision, and how it's implemented.  (I'm also interested in the logic of why it's easier to program around missing messages than out-of-order messages.)

Miles
-- 
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Joe Armstrong-2
I think I should add - that "in the absence of errors message passing
is ordered"

I think we should deliberately separate errors which are detected by
sockets closing, pings timing out etc. from the message passing behaviour.

Imagine a client sends numbered messages 1,2,3,4,5 in order to a server.
If no errors have been observed and the server receives message 5 we can
assume that messages 1..4 have also been received in order.

The client will never know how many messages the server has received
unless the server tells it.

This is what the system is supposed to do - how it does it will depend
on time (ie how it does it today will differ from how it was done 10 years ago)

In the old days there was just a single socket between a client and
server - so we assumed that what you wrote to the socket
got reead in the order it was send. This true at the application level
packets might get fragmented but they are not reordered.
(In lower levels of the TCP stack, out of order packages are reordered
and missing packets resent).

So In the absence of errors (meaning the TCP socket had not closed)
what comes out of a socket has the same ordering as what went in.
(Note: not true for UDP)

I guess we also make assumptions that the underlying layers are also
reliable - So Erlang messaging should be  reliable if TCP is reliable.

The subject is complicated by a load of theorems saying that various
things are mathematically impossible (distributed consensus, exactly
once delivery if messages) -

Add multicore processors and multiple sockets between nodes and
the situation because a lot more complicated.

Cheers

/Joe


On Mon, Sep 25, 2017 at 6:02 PM, Miles Fidelman
<[hidden email]> wrote:

> On 9/24/17 11:53 PM, Raimo Niskanen wrote:
>
> On Sun, Sep 24, 2017 at 11:24:31PM -0700, Miles Fidelman wrote:
>
> See below....
>
>
> On 9/24/17 6:10 PM, zxq9 wrote:
>
> On 2017年09月24日 日曜日 16:50:45 Miles Fidelman wrote:
>
> Folks,
>
> I've just been re-reading Joe Armstrong's thesis, and I'm reminded of a
> question that's been nagging me.
>
> As I understand it, message delivery is not guaranteed, but message
> order IS. So how, exactly does that work?  What's the underlying
> mechanism that imposes sequencing, but allows messages to get lost?
> (Particularly across a network.)  What are the various scenarios at play?
>
> This is sort of backwards.
>
> Message delivery is guaranteed, assuming the process you are sending a
> message to exists and is available, BUT from the perspective of the
> sender there is no way to tell whether the receiver actually got it,
> has crashed, disappeared, fell into a network blackhole, or whatever.
> Monitoring can tell you whether the process you are trying to reach
> is available right at that moment, but that's it.
>
> The point is, though, that whether the receiver is unreachable, has
> crashed, got the message and did its work but was unable to report
> back about it, or whatever -- its all the same reality from the
> perspective of the sender. "Unavailable" means "unavailable", not matter
> what the cause -- because the cause cannot be determined from the
> perspective of the sender. You can only know this with an out of
> context check of some sort, and that is basically the role the runtime
> plays for you with regard to monitors and links.
>
> The OTP synchronous "call" mechanism is actually a complex procedure
> built from asynchronous messages, unique reference tags, and monitors.
>
> Note that I didn't ask about the synchronous calls, I asked about raw
> interprocess messages.
>
> What IS guaranteed is the ordering of messages *relative to two processes*
>
> If A sends B the messages 1, 2 and 3 in that order, they will certainly
> arrive in that order (assuming they arrive at all -- meaning that B is
> available from the perspective of A).
>
> But that's the question.  Particularly when sent via network, 1, 2, 3
> may be sent in that order, but, at the protocol level, they may not
> arrive in that order.
>
> What protocol level?
>
> Erlang distribution has to use or implement a reliable protocol.  Today
> TCP, but anything is possible.  Note that this protocol is between two
> nodes, both containing many processes.  But the emulator relies on the
> transport protocol being reliable.
>
>
> No.  It doesn't.  It could simply send UDP packets.  I'm asking about
> implementation details.  In Joe's thesis, he says that the behavior is a
> "design choice."  I'm asking about the implementation details.  How does
> BEAM actually handle message delivery - locally, via network?
>
> With a reliable transport protocol - say TCP - if the message-containing
> packets arrived as 1, 3, 2, the protocol engine would wait for 2 to
> arrive and deliver 1,2,3 in that order.  If It received 1 & 3, but 2 got
> lost, it would request a re-transmit, wait for it to arrive, and again,
> deliver in that order.
>
> But the implication of Erlang's stated rules is that an unreliable
> transport protocol is being used, if you send 1, 2, 3, and what arrives
>
> What?  What is stated?
>
> From Joe Armstrong's Thesis:
>
> "Message passing is assumed to be unreliable with no guarantee of delivery."
>
> "Since we made no assumptions about reliable message passing, and must write
> our application so that it works in the presence of unreliable message
> passing it should indeed work in the presence of message passing errors. The
> initial ecort involved will reward us when we try to scale up our systems."
>
> "2. Message passing between a pair of processes is assumed to be ordered
> meaning that if a sequence of messages is sent and received between any pair
> of processes then the messages will be received in the same order they were
> sent."
>
>  "Note that point two is a design decision, and does not reflect any under-
> lying semantics in the network used to transmit messages. The underlying
> network might reorder the messages, but between any pair of processes these
> messages can be buffered, and re-assembled into the correct order before
> delivery. This assumption makes programming message passing applications
> much easier than if we had to always allow for out of order messages."
>
> ---
> I read this as saying, messages will be delivered in order, but some may be
> missing.
>
> I'm really interested in this design decision, and how it's implemented.
> (I'm also interested in the logic of why it's easier to program around
> missing messages than out-of-order messages.)
>
> Miles
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.  .... Yogi Berra
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman
In reply to this post by Miles Fidelman

Hi Joe,

Thanks for the reply.  But... it raises a few follow-up questions & comments:

Joe Armstrong [hidden email] wrote:

I think I should add - that "in the absence of errors message passing
is ordered"

I think we should deliberately separate errors which are detected by
sockets closing, pings timing out etc. from the message passing behaviour.

Imagine a client sends numbered messages 1,2,3,4,5 in order to a server.
If no errors have been observed and the server receives message 5 we can
assume that messages 1..4 have also been received in order.

It's the error cases that raise all the interesting questions!  What happens if message 5 arrives, but message 4 doesn't, or if it arrives later?  Those define the special cases that application logic might have to handle!

------- context of question -----

I was re-reading your thesis the other day (I think a quora interchange with Alan Kay sparked me to do so), and these lines particularly caught my eye:

"2. Message passing between a pair of processes is assumed to be ordered
meaning that if a sequence of messages is sent and received between any pair
of processes then the messages will be received in the same order they were
sent."

"Note that point two is a design decision, and does not reflect any under-
lying semantics in the network used to transmit messages. The underlying
network might reorder the messages, but between any pair of processes these
messages can be buffered, and re-assembled into the correct order before
delivery. This assumption makes programming message passing applications
much easier than if we had to always allow for out of order messages."

but... "Message passing is assumed to be unreliable with no guarantee of delivery."

Somehow, I don't see how this makes "programming message passing applications much easier" - maybe it makes the underlying run-time easier, but not the applications.  An awful lot of applications can go south very quickly if messages are lost (account balances in transaction processing systems comes to mind as the obvious example).  If the underlying environment doesn't provide BOTH reliable delivery AND ordered delivery, one pretty much has to write one's own protocol to guaranty reliable, ordered message transmission.

Which leads back to the questions of what, in detail, does Erlang do, and how much of that behavior is guaranteed going forward!

------- end context ---------

<Note:  When I ask about a message "showing up" there's the obvious distinction between making it from one processor/node to another, as opposed to being placed in the receiving processes inbox, ready to be read?>

What happens if messages 1, 2, 3, 5 (but not 4) show up (are there sequence numbers available to the internal message passing mechanisms)? 

What happens if 1, 2, 3, 5 show up, and 4 shows up just a tad later - but before 5 has been read by the receiving process? 

What about if 1,2,3,5 have all been read, and then 4 shows up?

The client will never know how many messages the server has received
unless the server tells it.

Well, doesn't that have to do with whether there are sequence numbers embedded in the message passing mechanism?


This is what the system is supposed to do - how it does it will depend
on time (ie how it does it today will differ from how it was done 10 years ago)


Well... it also depends on how the semantics of message passing are defined. 


In the old days there was just a single socket between a client and
server - so we assumed that what you wrote to the socket
got reead in the order it was send. This true at the application level
packets might get fragmented but they are not reordered.
(In lower levels of the TCP stack, out of order packages are reordered
and missing packets resent).

So In the absence of errors (meaning the TCP socket had not closed)
what comes out of a socket has the same ordering as what went in.
(Note: not true for UDP)

I guess we also make assumptions that the underlying layers are also
reliable - So Erlang messaging should be  reliable if TCP is reliable.
Hence my question about mechanisms.  The BEAM Book gives all the gory details about how messages are passed (now) in both single-processor & multi-processor environments; but details are not provided on how the distributed case is handled. 

By the way, TCP may be reliable, but if the connection is broken and a new connection established, stuff can get dropped - it's a higher level decision as to whether lost traffic is resent.  And that's where "design decisions" come in - what gets implemented in the run-time system, and what is left to applications.


The subject is complicated by a load of theorems saying that various
things are mathematically impossible (distributed consensus, exactly
once delivery if messages) -

Add multicore processors and multiple sockets between nodes and
the situation because a lot more complicated.

Precisely what sparks my interest.

Inquiring minds want to know!  :-)

Best,

Miles



Cheers

/Joe


On Mon, Sep 25, 2017 at 6:02 PM, Miles Fidelman
<> wrote:
> On 9/24/17 11:53 PM, Raimo Niskanen wrote:
>
> On Sun, Sep 24, 2017 at 11:24:31PM -0700, Miles Fidelman wrote:
>
> See below....
>
>
> On 9/24/17 6:10 PM, zxq9 wrote:
>
> On 2017年09月24日 日曜日 16:50:45 Miles Fidelman wrote:
>
> Folks,
>
> I've just been re-reading Joe Armstrong's thesis, and I'm reminded of a
> question that's been nagging me.
>
> As I understand it, message delivery is not guaranteed, but message
> order IS. So how, exactly does that work?  What's the underlying
> mechanism that imposes sequencing, but allows messages to get lost?
> (Particularly across a network.)  What are the various scenarios at play?
>
> This is sort of backwards.
>
> Message delivery is guaranteed, assuming the process you are sending a
> message to exists and is available, BUT from the perspective of the
> sender there is no way to tell whether the receiver actually got it,
> has crashed, disappeared, fell into a network blackhole, or whatever.
> Monitoring can tell you whether the process you are trying to reach
> is available right at that moment, but that's it.
>
> The point is, though, that whether the receiver is unreachable, has
> crashed, got the message and did its work but was unable to report
> back about it, or whatever -- its all the same reality from the
> perspective of the sender. "Unavailable" means "unavailable", not matter
> what the cause -- because the cause cannot be determined from the
> perspective of the sender. You can only know this with an out of
> context check of some sort, and that is basically the role the runtime
> plays for you with regard to monitors and links.
>
> The OTP synchronous "call" mechanism is actually a complex procedure
> built from asynchronous messages, unique reference tags, and monitors.
>
> Note that I didn't ask about the synchronous calls, I asked about raw
> interprocess messages.
>
> What IS guaranteed is the ordering of messages *relative to two processes*
>
> If A sends B the messages 1, 2 and 3 in that order, they will certainly
> arrive in that order (assuming they arrive at all -- meaning that B is
> available from the perspective of A).
>
> But that's the question.  Particularly when sent via network, 1, 2, 3
> may be sent in that order, but, at the protocol level, they may not
> arrive in that order.
>
> What protocol level?
>
> Erlang distribution has to use or implement a reliable protocol.  Today
> TCP, but anything is possible.  Note that this protocol is between two
> nodes, both containing many processes.  But the emulator relies on the
> transport protocol being reliable.
>
>
> No.  It doesn't.  It could simply send UDP packets.  I'm asking about
> implementation details.  In Joe's thesis, he says that the behavior is a
> "design choice."  I'm asking about the implementation details.  How does
> BEAM actually handle message delivery - locally, via network?
>
> With a reliable transport protocol - say TCP - if the message-containing
> packets arrived as 1, 3, 2, the protocol engine would wait for 2 to
> arrive and deliver 1,2,3 in that order.  If It received 1 & 3, but 2 got
> lost, it would request a re-transmit, wait for it to arrive, and again,
> deliver in that order.
>
> But the implication of Erlang's stated rules is that an unreliable
> transport protocol is being used, if you send 1, 2, 3, and what arrives
>
> What?  What is stated?
>
> From Joe Armstrong's Thesis:
>
> "Message passing is assumed to be unreliable with no guarantee of delivery."
>
> "Since we made no assumptions about reliable message passing, and must write
> our application so that it works in the presence of unreliable message
> passing it should indeed work in the presence of message passing errors. The
> initial ecort involved will reward us when we try to scale up our systems."
>
> "2. Message passing between a pair of processes is assumed to be ordered
> meaning that if a sequence of messages is sent and received between any pair
> of processes then the messages will be received in the same order they were
> sent."
>
>  "Note that point two is a design decision, and does not reflect any under-
> lying semantics in the network used to transmit messages. The underlying
> network might reorder the messages, but between any pair of processes these
> messages can be buffered, and re-assembled into the correct order before
> delivery. This assumption makes programming message passing applications
> much easier than if we had to always allow for out of order messages."
>
> ---
> I read this as saying, messages will be delivered in order, but some may be
> missing.
>
> I'm really interested in this design decision, and how it's implemented.
> (I'm also interested in the logic of why it's easier to program around
> missing messages than out-of-order messages.)
>
> Miles
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.  .... Yogi Berra
>
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions
>

-- 
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Jesper Louis Andersen-2
In reply to this post by Miles Fidelman
On Mon, Sep 25, 2017 at 6:02 PM Miles Fidelman <[hidden email]> wrote:

(I'm also interested in the logic of why it's easier to program around missing messages than out-of-order messages.)



In principle, you could allow for messages in any order with missing messages as well. This would enforce most Erlang processes to implement a reassembly buffer and have the client set a unique counter in each message if they need a specific order. It is not impossible to do at all.

However, ordering is a very nice property in practice, and many streaming scenarios requires that messages have order. If you get e.g.

{data, FromPid, Binary} | {done, FromPid}

ordering ensures that you don't need to know a counter on each 'data' block. Many protocols also have state transitions and packets forcing those transitions arrive in certain orders. Requests that "cancel" stuff usually requires that everything before that cancel was correctly processed in order to build a graceful shutdown (one example: the GOAWAY frame in HTTP/2). Reordering is fatal to correctness here as well as it destroys the "happens before" relationship.

In short, certain invariants can be easier maintained between mailbox receives if you know that things in the mailbox happens in order. Furthermore, the guarantee is fairly easy to uphold between any two pairs of processes.

Missing messages often requires some kind of exception or timeout to trigger in order to clean up internal state in a process. But this is often built into typical protocols anyway. I won't say it is easier or harder to handle compared to ordered messages however.

If you squint your eyes a bit, then a missing message is simply a message arriving at timepoint "infinity" (i.e., never). So it can be understood as a reordered message unless *every* message after the missing one is also at timepoint "infinity". BEAM is currently using TCP as the transport between nodes, so this observation holds: reassembly at the TCP level ensures messages arrive in order and if a message goes missing it means connection loss rather than later message arriving properly. By this view, UDP is not an allowed protocol because it breaks the squinted eye view of message ordering.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman

On 9/25/17 2:13 PM, Jesper Louis Andersen wrote:

On Mon, Sep 25, 2017 at 6:02 PM Miles Fidelman <[hidden email]> wrote:

(I'm also interested in the logic of why it's easier to program around missing messages than out-of-order messages.)



I think that your examples essentially demonstrate that, for a lot of applications, one pretty much has to implement one's own message passing protocol on top of Erlang's - to guarantee that all messages are delivered, and delivered in order.  Some applications can tolerate missed messages, a lot can't.

Miles

In principle, you could allow for messages in any order with missing messages as well. This would enforce most Erlang processes to implement a reassembly buffer and have the client set a unique counter in each message if they need a specific order. It is not impossible to do at all.

However, ordering is a very nice property in practice, and many streaming scenarios requires that messages have order. If you get e.g.

{data, FromPid, Binary} | {done, FromPid}

ordering ensures that you don't need to know a counter on each 'data' block. Many protocols also have state transitions and packets forcing those transitions arrive in certain orders. Requests that "cancel" stuff usually requires that everything before that cancel was correctly processed in order to build a graceful shutdown (one example: the GOAWAY frame in HTTP/2). Reordering is fatal to correctness here as well as it destroys the "happens before" relationship.

In short, certain invariants can be easier maintained between mailbox receives if you know that things in the mailbox happens in order. Furthermore, the guarantee is fairly easy to uphold between any two pairs of processes.

Missing messages often requires some kind of exception or timeout to trigger in order to clean up internal state in a process. But this is often built into typical protocols anyway. I won't say it is easier or harder to handle compared to ordered messages however.

If you squint your eyes a bit, then a missing message is simply a message arriving at timepoint "infinity" (i.e., never). So it can be understood as a reordered message unless *every* message after the missing one is also at timepoint "infinity". BEAM is currently using TCP as the transport between nodes, so this observation holds: reassembly at the TCP level ensures messages arrive in order and if a message goes missing it means connection loss rather than later message arriving properly. By this view, UDP is not an allowed protocol because it breaks the squinted eye view of message ordering.


-- 
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Raimo Niskanen-2
In reply to this post by Miles Fidelman
Since this seems to be about a thesis by Joe, not about the impmenentation,
Joe can defend his own thesis.

/ Raimo


On Mon, Sep 25, 2017 at 09:02:39AM -0700, Miles Fidelman wrote:

> On 9/24/17 11:53 PM, Raimo Niskanen wrote:
>
> > On Sun, Sep 24, 2017 at 11:24:31PM -0700, Miles Fidelman wrote:
> >> See below....
> >>
> >>
> >> On 9/24/17 6:10 PM, zxq9 wrote:
> >>> On 2017年09月24日 日曜日 16:50:45 Miles Fidelman wrote:
> >>>> Folks,
> >>>>
> >>>> I've just been re-reading Joe Armstrong's thesis, and I'm reminded of a
> >>>> question that's been nagging me.
> >>>>
> >>>> As I understand it, message delivery is not guaranteed, but message
> >>>> order IS. So how, exactly does that work?  What's the underlying
> >>>> mechanism that imposes sequencing, but allows messages to get lost?
> >>>> (Particularly across a network.)  What are the various scenarios at play?
> >>> This is sort of backwards.
> >>>
> >>> Message delivery is guaranteed, assuming the process you are sending a
> >>> message to exists and is available, BUT from the perspective of the
> >>> sender there is no way to tell whether the receiver actually got it,
> >>> has crashed, disappeared, fell into a network blackhole, or whatever.
> >>> Monitoring can tell you whether the process you are trying to reach
> >>> is available right at that moment, but that's it.
> >>>
> >>> The point is, though, that whether the receiver is unreachable, has
> >>> crashed, got the message and did its work but was unable to report
> >>> back about it, or whatever -- its all the same reality from the
> >>> perspective of the sender. "Unavailable" means "unavailable", not matter
> >>> what the cause -- because the cause cannot be determined from the
> >>> perspective of the sender. You can only know this with an out of
> >>> context check of some sort, and that is basically the role the runtime
> >>> plays for you with regard to monitors and links.
> >>>
> >>> The OTP synchronous "call" mechanism is actually a complex procedure
> >>> built from asynchronous messages, unique reference tags, and monitors.
> >>>
> >> Note that I didn't ask about the synchronous calls, I asked about raw
> >> interprocess messages.
> >>
> >>> What IS guaranteed is the ordering of messages *relative to two processes*
> >>>
> >>> If A sends B the messages 1, 2 and 3 in that order, they will certainly
> >>> arrive in that order (assuming they arrive at all -- meaning that B is
> >>> available from the perspective of A).
> >> But that's the question.  Particularly when sent via network, 1, 2, 3
> >> may be sent in that order, but, at the protocol level, they may not
> >> arrive in that order.
> > What protocol level?
> >
> > Erlang distribution has to use or implement a reliable protocol.  Today
> > TCP, but anything is possible.  Note that this protocol is between two
> > nodes, both containing many processes.  But the emulator relies on the
> > transport protocol being reliable.
>
> No.  It doesn't.  It could simply send UDP packets.  I'm asking about
> implementation details.  In Joe's thesis, he says that the behavior is a
> "design choice."  I'm asking about the implementation details.  How does
> BEAM actually handle message delivery - locally, via network?
>
> >> With a reliable transport protocol - say TCP - if the message-containing
> >> packets arrived as 1, 3, 2, the protocol engine would wait for 2 to
> >> arrive and deliver 1,2,3 in that order.  If It received 1 & 3, but 2 got
> >> lost, it would request a re-transmit, wait for it to arrive, and again,
> >> deliver in that order.
> >>
> >> But the implication of Erlang's stated rules is that an unreliable
> >> transport protocol is being used, if you send 1, 2, 3, and what arrives
> > What?  What is stated?
>  From Joe Armstrong's Thesis:
>
> "Message passing is assumed to be unreliable with no guarantee of
> delivery."
>
> "Since we made no assumptions about reliable message passing, and must
> write our application so that it works in the presence of unreliable
> message passing it should indeed work in the presence of message passing
> errors. The initial ecort involved will reward us when we try to scale
> up our systems."
>
> "2. Message passing between a pair of processes is assumed to be ordered
> meaning that if a sequence of messages is sent and received between any
> pair of processes then the messages will be received in the same order
> they were sent."
>
> "Note that point two is a design decision, and does not reflect any
> under- lying semantics in the network used to transmit messages. The
> underlying network might reorder the messages, but between any pair of
> processes these messages can be buffered, and re-assembled into the
> correct order before delivery. This assumption makes programming message
> passing applications much easier than if we had to always allow for out
> of order messages."
>
> ---
> I read this as saying, messages will be delivered in order, but some may
> be missing.
>
> I'm really interested in this design decision, and how it's
> implemented.  (I'm also interested in the logic of why it's easier to
> program around missing messages than out-of-order messages.)
>
> Miles
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.  .... Yogi Berra
>

> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions


--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Joe Armstrong-2
In reply to this post by Miles Fidelman
What I said was "message passing is assumed to be reliable" 

The key word here is *assumed* my assumption is that if I open a TCP socket
and send it five messages numbered 1 to 5 then If I successfully read message
5 and have seen no error indicators then I can *assume* that messages 1 to 4 also arrived in order.

Actually I have no idea if this is true - but it does seem to be a reasonable
assumption.

Messages 1 to 4 might have arrived got put in a buffer prior to my reading them and accidentally reordered due to a software bug. An alpha particle might have hit the data in message 3 and changed it -- who knows?

Having assumed that message passing is reliable I build code based on
this assumption.

I'm not, of course, saying that the assumption is true, just that I trust the
implementers of the system have done a good job to try and make it true.
Certainly any repeatable counter examples should have been investigated
to see if there were any errors in the system.

All this builds on layers of trust. I trust that erlang message passing is ordered and reliable in the absence of errors.

The Erlang implementers trust that TCP is reliable.

The TCP implementors trust that the OS is reliable.

The OS implementors trust that the processor is reliable.

The processor implementors trust that the VLSI compilers are correct.

Software runs on physical machines - so really the laws of physics apply not
maths. Physics takes into account space and time, and the concept of simultaneity does not exist, no so in maths.

It seems to me that software is built upon chains of trust, not upon mathematical chains of proof.

I've just been saying "what we want to achieve" and not "how we can achieve it".

The statements that people make about the system should be in terms
of belief rather than proof.

 I'd say "I believe we have reliable message passing"
It would be plain daft to say "we have reliable message passing" or
"we can prove it be correct" since there is no way of validating this.

Call me old fashioned but I think that claims that, for example,
"we have unlimited storage" and so on are just nuts ...

Just say it how it is - we believe this to be the case, and here is our evidence ...

Cheers

/Joe





On Mon, Sep 25, 2017 at 11:05 PM, Miles Fidelman <[hidden email]> wrote:

Hi Joe,

Thanks for the reply.  But... it raises a few follow-up questions & comments:

Joe Armstrong [hidden email] wrote:

I think I should add - that "in the absence of errors message passing
is ordered"

I think we should deliberately separate errors which are detected by
sockets closing, pings timing out etc. from the message passing behaviour.

Imagine a client sends numbered messages 1,2,3,4,5 in order to a server.
If no errors have been observed and the server receives message 5 we can
assume that messages 1..4 have also been received in order.

It's the error cases that raise all the interesting questions!  What happens if message 5 arrives, but message 4 doesn't, or if it arrives later?  Those define the special cases that application logic might have to handle!

------- context of question -----

I was re-reading your thesis the other day (I think a quora interchange with Alan Kay sparked me to do so), and these lines particularly caught my eye:

"2. Message passing between a pair of processes is assumed to be ordered
meaning that if a sequence of messages is sent and received between any pair
of processes then the messages will be received in the same order they were
sent."

"Note that point two is a design decision, and does not reflect any under-
lying semantics in the network used to transmit messages. The underlying
network might reorder the messages, but between any pair of processes these
messages can be buffered, and re-assembled into the correct order before
delivery. This assumption makes programming message passing applications
much easier than if we had to always allow for out of order messages."

but... "Message passing is assumed to be unreliable with no guarantee of delivery."

Somehow, I don't see how this makes "programming message passing applications much easier" - maybe it makes the underlying run-time easier, but not the applications.  An awful lot of applications can go south very quickly if messages are lost (account balances in transaction processing systems comes to mind as the obvious example).  If the underlying environment doesn't provide BOTH reliable delivery AND ordered delivery, one pretty much has to write one's own protocol to guaranty reliable, ordered message transmission.

Which leads back to the questions of what, in detail, does Erlang do, and how much of that behavior is guaranteed going forward!

------- end context ---------

<Note:  When I ask about a message "showing up" there's the obvious distinction between making it from one processor/node to another, as opposed to being placed in the receiving processes inbox, ready to be read?>

What happens if messages 1, 2, 3, 5 (but not 4) show up (are there sequence numbers available to the internal message passing mechanisms)? 

What happens if 1, 2, 3, 5 show up, and 4 shows up just a tad later - but before 5 has been read by the receiving process? 

What about if 1,2,3,5 have all been read, and then 4 shows up?

The client will never know how many messages the server has received
unless the server tells it.

Well, doesn't that have to do with whether there are sequence numbers embedded in the message passing mechanism?

This is what the system is supposed to do - how it does it will depend
on time (ie how it does it today will differ from how it was done 10 years ago)


Well... it also depends on how the semantics of message passing are defined. 


In the old days there was just a single socket between a client and
server - so we assumed that what you wrote to the socket
got reead in the order it was send. This true at the application level
packets might get fragmented but they are not reordered.
(In lower levels of the TCP stack, out of order packages are reordered
and missing packets resent).

So In the absence of errors (meaning the TCP socket had not closed)
what comes out of a socket has the same ordering as what went in.
(Note: not true for UDP)

I guess we also make assumptions that the underlying layers are also
reliable - So Erlang messaging should be  reliable if TCP is reliable.
Hence my question about mechanisms.  The BEAM Book gives all the gory details about how messages are passed (now) in both single-processor & multi-processor environments; but details are not provided on how the distributed case is handled. 

By the way, TCP may be reliable, but if the connection is broken and a new connection established, stuff can get dropped - it's a higher level decision as to whether lost traffic is resent.  And that's where "design decisions" come in - what gets implemented in the run-time system, and what is left to applications.

The subject is complicated by a load of theorems saying that various
things are mathematically impossible (distributed consensus, exactly
once delivery if messages) -

Add multicore processors and multiple sockets between nodes and
the situation because a lot more complicated.

Precisely what sparks my interest.

Inquiring minds want to know!  :-)

Best,

Miles


Cheers

/Joe


On Mon, Sep 25, 2017 at 6:02 PM, Miles Fidelman
<> wrote:
> On 9/24/17 11:53 PM, Raimo Niskanen wrote:
> > On Sun, Sep 24, 2017 at 11:24:31PM -0700, Miles Fidelman wrote: > > See below.... > > > On 9/24/17 6:10 PM, zxq9 wrote: > > On 2017年09月24日 日曜日 16:50:45 Miles Fidelman wrote: > > Folks, > > I've just been re-reading Joe Armstrong's thesis, and I'm reminded of a > question that's been nagging me. > > As I understand it, message delivery is not guaranteed, but message > order IS. So how, exactly does that work? What's the underlying > mechanism that imposes sequencing, but allows messages to get lost? > (Particularly across a network.) What are the various scenarios at play? > > This is sort of backwards. > > Message delivery is guaranteed, assuming the process you are sending a > message to exists and is available, BUT from the perspective of the > sender there is no way to tell whether the receiver actually got it, > has crashed, disappeared, fell into a network blackhole, or whatever. > Monitoring can tell you whether the process you are trying to reach > is available right at that moment, but that's it. > > The point is, though, that whether the receiver is unreachable, has > crashed, got the message and did its work but was unable to report > back about it, or whatever -- its all the same reality from the > perspective of the sender. "Unavailable" means "unavailable", not matter > what the cause -- because the cause cannot be determined from the > perspective of the sender. You can only know this with an out of > context check of some sort, and that is basically the role the runtime > plays for you with regard to monitors and links. > > The OTP synchronous "call" mechanism is actually a complex procedure > built from asynchronous messages, unique reference tags, and monitors. > > Note that I didn't ask about the synchronous calls, I asked about raw > interprocess messages. > > What IS guaranteed is the ordering of messages *relative to two processes* > > If A sends B the messages 1, 2 and 3 in that order, they will certainly > arrive in that order (assuming they arrive at all -- meaning that B is > available from the perspective of A). > > But that's the question. Particularly when sent via network, 1, 2, 3 > may be sent in that order, but, at the protocol level, they may not > arrive in that order. > > What protocol level? > > Erlang distribution has to use or implement a reliable protocol. Today > TCP, but anything is possible. Note that this protocol is between two > nodes, both containing many processes. But the emulator relies on the > transport protocol being reliable. > > > No. It doesn't. It could simply send UDP packets. I'm asking about > implementation details. In Joe's thesis, he says that the behavior is a > "design choice." I'm asking about the implementation details. How does > BEAM actually handle message delivery - locally, via network? > > With a reliable transport protocol - say TCP - if the message-containing > packets arrived as 1, 3, 2, the protocol engine would wait for 2 to > arrive and deliver 1,2,3 in that order. If It received 1 & 3, but 2 got > lost, it would request a re-transmit, wait for it to arrive, and again, > deliver in that order. > > But the implication of Erlang's stated rules is that an unreliable > transport protocol is being used, if you send 1, 2, 3, and what arrives > > What? What is stated? > > From Joe Armstrong's Thesis: > > "Message passing is assumed to be unreliable with no guarantee of delivery." > > "Since we made no assumptions about reliable message passing, and must write > our application so that it works in the presence of unreliable message > passing it should indeed work in the presence of message passing errors. The > initial ecort involved will reward us when we try to scale up our systems." > > "2. Message passing between a pair of processes is assumed to be ordered > meaning that if a sequence of messages is sent and received between any pair > of processes then the messages will be received in the same order they were > sent." > > "Note that point two is a design decision, and does not reflect any under- > lying semantics in the network used to transmit messages. The underlying > network might reorder the messages, but between any pair of processes these > messages can be buffered, and re-assembled into the correct order before > delivery. This assumption makes programming message passing applications > much easier than if we had to always allow for out of order messages." > > --- > I read this as saying, messages will be delivered in order, but some may be > missing. > > I'm really interested in this design decision, and how it's implemented. > (I'm also interested in the logic of why it's easier to program around > missing messages than out-of-order messages.) > > Miles > > -- > In theory, there is no difference between theory and practice. > In practice, there is. .... Yogi Berra > > > _______________________________________________ > erlang-questions mailing list
> > http://erlang.org/mailman/listinfo/erlang-questions >

-- 
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Matthias Lang
In reply to this post by Raimo Niskanen-2
Hi,

I'm more than a bit surprised by what I'm reading here and maybe part
of it has to do with people meaning different things by "message
passing protocol".

 MF> I think that your examples essentially demonstrate that, for a lot of
 MF> applications, one pretty much has to implement one's own message passing
 MF> protocol on top of Erlang's - to guarantee that all messages are
 MF> delivered, and delivered in order.  Some applications can tolerate
 MF> missed messages, a lot can't.

I like the 20 year old advice from Per Hedeland which I quote in the
FAQ (10.8 and 10.9)

  http://erlang.org/faq/academic.html#idp33047120

If this advice is wrong, then I should update it, but convincing arguments
and some sort of consensus would be required for a change.

The situations I'm aware of where messages can disappear are:

  1. When the receiving process disappears, for instance because it
     crashed. This applies to both single-node and distributed Erlang.

  2. When the communication between nodes breaks. This applies to
     distributed Erlang only.

  3. Quite a few years ago (2005? 2007?), Hans Svensson demonstrated
     some cases where if you restarted nodes in a distributed Erlang
     system in particular ways, then things could get strange with
     message passing.

  4. Hardware errors, compiler bugs, etc.

For #1 and #2, I don't think it's good to describe the solution as
"implement one's own message passing protocol on top of Erlang's".
The failure is quite specific, you get all messages up to the crash
and then you get none after that. It's not the message passing that's
the problem.

For #3, my unreliable recollection was that this was a situation where
the implementation was unexpectedly weak. It may go deeper than that
and it may be that the implementaiton is better today. I don't know.

#4 seems irrelevant. If you're worried that just the right combination
of flipped bits or compiler errors, no matter how unlikely, can cause
a message to disappear, then putting "one's own message passing
protocol on top of Erlang's": isn't going to eliminite that. There
will be some combination of flipped bits that will defeat it.

Miles, do you have some concrete examples of situations where you're
worried about messages disappearing? Here's one from me: process 1
sends two messages to process 2. The messages are A and B,
respectively. Process 2 sends an ACK for message B back to process
1. For single-node Erlang, if message A disappears then that is a
bug. I'll let others reason about distributed Erlang.

Matthias

--------

Date: 26. September 2017
From: Raimo Niskanen <[hidden email]>
To [hidden email]
Subject: Re: [erlang-questions] question re. message delivery


> Since this seems to be about a thesis by Joe, not about the impmenentation,
> Joe can defend his own thesis.
>
> / Raimo
>
>
> On Mon, Sep 25, 2017 at 09:02:39AM -0700, Miles Fidelman wrote:
> > On 9/24/17 11:53 PM, Raimo Niskanen wrote:
> >
> > > On Sun, Sep 24, 2017 at 11:24:31PM -0700, Miles Fidelman wrote:
> > >> See below....
> > >>
> > >>
> > >> On 9/24/17 6:10 PM, zxq9 wrote:
> > >>> On 2017年09月24日 日曜日 16:50:45 Miles Fidelman wrote:
> > >>>> Folks,
> > >>>>
> > >>>> I've just been re-reading Joe Armstrong's thesis, and I'm reminded of a
> > >>>> question that's been nagging me.
> > >>>>
> > >>>> As I understand it, message delivery is not guaranteed, but message
> > >>>> order IS. So how, exactly does that work?  What's the underlying
> > >>>> mechanism that imposes sequencing, but allows messages to get lost?
> > >>>> (Particularly across a network.)  What are the various scenarios at play?
> > >>> This is sort of backwards.
> > >>>
> > >>> Message delivery is guaranteed, assuming the process you are sending a
> > >>> message to exists and is available, BUT from the perspective of the
> > >>> sender there is no way to tell whether the receiver actually got it,
> > >>> has crashed, disappeared, fell into a network blackhole, or whatever.
> > >>> Monitoring can tell you whether the process you are trying to reach
> > >>> is available right at that moment, but that's it.
> > >>>
> > >>> The point is, though, that whether the receiver is unreachable, has
> > >>> crashed, got the message and did its work but was unable to report
> > >>> back about it, or whatever -- its all the same reality from the
> > >>> perspective of the sender. "Unavailable" means "unavailable", not matter
> > >>> what the cause -- because the cause cannot be determined from the
> > >>> perspective of the sender. You can only know this with an out of
> > >>> context check of some sort, and that is basically the role the runtime
> > >>> plays for you with regard to monitors and links.
> > >>>
> > >>> The OTP synchronous "call" mechanism is actually a complex procedure
> > >>> built from asynchronous messages, unique reference tags, and monitors.
> > >>>
> > >> Note that I didn't ask about the synchronous calls, I asked about raw
> > >> interprocess messages.
> > >>
> > >>> What IS guaranteed is the ordering of messages *relative to two processes*
> > >>>
> > >>> If A sends B the messages 1, 2 and 3 in that order, they will certainly
> > >>> arrive in that order (assuming they arrive at all -- meaning that B is
> > >>> available from the perspective of A).
> > >> But that's the question.  Particularly when sent via network, 1, 2, 3
> > >> may be sent in that order, but, at the protocol level, they may not
> > >> arrive in that order.
> > > What protocol level?
> > >
> > > Erlang distribution has to use or implement a reliable protocol.  Today
> > > TCP, but anything is possible.  Note that this protocol is between two
> > > nodes, both containing many processes.  But the emulator relies on the
> > > transport protocol being reliable.
> >
> > No.  It doesn't.  It could simply send UDP packets.  I'm asking about
> > implementation details.  In Joe's thesis, he says that the behavior is a
> > "design choice."  I'm asking about the implementation details.  How does
> > BEAM actually handle message delivery - locally, via network?
> >
> > >> With a reliable transport protocol - say TCP - if the message-containing
> > >> packets arrived as 1, 3, 2, the protocol engine would wait for 2 to
> > >> arrive and deliver 1,2,3 in that order.  If It received 1 & 3, but 2 got
> > >> lost, it would request a re-transmit, wait for it to arrive, and again,
> > >> deliver in that order.
> > >>
> > >> But the implication of Erlang's stated rules is that an unreliable
> > >> transport protocol is being used, if you send 1, 2, 3, and what arrives
> > > What?  What is stated?
> >  From Joe Armstrong's Thesis:
> >
> > "Message passing is assumed to be unreliable with no guarantee of
> > delivery."
> >
> > "Since we made no assumptions about reliable message passing, and must
> > write our application so that it works in the presence of unreliable
> > message passing it should indeed work in the presence of message passing
> > errors. The initial ecort involved will reward us when we try to scale
> > up our systems."
> >
> > "2. Message passing between a pair of processes is assumed to be ordered
> > meaning that if a sequence of messages is sent and received between any
> > pair of processes then the messages will be received in the same order
> > they were sent."
> >
> > "Note that point two is a design decision, and does not reflect any
> > under- lying semantics in the network used to transmit messages. The
> > underlying network might reorder the messages, but between any pair of
> > processes these messages can be buffered, and re-assembled into the
> > correct order before delivery. This assumption makes programming message
> > passing applications much easier than if we had to always allow for out
> > of order messages."
> >
> > ---
> > I read this as saying, messages will be delivered in order, but some may
> > be missing.
> >
> > I'm really interested in this design decision, and how it's
> > implemented.  (I'm also interested in the logic of why it's easier to
> > program around missing messages than out-of-order messages.)
> >
> > Miles
> >
> > --
> > In theory, there is no difference between theory and practice.
> > In practice, there is.  .... Yogi Berra
> >
>
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email]
> > http://erlang.org/mailman/listinfo/erlang-questions
>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman
See in-line...

On 9/26/17 9:39 AM, Matthias Lang wrote:

> Hi,
>
> I'm more than a bit surprised by what I'm reading here and maybe part
> of it has to do with people meaning different things by "message
> passing protocol".
>
>   MF> I think that your examples essentially demonstrate that, for a lot of
>   MF> applications, one pretty much has to implement one's own message passing
>   MF> protocol on top of Erlang's - to guarantee that all messages are
>   MF> delivered, and delivered in order.  Some applications can tolerate
>   MF> missed messages, a lot can't.
>
> I like the 20 year old advice from Per Hedeland which I quote in the
> FAQ (10.8 and 10.9)
>
>    http://erlang.org/faq/academic.html#idp33047120
>
> If this advice is wrong, then I should update it, but convincing arguments
> and some sort of consensus would be required for a change.
>
> The situations I'm aware of where messages can disappear are:
>
>    1. When the receiving process disappears, for instance because it
>       crashed. This applies to both single-node and distributed Erlang.
>
>    2. When the communication between nodes breaks. This applies to
>       distributed Erlang only.

Which is what I'm curious about.  Of course, with multi-processor
architectures, one must also consider communications between processors
on the same node.
>
>    3. Quite a few years ago (2005? 2007?), Hans Svensson demonstrated
>       some cases where if you restarted nodes in a distributed Erlang
>       system in particular ways, then things could get strange with
>       message passing.

Exactly.
>
>    4. Hardware errors, compiler bugs, etc.
>
> For #1 and #2, I don't think it's good to describe the solution as
> "implement one's own message passing protocol on top of Erlang's".
> The failure is quite specific, you get all messages up to the crash
> and then you get none after that. It's not the message passing that's
> the problem.

Of course it is.  If one wants reliable packet delivery, one implements
TCP (or equivalent) above raw IP.  If one wants reliable email, one
implements a return receipt function.  Etc.

>
> For #3, my unreliable recollection was that this was a situation where
> the implementation was unexpectedly weak. It may go deeper than that
> and it may be that the implementaiton is better today. I don't know.
>
> #4 seems irrelevant. If you're worried that just the right combination
> of flipped bits or compiler errors, no matter how unlikely, can cause
> a message to disappear, then putting "one's own message passing
> protocol on top of Erlang's": isn't going to eliminite that. There
> will be some combination of flipped bits that will defeat it.
>
> Miles, do you have some concrete examples of situations where you're
> worried about messages disappearing? Here's one from me: process 1
> sends two messages to process 2. The messages are A and B,
> respectively. Process 2 sends an ACK for message B back to process
> 1. For single-node Erlang, if message A disappears then that is a
> bug. I'll let others reason about distributed Erlang.

Sure.  Bank transactions.  Edits to a document.  Dispatch commands to a
vehicle.

Both order and missing messages matter.

The question remains, how does the actual Erlang run-time system respond
in the case of various kinds of failures.  And will those behaviors
remain consistent in future releases.


--
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Matthias Lang
On 26. September 2017, Miles Fidelman wrote:


> > Miles, do you have some concrete examples of situations where you're
> > worried about messages disappearing? Here's one from me: process 1
> > sends two messages to process 2. The messages are A and B,
> > respectively. Process 2 sends an ACK for message B back to process
> > 1. For single-node Erlang, if message A disappears then that is a
> > bug. I'll let others reason about distributed Erlang.
>
> Sure.  Bank transactions.  Edits to a document.  Dispatch commands to a
> vehicle.

Those are not what I consider concrete examples.

Matthias
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

zxq9-2
In reply to this post by Miles Fidelman
On 2017年09月26日 火曜日 09:58:16 Miles Fidelman wrote:
> Sure.  Bank transactions.  Edits to a document.  Dispatch commands to a
> vehicle.
>
> Both order and missing messages matter.

There is this thing called a phased transaction.
A smart guy told me about it once.
Perhaps you would be interested in it.

> The question remains, how does the actual Erlang run-time system respond
> in the case of various kinds of failures.  And will those behaviors
> remain consistent in future releases.

Nope.
It is totally unreliable and utter crap.
All of it.
Give up and find The Right Thing.
You know its out there.
And you already know it is not here.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman
In reply to this post by Matthias Lang

On 9/26/17 10:05 AM, Matthias Lang wrote:

On 26. September 2017, Miles Fidelman wrote:


Miles, do you have some concrete examples of situations where you're
worried about messages disappearing? Here's one from me: process 1
sends two messages to process 2. The messages are A and B,
respectively. Process 2 sends an ACK for message B back to process
1. For single-node Erlang, if message A disappears then that is a
bug. I'll let others reason about distributed Erlang.
Sure.  Bank transactions.  Edits to a document.  Dispatch commands to a
vehicle.
Those are not what I consider concrete examples.

How concrete do you need.  There are lots of systems where one needs reliable, ordered message delivery.  One generally implements an acknowledgement, but those can be unordered.

Let's say that a transaction is a test-and-set on a value.  With acknowledgement sent after transaction is executed.

Send transactions 1,2,3,4,5

If message delivery is ordered & guaranteed, then an acknowledgement from 5 guarantees that transactions 1-4 were performed, in order, even if some of the acknowledgements were not received.

If, either, a message can get dropped, or it can get delivered out of order, then one needs a protocol on top of basic message delivery to wait for an acknowledgement of each and every message before sending the next. 

There are multiple approaches to implementing a reliable messaging protocol.  What approach would work best somewhat depends on the kinds of failure modes one has to deal with (i.e., the services provided by the next layer down).

Miles
-- 
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Oliver Korpilla
Hello, Miles.

Not sure if that is the best example.

Trusting blindly in delivery order can cause problems, so financial transactions could for example handled by doing transactions on Mnesia. These transactions should have checks within them that abort the transaction if it cannot be applied. Also, the system does not rely on the arrival of messages, only that each query is responsible for doing its own checking - like disallowing account balances below zero and hence rejecting (or flagging) such a transaction. Ordering problems can then be solved by retries, for example.

If you start to rely on message order and guaranteed delivery, standard Erlang messaging is not a suitable system. IIRC, RabbitMQ guarantees delivery _if possible_. If you enable this mode every message is cached until its processing is acknowledged. Excuse my lack of specifics or any inaccuracy but I have this info from somebody else demonstrating their distributed system to a group I was in.

Guarantees come at a cost. If delivery (if possible) must be assured and non-delivery must be detected, you have to pay that cost somewhere.

Cheers,
Oliver
 
 

Gesendet: Dienstag, 26. September 2017 um 20:10 Uhr
Von: "Miles Fidelman" <[hidden email]>
An: "Matthias Lang" <[hidden email]>
Cc: [hidden email]
Betreff: Re: [erlang-questions] question re. message delivery

On 9/26/17 10:05 AM, Matthias Lang wrote:
On 26. September 2017, Miles Fidelman wrote:


Miles, do you have some concrete examples of situations where you're
worried about messages disappearing? Here's one from me: process 1
sends two messages to process 2. The messages are A and B,
respectively. Process 2 sends an ACK for message B back to process
1. For single-node Erlang, if message A disappears then that is a
bug. I'll let others reason about distributed Erlang.
Sure.  Bank transactions.  Edits to a document.  Dispatch commands to a
vehicle.
Those are not what I consider concrete examples.
How concrete do you need.  There are lots of systems where one needs reliable, ordered message delivery.  One generally implements an acknowledgement, but those can be unordered.

Let's say that a transaction is a test-and-set on a value.  With acknowledgement sent after transaction is executed.

Send transactions 1,2,3,4,5

If message delivery is ordered & guaranteed, then an acknowledgement from 5 guarantees that transactions 1-4 were performed, in order, even if some of the acknowledgements were not received.

If, either, a message can get dropped, or it can get delivered out of order, then one needs a protocol on top of basic message delivery to wait for an acknowledgement of each and every message before sending the next. 

There are multiple approaches to implementing a reliable messaging protocol.  What approach would work best somewhat depends on the kinds of failure modes one has to deal with (i.e., the services provided by the next layer down).

Miles
--
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra
_______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman
On 9/26/17 11:25 AM, Oliver Korpilla wrote:

> Hello, Miles.
>
> Not sure if that is the best example.
>
> Trusting blindly in delivery order can cause problems, so financial transactions could for example handled by doing transactions on Mnesia. These transactions should have checks within them that abort the transaction if it cannot be applied. Also, the system does not rely on the arrival of messages, only that each query is responsible for doing its own checking - like disallowing account balances below zero and hence rejecting (or flagging) such a transaction. Ordering problems can then be solved by retries, for example.
>
> If you start to rely on message order and guaranteed delivery, standard Erlang messaging is not a suitable system. IIRC, RabbitMQ guarantees delivery _if possible_. If you enable this mode every message is cached until its processing is acknowledged. Excuse my lack of specifics or any inaccuracy but I have this info from somebody else demonstrating their distributed system to a group I was in.
>
>
I understand that.  What I'm trying to understand are the specific
implementation details & behaviors.  As much out of curiosity as
anything else.

The specific semantics - delivery not guaranteed, in order guaranteed is
just a bit odd, and begging for clarification.  As is the thinking
behind it.

Miles

--
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Matthias Lang
In reply to this post by Miles Fidelman
> > For #1 and #2, I don't think it's good to describe the solution as
> > "implement one's own message passing protocol on top of Erlang's".
> > The failure is quite specific, you get all messages up to the crash
> > and then you get none after that. It's not the message passing that's
> > the problem.

> Of course it is.  If one wants reliable packet delivery, one implements TCP
> (or equivalent) above raw IP.  If one wants reliable email, one implements a
> return receipt function.  Etc.

Ok, so I think what you mean is that you want the scheme the FAQ
(question 10.9) describes as "Super-safe". Right?

You can't implement that as a change to the message passing. One way
to see that is to consider what would happen if you used TCP to send
your message from one erlang process to another instead of using
message passing---the reason this is different to "Super-safe" is that
the ack gets sent before processing.

Matt
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman
In reply to this post by Miles Fidelman

Hi Joe,

Hmmm....

Joe Armstrong wrote:

What I said was "message passing is assumed to be reliable"


The key word here is *assumed* my assumption is that if I open a TCP socket
and send it five messages numbered 1 to 5 then If I successfully read
message
5 and have seen no error indicators then I can *assume* that messages 1 to
4 also arrived in order.


Well yes, but with TCP one has sequence numbers, buffering, and retransmission - and GUARANTEES, by design, that if you (say a socket connection) receive packet 5, then you've also received packets 1-4, in order. 

My understanding is that Erlang does NOT make that guarantee.  As stated:

- message delivery is assumed to be UNRELIABLE

- ordering is guaranteed to be maintained

The implication being that one might well receive packets 1, 2, 3, 5 - and not know that 4 is missing.

Actually I have no idea if this is true - but it does seem to be a
reasonable
assumption.

Messages 1 to 4 might have arrived got put in a buffer prior to my reading
them and accidentally reordered due to a software bug. An alpha particle
might have hit the data in message 3 and changed it -- who knows?


More likely, a TCP connection has dropped, taking a message or two with it, and once the connection is re-established, stuff starts flowing after a gap.

With UDP, packets could arrive out of order as well as get dropped.

There are ways to extend TCP, or write a higher level protocol that will detect dropped connections, and packets, reconnect, request retransmission - with the result that both the sender & receiver are guaranteed both delivery & order.

Which brings us back to implementation.


Having assumed that message passing is reliable I build code based on
this assumption.

But, for Erlang, we can't make this assumption - the documentation specifically says so. 


I'm not, of course, saying that the assumption is true, just that I trust
the
implementers of the system have done a good job to try and make it true.
Certainly any repeatable counter examples should have been investigated
to see if there were any errors in the system.

All this builds on layers of trust. I trust that erlang message passing is
ordered and reliable in the absence of errors.

The Erlang implementers trust that TCP is reliable.


Well, that is the question, isn't it.  Lots of things cause TCP to drop connections.  So the question remains - how are dropped connections handled?  And, if after a connection is dropped and restored, how are dropped messages and/or messages received out of order handled?

Actually, there's another design question in there - in a multi-node Erlang system, maintaining n2 TCP connections seems just a tad unwieldy.  Personally, I'd be more likely to use a connectionless protocol, maybe even broadcast.



The TCP implementors trust that the OS is reliable.

The OS implementors trust that the processor is reliable.

The processor implementors trust that the VLSI compilers are correct.

Software runs on physical machines - so really the laws of physics apply not
maths. Physics takes into account space and time, and the concept of
simultaneity does not exist, no so in maths.

It seems to me that software is built upon chains of trust, not upon
mathematical chains of proof.

I've just been saying "what we want to achieve" and not "how we can achieve
it".

Which brings us back to:

stated goals:  unreliable delivery, ordered delivery

The BEAM Book details how this works within a node, but is silent on how distributed Erlang is implemented.  I'm really interested in some details.


The statements that people make about the system should be in terms
of belief rather than proof.

I'd say "I believe we have reliable message passing"
It would be plain daft to say "we have reliable message passing" or
"we can prove it be correct" since there is no way of validating this.

Sure there is.  The state machine model of TCP is very clearly defined, including its various error conditions.  And one can test an implementation for adherence to the state machine model.  (In some cases, one can also demonstrate that software is provably correct - but let's not go there).



Call me old fashioned but I think that claims that, for example,
"we have unlimited storage" and so on are just nuts ...

Agreed.  But claims like "when allocated storage reaches 80% use, additional storage is allocated by <mechanism>" are not just reasonable, but mandatory when designing systems that have to scale under uncertain load.

Which brings us back to - how is message passing implemented between Erlang nodes?

Cheers,

Miles

-- 
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
12