question re. message delivery

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Richard A. O'Keefe-2
(1) I told my concurrent programming class that
     Erlang message delivery should be taken as
     reliable up to the point where communication
     is lost with the receiver, so that *IF* a
     message is received, all previous messages
     from that sender have been received in order.

(2) I also told them that the big problem is
     losing communication for a while and then
     it comes back (e.g., someone accidentally
     pulled a plug and then pushed it back in)
     but that this is why TCP has sequence numbers
     and acks.

(3) I also told them that it is the nature of
     the physical world that when you send someone
     a message (texting on a mobile phone is a
     great example) you can know that you SENT it
     but you can never know they RECEIVED it
     unless they tell you and gave the example of
     my daughter wanting a ride home but my phone's
     extremely limited mailbox filling up so I did
     not get her message until hours later.

(4) As for Joe's general philosophy of belief about
     systems, I'm reminded of Dijkstra's distinction
     between a Sufficiently Large Machine (one which
     is able to run your program without exhausting
     its resources) and a Hopefully Sufficiently
     Large Machine (one which either does the job
     properly or TELLS you it ran into trouble).
     Having learned on a B6700 where the hardware
     checked array subscripts and integer overflow
     -- so that this was not something you could or
     would consider turning off, there being no
     cheaper way to do this -- and then meeting
     the world of PDP-11s and DEC-10s, I quickly
     learned the painful distinction between a
     Hopefully Sufficiently Large Machine (B6700)
     and an Insufficiently Large Machine (the others,
     which just quietly went insane).

     There are all sorts of properties we'd like
     our systems to have, and they sort of
     approximately do, most of the time, but we
     really want to be TOLD when they're unable
     to do their job properly.

     The Armstrong approach, after all, is not
     "ignore errors", but "let it crash".

(5) I've just started looking at the MQTT protocol,
     and noticed that you can ask for
     "at least once", "at most once", or "exactly
     once" delivery.  I suspect that this is another
     area where it's "belief" not proof, and that
     the end-to-end principle applies.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Matthias Lang
In reply to this post by Matthias Lang
Hi,

Earlier in this thread, I referred to Hans Svensson's work from 2007.
I have now found the paper and see that Lars-Åke Fredlund is also an author:

http://happy-testing.com/hans/papers/EW2007-pitfalls_recipes.pdf

I like this paper. It has a careful description of what's meant by a
'message passing guarantee', it's by far the best I'm aware
of. Another reason to like it is that it provides a result which was
surprising to me at the time, i.e. that pid reuse can happen in
practice (4.1). Still another reason is that it provides the scripts
needed to reproduce the failure.

There's at least one more paper on the same subject on that
page (http://happy-testing.com/hans/).

Matthias
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Joe Armstrong-2
In reply to this post by Miles Fidelman
On Wed, Sep 27, 2017 at 2:16 AM, Miles Fidelman
<[hidden email]> wrote:

> Hi Joe,
>
> Hmmm....
>
> Joe Armstrong wrote:
>
> What I said was "message passing is assumed to be reliable"
>
>
> The key word here is *assumed* my assumption is that if I open a TCP socket
> and send it five messages numbered 1 to 5 then If I successfully read
> message
> 5 and have seen no error indicators then I can *assume* that messages 1 to
> 4 also arrived in order.
>
>
> Well yes, but with TCP one has sequence numbers, buffering, and
> retransmission - and GUARANTEES, by design, that if you (say a socket
> connection) receive packet 5, then you've also received packets 1-4, in
> order.
>
> My understanding is that Erlang does NOT make that guarantee.  As stated:
>
> - message delivery is assumed to be UNRELIABLE

That was the short version - Here's the long version.

Actually there is no such thing as "delivering a message" in Erlang.

What "delivering a message means" is "putting the message in the
mailbox of the receiver and scheduling the process for execution"

So all sorts of things can go wrong - the message is put in the mailbox
but an earlier message in the mailbox causes the process to crash
before the process reaches your message.

There *is* a guarantee that if create a link to a process and the process
dies you get sent a message.

So we can can make the statement "in the absence of errors message passing
order is preserved"

What does this mean? If you are linked to a message receiver and see no
error message then it is alive. If you send a sequences of messages
to the receiver and see no error messages then the messages have been
placed in the mailbox in order. This is guarantee (if the code is correct).

Note that there is no guarantee that the process will ever read the mailbox.

It's like the postal service - the letters get put in the mailbox but
there's no guarantee they get taken out, but you get to know if the
owner of the mailbox dies.

The bit that should be reliable is putting the messages in the mailbox in order
if nothing has crashed (we assume this to be correctly coded)

The bit that is unreliable is the guarantee that the message is removed
from the mailbox and correctly processed.

I'm not sure where you quoted me from - but there should be some small
print nearby with the an extended explanation.

The world "unreliable" means different things to different people.
TCP might well be reliable by design - but is it correctly implemented?
I have seen many good designs with bad implementations.

I've helped design fault-tolerant systems for years - so I'm a trust
as little as
possible sort of person. Assume things will crash and clean up later.

I was told years ago not to trust processes, a wise man said "if you
want to know
if a process has done something, get it to send you a reply message,
if you don't get the reply message then you can't assume anything about
the receiving process. So generating unique tags which we send in
round trips become important ...

Aside: Telecoms protocols make great use of tags, and timeouts
you send a request with a tag, wait a relative long time (the timeout) -
much longer than the operation should take. Then on a timeout
assume the worse - crash everything and restart.

Works very well in practise - theory wise it's very dodgy - millions of lines
of code doing this stuff is way to complex to prove anything about.

Since I basically don't trust any of the underlying layers - you have
to ask what to
I trust.

Well nothing really - but I have higher levels of trust for some
things than other.

Round trip confirmations including SHA1 checksums seems pretty good
to me.

If I say to a server "get me a file called 'foo'" and get something back
It may of may not be correct.

If I say "get me some data that has the sha1 checksum 34ad34..."
and get some data back I can check the data and see if it has the correct
checksum. I don't even need secure sockets. I do need a secure way to
know the checksum - but that is an entirely different problem.

This boils down to system design - in the latter case I need to place no
trust in the layers (I need to trust SHA1 so it's not absolute)


>
> - ordering is guaranteed to be maintained
>
> The implication being that one might well receive packets 1, 2, 3, 5 - and
> not know that 4 is missing.
>
> Actually I have no idea if this is true - but it does seem to be a
> reasonable
> assumption.
>
> Messages 1 to 4 might have arrived got put in a buffer prior to my reading
> them and accidentally reordered due to a software bug. An alpha particle
> might have hit the data in message 3 and changed it -- who knows?
>
>
> More likely, a TCP connection has dropped, taking a message or two with it,
> and once the connection is re-established, stuff starts flowing after a gap.
>
> With UDP, packets could arrive out of order as well as get dropped.
>
> There are ways to extend TCP, or write a higher level protocol that will
> detect dropped connections, and packets, reconnect, request retransmission -
> with the result that both the sender & receiver are guaranteed both delivery
> & order.
>
> Which brings us back to implementation.
>
>
> Having assumed that message passing is reliable I build code based on
> this assumption.
>
> But, for Erlang, we can't make this assumption - the documentation
> specifically says so.
>
>
> I'm not, of course, saying that the assumption is true, just that I trust
> the
> implementers of the system have done a good job to try and make it true.
> Certainly any repeatable counter examples should have been investigated
> to see if there were any errors in the system.
>
> All this builds on layers of trust. I trust that erlang message passing is
> ordered and reliable in the absence of errors.
>
> The Erlang implementers trust that TCP is reliable.
>
>
> Well, that is the question, isn't it.  Lots of things cause TCP to drop
> connections.  So the question remains - how are dropped connections handled?
> And, if after a connection is dropped and restored, how are dropped messages
> and/or messages received out of order handled?
>
> Actually, there's another design question in there - in a multi-node Erlang
> system, maintaining n2 TCP connections seems just a tad unwieldy.
> Personally, I'd be more likely to use a connectionless protocol, maybe even
> broadcast.
>
>
>
> The TCP implementors trust that the OS is reliable.
>
> The OS implementors trust that the processor is reliable.
>
> The processor implementors trust that the VLSI compilers are correct.
>
> Software runs on physical machines - so really the laws of physics apply not
> maths. Physics takes into account space and time, and the concept of
> simultaneity does not exist, no so in maths.
>
> It seems to me that software is built upon chains of trust, not upon
> mathematical chains of proof.
>
> I've just been saying "what we want to achieve" and not "how we can achieve
> it".
>
> Which brings us back to:
>
> stated goals:  unreliable delivery, ordered delivery
>
> The BEAM Book details how this works within a node, but is silent on how
> distributed Erlang is implemented.  I'm really interested in some details.



>
> The statements that people make about the system should be in terms
> of belief rather than proof.
>
> I'd say "I believe we have reliable message passing"
> It would be plain daft to say "we have reliable message passing" or
> "we can prove it be correct" since there is no way of validating this.
>
> Sure there is.  The state machine model of TCP is very clearly defined,
> including its various error conditions.  And one can test an implementation
> for adherence to the state machine model.  (In some cases, one can also
> demonstrate that software is provably correct - but let's not go there).
>
>
>
> Call me old fashioned but I think that claims that, for example,
> "we have unlimited storage" and so on are just nuts ...
>
> Agreed.  But claims like "when allocated storage reaches 80% use, additional
> storage is allocated by <mechanism>" are not just reasonable, but mandatory
> when designing systems that have to scale under uncertain load.
>
> Which brings us back to - how is message passing implemented between Erlang
> nodes?
>
> Cheers,
>
> Miles
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.  .... Yogi Berra
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Raimo Niskanen-2
In reply to this post by Miles Fidelman
I'll try to summarize what I know on the topic, acting a bit as a ghost
writer for the VM Team.

See also the old, outdated language specification, which is the best we have.
It is still the soother that the VM Team sucks on when they do not know
what else to do, and an updated version is in the pipeline.
See especially 10.6.2 Order of signals:

    http://erlang.org/download/erl_spec47.ps.gz


Also see, especially 2.1 Passing of Signals "signals can be lost":

    http://erlang.org/doc/apps/erts/communication.html


Message order between a pair of processes is guaranteed.  I.e. a message
sent after another message will not be received before that other message.

Messages may be dropped.  In particular due to a communication link
between nodes going down and up.

If you set a monitor (or a link) on a process you will get a 'DOWN'
message if the other process vanishes i.e. dies or communication link lost.
That 'DOWN' message is guaranteed to be delivered (the same applies to
links and 'EXIT' messages).

An example: if process P1 first sets a monitor on process P2 and then
sends messages M1, M2 and M3 to P2.  P2 acknowledges M1 with M1a and M3
with M3a.  Then if P1 gets M1a it knows that P2 has seen M1 and P1 is
guaranteed to eventually get either M3a or 'DOWN'.  If it gets M3a then
it knows P2 have seen M2 and M3.  If it gets 'DOWN' then M2 may have been
either dropped or seen by P2, the same applies to M3, and P1 may eventually
get M3a knowing that P2 has seen M3, but can not know if it has seen M2.

Another example: gen_server:call first sets a monitor on the server process,
then sends the query.  By that it knows it will eventually either get
the reply or 'DOWN'.  If it gets 'DOWN' it actually may get a late reply
(e.g. network down-up), which is often overlooked.

The distribution communication is is per default implemented with TCP links
between nodes.  The VM relies on the distribution transport to deliver
messages in order, or to die meaning that the link has failed and that any
number of messages at the end of the sequence may have been dropped.

Process links and monitors towards processes on remote nodes are registered
in the local node on the distribution channel entry for that remote node,
so the VM can trigger 'DOWN' and 'EXIT' messages for all links and monitors
when a communication link goes down.  These messages are guaranteed to be
delivered (if their owner exists).

I hope this clears things up.
/ Raimo Niskanen



On Tue, Sep 26, 2017 at 05:16:56PM -0700, Miles Fidelman wrote:

> Hi Joe,
>
> Hmmm....
>
> Joe Armstrong wrote:
>
> > What I said was "message passing is assumed to be reliable"
>
> >
> > The key word here is *assumed* my assumption is that if I open a TCP
> > socket
> > and send it five messages numbered 1 to 5 then If I successfully read
> > message
> > 5 and have seen no error indicators then I can *assume* that messages 1 to
> > 4 also arrived in order.
> >
>
> Well yes, but with TCP one has sequence numbers, buffering, and
> retransmission - and GUARANTEES, by design, that if you (say a socket
> connection) receive packet 5, then you've also received packets 1-4, in
> order.
>
> My understanding is that Erlang does NOT make that guarantee.  As stated:
>
> - message delivery is assumed to be UNRELIABLE
>
> - ordering is guaranteed to be maintained
>
> The implication being that one might well receive packets 1, 2, 3, 5 -
> and not know that 4 is missing.
>
> > Actually I have no idea if this is true - but it does seem to be a
> > reasonable
> > assumption.
> >
> > Messages 1 to 4 might have arrived got put in a buffer prior to my reading
> > them and accidentally reordered due to a software bug. An alpha particle
> > might have hit the data in message 3 and changed it -- who knows?
>
>
> More likely, a TCP connection has dropped, taking a message or two with
> it, and once the connection is re-established, stuff starts flowing
> after a gap.
>
> With UDP, packets could arrive out of order as well as get dropped.
>
> There are ways to extend TCP, or write a higher level protocol that will
> detect dropped connections, and packets, reconnect, request
> retransmission - with the result that both the sender & receiver are
> guaranteed both delivery & order.
>
> Which brings us back to implementation.
>
> >
> > Having assumed that message passing is reliable I build code based on
> > this assumption.
>
> But, for Erlang, we can't make this assumption - the documentation
> specifically says so.
>
> >
> > I'm not, of course, saying that the assumption is true, just that I trust
> > the
> > implementers of the system have done a good job to try and make it true.
> > Certainly any repeatable counter examples should have been investigated
> > to see if there were any errors in the system.
> >
> > All this builds on layers of trust. I trust that erlang message passing is
> > ordered and reliable in the absence of errors.
> >
> > The Erlang implementers trust that TCP is reliable.
>
>
> Well, that is the question, isn't it.  Lots of things cause TCP to drop
> connections.  So the question remains - how are dropped connections
> handled?  And, if after a connection is dropped and restored, how are
> dropped messages and/or messages received out of order handled?
>
> Actually, there's another design question in there - in a multi-node
> Erlang system, maintaining n2 TCP connections seems just a tad
> unwieldy.  Personally, I'd be more likely to use a connectionless
> protocol, maybe even broadcast.
>
>
> >
> > The TCP implementors trust that the OS is reliable.
> >
> > The OS implementors trust that the processor is reliable.
> >
> > The processor implementors trust that the VLSI compilers are correct.
> >
> > Software runs on physical machines - so really the laws of physics
> > apply not
> > maths. Physics takes into account space and time, and the concept of
> > simultaneity does not exist, no so in maths.
> >
> > It seems to me that software is built upon chains of trust, not upon
> > mathematical chains of proof.
> >
> > I've just been saying "what we want to achieve" and not "how we can
> > achieve
> > it".
>
> Which brings us back to:
>
> stated goals:  unreliable delivery, ordered delivery
>
> The BEAM Book details how this works within a node, but is silent on how
> distributed Erlang is implemented.  I'm really interested in some details.
>
> >
> > The statements that people make about the system should be in terms
> > of belief rather than proof.
> >
> > I'd say "I believe we have reliable message passing"
> > It would be plain daft to say "we have reliable message passing" or
> > "we can prove it be correct" since there is no way of validating this.
>
> Sure there is.  The state machine model of TCP is very clearly defined,
> including its various error conditions.  And one can test an
> implementation for adherence to the state machine model.  (In some
> cases, one can also demonstrate that software is provably correct - but
> let's not go there).
>
>
> >
> > Call me old fashioned but I think that claims that, for example,
> > "we have unlimited storage" and so on are just nuts ...
>
> Agreed.  But claims like "when allocated storage reaches 80% use,
> additional storage is allocated by <mechanism>" are not just reasonable,
> but mandatory when designing systems that have to scale under uncertain
> load.
>
> Which brings us back to - how is message passing implemented between
> Erlang nodes?
>
> Cheers,
>
> Miles
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.  .... Yogi Berra
>

> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions


--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Matthias Lang
In reply to this post by Richard A. O'Keefe-2
On 27. September 2017, Richard A. O'Keefe wrote:

> (3) I also told them that it is the nature of
>     the physical world that when you send someone
>     a message (texting on a mobile phone is a
>     great example) you can know that you SENT it
>     but you can never know they RECEIVED it
>     unless they tell you and gave the example of
>     my daughter wanting a ride home but my phone's
>     extremely limited mailbox filling up so I did
>     not get her message until hours later.

SMS is a good example, and there's a detail you omitted. Your daughter
could most likely have seen that the message had not been delivered.

SMS has a mechanism for telling the sender that a particular message
was delivered to the phone. Exactly how that information is displayed
is different on different phones and it's often disabled by default.

This is an interesting detail because it illustrates the difference
between ACKs in different layers.

Matthias

---

Note 1: On the Android phone I have, it's enabled in the SMS app by
selecting settings/advanced/'get SMS delivery reports'. The word
'delivered' appears each SMS when the receiving phone has it.

Note 2: In the old GSM standards, e.g. 03.40 (from the 1990s), the
'status report capabilities' are listed as optional. I haven't noticed
any mobile networks that don't have it.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman
In reply to this post by Joe Armstrong-2
On 9/27/17 4:08 AM, Joe Armstrong wrote:

> On Wed, Sep 27, 2017 at 2:16 AM, Miles Fidelman
> <[hidden email]> wrote:
>> Hi Joe,
>>
>> Hmmm....
>>
>> Joe Armstrong wrote:
>>
>> What I said was "message passing is assumed to be reliable"
>>
>>
>> The key word here is *assumed* my assumption is that if I open a TCP socket
>> and send it five messages numbered 1 to 5 then If I successfully read
>> message
>> 5 and have seen no error indicators then I can *assume* that messages 1 to
>> 4 also arrived in order.
>>
>>
>> Well yes, but with TCP one has sequence numbers, buffering, and
>> retransmission - and GUARANTEES, by design, that if you (say a socket
>> connection) receive packet 5, then you've also received packets 1-4, in
>> order.
>>
>> My understanding is that Erlang does NOT make that guarantee.  As stated:
>>
>> - message delivery is assumed to be UNRELIABLE
> That was the short version - Here's the long version.
>
> Actually there is no such thing as "delivering a message" in Erlang.
>
> What "delivering a message means" is "putting the message in the
> mailbox of the receiver and scheduling the process for execution"

Understood.  What's less clear are the details of exactly how the
message is copied, and the step by step flow of control.  There are a
lot of details spelled out (in the BEAM book) about how this happens on
a single node, but no details at all about how messages are sent between
nodes.
>
> So all sorts of things can go wrong - the message is put in the mailbox
> but an earlier message in the mailbox causes the process to crash
> before the process reaches your message.

Exactly.  I'm trying to understand the implementation details. (Call it
an itch I feel compelled to scratch, from a systems engineer who's
worked on a lot of distributed systems, and at one time was involved in
designing & implementing network protocols.)

>
> There *is* a guarantee that if create a link to a process and the process
> dies you get sent a message.
>
> So we can can make the statement "in the absence of errors message passing
> order is preserved"

But, is that true between nodes?  There are lots of ways to implement
message passing that don't preserve ordering.  Hence my interest in details.

One can imagine all kinds of mechanisms, and failure mode, including
ones where both the sending and receiving processes continue to run just
fine.

- on a single node, sending a message is implemented by spawning a
process that copies the message from the sender's memory space to the
receiver's - easy enough for such a process to die and lose a message,
or for scheduling to lead messages to be delivered out of order

- across nodes, even if TCP is used to move messages, a connection can
fail and be restarted - losing data in the process - again both
processes remain alive and well

The BEAM book details how a message is moved between processes on the
same node (but doesn't actually about who's doing the moving). There's
no description about how messages are moved between nodes. Again, my
interest in details.

>
> What does this mean? If you are linked to a message receiver and see no
> error message then it is alive. If you send a sequences of messages
> to the receiver and see no error messages then the messages have been
> placed in the mailbox in order. This is guarantee (if the code is correct).

Assumes facts not in evidence.  Depending on implementation, the
"postman" could die (to use the postal mail analogy).

>
> Note that there is no guarantee that the process will ever read the mailbox.
>
> It's like the postal service - the letters get put in the mailbox but
> there's no guarantee they get taken out, but you get to know if the
> owner of the mailbox dies.

Not a really great analogy.  First off, you don't know if the owner of
the mailbox dies (I'm dealing with that right now, still receiving mail
for a dead relative, forwarded from an older address, courtesy of the
post office).  Beyond that, you never if a piece of mail falls behind a
piece of sorting equipment.  And.. there are plenty of cases of lost
mail showing up days, or even years later.

Now, some forms of mail delivery come with a tracking number - so you
can determine where a piece of mail is.  And you can send mail with a
return receipt, and resend if the receipt isn't received in a timely
fashion (which adds the potential for duplicate delivery if the mail is
just delayed, or the receipt gets lost).

>
> The bit that should be reliable is putting the messages in the mailbox in order
> if nothing has crashed (we assume this to be correctly coded)

Which mailbox?  The sender's outbox (or the local postbox) - agreed. 
The receiver's inbox - now that's the question on the table.

>
> The bit that is unreliable is the guarantee that the message is removed
> from the mailbox and correctly processed.

Also unreliable:
- that a message gets from the sender's outbox to the receiver's inbox
- that messages get placed in the receiver's inbox in order of transmission


>
> I'm not sure where you quoted me from - but there should be some small
> print nearby with the an extended explanation.

The quotes are from your thesis.  There is no extended explanation that
I could find, other than that the guarantee of order is a "design decision."

>
> The world "unreliable" means different things to different people.
> TCP might well be reliable by design - but is it correctly implemented?
> I have seen many good designs with bad implementations.

Sure, so let's stick with two very specific uses of the word:

i. the protocol sense:  UDP is defined as an unreliable service, TCP is
defined as a reliable one
ii. failure to execute a function, or provide a service, as specified

For the purposes of this discussion, I'm trying to get very clear about
what, exactly is specified, how it's implemented, and what failure modes
might be there.


>
> I've helped design fault-tolerant systems for years - so I'm a trust
> as little as
> possible sort of person. Assume things will crash and clean up later.

Well... me too.  But with the focus of how to build distributed systems
that continue to function in the face of "challenged" networks.  So my
focus tends to be on what kinds of things might fail, how to detect
those failures, and how to clean up afterwards.

(Side note:  I first discovered Erlang while working on distributed
simulation systems, and the protocols used for updating distributed state.)

>
> I was told years ago not to trust processes, a wise man said "if you
> want to know
> if a process has done something, get it to send you a reply message,
> if you don't get the reply message then you can't assume anything about
> the receiving process. So generating unique tags which we send in
> round trips become important ...

Absolutely!


>
> Aside: Telecoms protocols make great use of tags, and timeouts
> you send a request with a tag, wait a relative long time (the timeout) -
> much longer than the operation should take. Then on a timeout
> assume the worse - crash everything and restart.

Well, I assume you mean telephony when you say "telecoms" - in the data
network world we use sequence numbers, and include those numbers in
acknowledgement messages.  On the sending end, you resend if you don't
receive and ACK within a timeout window; on the receiving end, you use
sequence numbers to order delivery - and sometimes you request a
retransmission if you detect a missing sequence number.

Where it gets trickier is if a connection (more accurately, an
"association") drops.  Things get sort of interesting if you want to run
a reliable telnet or ssh connection from a cell phone in a moving
vehicle - your IP address tends to change as you roam from tower to
tower, so you need something "above" TCP to re-establish connections,
and retransmit packets lost in transit.  (Well, my personal experience
is with tactical mesh networks.  Cellular data protocols are somewhat
odder than basic TCP/IP).

>
> Works very well in practise - theory wise it's very dodgy - millions of lines
> of code doing this stuff is way to complex to prove anything about.
>
> Since I basically don't trust any of the underlying layers - you have
> to ask what to
> I trust.
>
> Well nothing really - but I have higher levels of trust for some
> things than other.
>
> Round trip confirmations including SHA1 checksums seems pretty good
> to me.
>
> If I say to a server "get me a file called 'foo'" and get something back
> It may of may not be correct.
>
> If I say "get me some data that has the sha1 checksum 34ad34..."
> and get some data back I can check the data and see if it has the correct
> checksum. I don't even need secure sockets. I do need a secure way to
> know the checksum - but that is an entirely different problem.
>
> This boils down to system design - in the latter case I need to place no
> trust in the layers (I need to trust SHA1 so it's not absolute)

Which  brings me back to my interest in details of Erlang's message
passing implementation (well actually, the design - let's stay away from
the question of correct implementation).

Cheers,

Miles

--
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman
In reply to this post by Richard A. O'Keefe-2


On 9/26/17 10:58 PM, Richard A. O'Keefe wrote:
> (1) I told my concurrent programming class that
>     Erlang message delivery should be taken as
>     reliable up to the point where communication
>     is lost with the receiver, so that *IF* a
>     message is received, all previous messages
>     from that sender have been received in order.

That seems like an awfully big assumption, particularly when working
with a distributed Erlang system.  Absent sequence numbers, there are
lots of ways that messages can be lost invisibly to the sender and
receiver.  (And what does it mean to "lose communication" when dealing
with connectionless message delivery?)

>
> (2) I also told them that the big problem is
>     losing communication for a while and then
>     it comes back (e.g., someone accidentally
>     pulled a plug and then pushed it back in)
>     but that this is why TCP has sequence numbers
>     and acks.

Yes, but one also has failures where a TCP association fails and a new
connection is established.  Stuff can get lost if a connection breaks
and a new one is established.  Ordering, however, is maintained.  (It
gets more interesting if you're trying to maintain something like an SSH
session from a moving cell phone - you need something more than TCP to
maintain continuity across changes in IP address that cause TCP
associations to drop.)

>
> (3) I also told them that it is the nature of
>     the physical world that when you send someone
>     a message (texting on a mobile phone is a
>     great example) you can know that you SENT it
>     but you can never know they RECEIVED it
>     unless they tell you and gave the example of
>     my daughter wanting a ride home but my phone's
>     extremely limited mailbox filling up so I did
>     not get her message until hours later.
>
> (4) As for Joe's general philosophy of belief about
>     systems, I'm reminded of Dijkstra's distinction
>     between a Sufficiently Large Machine (one which
>     is able to run your program without exhausting
>     its resources) and a Hopefully Sufficiently
>     Large Machine (one which either does the job
>     properly or TELLS you it ran into trouble).
>     Having learned on a B6700 where the hardware
>     checked array subscripts and integer overflow
>     -- so that this was not something you could or
>     would consider turning off, there being no
>     cheaper way to do this -- and then meeting
>     the world of PDP-11s and DEC-10s, I quickly
>     learned the painful distinction between a
>     Hopefully Sufficiently Large Machine (B6700)
>     and an Insufficiently Large Machine (the others,
>     which just quietly went insane).
>
>     There are all sorts of properties we'd like
>     our systems to have, and they sort of
>     approximately do, most of the time, but we
>     really want to be TOLD when they're unable
>     to do their job properly.

Yes, indeed.
>
>     The Armstrong approach, after all, is not
>     "ignore errors", but "let it crash".

Yes, indeed.  But then one has to do something after the crash. :-)

>
> (5) I've just started looking at the MQTT protocol,
>     and noticed that you can ask for
>     "at least once", "at most once", or "exactly
>     once" delivery.  I suspect that this is another
>     area where it's "belief" not proof, and that
>     the end-to-end principle applies.

Haven't studied MQTT specifically, but where protocols are concerned,
there are certainly ways to "prove" things about the specifications of a
protocol.  Belief enters into implementation correctness and failure
modes.  One can also support belief through testing & validation of
implementations.  We do that all the time.

Cheers,

Miles

--
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman
In reply to this post by Matthias Lang
Thanks, Matthias.

That's the kind of analysis that I've been looking for!

Miles


On 9/27/17 2:11 AM, Matthias Lang wrote:

> Hi,
>
> Earlier in this thread, I referred to Hans Svensson's work from 2007.
> I have now found the paper and see that Lars-Åke Fredlund is also an author:
>
> http://happy-testing.com/hans/papers/EW2007-pitfalls_recipes.pdf
>
> I like this paper. It has a careful description of what's meant by a
> 'message passing guarantee', it's by far the best I'm aware
> of. Another reason to like it is that it provides a result which was
> surprising to me at the time, i.e. that pid reuse can happen in
> practice (4.1). Still another reason is that it provides the scripts
> needed to reproduce the failure.
>
> There's at least one more paper on the same subject on that
> page (http://happy-testing.com/hans/).
>
> Matthias
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

--
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman
In reply to this post by Raimo Niskanen-2
Raimo,

Thanks for the details - exactly what I was looking for!

A few questions - inline, near the end...


On 9/27/17 7:42 AM, Raimo Niskanen wrote:

> I'll try to summarize what I know on the topic, acting a bit as a ghost
> writer for the VM Team.
>
> See also the old, outdated language specification, which is the best we have.
> It is still the soother that the VM Team sucks on when they do not know
> what else to do, and an updated version is in the pipeline.
> See especially 10.6.2 Order of signals:
>
>      http://erlang.org/download/erl_spec47.ps.gz
>
>
> Also see, especially 2.1 Passing of Signals "signals can be lost":
>
>      http://erlang.org/doc/apps/erts/communication.html
>
>
> Message order between a pair of processes is guaranteed.  I.e. a message
> sent after another message will not be received before that other message.
>
> Messages may be dropped.  In particular due to a communication link
> between nodes going down and up.
>
> If you set a monitor (or a link) on a process you will get a 'DOWN'
> message if the other process vanishes i.e. dies or communication link lost.
> That 'DOWN' message is guaranteed to be delivered (the same applies to
> links and 'EXIT' messages).

Ahh.. now that's important.  (One might quibble about the semantics of
"DOWN" vs. "not reachable," but that's another topic.)

>
> An example: if process P1 first sets a monitor on process P2 and then
> sends messages M1, M2 and M3 to P2.  P2 acknowledges M1 with M1a and M3
> with M3a.  Then if P1 gets M1a it knows that P2 has seen M1 and P1 is
> guaranteed to eventually get either M3a or 'DOWN'.  If it gets M3a then
> it knows P2 have seen M2 and M3.  If it gets 'DOWN' then M2 may have been
> either dropped or seen by P2, the same applies to M3, and P1 may eventually
> get M3a knowing that P2 has seen M3, but can not know if it has seen M2.
>
> Another example: gen_server:call first sets a monitor on the server process,
> then sends the query.  By that it knows it will eventually either get
> the reply or 'DOWN'.  If it gets 'DOWN' it actually may get a late reply
> (e.g. network down-up), which is often overlooked.
>
> The distribution communication is is per default implemented with TCP links
> between nodes.  The VM relies on the distribution transport to deliver
> messages in order, or to die meaning that the link has failed and that any
> number of messages at the end of the sequence may have been dropped.

Are those TCP links created on the fly, between processes, or are they
kept running between nodes?

I ask, because there's an obvious n2 issue in a system with lots of
nodes.  (I used to work on protocols linking distributed simulators -
we'd either use IP multicast to distributed state updates, or establish
sparser network "above" TCP - n2 links between servers on each local
network, broadcast on LANs).


>
> Process links and monitors towards processes on remote nodes are registered
> in the local node on the distribution channel entry for that remote node,
> so the VM can trigger 'DOWN' and 'EXIT' messages for all links and monitors
> when a communication link goes down.  These messages are guaranteed to be
> delivered (if their owner exists).
>
> I hope this clears things up.
> / Raimo Niskanen

Yes.  Thanks!

Miles

>
>
>
> On Tue, Sep 26, 2017 at 05:16:56PM -0700, Miles Fidelman wrote:
>> Hi Joe,
>>
>> Hmmm....
>>
>> Joe Armstrong wrote:
>>
>>> What I said was "message passing is assumed to be reliable"
>>> The key word here is *assumed* my assumption is that if I open a TCP
>>> socket
>>> and send it five messages numbered 1 to 5 then If I successfully read
>>> message
>>> 5 and have seen no error indicators then I can *assume* that messages 1 to
>>> 4 also arrived in order.
>>>
>> Well yes, but with TCP one has sequence numbers, buffering, and
>> retransmission - and GUARANTEES, by design, that if you (say a socket
>> connection) receive packet 5, then you've also received packets 1-4, in
>> order.
>>
>> My understanding is that Erlang does NOT make that guarantee.  As stated:
>>
>> - message delivery is assumed to be UNRELIABLE
>>
>> - ordering is guaranteed to be maintained
>>
>> The implication being that one might well receive packets 1, 2, 3, 5 -
>> and not know that 4 is missing.
>>
>>> Actually I have no idea if this is true - but it does seem to be a
>>> reasonable
>>> assumption.
>>>
>>> Messages 1 to 4 might have arrived got put in a buffer prior to my reading
>>> them and accidentally reordered due to a software bug. An alpha particle
>>> might have hit the data in message 3 and changed it -- who knows?
>>
>> More likely, a TCP connection has dropped, taking a message or two with
>> it, and once the connection is re-established, stuff starts flowing
>> after a gap.
>>
>> With UDP, packets could arrive out of order as well as get dropped.
>>
>> There are ways to extend TCP, or write a higher level protocol that will
>> detect dropped connections, and packets, reconnect, request
>> retransmission - with the result that both the sender & receiver are
>> guaranteed both delivery & order.
>>
>> Which brings us back to implementation.
>>
>>> Having assumed that message passing is reliable I build code based on
>>> this assumption.
>> But, for Erlang, we can't make this assumption - the documentation
>> specifically says so.
>>
>>> I'm not, of course, saying that the assumption is true, just that I trust
>>> the
>>> implementers of the system have done a good job to try and make it true.
>>> Certainly any repeatable counter examples should have been investigated
>>> to see if there were any errors in the system.
>>>
>>> All this builds on layers of trust. I trust that erlang message passing is
>>> ordered and reliable in the absence of errors.
>>>
>>> The Erlang implementers trust that TCP is reliable.
>>
>> Well, that is the question, isn't it.  Lots of things cause TCP to drop
>> connections.  So the question remains - how are dropped connections
>> handled?  And, if after a connection is dropped and restored, how are
>> dropped messages and/or messages received out of order handled?
>>
>> Actually, there's another design question in there - in a multi-node
>> Erlang system, maintaining n2 TCP connections seems just a tad
>> unwieldy.  Personally, I'd be more likely to use a connectionless
>> protocol, maybe even broadcast.
>>
>>
>>> The TCP implementors trust that the OS is reliable.
>>>
>>> The OS implementors trust that the processor is reliable.
>>>
>>> The processor implementors trust that the VLSI compilers are correct.
>>>
>>> Software runs on physical machines - so really the laws of physics
>>> apply not
>>> maths. Physics takes into account space and time, and the concept of
>>> simultaneity does not exist, no so in maths.
>>>
>>> It seems to me that software is built upon chains of trust, not upon
>>> mathematical chains of proof.
>>>
>>> I've just been saying "what we want to achieve" and not "how we can
>>> achieve
>>> it".
>> Which brings us back to:
>>
>> stated goals:  unreliable delivery, ordered delivery
>>
>> The BEAM Book details how this works within a node, but is silent on how
>> distributed Erlang is implemented.  I'm really interested in some details.
>>
>>> The statements that people make about the system should be in terms
>>> of belief rather than proof.
>>>
>>> I'd say "I believe we have reliable message passing"
>>> It would be plain daft to say "we have reliable message passing" or
>>> "we can prove it be correct" since there is no way of validating this.
>> Sure there is.  The state machine model of TCP is very clearly defined,
>> including its various error conditions.  And one can test an
>> implementation for adherence to the state machine model.  (In some
>> cases, one can also demonstrate that software is provably correct - but
>> let's not go there).
>>
>>
>>> Call me old fashioned but I think that claims that, for example,
>>> "we have unlimited storage" and so on are just nuts ...
>> Agreed.  But claims like "when allocated storage reaches 80% use,
>> additional storage is allocated by <mechanism>" are not just reasonable,
>> but mandatory when designing systems that have to scale under uncertain
>> load.
>>
>> Which brings us back to - how is message passing implemented between
>> Erlang nodes?
>>
>> Cheers,
>>
>> Miles
>>
>> --
>> In theory, there is no difference between theory and practice.
>> In practice, there is.  .... Yogi Berra
>>
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>

--
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Raimo Niskanen-2
Miles; great to hear!  Replies inline.

On Wed, Sep 27, 2017 at 09:10:45AM -0700, Miles Fidelman wrote:

> Raimo,
>
> Thanks for the details - exactly what I was looking for!
>
> A few questions - inline, near the end...
>
>
> On 9/27/17 7:42 AM, Raimo Niskanen wrote:
> > I'll try to summarize what I know on the topic, acting a bit as a ghost
> > writer for the VM Team.
> >
> > See also the old, outdated language specification, which is the best we have.
> > It is still the soother that the VM Team sucks on when they do not know
> > what else to do, and an updated version is in the pipeline.
> > See especially 10.6.2 Order of signals:
> >
> >      http://erlang.org/download/erl_spec47.ps.gz
> >
> >
> > Also see, especially 2.1 Passing of Signals "signals can be lost":
> >
> >      http://erlang.org/doc/apps/erts/communication.html
> >
> >
> > Message order between a pair of processes is guaranteed.  I.e. a message
> > sent after another message will not be received before that other message.
> >
> > Messages may be dropped.  In particular due to a communication link
> > between nodes going down and up.
> >
> > If you set a monitor (or a link) on a process you will get a 'DOWN'
> > message if the other process vanishes i.e. dies or communication link lost.
> > That 'DOWN' message is guaranteed to be delivered (the same applies to
> > links and 'EXIT' messages).
>
> Ahh.. now that's important.  (One might quibble about the semantics of
> "DOWN" vs. "not reachable," but that's another topic.)

"not reachable" here means link down, which for TCP means socket error or
closed, or a link tick timeout.  The VM sends ticks on every link if no
other data is sent to ensure that data flows, and if not the link is closed.

If the process dies in the remote VM, the latter produces a 'DOWN' message.
That message may be dropped, but if so it is due to link down and then
the local VM produces a different 'DOWN' message because there is a monitor
on a process in the node at the other end of the link that went down.

>
> >
> > An example: if process P1 first sets a monitor on process P2 and then
> > sends messages M1, M2 and M3 to P2.  P2 acknowledges M1 with M1a and M3
> > with M3a.  Then if P1 gets M1a it knows that P2 has seen M1 and P1 is
> > guaranteed to eventually get either M3a or 'DOWN'.  If it gets M3a then
> > it knows P2 have seen M2 and M3.  If it gets 'DOWN' then M2 may have been
> > either dropped or seen by P2, the same applies to M3, and P1 may eventually
> > get M3a knowing that P2 has seen M3, but can not know if it has seen M2.
> >
> > Another example: gen_server:call first sets a monitor on the server process,
> > then sends the query.  By that it knows it will eventually either get
> > the reply or 'DOWN'.  If it gets 'DOWN' it actually may get a late reply
> > (e.g. network down-up), which is often overlooked.
> >
> > The distribution communication is is per default implemented with TCP links
> > between nodes.  The VM relies on the distribution transport to deliver
> > messages in order, or to die meaning that the link has failed and that any
> > number of messages at the end of the sequence may have been dropped.
>
> Are those TCP links created on the fly, between processes, or are they
> kept running between nodes?

Today a link is between nodes, created on the fly and kept running.
If you bang a process on another node the link is brought up, and then it is
kept up forever, until net_adm:disconnect(Node), until it fails or is closed
from the other end.

I do not think we can have one link per process or process pair - that
would exhaust OS resources.  Possibly multiple links between nodes, but
that would also be pushing it.

Closing idle links could be useful to save resources, though.

>
> I ask, because there's an obvious n2 issue in a system with lots of
> nodes.  (I used to work on protocols linking distributed simulators -
> we'd either use IP multicast to distributed state updates, or establish
> sparser network "above" TCP - n2 links between servers on each local
> network, broadcast on LANs).

Yes there is an n^2 problem.  We are contemplating mitigations.  But today
global (the global name registry) does what it can to create a fully
connected network.

We are looking into replacing global with some DHT based registry to avoid
the fully connected network, and into closing idle links.

Routing of links through other nodes has also been discussed.

There is also the option to attach as a "hidden" node that will not cause
secondary connections to be set up.

This topic often comes up at Erlang conferences since it is a limitation of
how many nodes that are feasible in a network, and results in all kinds of
interesting solutions, workarounds and suggestions.

>
>
> >
> > Process links and monitors towards processes on remote nodes are registered
> > in the local node on the distribution channel entry for that remote node,
> > so the VM can trigger 'DOWN' and 'EXIT' messages for all links and monitors
> > when a communication link goes down.  These messages are guaranteed to be
> > delivered (if their owner exists).
> >
> > I hope this clears things up.
> > / Raimo Niskanen
>
> Yes.  Thanks!
>
> Miles

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Peer Stritzinger-3
In reply to this post by Raimo Niskanen-2
IIRC there is one guarantee in newer versions of Erlang (for a pretty conservative definition of new ;-)

All messages from another node arrive between its nodeup and nodedown message.

This means that you can always detect if there is a possible message loss between two nodes
by monitoring the node.  

Or as I understand monitoring a remote process also does this implicitly since a link-down triggers
DOWN messages on all monitored processes across this link and a nodedown

Please correct me if this is not true (modulo the Hans Svensson/Lars-Åke Fredlund paper).

Cheers,
-- Peer

> On 27.09.2017, at 16:42, Raimo Niskanen <[hidden email]> wrote:
>
> I'll try to summarize what I know on the topic, acting a bit as a ghost
> writer for the VM Team.
>
> See also the old, outdated language specification, which is the best we have.
> It is still the soother that the VM Team sucks on when they do not know
> what else to do, and an updated version is in the pipeline.
> See especially 10.6.2 Order of signals:
>
>    http://erlang.org/download/erl_spec47.ps.gz
>
>
> Also see, especially 2.1 Passing of Signals "signals can be lost":
>
>    http://erlang.org/doc/apps/erts/communication.html
>
>
> Message order between a pair of processes is guaranteed.  I.e. a message
> sent after another message will not be received before that other message.
>
> Messages may be dropped.  In particular due to a communication link
> between nodes going down and up.
>
> If you set a monitor (or a link) on a process you will get a 'DOWN'
> message if the other process vanishes i.e. dies or communication link lost.
> That 'DOWN' message is guaranteed to be delivered (the same applies to
> links and 'EXIT' messages).
>
> An example: if process P1 first sets a monitor on process P2 and then
> sends messages M1, M2 and M3 to P2.  P2 acknowledges M1 with M1a and M3
> with M3a.  Then if P1 gets M1a it knows that P2 has seen M1 and P1 is
> guaranteed to eventually get either M3a or 'DOWN'.  If it gets M3a then
> it knows P2 have seen M2 and M3.  If it gets 'DOWN' then M2 may have been
> either dropped or seen by P2, the same applies to M3, and P1 may eventually
> get M3a knowing that P2 has seen M3, but can not know if it has seen M2.
>
> Another example: gen_server:call first sets a monitor on the server process,
> then sends the query.  By that it knows it will eventually either get
> the reply or 'DOWN'.  If it gets 'DOWN' it actually may get a late reply
> (e.g. network down-up), which is often overlooked.
>
> The distribution communication is is per default implemented with TCP links
> between nodes.  The VM relies on the distribution transport to deliver
> messages in order, or to die meaning that the link has failed and that any
> number of messages at the end of the sequence may have been dropped.
>
> Process links and monitors towards processes on remote nodes are registered
> in the local node on the distribution channel entry for that remote node,
> so the VM can trigger 'DOWN' and 'EXIT' messages for all links and monitors
> when a communication link goes down.  These messages are guaranteed to be
> delivered (if their owner exists).
>
> I hope this clears things up.
> / Raimo Niskanen
>
>
>
> On Tue, Sep 26, 2017 at 05:16:56PM -0700, Miles Fidelman wrote:
>> Hi Joe,
>>
>> Hmmm....
>>
>> Joe Armstrong wrote:
>>
>>> What I said was "message passing is assumed to be reliable"
>>
>>>
>>> The key word here is *assumed* my assumption is that if I open a TCP
>>> socket
>>> and send it five messages numbered 1 to 5 then If I successfully read
>>> message
>>> 5 and have seen no error indicators then I can *assume* that messages 1 to
>>> 4 also arrived in order.
>>>
>>
>> Well yes, but with TCP one has sequence numbers, buffering, and
>> retransmission - and GUARANTEES, by design, that if you (say a socket
>> connection) receive packet 5, then you've also received packets 1-4, in
>> order.
>>
>> My understanding is that Erlang does NOT make that guarantee.  As stated:
>>
>> - message delivery is assumed to be UNRELIABLE
>>
>> - ordering is guaranteed to be maintained
>>
>> The implication being that one might well receive packets 1, 2, 3, 5 -
>> and not know that 4 is missing.
>>
>>> Actually I have no idea if this is true - but it does seem to be a
>>> reasonable
>>> assumption.
>>>
>>> Messages 1 to 4 might have arrived got put in a buffer prior to my reading
>>> them and accidentally reordered due to a software bug. An alpha particle
>>> might have hit the data in message 3 and changed it -- who knows?
>>
>>
>> More likely, a TCP connection has dropped, taking a message or two with
>> it, and once the connection is re-established, stuff starts flowing
>> after a gap.
>>
>> With UDP, packets could arrive out of order as well as get dropped.
>>
>> There are ways to extend TCP, or write a higher level protocol that will
>> detect dropped connections, and packets, reconnect, request
>> retransmission - with the result that both the sender & receiver are
>> guaranteed both delivery & order.
>>
>> Which brings us back to implementation.
>>
>>>
>>> Having assumed that message passing is reliable I build code based on
>>> this assumption.
>>
>> But, for Erlang, we can't make this assumption - the documentation
>> specifically says so.
>>
>>>
>>> I'm not, of course, saying that the assumption is true, just that I trust
>>> the
>>> implementers of the system have done a good job to try and make it true.
>>> Certainly any repeatable counter examples should have been investigated
>>> to see if there were any errors in the system.
>>>
>>> All this builds on layers of trust. I trust that erlang message passing is
>>> ordered and reliable in the absence of errors.
>>>
>>> The Erlang implementers trust that TCP is reliable.
>>
>>
>> Well, that is the question, isn't it.  Lots of things cause TCP to drop
>> connections.  So the question remains - how are dropped connections
>> handled?  And, if after a connection is dropped and restored, how are
>> dropped messages and/or messages received out of order handled?
>>
>> Actually, there's another design question in there - in a multi-node
>> Erlang system, maintaining n2 TCP connections seems just a tad
>> unwieldy.  Personally, I'd be more likely to use a connectionless
>> protocol, maybe even broadcast.
>>
>>
>>>
>>> The TCP implementors trust that the OS is reliable.
>>>
>>> The OS implementors trust that the processor is reliable.
>>>
>>> The processor implementors trust that the VLSI compilers are correct.
>>>
>>> Software runs on physical machines - so really the laws of physics
>>> apply not
>>> maths. Physics takes into account space and time, and the concept of
>>> simultaneity does not exist, no so in maths.
>>>
>>> It seems to me that software is built upon chains of trust, not upon
>>> mathematical chains of proof.
>>>
>>> I've just been saying "what we want to achieve" and not "how we can
>>> achieve
>>> it".
>>
>> Which brings us back to:
>>
>> stated goals:  unreliable delivery, ordered delivery
>>
>> The BEAM Book details how this works within a node, but is silent on how
>> distributed Erlang is implemented.  I'm really interested in some details.
>>
>>>
>>> The statements that people make about the system should be in terms
>>> of belief rather than proof.
>>>
>>> I'd say "I believe we have reliable message passing"
>>> It would be plain daft to say "we have reliable message passing" or
>>> "we can prove it be correct" since there is no way of validating this.
>>
>> Sure there is.  The state machine model of TCP is very clearly defined,
>> including its various error conditions.  And one can test an
>> implementation for adherence to the state machine model.  (In some
>> cases, one can also demonstrate that software is provably correct - but
>> let's not go there).
>>
>>
>>>
>>> Call me old fashioned but I think that claims that, for example,
>>> "we have unlimited storage" and so on are just nuts ...
>>
>> Agreed.  But claims like "when allocated storage reaches 80% use,
>> additional storage is allocated by <mechanism>" are not just reasonable,
>> but mandatory when designing systems that have to scale under uncertain
>> load.
>>
>> Which brings us back to - how is message passing implemented between
>> Erlang nodes?
>>
>> Cheers,
>>
>> Miles
>>
>> --
>> In theory, there is no difference between theory and practice.
>> In practice, there is.  .... Yogi Berra
>>
>
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Raimo Niskanen-2
I have no objections.  You are absolutely right.
/ Raimo

On Thu, Sep 28, 2017 at 12:52:25PM +0200, Peer Stritzinger wrote:

> IIRC there is one guarantee in newer versions of Erlang (for a pretty conservative definition of new ;-)
>
> All messages from another node arrive between its nodeup and nodedown message.
>
> This means that you can always detect if there is a possible message loss between two nodes
> by monitoring the node.  
>
> Or as I understand monitoring a remote process also does this implicitly since a link-down triggers
> DOWN messages on all monitored processes across this link and a nodedown
>
> Please correct me if this is not true (modulo the Hans Svensson/Lars-Åke Fredlund paper).
>
> Cheers,
> -- Peer
>
> > On 27.09.2017, at 16:42, Raimo Niskanen <[hidden email]> wrote:
> >
> > I'll try to summarize what I know on the topic, acting a bit as a ghost
> > writer for the VM Team.
> >
> > See also the old, outdated language specification, which is the best we have.
> > It is still the soother that the VM Team sucks on when they do not know
> > what else to do, and an updated version is in the pipeline.
> > See especially 10.6.2 Order of signals:
> >
> >    http://erlang.org/download/erl_spec47.ps.gz
> >
> >
> > Also see, especially 2.1 Passing of Signals "signals can be lost":
> >
> >    http://erlang.org/doc/apps/erts/communication.html
> >
> >
> > Message order between a pair of processes is guaranteed.  I.e. a message
> > sent after another message will not be received before that other message.
> >
> > Messages may be dropped.  In particular due to a communication link
> > between nodes going down and up.
> >
> > If you set a monitor (or a link) on a process you will get a 'DOWN'
> > message if the other process vanishes i.e. dies or communication link lost.
> > That 'DOWN' message is guaranteed to be delivered (the same applies to
> > links and 'EXIT' messages).
> >
> > An example: if process P1 first sets a monitor on process P2 and then
> > sends messages M1, M2 and M3 to P2.  P2 acknowledges M1 with M1a and M3
> > with M3a.  Then if P1 gets M1a it knows that P2 has seen M1 and P1 is
> > guaranteed to eventually get either M3a or 'DOWN'.  If it gets M3a then
> > it knows P2 have seen M2 and M3.  If it gets 'DOWN' then M2 may have been
> > either dropped or seen by P2, the same applies to M3, and P1 may eventually
> > get M3a knowing that P2 has seen M3, but can not know if it has seen M2.
> >
> > Another example: gen_server:call first sets a monitor on the server process,
> > then sends the query.  By that it knows it will eventually either get
> > the reply or 'DOWN'.  If it gets 'DOWN' it actually may get a late reply
> > (e.g. network down-up), which is often overlooked.
> >
> > The distribution communication is is per default implemented with TCP links
> > between nodes.  The VM relies on the distribution transport to deliver
> > messages in order, or to die meaning that the link has failed and that any
> > number of messages at the end of the sequence may have been dropped.
> >
> > Process links and monitors towards processes on remote nodes are registered
> > in the local node on the distribution channel entry for that remote node,
> > so the VM can trigger 'DOWN' and 'EXIT' messages for all links and monitors
> > when a communication link goes down.  These messages are guaranteed to be
> > delivered (if their owner exists).
> >
> > I hope this clears things up.
> > / Raimo Niskanen
> >
> >
> >
> > On Tue, Sep 26, 2017 at 05:16:56PM -0700, Miles Fidelman wrote:
> >> Hi Joe,
> >>
> >> Hmmm....
> >>
> >> Joe Armstrong wrote:
> >>
> >>> What I said was "message passing is assumed to be reliable"
> >>
> >>>
> >>> The key word here is *assumed* my assumption is that if I open a TCP
> >>> socket
> >>> and send it five messages numbered 1 to 5 then If I successfully read
> >>> message
> >>> 5 and have seen no error indicators then I can *assume* that messages 1 to
> >>> 4 also arrived in order.
> >>>
> >>
> >> Well yes, but with TCP one has sequence numbers, buffering, and
> >> retransmission - and GUARANTEES, by design, that if you (say a socket
> >> connection) receive packet 5, then you've also received packets 1-4, in
> >> order.
> >>
> >> My understanding is that Erlang does NOT make that guarantee.  As stated:
> >>
> >> - message delivery is assumed to be UNRELIABLE
> >>
> >> - ordering is guaranteed to be maintained
> >>
> >> The implication being that one might well receive packets 1, 2, 3, 5 -
> >> and not know that 4 is missing.
> >>
> >>> Actually I have no idea if this is true - but it does seem to be a
> >>> reasonable
> >>> assumption.
> >>>
> >>> Messages 1 to 4 might have arrived got put in a buffer prior to my reading
> >>> them and accidentally reordered due to a software bug. An alpha particle
> >>> might have hit the data in message 3 and changed it -- who knows?
> >>
> >>
> >> More likely, a TCP connection has dropped, taking a message or two with
> >> it, and once the connection is re-established, stuff starts flowing
> >> after a gap.
> >>
> >> With UDP, packets could arrive out of order as well as get dropped.
> >>
> >> There are ways to extend TCP, or write a higher level protocol that will
> >> detect dropped connections, and packets, reconnect, request
> >> retransmission - with the result that both the sender & receiver are
> >> guaranteed both delivery & order.
> >>
> >> Which brings us back to implementation.
> >>
> >>>
> >>> Having assumed that message passing is reliable I build code based on
> >>> this assumption.
> >>
> >> But, for Erlang, we can't make this assumption - the documentation
> >> specifically says so.
> >>
> >>>
> >>> I'm not, of course, saying that the assumption is true, just that I trust
> >>> the
> >>> implementers of the system have done a good job to try and make it true.
> >>> Certainly any repeatable counter examples should have been investigated
> >>> to see if there were any errors in the system.
> >>>
> >>> All this builds on layers of trust. I trust that erlang message passing is
> >>> ordered and reliable in the absence of errors.
> >>>
> >>> The Erlang implementers trust that TCP is reliable.
> >>
> >>
> >> Well, that is the question, isn't it.  Lots of things cause TCP to drop
> >> connections.  So the question remains - how are dropped connections
> >> handled?  And, if after a connection is dropped and restored, how are
> >> dropped messages and/or messages received out of order handled?
> >>
> >> Actually, there's another design question in there - in a multi-node
> >> Erlang system, maintaining n2 TCP connections seems just a tad
> >> unwieldy.  Personally, I'd be more likely to use a connectionless
> >> protocol, maybe even broadcast.
> >>
> >>
> >>>
> >>> The TCP implementors trust that the OS is reliable.
> >>>
> >>> The OS implementors trust that the processor is reliable.
> >>>
> >>> The processor implementors trust that the VLSI compilers are correct.
> >>>
> >>> Software runs on physical machines - so really the laws of physics
> >>> apply not
> >>> maths. Physics takes into account space and time, and the concept of
> >>> simultaneity does not exist, no so in maths.
> >>>
> >>> It seems to me that software is built upon chains of trust, not upon
> >>> mathematical chains of proof.
> >>>
> >>> I've just been saying "what we want to achieve" and not "how we can
> >>> achieve
> >>> it".
> >>
> >> Which brings us back to:
> >>
> >> stated goals:  unreliable delivery, ordered delivery
> >>
> >> The BEAM Book details how this works within a node, but is silent on how
> >> distributed Erlang is implemented.  I'm really interested in some details.
> >>
> >>>
> >>> The statements that people make about the system should be in terms
> >>> of belief rather than proof.
> >>>
> >>> I'd say "I believe we have reliable message passing"
> >>> It would be plain daft to say "we have reliable message passing" or
> >>> "we can prove it be correct" since there is no way of validating this.
> >>
> >> Sure there is.  The state machine model of TCP is very clearly defined,
> >> including its various error conditions.  And one can test an
> >> implementation for adherence to the state machine model.  (In some
> >> cases, one can also demonstrate that software is provably correct - but
> >> let's not go there).
> >>
> >>
> >>>
> >>> Call me old fashioned but I think that claims that, for example,
> >>> "we have unlimited storage" and so on are just nuts ...
> >>
> >> Agreed.  But claims like "when allocated storage reaches 80% use,
> >> additional storage is allocated by <mechanism>" are not just reasonable,
> >> but mandatory when designing systems that have to scale under uncertain
> >> load.
> >>
> >> Which brings us back to - how is message passing implemented between
> >> Erlang nodes?
> >>
> >> Cheers,
> >>
> >> Miles
> >>
> >> --
> >> In theory, there is no difference between theory and practice.
> >> In practice, there is.  .... Yogi Berra
> >>
> >
> >> _______________________________________________
> >> erlang-questions mailing list
> >> [hidden email]
> >> http://erlang.org/mailman/listinfo/erlang-questions
> >
> >
> > --
> >
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email]
> > http://erlang.org/mailman/listinfo/erlang-questions
>

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Miles Fidelman
In reply to this post by Peer Stritzinger-3
That's a very useful piece of information (i.e., it's useful when
designing applications)!

Note to whomever:  This is the kind of thing that should be stated
explicitly in documentation (both user & design specs).

Thanks!

Miles


On 9/28/17 3:52 AM, Peer Stritzinger wrote:

> IIRC there is one guarantee in newer versions of Erlang (for a pretty conservative definition of new ;-)
>
> All messages from another node arrive between its nodeup and nodedown message.
>
> This means that you can always detect if there is a possible message loss between two nodes
> by monitoring the node.
>
> Or as I understand monitoring a remote process also does this implicitly since a link-down triggers
> DOWN messages on all monitored processes across this link and a nodedown
>
> Please correct me if this is not true (modulo the Hans Svensson/Lars-Åke Fredlund paper).
>
> Cheers,
> -- Peer
>
>> On 27.09.2017, at 16:42, Raimo Niskanen <[hidden email]> wrote:
>>
>> I'll try to summarize what I know on the topic, acting a bit as a ghost
>> writer for the VM Team.
>>
>> See also the old, outdated language specification, which is the best we have.
>> It is still the soother that the VM Team sucks on when they do not know
>> what else to do, and an updated version is in the pipeline.
>> See especially 10.6.2 Order of signals:
>>
>>     http://erlang.org/download/erl_spec47.ps.gz
>>
>>
>> Also see, especially 2.1 Passing of Signals "signals can be lost":
>>
>>     http://erlang.org/doc/apps/erts/communication.html
>>
>>
>> Message order between a pair of processes is guaranteed.  I.e. a message
>> sent after another message will not be received before that other message.
>>
>> Messages may be dropped.  In particular due to a communication link
>> between nodes going down and up.
>>
>> If you set a monitor (or a link) on a process you will get a 'DOWN'
>> message if the other process vanishes i.e. dies or communication link lost.
>> That 'DOWN' message is guaranteed to be delivered (the same applies to
>> links and 'EXIT' messages).
>>
>> An example: if process P1 first sets a monitor on process P2 and then
>> sends messages M1, M2 and M3 to P2.  P2 acknowledges M1 with M1a and M3
>> with M3a.  Then if P1 gets M1a it knows that P2 has seen M1 and P1 is
>> guaranteed to eventually get either M3a or 'DOWN'.  If it gets M3a then
>> it knows P2 have seen M2 and M3.  If it gets 'DOWN' then M2 may have been
>> either dropped or seen by P2, the same applies to M3, and P1 may eventually
>> get M3a knowing that P2 has seen M3, but can not know if it has seen M2.
>>
>> Another example: gen_server:call first sets a monitor on the server process,
>> then sends the query.  By that it knows it will eventually either get
>> the reply or 'DOWN'.  If it gets 'DOWN' it actually may get a late reply
>> (e.g. network down-up), which is often overlooked.
>>
>> The distribution communication is is per default implemented with TCP links
>> between nodes.  The VM relies on the distribution transport to deliver
>> messages in order, or to die meaning that the link has failed and that any
>> number of messages at the end of the sequence may have been dropped.
>>
>> Process links and monitors towards processes on remote nodes are registered
>> in the local node on the distribution channel entry for that remote node,
>> so the VM can trigger 'DOWN' and 'EXIT' messages for all links and monitors
>> when a communication link goes down.  These messages are guaranteed to be
>> delivered (if their owner exists).
>>
>> I hope this clears things up.
>> / Raimo Niskanen
>>
>>
>>
>> On Tue, Sep 26, 2017 at 05:16:56PM -0700, Miles Fidelman wrote:
>>> Hi Joe,
>>>
>>> Hmmm....
>>>
>>> Joe Armstrong wrote:
>>>
>>>> What I said was "message passing is assumed to be reliable"
>>>> The key word here is *assumed* my assumption is that if I open a TCP
>>>> socket
>>>> and send it five messages numbered 1 to 5 then If I successfully read
>>>> message
>>>> 5 and have seen no error indicators then I can *assume* that messages 1 to
>>>> 4 also arrived in order.
>>>>
>>> Well yes, but with TCP one has sequence numbers, buffering, and
>>> retransmission - and GUARANTEES, by design, that if you (say a socket
>>> connection) receive packet 5, then you've also received packets 1-4, in
>>> order.
>>>
>>> My understanding is that Erlang does NOT make that guarantee.  As stated:
>>>
>>> - message delivery is assumed to be UNRELIABLE
>>>
>>> - ordering is guaranteed to be maintained
>>>
>>> The implication being that one might well receive packets 1, 2, 3, 5 -
>>> and not know that 4 is missing.
>>>
>>>> Actually I have no idea if this is true - but it does seem to be a
>>>> reasonable
>>>> assumption.
>>>>
>>>> Messages 1 to 4 might have arrived got put in a buffer prior to my reading
>>>> them and accidentally reordered due to a software bug. An alpha particle
>>>> might have hit the data in message 3 and changed it -- who knows?
>>>
>>> More likely, a TCP connection has dropped, taking a message or two with
>>> it, and once the connection is re-established, stuff starts flowing
>>> after a gap.
>>>
>>> With UDP, packets could arrive out of order as well as get dropped.
>>>
>>> There are ways to extend TCP, or write a higher level protocol that will
>>> detect dropped connections, and packets, reconnect, request
>>> retransmission - with the result that both the sender & receiver are
>>> guaranteed both delivery & order.
>>>
>>> Which brings us back to implementation.
>>>
>>>> Having assumed that message passing is reliable I build code based on
>>>> this assumption.
>>> But, for Erlang, we can't make this assumption - the documentation
>>> specifically says so.
>>>
>>>> I'm not, of course, saying that the assumption is true, just that I trust
>>>> the
>>>> implementers of the system have done a good job to try and make it true.
>>>> Certainly any repeatable counter examples should have been investigated
>>>> to see if there were any errors in the system.
>>>>
>>>> All this builds on layers of trust. I trust that erlang message passing is
>>>> ordered and reliable in the absence of errors.
>>>>
>>>> The Erlang implementers trust that TCP is reliable.
>>>
>>> Well, that is the question, isn't it.  Lots of things cause TCP to drop
>>> connections.  So the question remains - how are dropped connections
>>> handled?  And, if after a connection is dropped and restored, how are
>>> dropped messages and/or messages received out of order handled?
>>>
>>> Actually, there's another design question in there - in a multi-node
>>> Erlang system, maintaining n2 TCP connections seems just a tad
>>> unwieldy.  Personally, I'd be more likely to use a connectionless
>>> protocol, maybe even broadcast.
>>>
>>>
>>>> The TCP implementors trust that the OS is reliable.
>>>>
>>>> The OS implementors trust that the processor is reliable.
>>>>
>>>> The processor implementors trust that the VLSI compilers are correct.
>>>>
>>>> Software runs on physical machines - so really the laws of physics
>>>> apply not
>>>> maths. Physics takes into account space and time, and the concept of
>>>> simultaneity does not exist, no so in maths.
>>>>
>>>> It seems to me that software is built upon chains of trust, not upon
>>>> mathematical chains of proof.
>>>>
>>>> I've just been saying "what we want to achieve" and not "how we can
>>>> achieve
>>>> it".
>>> Which brings us back to:
>>>
>>> stated goals:  unreliable delivery, ordered delivery
>>>
>>> The BEAM Book details how this works within a node, but is silent on how
>>> distributed Erlang is implemented.  I'm really interested in some details.
>>>
>>>> The statements that people make about the system should be in terms
>>>> of belief rather than proof.
>>>>
>>>> I'd say "I believe we have reliable message passing"
>>>> It would be plain daft to say "we have reliable message passing" or
>>>> "we can prove it be correct" since there is no way of validating this.
>>> Sure there is.  The state machine model of TCP is very clearly defined,
>>> including its various error conditions.  And one can test an
>>> implementation for adherence to the state machine model.  (In some
>>> cases, one can also demonstrate that software is provably correct - but
>>> let's not go there).
>>>
>>>
>>>> Call me old fashioned but I think that claims that, for example,
>>>> "we have unlimited storage" and so on are just nuts ...
>>> Agreed.  But claims like "when allocated storage reaches 80% use,
>>> additional storage is allocated by <mechanism>" are not just reasonable,
>>> but mandatory when designing systems that have to scale under uncertain
>>> load.
>>>
>>> Which brings us back to - how is message passing implemented between
>>> Erlang nodes?
>>>
>>> Cheers,
>>>
>>> Miles
>>>
>>> --
>>> In theory, there is no difference between theory and practice.
>>> In practice, there is.  .... Yogi Berra
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> [hidden email]
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>> --
>>
>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

--
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Richard A. O'Keefe-2
In reply to this post by Matthias Lang


On 28/09/17 3:57 AM, Matthias Lang wrote:

> SMS is a good example, and there's a detail you omitted. Your daughter
> could most likely have seen that the message had not been delivered.

I omitted that detail because of outright ignorance.
I can't find anything about that feature in my phone's settings.
(It's a Nokia feature-phone, which I like because it is too dumb
to do bad things...)

> SMS has a mechanism for telling the sender that a particular message
> was delivered to the phone.

And _that_ is a detail that I _did_ mention.
Delivery to the *phone* is not the same as delivery to the *owner*;
someone else might have read it (I have known this to happen) and
the fumble-fingered owner might have accidentally deleted it before
actually reading it (I haven't quite managed that yet, but I am
certainly fumble-fingered enough to accidentally delete messages
before sending them).

In short, you still can't depend on the person you were trying to
communicate with having received the message unless/until they
tell you they did.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: question re. message delivery

Raimo Niskanen-2
In reply to this post by Miles Fidelman
On Thu, Sep 28, 2017 at 12:35:24PM -0700, Miles Fidelman wrote:
> That's a very useful piece of information (i.e., it's useful when
> designing applications)!
>
> Note to whomever:  This is the kind of thing that should be stated
> explicitly in documentation (both user & design specs).

Oh yes!

We are anticipating an updated Erlang Specification the VM team has on
their ToDo list.  When good enough it will be published as a draft to
be scrutinized by the Community, especially by the Old Masters.

/ Raimo


>
> Thanks!
>
> Miles
>
>
> On 9/28/17 3:52 AM, Peer Stritzinger wrote:
> > IIRC there is one guarantee in newer versions of Erlang (for a pretty conservative definition of new ;-)
> >
> > All messages from another node arrive between its nodeup and nodedown message.
> >
> > This means that you can always detect if there is a possible message loss between two nodes
> > by monitoring the node.
> >
> > Or as I understand monitoring a remote process also does this implicitly since a link-down triggers
> > DOWN messages on all monitored processes across this link and a nodedown
> >
> > Please correct me if this is not true (modulo the Hans Svensson/Lars-Åke Fredlund paper).
> >
> > Cheers,
> > -- Peer
> >
> >> On 27.09.2017, at 16:42, Raimo Niskanen <[hidden email]> wrote:
> >>
> >> I'll try to summarize what I know on the topic, acting a bit as a ghost
> >> writer for the VM Team.
> >>
> >> See also the old, outdated language specification, which is the best we have.
> >> It is still the soother that the VM Team sucks on when they do not know
> >> what else to do, and an updated version is in the pipeline.
> >> See especially 10.6.2 Order of signals:
> >>
> >>     http://erlang.org/download/erl_spec47.ps.gz
> >>
> >>
> >> Also see, especially 2.1 Passing of Signals "signals can be lost":
> >>
> >>     http://erlang.org/doc/apps/erts/communication.html
> >>
> >>
> >> Message order between a pair of processes is guaranteed.  I.e. a message
> >> sent after another message will not be received before that other message.
> >>
> >> Messages may be dropped.  In particular due to a communication link
> >> between nodes going down and up.
> >>
> >> If you set a monitor (or a link) on a process you will get a 'DOWN'
> >> message if the other process vanishes i.e. dies or communication link lost.
> >> That 'DOWN' message is guaranteed to be delivered (the same applies to
> >> links and 'EXIT' messages).
> >>
> >> An example: if process P1 first sets a monitor on process P2 and then
> >> sends messages M1, M2 and M3 to P2.  P2 acknowledges M1 with M1a and M3
> >> with M3a.  Then if P1 gets M1a it knows that P2 has seen M1 and P1 is
> >> guaranteed to eventually get either M3a or 'DOWN'.  If it gets M3a then
> >> it knows P2 have seen M2 and M3.  If it gets 'DOWN' then M2 may have been
> >> either dropped or seen by P2, the same applies to M3, and P1 may eventually
> >> get M3a knowing that P2 has seen M3, but can not know if it has seen M2.
> >>
> >> Another example: gen_server:call first sets a monitor on the server process,
> >> then sends the query.  By that it knows it will eventually either get
> >> the reply or 'DOWN'.  If it gets 'DOWN' it actually may get a late reply
> >> (e.g. network down-up), which is often overlooked.
> >>
> >> The distribution communication is is per default implemented with TCP links
> >> between nodes.  The VM relies on the distribution transport to deliver
> >> messages in order, or to die meaning that the link has failed and that any
> >> number of messages at the end of the sequence may have been dropped.
> >>
> >> Process links and monitors towards processes on remote nodes are registered
> >> in the local node on the distribution channel entry for that remote node,
> >> so the VM can trigger 'DOWN' and 'EXIT' messages for all links and monitors
> >> when a communication link goes down.  These messages are guaranteed to be
> >> delivered (if their owner exists).
> >>
> >> I hope this clears things up.
> >> / Raimo Niskanen
> >>
> >>
> >>
> >> On Tue, Sep 26, 2017 at 05:16:56PM -0700, Miles Fidelman wrote:
> >>> Hi Joe,
> >>>
> >>> Hmmm....
> >>>
> >>> Joe Armstrong wrote:
> >>>
> >>>> What I said was "message passing is assumed to be reliable"
> >>>> The key word here is *assumed* my assumption is that if I open a TCP
> >>>> socket
> >>>> and send it five messages numbered 1 to 5 then If I successfully read
> >>>> message
> >>>> 5 and have seen no error indicators then I can *assume* that messages 1 to
> >>>> 4 also arrived in order.
> >>>>
> >>> Well yes, but with TCP one has sequence numbers, buffering, and
> >>> retransmission - and GUARANTEES, by design, that if you (say a socket
> >>> connection) receive packet 5, then you've also received packets 1-4, in
> >>> order.
> >>>
> >>> My understanding is that Erlang does NOT make that guarantee.  As stated:
> >>>
> >>> - message delivery is assumed to be UNRELIABLE
> >>>
> >>> - ordering is guaranteed to be maintained
> >>>
> >>> The implication being that one might well receive packets 1, 2, 3, 5 -
> >>> and not know that 4 is missing.
> >>>
> >>>> Actually I have no idea if this is true - but it does seem to be a
> >>>> reasonable
> >>>> assumption.
> >>>>
> >>>> Messages 1 to 4 might have arrived got put in a buffer prior to my reading
> >>>> them and accidentally reordered due to a software bug. An alpha particle
> >>>> might have hit the data in message 3 and changed it -- who knows?
> >>>
> >>> More likely, a TCP connection has dropped, taking a message or two with
> >>> it, and once the connection is re-established, stuff starts flowing
> >>> after a gap.
> >>>
> >>> With UDP, packets could arrive out of order as well as get dropped.
> >>>
> >>> There are ways to extend TCP, or write a higher level protocol that will
> >>> detect dropped connections, and packets, reconnect, request
> >>> retransmission - with the result that both the sender & receiver are
> >>> guaranteed both delivery & order.
> >>>
> >>> Which brings us back to implementation.
> >>>
> >>>> Having assumed that message passing is reliable I build code based on
> >>>> this assumption.
> >>> But, for Erlang, we can't make this assumption - the documentation
> >>> specifically says so.
> >>>
> >>>> I'm not, of course, saying that the assumption is true, just that I trust
> >>>> the
> >>>> implementers of the system have done a good job to try and make it true.
> >>>> Certainly any repeatable counter examples should have been investigated
> >>>> to see if there were any errors in the system.
> >>>>
> >>>> All this builds on layers of trust. I trust that erlang message passing is
> >>>> ordered and reliable in the absence of errors.
> >>>>
> >>>> The Erlang implementers trust that TCP is reliable.
> >>>
> >>> Well, that is the question, isn't it.  Lots of things cause TCP to drop
> >>> connections.  So the question remains - how are dropped connections
> >>> handled?  And, if after a connection is dropped and restored, how are
> >>> dropped messages and/or messages received out of order handled?
> >>>
> >>> Actually, there's another design question in there - in a multi-node
> >>> Erlang system, maintaining n2 TCP connections seems just a tad
> >>> unwieldy.  Personally, I'd be more likely to use a connectionless
> >>> protocol, maybe even broadcast.
> >>>
> >>>
> >>>> The TCP implementors trust that the OS is reliable.
> >>>>
> >>>> The OS implementors trust that the processor is reliable.
> >>>>
> >>>> The processor implementors trust that the VLSI compilers are correct.
> >>>>
> >>>> Software runs on physical machines - so really the laws of physics
> >>>> apply not
> >>>> maths. Physics takes into account space and time, and the concept of
> >>>> simultaneity does not exist, no so in maths.
> >>>>
> >>>> It seems to me that software is built upon chains of trust, not upon
> >>>> mathematical chains of proof.
> >>>>
> >>>> I've just been saying "what we want to achieve" and not "how we can
> >>>> achieve
> >>>> it".
> >>> Which brings us back to:
> >>>
> >>> stated goals:  unreliable delivery, ordered delivery
> >>>
> >>> The BEAM Book details how this works within a node, but is silent on how
> >>> distributed Erlang is implemented.  I'm really interested in some details.
> >>>
> >>>> The statements that people make about the system should be in terms
> >>>> of belief rather than proof.
> >>>>
> >>>> I'd say "I believe we have reliable message passing"
> >>>> It would be plain daft to say "we have reliable message passing" or
> >>>> "we can prove it be correct" since there is no way of validating this.
> >>> Sure there is.  The state machine model of TCP is very clearly defined,
> >>> including its various error conditions.  And one can test an
> >>> implementation for adherence to the state machine model.  (In some
> >>> cases, one can also demonstrate that software is provably correct - but
> >>> let's not go there).
> >>>
> >>>
> >>>> Call me old fashioned but I think that claims that, for example,
> >>>> "we have unlimited storage" and so on are just nuts ...
> >>> Agreed.  But claims like "when allocated storage reaches 80% use,
> >>> additional storage is allocated by <mechanism>" are not just reasonable,
> >>> but mandatory when designing systems that have to scale under uncertain
> >>> load.
> >>>
> >>> Which brings us back to - how is message passing implemented between
> >>> Erlang nodes?
> >>>
> >>> Cheers,
> >>>
> >>> Miles
> >>>
> >>> --
> >>> In theory, there is no difference between theory and practice.
> >>> In practice, there is.  .... Yogi Berra
> >>>
> >>> _______________________________________________
> >>> erlang-questions mailing list
> >>> [hidden email]
> >>> http://erlang.org/mailman/listinfo/erlang-questions
> >>
> >> --
> >>
> >> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >> _______________________________________________
> >> erlang-questions mailing list
> >> [hidden email]
> >> http://erlang.org/mailman/listinfo/erlang-questions
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email]
> > http://erlang.org/mailman/listinfo/erlang-questions
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.  .... Yogi Berra
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
12