How does Erlang TCP determine the end of a TCP stream?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

How does Erlang TCP determine the end of a TCP stream?

asdf asdf
Hello,

From my understanding, tcp is a stream based protocol, and you can’t really tell one packet from another. You can make sure the packets arrive in order, but whether or not they should be “grouped” is not seen by the application layer.

But somehow Erlang ssl and tcp abstract this away.

In my code, I have some sort of the following:

Handle_info({tcp/ssl, Sock, Data}, State) -> … and every time that I receive data into the socket, it comes as the “full package”, meaning that when another server sends a message to it over the socket, It is received as the full length of the packet every time with no extra bytes here or there. On the other hand, a co-worker’s server has to implement socket reads and can’t determine one “group” from another. They have to read in the length or set delimiters.


What is special about OTP that delimits these packets?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

Vimal Kumar
Hi,

I believe you are using with the inet:setopts/2 'active' option set to true? Set it to false or once and then use gen_tcp:recv/2,3 to read from tcp stream upto the length you want. 


On Tue, Oct 31, 2017 at 8:11 PM, code wiget <[hidden email]> wrote:
Hello,

From my understanding, tcp is a stream based protocol, and you can’t really tell one packet from another. You can make sure the packets arrive in order, but whether or not they should be “grouped” is not seen by the application layer.

But somehow Erlang ssl and tcp abstract this away.

In my code, I have some sort of the following:

Handle_info({tcp/ssl, Sock, Data}, State) -> … and every time that I receive data into the socket, it comes as the “full package”, meaning that when another server sends a message to it over the socket, It is received as the full length of the packet every time with no extra bytes here or there. On the other hand, a co-worker’s server has to implement socket reads and can’t determine one “group” from another. They have to read in the length or set delimiters.


What is special about OTP that delimits these packets?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

scott ribe
In reply to this post by asdf asdf
On Oct 31, 2017, at 8:41 AM, code wiget <[hidden email]> wrote:
>
> Handle_info({tcp/ssl, Sock, Data}, State) -> … and every time that I receive data into the socket, it comes as the “full package”, meaning that when another server sends a message to it over the socket, It is received as the full length of the packet every time with no extra bytes here or there.

Pure coincidence. Yes, it is highly likely that a single send on one end comes through a single receive on the other end, but there is no guarantee so you must figure out framing yourself.

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

asdf asdf
Vimal and Scott,  

I would like to clarify quickly that this is not a coincidence, and that I am processing ~500 messages/min over 1 socket connected to 1 client server and all of the messages are “framed” properly. There is something in the OTP that is framing the packets.

Vimal - I am using {active, once}. And I like when it comes in as a full packet, so I do not want to change anything that could eliminate that behavior. When I get a handle info and ensure that the data is the right length in the guard, I can then do validation on the packet (we use a start code and stop code of 4 bytes) to ensure the validity of the packet from there. Having a stream and using gen_tcp:recv would complicate the gen_server in my opinion and it is clean now.

I just went to the Erlang man pages on the set_opts, and it seems like {active, once} works for my purposes:

"If the value is once ({active, once}), onedata message from the socket is sent to the process…..

Use active mode only if your high-level protocol provides its own flow control (for example, acknowledging received messages) or the amount of data exchanged is small.

It does not define how long “one data message” is, but from reading on, it seems like the small size of my packets is the reason it is working. All of the packets I am expecting are < 500 bytes. 

This settles this a bit for me, I can sleep at night, but we are working on an assumption about erlang’s “one data message”.

If anyone can elaborate on this, or has some more in depth knowledge, I would greatly appreciate your expertise.




_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

scott ribe
On Oct 31, 2017, at 10:10 AM, code wiget <[hidden email]> wrote:
>
> I would like to clarify quickly that this is not a coincidence, and that I am processing ~500 messages/min over 1 socket connected to 1 client server and all of the messages are “framed” properly. There is something in the OTP that is framing the packets.

It is a coincidence and there is nothing framing the *MESSAGES*.

> It does not define how long “one data message” is, but from reading on, it seems like the small size of my packets is the reason it is working. All of the packets I am expecting are < 500 bytes.

That does make it extremely likely that each message will be delivered in a single packet, but there is still no guarantee. There is also no guarantee 2 or more messages won't be delivered in a single packet.

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

Stanislaw Klekot
In reply to this post by asdf asdf
On Tue, Oct 31, 2017 at 12:10:49PM -0400, code wiget wrote:
> Vimal - I am using {active, once}. And I like when it comes in as
> a full packet, so I do not want to change anything that could
> eliminate that behavior.

This behaviour is a coincidence that bases on your Erlang code being
able to keep up with the sender, and as such, it's brittle.

Do a test: wait some time (1s) in your code before setting {active,once}
again and see what you'll get. My bet is several packets clumped
together.

This is how read() and recv() work on stream BSD sockets. If the
socket's buffer is non-empty, they return whatever is in the buffer (up
to read length). There's nothing magical that Erlang would do on the
sockets for {packet,raw}.

--
Stanislaw Klekot
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

asdf asdf
Stanislaw,

Is there a common solution to this problem? Erlang was built to be used in servers/switches, i would assume that there was a solution built into OTP?

If not, do you handle it just at the application later with gen_tcp:recv() and read X bytes at a time from the socket based on delimiters and such? How would you reconcile this with using a gen_server implementation that receives messages from the ssl socket in handle info ?

I appreciate the help, I’m happy I didn’t roll this further

On Oct 31, 2017, 12:21 PM -0400, Stanislaw Klekot <[hidden email]>, wrote:
On Tue, Oct 31, 2017 at 12:10:49PM -0400, code wiget wrote:
Vimal - I am using {active, once}. And I like when it comes in as
a full packet, so I do not want to change anything that could
eliminate that behavior.

This behaviour is a coincidence that bases on your Erlang code being
able to keep up with the sender, and as such, it's brittle.

Do a test: wait some time (1s) in your code before setting {active,once}
again and see what you'll get. My bet is several packets clumped
together.

This is how read() and recv() work on stream BSD sockets. If the
socket's buffer is non-empty, they return whatever is in the buffer (up
to read length). There's nothing magical that Erlang would do on the
sockets for {packet,raw}.

--
Stanislaw Klekot

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

Jesper Louis Andersen-2
In reply to this post by asdf asdf
On Tue, Oct 31, 2017 at 3:41 PM code wiget <[hidden email]> wrote:
Handle_info({tcp/ssl, Sock, Data}, State) -> … and every time that I receive data into the socket, it comes as the “full package”, meaning that when another server sends a message to it over the socket, It is received as the full length of the packet every time with no extra bytes here or there. On the other hand, a co-worker’s server has to implement socket reads and can’t determine one “group” from another. They have to read in the length or set delimiters.

If you set an option such as {packet, 4} on the socket, then the VM will expect a 4 byte big endian length header followed by that many bytes in payload. Running with {active, once} or {active, N} will have the socket send you payloads one message at a time, stripped of said header.

If you don't set an option such as {packet, 4} then the VM can send you anything from 1 byte to buffering everything for a while and deliver a megabyte to you. In this case, you are receiving chunks of the stream at a time. It is common to receive around the MTU of the underlying network if your rate is fairly low (something like 1440-1460 bytes on ethernet is typical). You will have to do some work on your end in order to handle the case where one chunk doesn't have all the data necessary for a correct decode.

The typical solution is to buffer the partial chunk in the process and then append it when the next message arrives, trying to decode then.

It is a coincidence if you are in the latter of the above cases and happen to receive things in "full package" form. It'll break in any real network setting. Beware localhost as an interface, which often has a 16K or 64K MTU. Packets are then not broken here, but will be in any real network.

If you can, run with something like {packet, 4}. It is simple and works. Delimiters are worse because you have to scan for them and you often need to escape them in your payload. A Length-Value-Type encoding provides framing in which you don't have to scan and you don't need to escape. Putting the type last in the packet is debatable, but one advantage is that other languages have to allocate a full buffer which can often eliminate certain security-concerning bad implementations.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

scott ribe
On Oct 31, 2017, at 1:38 PM, Jesper Louis Andersen <[hidden email]> wrote:
>
> If you can, run with something like {packet, 4}. It is simple and works. Delimiters are worse because you have to scan for them and you often need to escape them in your payload. A Length-Value-Type encoding provides framing in which you don't have to scan and you don't need to escape. Putting the type last in the packet is debatable, but one advantage is that other languages have to allocate a full buffer which can often eliminate certain security-concerning bad implementations.

I like length-prefixed messages, and I like that Erlang has such direct & easy support built-in. I'm just going to suggest one more thing: add your own checksum at the end. In case there's any bug anywhere in decoding packets, this prevents the kind of error where you get out of sync and are then taking what should be message data as your 4 bytes of length.

Admittedly, that is far less likely with OTP's built-in implementation than if you're writing your buffering and decoding. But it's also possible for a sender to send a bogus length, or maybe the length is right but it fails to send the last byte before sending the length of the next message...

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

Eric des Courtis-3


On Tue, Oct 31, 2017 at 4:04 PM, scott ribe <[hidden email]> wrote:
On Oct 31, 2017, at 1:38 PM, Jesper Louis Andersen <[hidden email]> wrote:
>
> If you can, run with something like {packet, 4}. It is simple and works. Delimiters are worse because you have to scan for them and you often need to escape them in your payload. A Length-Value-Type encoding provides framing in which you don't have to scan and you don't need to escape. Putting the type last in the packet is debatable, but one advantage is that other languages have to allocate a full buffer which can often eliminate certain security-concerning bad implementations.

I like length-prefixed messages, and I like that Erlang has such direct & easy support built-in. I'm just going to suggest one more thing: add your own checksum at the end. In case there's any bug anywhere in decoding packets, this prevents the kind of error where you get out of sync and are then taking what should be message data as your 4 bytes of length.
I think this is probably unnecessary since an implementation for something like {packet,4} is trivial to begin with so you will likely never see an issue with it. Not to mention UDP and TCP both have checksums checked by the OS anyway. 

Admittedly, that is far less likely with OTP's built-in implementation than if you're writing your buffering and decoding. But it's also possible for a sender to send a bogus length, or maybe the length is right but it fails to send the last byte before sending the length of the next message...

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
<a href="tel:%28303%29%20722-0567" value="+13037220567">(303) 722-0567

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

scott ribe
On Oct 31, 2017, at 5:05 PM, Eric des Courtis <[hidden email]> wrote:
>
> I think this is probably unnecessary since an implementation for something like {packet,4} is trivial to begin with so you will likely never see an issue with it. Not to mention UDP and TCP both have checksums checked by the OS anyway.

As I pointed out, you can have a bug on the sending end as well. And the TCP checksum is extremely weak, basically just sum of words--that's fine until you have a terribly noisy corrupt transport...

So yes, in a lot of circumstances it's overkill, but I prefer to be paranoid when a network is involved.

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

Eric des Courtis-3
Depends on the data and what you are doing with it. If it's say a video over a gigantic P2P network then yes I would agree but then you might want something like SHA256 or SHA3 as your checksum.

On Tue, Oct 31, 2017 at 7:31 PM, scott ribe <[hidden email]> wrote:
On Oct 31, 2017, at 5:05 PM, Eric des Courtis <[hidden email]> wrote:
>
> I think this is probably unnecessary since an implementation for something like {packet,4} is trivial to begin with so you will likely never see an issue with it. Not to mention UDP and TCP both have checksums checked by the OS anyway.

As I pointed out, you can have a bug on the sending end as well. And the TCP checksum is extremely weak, basically just sum of words--that's fine until you have a terribly noisy corrupt transport...

So yes, in a lot of circumstances it's overkill, but I prefer to be paranoid when a network is involved.

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
<a href="tel:%28303%29%20722-0567" value="+13037220567">(303) 722-0567

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

Eric des Courtis-3
Then again that checksum should probably stick around with the data because hard drives, CPUs and RAM are more likely to screw up the data.

On Tue, Oct 31, 2017 at 8:09 PM, Eric des Courtis <[hidden email]> wrote:
Depends on the data and what you are doing with it. If it's say a video over a gigantic P2P network then yes I would agree but then you might want something like SHA256 or SHA3 as your checksum.

On Tue, Oct 31, 2017 at 7:31 PM, scott ribe <[hidden email]> wrote:
On Oct 31, 2017, at 5:05 PM, Eric des Courtis <[hidden email]> wrote:
>
> I think this is probably unnecessary since an implementation for something like {packet,4} is trivial to begin with so you will likely never see an issue with it. Not to mention UDP and TCP both have checksums checked by the OS anyway.

As I pointed out, you can have a bug on the sending end as well. And the TCP checksum is extremely weak, basically just sum of words--that's fine until you have a terribly noisy corrupt transport...

So yes, in a lot of circumstances it's overkill, but I prefer to be paranoid when a network is involved.

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
<a href="tel:%28303%29%20722-0567" value="+13037220567" target="_blank">(303) 722-0567

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

scott ribe
In reply to this post by Eric des Courtis-3
On Oct 31, 2017, at 6:09 PM, Eric des Courtis <[hidden email]> wrote:
>
> Depends on the data and what you are doing with it. If it's say a video over a gigantic P2P network then yes I would agree but then you might want something like SHA256 or SHA3 as your checksum.

Exactly. I'd probably do murmur or SIP hash for performance, but that's basically the idea.

> Then again that checksum should probably stick around with the data because hard drives, CPUs and RAM are more likely to screw up the data.

I'm a fan of ZFS ;-)

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How does Erlang TCP determine the end of a TCP stream?

Сергей Прохоров-2
In reply to this post by asdf asdf
You may look at epgsql's packet parser as an example. It has packet-type + packet-length - prefixed packet structure and it doesn't use {packet, N} option.

handle_info({_, Sock, Data2}, ...)

tail buffering

here we check packet bounds
 
Stanislaw,
Is there a common solution to this problem? Erlang was built to be used in servers/switches, i would assume that there was a solution built into OTP?
If not, do you handle it just at the application later with gen_tcp:recv() and read X bytes at a time from the socket based on delimiters and such? How would you reconcile this with using a gen_server implementation that receives messages from the ssl socket in handle info ?
I appreciate the help, I?m happy I didn?t roll this further

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions