gen_sctp: What delays SACK?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

gen_sctp: What delays SACK?

Oliver Korpilla
Hello.

We're using an elixir application as a sort of protocol tester. It communicated with the system-under-test over SCTP as a transport.

We're observing delay and unsent messages and due to the nature of the SCTP protocol we're not sure which side causes the issue.

The BEAM side has the NO_DELAY option set and pumps a burst of messages but then waits for responses (so it will not burst indefinitely, it burst once and then respond).

The C++ application has the DELAYED_SACK option set - we tried with both sack_freq 1 (which supposedly disables the algorithm) and higher (the default in our system).

(We also increased the receive window on both sides to ensure that senders would not block.)

But we're stumped. The C++ side is not responding at some point. When we did an actual target test once and we saw SCTP messages sent from system-under-test just stop when analyzing the tcpdump of the interfaces - C++ application has not emitted something on the wire and respectively nothing is received.


Our latest area of inquiry is to find out if maybe the elixir part is simply not getting scheduled - but can this impact for example SACK latency? Who acknowledges a message - the SCTP stack by itself or the application? And will the protocol block the sender until SACK?

I'm sorry for asking such vague questions but SCTP know-how is spread thin in our outfit and we're not the experts...

Thank you,
Oliver
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_sctp: What delays SACK?

Jesper Louis Andersen-2
Hi,

Use tcpdump(1) on the flow and look for who is adding the latency. Usual rule of protocol debugging is to start at the lowest level and verify each level as you go up. Because then you have an audit trail of the events that happened which can inform you at a higher level.


On Tue, Nov 13, 2018 at 10:15 AM Oliver Korpilla <[hidden email]> wrote:
Hello.

We're using an elixir application as a sort of protocol tester. It communicated with the system-under-test over SCTP as a transport.

We're observing delay and unsent messages and due to the nature of the SCTP protocol we're not sure which side causes the issue.

The BEAM side has the NO_DELAY option set and pumps a burst of messages but then waits for responses (so it will not burst indefinitely, it burst once and then respond).

The C++ application has the DELAYED_SACK option set - we tried with both sack_freq 1 (which supposedly disables the algorithm) and higher (the default in our system).

(We also increased the receive window on both sides to ensure that senders would not block.)

But we're stumped. The C++ side is not responding at some point. When we did an actual target test once and we saw SCTP messages sent from system-under-test just stop when analyzing the tcpdump of the interfaces - C++ application has not emitted something on the wire and respectively nothing is received.


Our latest area of inquiry is to find out if maybe the elixir part is simply not getting scheduled - but can this impact for example SACK latency? Who acknowledges a message - the SCTP stack by itself or the application? And will the protocol block the sender until SACK?

I'm sorry for asking such vague questions but SCTP know-how is spread thin in our outfit and we're not the experts...

Thank you,
Oliver
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--
J.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_sctp: What delays SACK?

Oliver Korpilla
Hello, Jesper.

The problem I see that the C++ side just fails to send more messages back but I'm stumped why.

It _looks_ like it fails to respond to my protocol requests for some reason.

But does it really? Or is something blocking/buffering/delaying/missing in the stack? And which side causes it?

I'm very very stumped. Because I've seen the tcpdump in Wireshark and C++ stops sending. It just stops. (If I had more trust in my SCTP knowledge I would _assume_ there's some sort of deadlock on the C++ side.)

Thank you very much,
Oliver 

Gesendet: Dienstag, 13. November 2018 um 15:51 Uhr
Von: "Jesper Louis Andersen" <[hidden email]>
An: "Oliver Korpilla" <[hidden email]>
Cc: "Erlang (E-mail)" <[hidden email]>
Betreff: Re: [erlang-questions] gen_sctp: What delays SACK?

Hi,
 
Use tcpdump(1) on the flow and look for who is adding the latency. Usual rule of protocol debugging is to start at the lowest level and verify each level as you go up. Because then you have an audit trail of the events that happened which can inform you at a higher level.
  

On Tue, Nov 13, 2018 at 10:15 AM Oliver Korpilla <[hidden email][mailto:[hidden email]]> wrote:Hello.

We're using an elixir application as a sort of protocol tester. It communicated with the system-under-test over SCTP as a transport.

We're observing delay and unsent messages and due to the nature of the SCTP protocol we're not sure which side causes the issue.

The BEAM side has the NO_DELAY option set and pumps a burst of messages but then waits for responses (so it will not burst indefinitely, it burst once and then respond).

The C++ application has the DELAYED_SACK option set - we tried with both sack_freq 1 (which supposedly disables the algorithm) and higher (the default in our system).

(We also increased the receive window on both sides to ensure that senders would not block.)

But we're stumped. The C++ side is not responding at some point. When we did an actual target test once and we saw SCTP messages sent from system-under-test just stop when analyzing the tcpdump of the interfaces - C++ application has not emitted something on the wire and respectively nothing is received.


Our latest area of inquiry is to find out if maybe the elixir part is simply not getting scheduled - but can this impact for example SACK latency? Who acknowledges a message - the SCTP stack by itself or the application? And will the protocol block the sender until SACK?

I'm sorry for asking such vague questions but SCTP know-how is spread thin in our outfit and we're not the experts...

Thank you,
Oliver
_______________________________________________
erlang-questions mailing list
[hidden email][mailto:[hidden email]]
http://erlang.org/mailman/listinfo/erlang-questions 
 --
J.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_sctp: What delays SACK?

Andreas Schultz-3
Hi Oliver,

One reason for you problems might be the fact that Erlangs SCTP implementation is unbelievable slow.

Apparently SCTP SACKs are only send after the application has actually received (on the recv syscall) the payload.
So,what I'm seeing is that if your application takes to long to read the payload, the SACK will not be send and the sender will start retransmitting the SCTP packet.

I have written a simple socket benchmark client [1].

When testing with TCP, UDP and SCTP on OTP 21.1, sending 1000 packets with a length of 100 bytes, I get this results:

* server is socat to /dev/null, e.g. "socat -u SCTP-LISTEN:6000,reuseaddr,keepalive,rcvbuf=131071,reuseaddr /dev/null")
* times are in microseconds

$ ./bench.escript sctp 127.0.0.1 6000 100 1000
sctp,100,1000,817250
tcp,100,1000,8301
udp,100,1000,17477

As you can see TCP is fastest with ~8ms due to the Nagle algorithm combining the packets before sending, UDP is still ok with ~17.5ms. SCTP takes a astonishing 817ms. That is 46 times slower than UDP.

Regards
Andreas


Oliver Korpilla <[hidden email]> schrieb am Di., 13. Nov. 2018 um 18:01 Uhr:
Hello, Jesper.

The problem I see that the C++ side just fails to send more messages back but I'm stumped why.

It _looks_ like it fails to respond to my protocol requests for some reason.

But does it really? Or is something blocking/buffering/delaying/missing in the stack? And which side causes it?

I'm very very stumped. Because I've seen the tcpdump in Wireshark and C++ stops sending. It just stops. (If I had more trust in my SCTP knowledge I would _assume_ there's some sort of deadlock on the C++ side.)

Thank you very much,
Oliver 

Gesendet: Dienstag, 13. November 2018 um 15:51 Uhr
Von: "Jesper Louis Andersen" <[hidden email]>
An: "Oliver Korpilla" <[hidden email]>
Cc: "Erlang (E-mail)" <[hidden email]>
Betreff: Re: [erlang-questions] gen_sctp: What delays SACK?

Hi,
 
Use tcpdump(1) on the flow and look for who is adding the latency. Usual rule of protocol debugging is to start at the lowest level and verify each level as you go up. Because then you have an audit trail of the events that happened which can inform you at a higher level.
  

On Tue, Nov 13, 2018 at 10:15 AM Oliver Korpilla <[hidden email][mailto:[hidden email]]> wrote:Hello.

We're using an elixir application as a sort of protocol tester. It communicated with the system-under-test over SCTP as a transport.

We're observing delay and unsent messages and due to the nature of the SCTP protocol we're not sure which side causes the issue.

The BEAM side has the NO_DELAY option set and pumps a burst of messages but then waits for responses (so it will not burst indefinitely, it burst once and then respond).

The C++ application has the DELAYED_SACK option set - we tried with both sack_freq 1 (which supposedly disables the algorithm) and higher (the default in our system).

(We also increased the receive window on both sides to ensure that senders would not block.)

But we're stumped. The C++ side is not responding at some point. When we did an actual target test once and we saw SCTP messages sent from system-under-test just stop when analyzing the tcpdump of the interfaces - C++ application has not emitted something on the wire and respectively nothing is received.


Our latest area of inquiry is to find out if maybe the elixir part is simply not getting scheduled - but can this impact for example SACK latency? Who acknowledges a message - the SCTP stack by itself or the application? And will the protocol block the sender until SACK?

I'm sorry for asking such vague questions but SCTP know-how is spread thin in our outfit and we're not the experts...

Thank you,
Oliver
_______________________________________________
erlang-questions mailing list
[hidden email][mailto:[hidden email]]
http://erlang.org/mailman/listinfo/erlang-questions 
 --
J.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
--
--
Dipl.-Inform. Andreas Schultz

----------------------- enabling your networks ----------------------
Travelping GmbH                     Phone:  +49-391-81 90 99 0
Roentgenstr. 13                     Fax:    +49-391-81 90 99 299
39108 Magdeburg                     Email:  [hidden email]
GERMANY                             Web:    http://www.travelping.com

Company Registration: Amtsgericht Stendal        Reg No.:   HRB 10578
Geschaeftsfuehrer: Holger Winkelmann          VAT ID No.: DE236673780
---------------------------------------------------------------------

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gen_sctp: What delays SACK?

Oliver Korpilla
Hello, Andreas.

We also see retransmission and they do have an impact on latency. (We also see retransmissions on another link we're not testing with BEAM.)

But my biggest concern is this sudden stop of all sending from the C++ side that I can not wrap my head around.

Thank you very much,
Oliver
 

Gesendet: Mittwoch, 14. November 2018 um 11:20 Uhr
Von: "Andreas Schultz" <[hidden email]>
An: "Oliver Korpilla" <[hidden email]>
Cc: "Jesper Louis Andersen" <[hidden email]>, "Erlang (E-mail)" <[hidden email]>
Betreff: Re: [erlang-questions] gen_sctp: What delays SACK?

Hi Oliver,
 
One reason for you problems might be the fact that Erlangs SCTP implementation is unbelievable slow.
 
Apparently SCTP SACKs are only send after the application has actually received (on the recv syscall) the payload.
So,what I'm seeing is that if your application takes to long to read the payload, the SACK will not be send and the sender will start retransmitting the SCTP packet.
 
I have written a simple socket benchmark client [1].
 
When testing with TCP, UDP and SCTP on OTP 21.1, sending 1000 packets with a length of 100 bytes, I get this results:

 
* server is socat to /dev/null, e.g. "socat -u SCTP-LISTEN:6000,reuseaddr,keepalive,rcvbuf=131071,reuseaddr /dev/null")

* times are in microseconds

$ ./bench.escript sctp 127.0.0.1 6000 100 1000
sctp,100,1000,817250
tcp,100,1000,8301
udp,100,1000,17477
 
As you can see TCP is fastest with ~8ms due to the Nagle algorithm combining the packets before sending, UDP is still ok with ~17.5ms. SCTP takes a astonishing 817ms. That is 46 times slower than UDP.
 
Regards
Andreas
 
[1]: https://gist.github.com/RoadRunnr/53c19861aa4e4fa5bd45c072727ab971
 
Oliver Korpilla <[hidden email][mailto:[hidden email]]> schrieb am Di., 13. Nov. 2018 um 18:01 Uhr:
Hello, Jesper.

The problem I see that the C++ side just fails to send more messages back but I'm stumped why.

It _looks_ like it fails to respond to my protocol requests for some reason.

But does it really? Or is something blocking/buffering/delaying/missing in the stack? And which side causes it?

I'm very very stumped. Because I've seen the tcpdump in Wireshark and C++ stops sending. It just stops. (If I had more trust in my SCTP knowledge I would _assume_ there's some sort of deadlock on the C++ side.)

Thank you very much,
Oliver 

Gesendet: Dienstag, 13. November 2018 um 15:51 Uhr
Von: "Jesper Louis Andersen" <[hidden email][mailto:[hidden email]]>
An: "Oliver Korpilla" <[hidden email][mailto:[hidden email]]>
Cc: "Erlang (E-mail)" <[hidden email][mailto:[hidden email]]>
Betreff: Re: [erlang-questions] gen_sctp: What delays SACK?

Hi,
 
Use tcpdump(1) on the flow and look for who is adding the latency. Usual rule of protocol debugging is to start at the lowest level and verify each level as you go up. Because then you have an audit trail of the events that happened which can inform you at a higher level.
  

On Tue, Nov 13, 2018 at 10:15 AM Oliver Korpilla <[hidden email][mailto:[hidden email]][mailto:[hidden email][mailto:[hidden email]]]> wrote:Hello.

We're using an elixir application as a sort of protocol tester. It communicated with the system-under-test over SCTP as a transport.

We're observing delay and unsent messages and due to the nature of the SCTP protocol we're not sure which side causes the issue.

The BEAM side has the NO_DELAY option set and pumps a burst of messages but then waits for responses (so it will not burst indefinitely, it burst once and then respond).

The C++ application has the DELAYED_SACK option set - we tried with both sack_freq 1 (which supposedly disables the algorithm) and higher (the default in our system).

(We also increased the receive window on both sides to ensure that senders would not block.)

But we're stumped. The C++ side is not responding at some point. When we did an actual target test once and we saw SCTP messages sent from system-under-test just stop when analyzing the tcpdump of the interfaces - C++ application has not emitted something on the wire and respectively nothing is received.


Our latest area of inquiry is to find out if maybe the elixir part is simply not getting scheduled - but can this impact for example SACK latency? Who acknowledges a message - the SCTP stack by itself or the application? And will the protocol block the sender until SACK?

I'm sorry for asking such vague questions but SCTP know-how is spread thin in our outfit and we're not the experts...

Thank you,
Oliver
_______________________________________________
erlang-questions mailing list
[hidden email][mailto:[hidden email]][mailto:[hidden email][mailto:[hidden email]]]
http://erlang.org/mailman/listinfo/erlang-questions[http://erlang.org/mailman/listinfo/erlang-questions
 --
J.
_______________________________________________
erlang-questions mailing list
[hidden email][mailto:[hidden email]]
http://erlang.org/mailman/listinfo/erlang-questions--

--
Dipl.-Inform. Andreas Schultz

----------------------- enabling your networks ----------------------
Travelping GmbH                     Phone:  +49-391-81 90 99 0
Roentgenstr. 13                     Fax:    +49-391-81 90 99 299
39108 Magdeburg                     Email:  [hidden email][mailto:[hidden email]]
GERMANY                             Web:    http://www.travelping.com[http://www.travelping.com]
 
Company Registration: Amtsgericht Stendal        Reg No.:   HRB 10578Geschaeftsfuehrer: Holger Winkelmann          VAT ID No.: DE236673780
---------------------------------------------------------------------
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions