Weird behaviour of gen_tcp:send/2 and erlang:port_command/3 with nosuspend to the same port on R16B03

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Weird behaviour of gen_tcp:send/2 and erlang:port_command/3 with nosuspend to the same port on R16B03

Defnull
Hello list, we faced with some gen_tcp related problems after switch from
erlang R15B03 to R16B03-01

The problem is as following: When server produce data faster then consumer
can handle, after
the out server's buffers are full and incoming client's buffers are full
gen_tcp:send/2 call on server side blocks forever in
erts_internal:port_command/3. After this, even when client consumes all
the data and the buffers are empty server process remains to be suspended
in that call

This problem does not occur always, but quite often.

Some details on implementation are below, I also shrink the example to
this small app so you can check the code:
https://github.com/define-null/tcp_failing_ex

Server is implemented in such a way, that it listen on 8899 port, then when
client connect to it spawn main srv process and plenty of workers, which
start to write to this port after client send some special msg. The main
process is responsible for commands from the client and send responses via
gen_tcp:send/2, while workers try to write some stream data to the client
and use erang:port_command with no-suspend. So workers send only up-to-date
data, dropping any in case client is slow.

The behaviour which we see is as following:
At first phase producer fills OS and erlang driver's buffers. Consumer read
data as it arrives and server drop data which it cannot send. So we see
buffer size growing on both side out queue of the server and in queue of
the client respectively

After some moment in time, i guess when the buffers are completely filled,
server try respond to
ping message of the client, using gen_tcp:send/2 call. After that, it
blocks there forever, even after client consumes all the messages. The
situation does not change and the srv process remains in the suspended
state, while it's incoming buffer begins to grow when client send more ping
messages.

Below is the output on the system with two slow clients, where for the
first client server's process is already blocked in gen_tcp:send/2 call,
while the second is served well.

Every 2.0s: netstat -al | grep 8899
                                                                      Tue
Jun 24 16:34:51 2014

tcp4      36      0  localhost.8899         localhost.63263
 ESTABLISHED
tcp4       0       0  localhost.63263        localhost.8899
ESTABLISHED
tcp4       0 130990  localhost.8899         localhost.63257
 ESTABLISHED
tcp4  619190   0  localhost.63257        localhost.8899         ESTABLISHED
tcp4       0       0  *.8899                 *.*                    LISTEN

This is the output for the client process from github example, where we see
that
after send operation (ping msg) no incoming msg come any more.
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,48}} before send
{{2014,6,24},{16,33,48}} after send ok
{{2014,6,24},{16,34,9}} before send
{{2014,6,24},{16,34,9}} after send ok
{{2014,6,24},{16,34,30}} before send
{{2014,6,24},{16,34,30}} after send ok
{{2014,6,24},{16,34,51}} before send
{{2014,6,24},{16,34,51}} after send ok
{{2014,6,24},{16,35,12}} before send
....

Server blocked process output:

{{2014,6,24},{16,33,21}}: <0.95.0> ping
{{2014,6,24},{16,33,21}} bsend: <0.95.0>
{{2014,6,24},{16,33,21}} asend: <0.95.0> ok
{{2014,6,24},{16,33,48}}: <0.95.0> ping
{{2014,6,24},{16,33,48}} bsend: <0.95.0>
%% (no asend message after it)

(tcp_failing_node)1> erlang:process_info(pid(0,95,0)).
[{current_function,{erts_internal,port_command,3}},
 {initial_call,{proc_lib,init_p,5}},

Bug is not always reproducible, but occurs quite often. The problem is that
even
after server's out buffers are empty data does not arrive to the client,
and incoming buffer grow
as client send ping messages to the server. (So erlang:port_command/3 with
no-suspend always return false
when another main server process for this connection is suspended in
gen_tcp:send/2)

And then it's getting only worse as already mentioned

Every 2.0s: netstat -al | grep 8899
                                                                      Tue
Jun 24 16:56:59 2014

tcp4     804      0  localhost.8899         localhost.63263
 ESTABLISHED
tcp4       0      0  localhost.63263        localhost.8899
ESTABLISHED
tcp4       0      0  localhost.8899         localhost.63257
 ESTABLISHED
tcp4       0      0  localhost.63257        localhost.8899
ESTABLISHED
tcp4       0      0  *.8899                 *.*                    LISTEN

We faced with this after switching to R16B03 from R15B03, I know there were
some changes in port_command handling, i guess why we got such behaviour?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20140624/6b04279e/attachment-0001.html>

Reply | Threaded
Open this post in threaded view
|

Weird behaviour of gen_tcp:send/2 and erlang:port_command/3 with nosuspend to the same port on R16B03

Lukas Larsson-7
Hello,

I was able to reproduce your testcase after removing all the "msg"
printouts and starting a couple of clients at the same time. It seems
that the sockets are hitting the high_msgq_watermark limit and then as
data gets flushes they are not set to run again. I'll see if I can dig
out what it is that is causing this behavior.

Lukas
On 24/06/14 16:56, Vasily Demidenok wrote:

> Hello list, we faced with some gen_tcp related problems after switch
> from erlang R15B03 to R16B03-01
>
> The problem is as following: When server produce data faster then
> consumer can handle, after
> the out server's buffers are full and incoming client's buffers are
> full gen_tcp:send/2 call on server side blocks forever in
> erts_internal:port_command/3. After this, even when client consumes all
> the data and the buffers are empty server process remains to be
> suspended in that call
>
> This problem does not occur always, but quite often.
>
> Some details on implementation are below, I also shrink the example to
> this small app so you can check the code:
> https://github.com/define-null/tcp_failing_ex
>
> Server is implemented in such a way, that it listen on 8899 port, then
> when client connect to it spawn main srv process and plenty of
> workers, which start to write to this port after client send some
> special msg. The main process is responsible for commands from the
> client and send responses via gen_tcp:send/2, while workers try to
> write some stream data to the client and use erang:port_command with
> no-suspend. So workers send only up-to-date data, dropping any in case
> client is slow.
>
> The behaviour which we see is as following:
> At first phase producer fills OS and erlang driver's buffers. Consumer
> read data as it arrives and server drop data which it cannot send. So
> we see buffer size growing on both side out queue of the server and in
> queue of the client respectively
>
> After some moment in time, i guess when the buffers are completely
> filled, server try respond to
> ping message of the client, using gen_tcp:send/2 call. After that, it
> blocks there forever, even after client consumes all the messages. The
> situation does not change and the srv process remains in the suspended
> state, while it's incoming buffer begins to grow when client send more
> ping messages.
>
> Below is the output on the system with two slow clients, where for the
> first client server's process is already blocked in gen_tcp:send/2
> call, while the second is served well.
>
> Every 2.0s: netstat -al | grep 8899                         Tue Jun 24
> 16:34:51 2014
>
> tcp4      36      0  localhost.8899         localhost.63263      
>  ESTABLISHED
> tcp4       0       0  localhost.63263        localhost.8899        
> ESTABLISHED
> tcp4       0 130990  localhost.8899         localhost.63257      
>  ESTABLISHED
> tcp4  619190   0  localhost.63257        localhost.8899     ESTABLISHED
> tcp4       0       0  *.8899                 *.*          LISTEN
>
> This is the output for the client process from github example, where
> we see that
> after send operation (ping msg) no incoming msg come any more.
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,48}} before send
> {{2014,6,24},{16,33,48}} after send ok
> {{2014,6,24},{16,34,9}} before send
> {{2014,6,24},{16,34,9}} after send ok
> {{2014,6,24},{16,34,30}} before send
> {{2014,6,24},{16,34,30}} after send ok
> {{2014,6,24},{16,34,51}} before send
> {{2014,6,24},{16,34,51}} after send ok
> {{2014,6,24},{16,35,12}} before send
> ....
>
> Server blocked process output:
>
> {{2014,6,24},{16,33,21}}: <0.95.0> ping
> {{2014,6,24},{16,33,21}} bsend: <0.95.0>
> {{2014,6,24},{16,33,21}} asend: <0.95.0> ok
> {{2014,6,24},{16,33,48}}: <0.95.0> ping
> {{2014,6,24},{16,33,48}} bsend: <0.95.0>
> %% (no asend message after it)
>
> (tcp_failing_node
> <mailto:tcp_failing_node>)1>
> erlang:process_info(pid(0,95,0)).
> [{current_function,{erts_internal,port_command,3}},
>  {initial_call,{proc_lib,init_p,5}},
>
> Bug is not always reproducible, but occurs quite often. The problem is
> that even
> after server's out buffers are empty data does not arrive to the
> client, and incoming buffer grow
> as client send ping messages to the server. (So erlang:port_command/3
> with no-suspend always return false
> when another main server process for this connection is suspended in
> gen_tcp:send/2)
>
> And then it's getting only worse as already mentioned
>
> Every 2.0s: netstat -al | grep 8899                         Tue Jun 24
> 16:56:59 2014
>
> tcp4     804      0  localhost.8899         localhost.63263      
>  ESTABLISHED
> tcp4       0      0  localhost.63263        localhost.8899        
> ESTABLISHED
> tcp4       0      0  localhost.8899         localhost.63257      
>  ESTABLISHED
> tcp4       0      0  localhost.63257        localhost.8899        
> ESTABLISHED
> tcp4       0      0  *.8899                 *.*        LISTEN
>
> We faced with this after switching to R16B03 from R15B03, I know there
> were some changes in port_command handling, i guess why we got such
> behaviour?
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs
> http://erlang.org/mailman/listinfo/erlang-bugs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20140624/0e4939e4/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Weird behaviour of gen_tcp:send/2 and erlang:port_command/3 with nosuspend to the same port on R16B03

Defnull
The problem remains even if only erlang:port_command/3 with nosuspend
option is used. (no calls for gen_tcp:send and many processes write to the
same socket)


2014-06-24 21:28 GMT+04:00 Lukas Larsson <lukas>:

>  Hello,
>
> I was able to reproduce your testcase after removing all the "msg"
> printouts and starting a couple of clients at the same time. It seems that
> the sockets are hitting the high_msgq_watermark limit and then as data gets
> flushes they are not set to run again. I'll see if I can dig out what it is
> that is causing this behavior.
>
> Lukas
>
> On 24/06/14 16:56, Vasily Demidenok wrote:
>
>  Hello list, we faced with some gen_tcp related problems after switch
> from erlang R15B03 to R16B03-01
>
>  The problem is as following: When server produce data faster then
> consumer can handle, after
> the out server's buffers are full and incoming client's buffers are full
> gen_tcp:send/2 call on server side blocks forever in
> erts_internal:port_command/3. After this, even when client consumes all
>  the data and the buffers are empty server process remains to be suspended
> in that call
>
>  This problem does not occur always, but quite often.
>
>  Some details on implementation are below, I also shrink the example to
> this small app so you can check the code:
> https://github.com/define-null/tcp_failing_ex
>
>  Server is implemented in such a way, that it listen on 8899 port, then
> when client connect to it spawn main srv process and plenty of workers,
> which start to write to this port after client send some special msg. The
> main process is responsible for commands from the client and send responses
> via gen_tcp:send/2, while workers try to write some stream data to the
> client and use erang:port_command with no-suspend. So workers send only
> up-to-date data, dropping any in case client is slow.
>
>  The behaviour which we see is as following:
> At first phase producer fills OS and erlang driver's buffers. Consumer
> read data as it arrives and server drop data which it cannot send. So we
> see buffer size growing on both side out queue of the server and in queue
> of the client respectively
>
>  After some moment in time, i guess when the buffers are completely
> filled, server try respond to
> ping message of the client, using gen_tcp:send/2 call. After that, it
> blocks there forever, even after client consumes all the messages. The
> situation does not change and the srv process remains in the suspended
> state, while it's incoming buffer begins to grow when client send more ping
> messages.
>
>  Below is the output on the system with two slow clients, where for the
> first client server's process is already blocked in gen_tcp:send/2 call,
> while the second is served well.
>
>  Every 2.0s: netstat -al | grep 8899
>
> Tue Jun 24 16:34:51 2014
>
>  tcp4      36      0  localhost.8899         localhost.63263
>  ESTABLISHED
> tcp4       0       0  localhost.63263        localhost.8899
> ESTABLISHED
> tcp4       0 130990  localhost.8899         localhost.63257
>  ESTABLISHED
> tcp4  619190   0  localhost.63257        localhost.8899         ESTABLISHED
> tcp4       0       0  *.8899                 *.*                    LISTEN
>
>  This is the output for the client process from github example, where we
> see that
> after send operation (ping msg) no incoming msg come any more.
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,28}}: msg
> {{2014,6,24},{16,33,48}} before send
> {{2014,6,24},{16,33,48}} after send ok
> {{2014,6,24},{16,34,9}} before send
> {{2014,6,24},{16,34,9}} after send ok
> {{2014,6,24},{16,34,30}} before send
> {{2014,6,24},{16,34,30}} after send ok
> {{2014,6,24},{16,34,51}} before send
> {{2014,6,24},{16,34,51}} after send ok
> {{2014,6,24},{16,35,12}} before send
> ....
>
>  Server blocked process output:
>
>  {{2014,6,24},{16,33,21}}: <0.95.0> ping
> {{2014,6,24},{16,33,21}} bsend: <0.95.0>
> {{2014,6,24},{16,33,21}} asend: <0.95.0> ok
> {{2014,6,24},{16,33,48}}: <0.95.0> ping
> {{2014,6,24},{16,33,48}} bsend: <0.95.0>
> %% (no asend message after it)
>
>  (tcp_failing_node)1> erlang:process_info(pid(0,95,0)).
> [{current_function,{erts_internal,port_command,3}},
>  {initial_call,{proc_lib,init_p,5}},
>
>  Bug is not always reproducible, but occurs quite often. The problem is
> that even
> after server's out buffers are empty data does not arrive to the client,
> and incoming buffer grow
> as client send ping messages to the server. (So erlang:port_command/3 with
> no-suspend always return false
> when another main server process for this connection is suspended in
> gen_tcp:send/2)
>
>  And then it's getting only worse as already mentioned
>
>  Every 2.0s: netstat -al | grep 8899
>
> Tue Jun 24 16:56:59 2014
>
>  tcp4     804      0  localhost.8899         localhost.63263
>  ESTABLISHED
> tcp4       0      0  localhost.63263        localhost.8899
> ESTABLISHED
> tcp4       0      0  localhost.8899         localhost.63257
>  ESTABLISHED
> tcp4       0      0  localhost.63257        localhost.8899
> ESTABLISHED
> tcp4       0      0  *.8899                 *.*                    LISTEN
>
>  We faced with this after switching to R16B03 from R15B03, I know there
> were some changes in port_command handling, i guess why we got such
> behaviour?
>
>
> _______________________________________________
> erlang-bugs mailing listerlang-bugs://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20140705/4b8a6890/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Re: Weird behaviour of gen_tcp:send/2 and erlang:port_command/3 with nosuspend to the same port on R16B03

Vasily Demidenok
Hello again
The friend of mine faced with the same problem. He assumes it's not tcp driver related problem, but port internals changes made in R16 release.
Downgrading to R15B03 fixed their problem. +n options with d/s/a params also did not help.

The app to reproduce bug:


2014-07-05 0:47 GMT+04:00 Vasily Demidenok <[hidden email]>:
The problem remains even if only erlang:port_command/3 with nosuspend option is used. (no calls for gen_tcp:send and many processes write to the same socket)


2014-06-24 21:28 GMT+04:00 Lukas Larsson <[hidden email]>:

Hello,

I was able to reproduce your testcase after removing all the "msg" printouts and starting a couple of clients at the same time. It seems that the sockets are hitting the high_msgq_watermark limit and then as data gets flushes they are not set to run again. I'll see if I can dig out what it is that is causing this behavior.

Lukas

On 24/06/14 16:56, Vasily Demidenok wrote:
Hello list, we faced with some gen_tcp related problems after switch from erlang R15B03 to R16B03-01

The problem is as following: When server produce data faster then consumer can handle, after 
the out server's buffers are full and incoming client's buffers are full gen_tcp:send/2 call on server side blocks forever in erts_internal:port_command/3. After this, even when client consumes all
the data and the buffers are empty server process remains to be suspended in that call

This problem does not occur always, but quite often.

Some details on implementation are below, I also shrink the example to 
this small app so you can check the code: https://github.com/define-null/tcp_failing_ex

Server is implemented in such a way, that it listen on 8899 port, then when client connect to it spawn main srv process and plenty of workers, which start to write to this port after client send some special msg. The main process is responsible for commands from the client and send responses via gen_tcp:send/2, while workers try to write some stream data to the client and use erang:port_command with no-suspend. So workers send only up-to-date data, dropping any in case client is slow. 

The behaviour which we see is as following:
At first phase producer fills OS and erlang driver's buffers. Consumer read data as it arrives and server drop data which it cannot send. So we see buffer size growing on both side out queue of the server and in queue of the client respectively

After some moment in time, i guess when the buffers are completely filled, server try respond to
ping message of the client, using gen_tcp:send/2 call. After that, it blocks there forever, even after client consumes all the messages. The situation does not change and the srv process remains in the suspended state, while it's incoming buffer begins to grow when client send more ping messages.

Below is the output on the system with two slow clients, where for the first client server's process is already blocked in gen_tcp:send/2 call, while the second is served well.

Every 2.0s: netstat -al | grep 8899                                                                                                               Tue Jun 24 16:34:51 2014

tcp4      36      0  localhost.8899         localhost.63263        ESTABLISHED
tcp4       0       0  localhost.63263        localhost.8899         ESTABLISHED
tcp4       0 130990  localhost.8899         localhost.63257        ESTABLISHED
tcp4  619190   0  localhost.63257        localhost.8899         ESTABLISHED
tcp4       0       0  *.8899                 *.*                    LISTEN

This is the output for the client process from github example, where we see that
after send operation (ping msg) no incoming msg come any more.
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,48}} before send
{{2014,6,24},{16,33,48}} after send ok
{{2014,6,24},{16,34,9}} before send
{{2014,6,24},{16,34,9}} after send ok
{{2014,6,24},{16,34,30}} before send
{{2014,6,24},{16,34,30}} after send ok
{{2014,6,24},{16,34,51}} before send
{{2014,6,24},{16,34,51}} after send ok
{{2014,6,24},{16,35,12}} before send
....

Server blocked process output:

{{2014,6,24},{16,33,21}}: <0.95.0> ping
{{2014,6,24},{16,33,21}} bsend: <0.95.0>
{{2014,6,24},{16,33,21}} asend: <0.95.0> ok
{{2014,6,24},{16,33,48}}: <0.95.0> ping
{{2014,6,24},{16,33,48}} bsend: <0.95.0>
%% (no asend message after it)

([hidden email])1> erlang:process_info(pid(0,95,0)).
[{current_function,{erts_internal,port_command,3}},
 {initial_call,{proc_lib,init_p,5}},

Bug is not always reproducible, but occurs quite often. The problem is that even
after server's out buffers are empty data does not arrive to the client, and incoming buffer grow
as client send ping messages to the server. (So erlang:port_command/3 with no-suspend always return false
when another main server process for this connection is suspended in gen_tcp:send/2)

And then it's getting only worse as already mentioned

Every 2.0s: netstat -al | grep 8899                                                                                                               Tue Jun 24 16:56:59 2014

tcp4     804      0  localhost.8899         localhost.63263        ESTABLISHED
tcp4       0      0  localhost.63263        localhost.8899         ESTABLISHED
tcp4       0      0  localhost.8899         localhost.63257        ESTABLISHED
tcp4       0      0  localhost.63257        localhost.8899         ESTABLISHED
tcp4       0      0  *.8899                 *.*                    LISTEN

We faced with this after switching to R16B03 from R15B03, I know there were some changes in port_command handling, i guess why we got such behaviour?


_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs




_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

回复: Weird behaviour of gen_tcp:send/2 anderlang:port_command/3 with nosuspend to the same port on R16B03

o鸡蛋黄o
Dear,

      a bug  in erlang vm sourcecode  , port_get_data_1 process arg is NULL , I had fixed it,thanks.


------------------ 原始邮件 ------------------
发件人: "Vasily Demidenok";<[hidden email]>;
发送时间: 2014年7月28日(星期一) 下午5:16
收件人: "Lukas Larsson"<[hidden email]>;
抄送: "erlang-bugs"<[hidden email]>;
主题: Re: [erlang-bugs] Weird behaviour of gen_tcp:send/2 anderlang:port_command/3 with nosuspend to the same port on R16B03

Hello again
The friend of mine faced with the same problem. He assumes it's not tcp driver related problem, but port internals changes made in R16 release.
Downgrading to R15B03 fixed their problem. +n options with d/s/a params also did not help.

The app to reproduce bug:


2014-07-05 0:47 GMT+04:00 Vasily Demidenok <[hidden email]>:
The problem remains even if only erlang:port_command/3 with nosuspend option is used. (no calls for gen_tcp:send and many processes write to the same socket)


2014-06-24 21:28 GMT+04:00 Lukas Larsson <[hidden email]>:

Hello,

I was able to reproduce your testcase after removing all the "msg" printouts and starting a couple of clients at the same time. It seems that the sockets are hitting the high_msgq_watermark limit and then as data gets flushes they are not set to run again. I'll see if I can dig out what it is that is causing this behavior.

Lukas

On 24/06/14 16:56, Vasily Demidenok wrote:
Hello list, we faced with some gen_tcp related problems after switch from erlang R15B03 to R16B03-01

The problem is as following: When server produce data faster then consumer can handle, after 
the out server's buffers are full and incoming client's buffers are full gen_tcp:send/2 call on server side blocks forever in erts_internal:port_command/3. After this, even when client consumes all
the data and the buffers are empty server process remains to be suspended in that call

This problem does not occur always, but quite often.

Some details on implementation are below, I also shrink the example to 
this small app so you can check the code: https://github.com/define-null/tcp_failing_ex

Server is implemented in such a way, that it listen on 8899 port, then when client connect to it spawn main srv process and plenty of workers, which start to write to this port after client send some special msg. The main process is responsible for commands from the client and send responses via gen_tcp:send/2, while workers try to write some stream data to the client and use erang:port_command with no-suspend. So workers send only up-to-date data, dropping any in case client is slow. 

The behaviour which we see is as following:
At first phase producer fills OS and erlang driver's buffers. Consumer read data as it arrives and server drop data which it cannot send. So we see buffer size growing on both side out queue of the server and in queue of the client respectively

After some moment in time, i guess when the buffers are completely filled, server try respond to
ping message of the client, using gen_tcp:send/2 call. After that, it blocks there forever, even after client consumes all the messages. The situation does not change and the srv process remains in the suspended state, while it's incoming buffer begins to grow when client send more ping messages.

Below is the output on the system with two slow clients, where for the first client server's process is already blocked in gen_tcp:send/2 call, while the second is served well.

Every 2.0s: netstat -al | grep 8899                                                                                                               Tue Jun 24 16:34:51 2014

tcp4      36      0  localhost.8899         localhost.63263        ESTABLISHED
tcp4       0       0  localhost.63263        localhost.8899         ESTABLISHED
tcp4       0 130990  localhost.8899         localhost.63257        ESTABLISHED
tcp4  619190   0  localhost.63257        localhost.8899         ESTABLISHED
tcp4       0       0  *.8899                 *.*                    LISTEN

This is the output for the client process from github example, where we see that
after send operation (ping msg) no incoming msg come any more.
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,48}} before send
{{2014,6,24},{16,33,48}} after send ok
{{2014,6,24},{16,34,9}} before send
{{2014,6,24},{16,34,9}} after send ok
{{2014,6,24},{16,34,30}} before send
{{2014,6,24},{16,34,30}} after send ok
{{2014,6,24},{16,34,51}} before send
{{2014,6,24},{16,34,51}} after send ok
{{2014,6,24},{16,35,12}} before send
....

Server blocked process output:

{{2014,6,24},{16,33,21}}: <0.95.0> ping
{{2014,6,24},{16,33,21}} bsend: <0.95.0>
{{2014,6,24},{16,33,21}} asend: <0.95.0> ok
{{2014,6,24},{16,33,48}}: <0.95.0> ping
{{2014,6,24},{16,33,48}} bsend: <0.95.0>
%% (no asend message after it)

([hidden email])1> erlang:process_info(pid(0,95,0)).
[{current_function,{erts_internal,port_command,3}},
 {initial_call,{proc_lib,init_p,5}},

Bug is not always reproducible, but occurs quite often. The problem is that even
after server's out buffers are empty data does not arrive to the client, and incoming buffer grow
as client send ping messages to the server. (So erlang:port_command/3 with no-suspend always return false
when another main server process for this connection is suspended in gen_tcp:send/2)

And then it's getting only worse as already mentioned

Every 2.0s: netstat -al | grep 8899                                                                                                               Tue Jun 24 16:56:59 2014

tcp4     804      0  localhost.8899         localhost.63263        ESTABLISHED
tcp4       0      0  localhost.63263        localhost.8899         ESTABLISHED
tcp4       0      0  localhost.8899         localhost.63257        ESTABLISHED
tcp4       0      0  localhost.63257        localhost.8899         ESTABLISHED
tcp4       0      0  *.8899                 *.*                    LISTEN

We faced with this after switching to R16B03 from R15B03, I know there were some changes in port_command handling, i guess why we got such behaviour?


_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs




_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: 回复: Weird behaviour of gen_tcp:send/2 anderlang:port_command/3 with nosuspend to the same port on R16B03

Vasily Demidenok
Great to hear, that you have already fixed that bug.
Can you kindly point me to the patch, or to the branch with that fix? We would like to merge and do not wait for it to appear in new R17 release if it's production ready.




2014-07-28 13:41 GMT+04:00 o鸡蛋黄o <[hidden email]>:
Dear,

      a bug  in erlang vm sourcecode  , port_get_data_1 process arg is NULL , I had fixed it,thanks.


------------------ 原始邮件 ------------------
发件人: "Vasily Demidenok";<[hidden email]>;
发送时间: 2014年7月28日(星期一) 下午5:16
收件人: "Lukas Larsson"<[hidden email]>;
抄送: "erlang-bugs"<[hidden email]>;
主题: Re: [erlang-bugs] Weird behaviour of gen_tcp:send/2 anderlang:port_command/3 with nosuspend to the same port on R16B03

Hello again
The friend of mine faced with the same problem. He assumes it's not tcp driver related problem, but port internals changes made in R16 release.
Downgrading to R15B03 fixed their problem. +n options with d/s/a params also did not help.

The app to reproduce bug:


2014-07-05 0:47 GMT+04:00 Vasily Demidenok <[hidden email]>:
The problem remains even if only erlang:port_command/3 with nosuspend option is used. (no calls for gen_tcp:send and many processes write to the same socket)


2014-06-24 21:28 GMT+04:00 Lukas Larsson <[hidden email]>:

Hello,

I was able to reproduce your testcase after removing all the "msg" printouts and starting a couple of clients at the same time. It seems that the sockets are hitting the high_msgq_watermark limit and then as data gets flushes they are not set to run again. I'll see if I can dig out what it is that is causing this behavior.

Lukas

On 24/06/14 16:56, Vasily Demidenok wrote:
Hello list, we faced with some gen_tcp related problems after switch from erlang R15B03 to R16B03-01

The problem is as following: When server produce data faster then consumer can handle, after 
the out server's buffers are full and incoming client's buffers are full gen_tcp:send/2 call on server side blocks forever in erts_internal:port_command/3. After this, even when client consumes all
the data and the buffers are empty server process remains to be suspended in that call

This problem does not occur always, but quite often.

Some details on implementation are below, I also shrink the example to 
this small app so you can check the code: https://github.com/define-null/tcp_failing_ex

Server is implemented in such a way, that it listen on 8899 port, then when client connect to it spawn main srv process and plenty of workers, which start to write to this port after client send some special msg. The main process is responsible for commands from the client and send responses via gen_tcp:send/2, while workers try to write some stream data to the client and use erang:port_command with no-suspend. So workers send only up-to-date data, dropping any in case client is slow. 

The behaviour which we see is as following:
At first phase producer fills OS and erlang driver's buffers. Consumer read data as it arrives and server drop data which it cannot send. So we see buffer size growing on both side out queue of the server and in queue of the client respectively

After some moment in time, i guess when the buffers are completely filled, server try respond to
ping message of the client, using gen_tcp:send/2 call. After that, it blocks there forever, even after client consumes all the messages. The situation does not change and the srv process remains in the suspended state, while it's incoming buffer begins to grow when client send more ping messages.

Below is the output on the system with two slow clients, where for the first client server's process is already blocked in gen_tcp:send/2 call, while the second is served well.

Every 2.0s: netstat -al | grep 8899                                                                                                               Tue Jun 24 16:34:51 2014

tcp4      36      0  localhost.8899         localhost.63263        ESTABLISHED
tcp4       0       0  localhost.63263        localhost.8899         ESTABLISHED
tcp4       0 130990  localhost.8899         localhost.63257        ESTABLISHED
tcp4  619190   0  localhost.63257        localhost.8899         ESTABLISHED
tcp4       0       0  *.8899                 *.*                    LISTEN

This is the output for the client process from github example, where we see that
after send operation (ping msg) no incoming msg come any more.
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,28}}: msg
{{2014,6,24},{16,33,48}} before send
{{2014,6,24},{16,33,48}} after send ok
{{2014,6,24},{16,34,9}} before send
{{2014,6,24},{16,34,9}} after send ok
{{2014,6,24},{16,34,30}} before send
{{2014,6,24},{16,34,30}} after send ok
{{2014,6,24},{16,34,51}} before send
{{2014,6,24},{16,34,51}} after send ok
{{2014,6,24},{16,35,12}} before send
....

Server blocked process output:

{{2014,6,24},{16,33,21}}: <0.95.0> ping
{{2014,6,24},{16,33,21}} bsend: <0.95.0>
{{2014,6,24},{16,33,21}} asend: <0.95.0> ok
{{2014,6,24},{16,33,48}}: <0.95.0> ping
{{2014,6,24},{16,33,48}} bsend: <0.95.0>
%% (no asend message after it)

([hidden email])1> erlang:process_info(pid(0,95,0)).
[{current_function,{erts_internal,port_command,3}},
 {initial_call,{proc_lib,init_p,5}},

Bug is not always reproducible, but occurs quite often. The problem is that even
after server's out buffers are empty data does not arrive to the client, and incoming buffer grow
as client send ping messages to the server. (So erlang:port_command/3 with no-suspend always return false
when another main server process for this connection is suspended in gen_tcp:send/2)

And then it's getting only worse as already mentioned

Every 2.0s: netstat -al | grep 8899                                                                                                               Tue Jun 24 16:56:59 2014

tcp4     804      0  localhost.8899         localhost.63263        ESTABLISHED
tcp4       0      0  localhost.63263        localhost.8899         ESTABLISHED
tcp4       0      0  localhost.8899         localhost.63257        ESTABLISHED
tcp4       0      0  localhost.63257        localhost.8899         ESTABLISHED
tcp4       0      0  *.8899                 *.*                    LISTEN

We faced with this after switching to R16B03 from R15B03, I know there were some changes in port_command handling, i guess why we got such behaviour?


_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs





_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Weird behaviour of gen_tcp:send/2 and erlang:port_command/3 with nosuspend to the same port on R16B03

Rickard Green-2
In reply to this post by Vasily Demidenok
Hi!

Sorry for the late response. We've been understaffed during vacation times.

A fix for this issue can be found in the
rickard/nosuspend-bug/OTP-12082 branch in my github repo
<https://github.com/rickard-green/otp/tree/rickard/nosuspend-bug/OTP-12082>.

The fix is based on OTP-17.1 and will be included in the next
maintenance patch package (most likely as it is in the above branch).

Release note for the fix:

OTP-12082

    A bug in the VM code implementing sending of signals to ports
    could cause the receiving port queue to remain in a busy state
    forever. When this state had been reached, processes sending
    command signals to the port either got suspended forever, or,
    if the nosuspend feature was used, always failed to send to
    the port.

    In order for this bug to be triggered on a port, one had to at
    least once utilize the nosuspend functionality when passing a
    signal to the port. This by either calling
    -- port_command(Port, Data, [nosuspend | Options]),
    -- erlang:send(Port, {PortOwner, {command, Data}}, [nosuspend | Options]),
    -- erlang:send_nosuspend(Port, {PortOwner, {command, Data}}), or
    -- erlang:send_nosuspend(Port, {PortOwner, {command, Data}}, Options).

    Thanks Vasily Demidenok for reporting the issue, and Sergey
    Kudryashov for providing a testcase.

Regards,
Rickard Green, Erlang/OTP, Ericsson AB


On Mon, Jul 28, 2014 at 11:16 AM, Vasily Demidenok
<[hidden email]> wrote:

> Hello again
> The friend of mine faced with the same problem. He assumes it's not tcp
> driver related problem, but port internals changes made in R16 release.
> Downgrading to R15B03 fixed their problem. +n options with d/s/a params also
> did not help.
>
> The app to reproduce bug:
> https://github.com/kudryashov-sv/ErlangCPort.git
>
>
> 2014-07-05 0:47 GMT+04:00 Vasily Demidenok <[hidden email]>:
>
>> The problem remains even if only erlang:port_command/3 with nosuspend
>> option is used. (no calls for gen_tcp:send and many processes write to the
>> same socket)
>>
>>
>> 2014-06-24 21:28 GMT+04:00 Lukas Larsson <[hidden email]>:
>>
>>> Hello,
>>>
>>> I was able to reproduce your testcase after removing all the "msg"
>>> printouts and starting a couple of clients at the same time. It seems that
>>> the sockets are hitting the high_msgq_watermark limit and then as data gets
>>> flushes they are not set to run again. I'll see if I can dig out what it is
>>> that is causing this behavior.
>>>
>>> Lukas
>>>
>>> On 24/06/14 16:56, Vasily Demidenok wrote:
>>>
>>> Hello list, we faced with some gen_tcp related problems after switch from
>>> erlang R15B03 to R16B03-01
>>>
>>> The problem is as following: When server produce data faster then
>>> consumer can handle, after
>>> the out server's buffers are full and incoming client's buffers are full
>>> gen_tcp:send/2 call on server side blocks forever in
>>> erts_internal:port_command/3. After this, even when client consumes all
>>> the data and the buffers are empty server process remains to be suspended
>>> in that call
>>>
>>> This problem does not occur always, but quite often.
>>>
>>> Some details on implementation are below, I also shrink the example to
>>> this small app so you can check the code:
>>> https://github.com/define-null/tcp_failing_ex
>>>
>>> Server is implemented in such a way, that it listen on 8899 port, then
>>> when client connect to it spawn main srv process and plenty of workers,
>>> which start to write to this port after client send some special msg. The
>>> main process is responsible for commands from the client and send responses
>>> via gen_tcp:send/2, while workers try to write some stream data to the
>>> client and use erang:port_command with no-suspend. So workers send only
>>> up-to-date data, dropping any in case client is slow.
>>>
>>> The behaviour which we see is as following:
>>> At first phase producer fills OS and erlang driver's buffers. Consumer
>>> read data as it arrives and server drop data which it cannot send. So we see
>>> buffer size growing on both side out queue of the server and in queue of the
>>> client respectively
>>>
>>> After some moment in time, i guess when the buffers are completely
>>> filled, server try respond to
>>> ping message of the client, using gen_tcp:send/2 call. After that, it
>>> blocks there forever, even after client consumes all the messages. The
>>> situation does not change and the srv process remains in the suspended
>>> state, while it's incoming buffer begins to grow when client send more ping
>>> messages.
>>>
>>> Below is the output on the system with two slow clients, where for the
>>> first client server's process is already blocked in gen_tcp:send/2 call,
>>> while the second is served well.
>>>
>>> Every 2.0s: netstat -al | grep 8899
>>> Tue Jun 24 16:34:51 2014
>>>
>>> tcp4      36      0  localhost.8899         localhost.63263
>>> ESTABLISHED
>>> tcp4       0       0  localhost.63263        localhost.8899
>>> ESTABLISHED
>>> tcp4       0 130990  localhost.8899         localhost.63257
>>> ESTABLISHED
>>> tcp4  619190   0  localhost.63257        localhost.8899
>>> ESTABLISHED
>>> tcp4       0       0  *.8899                 *.*
>>> LISTEN
>>>
>>> This is the output for the client process from github example, where we
>>> see that
>>> after send operation (ping msg) no incoming msg come any more.
>>> {{2014,6,24},{16,33,28}}: msg
>>> {{2014,6,24},{16,33,28}}: msg
>>> {{2014,6,24},{16,33,28}}: msg
>>> {{2014,6,24},{16,33,28}}: msg
>>> {{2014,6,24},{16,33,28}}: msg
>>> {{2014,6,24},{16,33,28}}: msg
>>> {{2014,6,24},{16,33,28}}: msg
>>> {{2014,6,24},{16,33,28}}: msg
>>> {{2014,6,24},{16,33,28}}: msg
>>> {{2014,6,24},{16,33,28}}: msg
>>> {{2014,6,24},{16,33,48}} before send
>>> {{2014,6,24},{16,33,48}} after send ok
>>> {{2014,6,24},{16,34,9}} before send
>>> {{2014,6,24},{16,34,9}} after send ok
>>> {{2014,6,24},{16,34,30}} before send
>>> {{2014,6,24},{16,34,30}} after send ok
>>> {{2014,6,24},{16,34,51}} before send
>>> {{2014,6,24},{16,34,51}} after send ok
>>> {{2014,6,24},{16,35,12}} before send
>>> ....
>>>
>>> Server blocked process output:
>>>
>>> {{2014,6,24},{16,33,21}}: <0.95.0> ping
>>> {{2014,6,24},{16,33,21}} bsend: <0.95.0>
>>> {{2014,6,24},{16,33,21}} asend: <0.95.0> ok
>>> {{2014,6,24},{16,33,48}}: <0.95.0> ping
>>> {{2014,6,24},{16,33,48}} bsend: <0.95.0>
>>> %% (no asend message after it)
>>>
>>> ([hidden email])1> erlang:process_info(pid(0,95,0)).
>>> [{current_function,{erts_internal,port_command,3}},
>>>  {initial_call,{proc_lib,init_p,5}},
>>>
>>> Bug is not always reproducible, but occurs quite often. The problem is
>>> that even
>>> after server's out buffers are empty data does not arrive to the client,
>>> and incoming buffer grow
>>> as client send ping messages to the server. (So erlang:port_command/3
>>> with no-suspend always return false
>>> when another main server process for this connection is suspended in
>>> gen_tcp:send/2)
>>>
>>> And then it's getting only worse as already mentioned
>>>
>>> Every 2.0s: netstat -al | grep 8899
>>> Tue Jun 24 16:56:59 2014
>>>
>>> tcp4     804      0  localhost.8899         localhost.63263
>>> ESTABLISHED
>>> tcp4       0      0  localhost.63263        localhost.8899
>>> ESTABLISHED
>>> tcp4       0      0  localhost.8899         localhost.63257
>>> ESTABLISHED
>>> tcp4       0      0  localhost.63257        localhost.8899
>>> ESTABLISHED
>>> tcp4       0      0  *.8899                 *.*                    LISTEN
>>>
>>> We faced with this after switching to R16B03 from R15B03, I know there
>>> were some changes in port_command handling, i guess why we got such
>>> behaviour?
>>>
>>>
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> [hidden email]
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>
>>>
>>
>
>
> _______________________________________________
> erlang-bugs mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-bugs
>
_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs