Strange BEAM slowdown

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange BEAM slowdown

Timothy Legant
Hello,

We have an application where we read a huge volume of small messages
from ZMQ sockets and distribute them to Erlang processes.  We are
seeing strange behavior where, after a short while, beam.smp's load
drops quite a bit and then the data begins queuing, eating memory
until we either stop the program or the Linux OOM killer does it for
us.

DETAILS
-------
CentOS release 6.6 (Final)
Erlang/OTP 17 [erts-6.4] [source-2e19e2f] [64-bit] [smp:56:56] [async-threads:20] [hipe] [kernel-poll:true]

beam.smp is started with the flags: +sbt db +sub true

We have 60+ data sources (TCP/ZMQ sockets), each of which feeds an
independent set of processes; there is no interaction between the
processes handling the data from one socket and the processes handling
data from other sockets.

Our first implementation used the erlzmq2 library to read the socket.
We then parsed the messages in Erlang and sent Erlang terms to the
data handling processes.

After seeing the problem behavior we suspected that the repeated calls
to erlzmq:recv() and parsing in Erlang might be the cause of the
backup so we rewrote that code as a NIF (background thread + several
API calls).  Our NIF implementation reads the ZMQ socket, parses the
data and then sends it to the data handling processes.  We (obviously,
I suppose) create one of these background threads for each of the 60+
data source sockets.

Despite the entirely different implementation of ZMQ handling, parsing
and dispatch of the data, we are seeing the same issue: first the load
drops off precipitously and then the data starts queuing in the ZMQ
socket buffers and the program is unusable.


We are curious if anyone has seen this sort of behavior with BEAM or
might have suggestions on where to look for the issue.


Thanks,

Tim
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Strange BEAM slowdown

dmkolesnikov
Hello,

I've seen a similar behavior when message processing rate was slower then message arrival rate. I would use etop to check if some process has significant mailbox size or huge reductions; use recon tool to check for binary leaked.

There is very good book by Fried, it might shed some light on your problem

https://s3.amazonaws.com/erlang-in-anger/text.v1.0.2.pdf

Best Regards,
Dmitry

Sent from my iPhone

> On 22 Feb 2016, at 19:14, Timothy Legant <[hidden email]> wrote:
>
> Hello,
>
> We have an application where we read a huge volume of small messages
> from ZMQ sockets and distribute them to Erlang processes.  We are
> seeing strange behavior where, after a short while, beam.smp's load
> drops quite a bit and then the data begins queuing, eating memory
> until we either stop the program or the Linux OOM killer does it for
> us.
>
> DETAILS
> -------
> CentOS release 6.6 (Final)
> Erlang/OTP 17 [erts-6.4] [source-2e19e2f] [64-bit] [smp:56:56] [async-threads:20] [hipe] [kernel-poll:true]
>
> beam.smp is started with the flags: +sbt db +sub true
>
> We have 60+ data sources (TCP/ZMQ sockets), each of which feeds an
> independent set of processes; there is no interaction between the
> processes handling the data from one socket and the processes handling
> data from other sockets.
>
> Our first implementation used the erlzmq2 library to read the socket.
> We then parsed the messages in Erlang and sent Erlang terms to the
> data handling processes.
>
> After seeing the problem behavior we suspected that the repeated calls
> to erlzmq:recv() and parsing in Erlang might be the cause of the
> backup so we rewrote that code as a NIF (background thread + several
> API calls).  Our NIF implementation reads the ZMQ socket, parses the
> data and then sends it to the data handling processes.  We (obviously,
> I suppose) create one of these background threads for each of the 60+
> data source sockets.
>
> Despite the entirely different implementation of ZMQ handling, parsing
> and dispatch of the data, we are seeing the same issue: first the load
> drops off precipitously and then the data starts queuing in the ZMQ
> socket buffers and the program is unusable.
>
>
> We are curious if anyone has seen this sort of behavior with BEAM or
> might have suggestions on where to look for the issue.
>
>
> Thanks,
>
> Tim
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Strange BEAM slowdown

Michael Truog
In reply to this post by Timothy Legant
If this was related to scheduler threads locking up, due to spending too
much time (more than 1-2ms roughly) in the erlzmq2 NIF, when the
erlzmq:recv function is called, you can change the ZeroMQ connections
you create to receive with active mode instead of passive, to receive
the messages in the Erlang process without the call to erlzmq:recv.  I
don't quite understand the need to rewrite the NIF, since it already is
using a background thread for the receive (at
https://github.com/zeromq/erlzmq2).  An example of using the active mode
for recv is at
https://github.com/CloudI/cloudi_service_zeromq/blob/master/src/cloudi_service_zeromq.erl 
.

On 02/22/2016 09:14 AM, Timothy Legant wrote:

> Hello,
>
> We have an application where we read a huge volume of small messages
> from ZMQ sockets and distribute them to Erlang processes.  We are
> seeing strange behavior where, after a short while, beam.smp's load
> drops quite a bit and then the data begins queuing, eating memory
> until we either stop the program or the Linux OOM killer does it for
> us.
>
> DETAILS
> -------
> CentOS release 6.6 (Final)
> Erlang/OTP 17 [erts-6.4] [source-2e19e2f] [64-bit] [smp:56:56] [async-threads:20] [hipe] [kernel-poll:true]
>
> beam.smp is started with the flags: +sbt db +sub true
>
> We have 60+ data sources (TCP/ZMQ sockets), each of which feeds an
> independent set of processes; there is no interaction between the
> processes handling the data from one socket and the processes handling
> data from other sockets.
>
> Our first implementation used the erlzmq2 library to read the socket.
> We then parsed the messages in Erlang and sent Erlang terms to the
> data handling processes.
>
> After seeing the problem behavior we suspected that the repeated calls
> to erlzmq:recv() and parsing in Erlang might be the cause of the
> backup so we rewrote that code as a NIF (background thread + several
> API calls).  Our NIF implementation reads the ZMQ socket, parses the
> data and then sends it to the data handling processes.  We (obviously,
> I suppose) create one of these background threads for each of the 60+
> data source sockets.
>
> Despite the entirely different implementation of ZMQ handling, parsing
> and dispatch of the data, we are seeing the same issue: first the load
> drops off precipitously and then the data starts queuing in the ZMQ
> socket buffers and the program is unusable.
>
>
> We are curious if anyone has seen this sort of behavior with BEAM or
> might have suggestions on where to look for the issue.
>
>
> Thanks,
>
> Tim
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Strange BEAM slowdown

Lukas Larsson-8
In reply to this post by Timothy Legant
Hello Timothy,

On Mon, Feb 22, 2016 at 6:14 PM, Timothy Legant <[hidden email]> wrote:
Hello,

We have an application where we read a huge volume of small messages
from ZMQ sockets and distribute them to Erlang processes.  We are
seeing strange behavior where, after a short while, beam.smp's load
drops quite a bit and then the data begins queuing, eating memory
until we either stop the program or the Linux OOM killer does it for
us.

When you say load, what do you mean? Scheduler utilization? Average CPU utilization? Requests per second?

Lukas

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions