How to pin-point High CPU utilization in Erlang VM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to pin-point High CPU utilization in Erlang VM

Marcial Rosales
We are experiencing a very high cpu utilization in 3 clustered Erlang VMs running RabbitMQ. We have deployed another cluster in an attempt to reproduce the same behaviour without much success.

Our goals are:

  • Find out where the CPU is being utilized
  • Choose the right tools to analyze CPU utilization

Our observations so far:

  • The BAD cluster observes a pretty excessive CPU utilization, both user and system ones, and also network.
  • The BAD cluster also observes a higher Erlang scheduler utilization, specially on microstate emulator and other. We are yet to understand what other could be. According to Erlang documentation is unaccounted things.
  • The BAD cluster observes a considerably higher number of system calls which we are yet to identify (dunno how) why is that.
  • The BAD cluster does not necessarily run higher number of reductions. In fact, the GOOD cluster runs more reductions and yet has a lower scheduler utilization.
METRICBADGOOD
user cpu46% - 57%19% - 40%
system cpu20% - 37%1% - 10%
network traffic6M - 19Mup to 8M
system interrupts120k - 196k10k - 20k
syscalls1.6M - 2.1M49k - 110k
task-clock 10sec6825512324
cpu profiling info

We have gathered lots of metrics in attempt to identify why the BAD cluster uses so much CPU. All the information can be found here https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841  along with the environment information.


We appreciate a lot any insights as to what could be causing the issue and/or in relation to additional tools we could use.

Many thanks
 
--
Marcial Rosales
Pivotal, Inc.  EMEA


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to pin-point High CPU utilization in Erlang VM

Danil Zagoskin-2
Hi!

I know that feel when a load is very high and you don't know why.

More things to see:
* extended msacc: configure OTP with --with-microstate-accounting=extra
* LCNT: lcnt:apply(timer, sleep, [5000]), lcnt:conflicts(), lcnt:inspect, etc.
* Check if there are some processes spending too much time running. You may find this tool useful: https://gist.github.com/stolen/9a28ed9403c724541b0ee5fcd822613e
* network buffers. Check if rmem/wmem in sysctl are the same. Also check network interfaces — MTU, drops, etc.
* NUMA and scheduler bindings. Try running the whole application on single NUMA node to avoid interconnect cost
* Other processes on the host. Once we saw a malware miner that hid itself from ps and was active only when the server was busy doing its main job. Use perf top on the every CPU core to detect this.


On Wed, Jul 18, 2018 at 6:34 AM Marcial Rosales <[hidden email]> wrote:
We are experiencing a very high cpu utilization in 3 clustered Erlang VMs running RabbitMQ. We have deployed another cluster in an attempt to reproduce the same behaviour without much success.

Our goals are:

  • Find out where the CPU is being utilized
  • Choose the right tools to analyze CPU utilization

Our observations so far:

  • The BAD cluster observes a pretty excessive CPU utilization, both user and system ones, and also network.
  • The BAD cluster also observes a higher Erlang scheduler utilization, specially on microstate emulator and other. We are yet to understand what other could be. According to Erlang documentation is unaccounted things.
  • The BAD cluster observes a considerably higher number of system calls which we are yet to identify (dunno how) why is that.
  • The BAD cluster does not necessarily run higher number of reductions. In fact, the GOOD cluster runs more reductions and yet has a lower scheduler utilization.
METRICBADGOOD
user cpu46% - 57%19% - 40%
system cpu20% - 37%1% - 10%
network traffic6M - 19Mup to 8M
system interrupts120k - 196k10k - 20k
syscalls1.6M - 2.1M49k - 110k
task-clock 10sec6825512324
cpu profiling info

We have gathered lots of metrics in attempt to identify why the BAD cluster uses so much CPU. All the information can be found here https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841  along with the environment information.


We appreciate a lot any insights as to what could be causing the issue and/or in relation to additional tools we could use.

Many thanks
 
--
Marcial Rosales
Pivotal, Inc.  EMEA

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--
Danil Zagoskin | [hidden email]

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to pin-point High CPU utilization in Erlang VM

Lukas Larsson-8
In reply to this post by Marcial Rosales
Hello,

On Wed, Jul 18, 2018 at 5:34 AM Marcial Rosales <[hidden email]> wrote:
We are experiencing a very high cpu utilization in 3 clustered Erlang VMs running RabbitMQ. We have deployed another cluster in an attempt to reproduce the same behaviour without much success.

Our goals are:

  • Find out where the CPU is being utilized
  • Choose the right tools to analyze CPU utilization

Our observations so far:

  • The BAD cluster observes a pretty excessive CPU utilization, both user and system ones, and also network.
  • The BAD cluster also observes a higher Erlang scheduler utilization, specially on microstate emulator and other. We are yet to understand what other could be. According to Erlang documentation is unaccounted things.
If you compile Erlang with "./configure --with-microstate-accounting=extra" (as suggested by Danil), the other part will be broken into more granular parts. However, looking at your perf recordings I would guess that most of the other time is time spent spinning before going to sleep.
 
  • The BAD cluster observes a considerably higher number of system calls which we are yet to identify (dunno how) why is that.
Maybe use strace and then write some small script that counts the different syscall made?
 
  • The BAD cluster does not necessarily run higher number of reductions. In fact, the GOOD cluster runs more reductions and yet has a lower scheduler utilization.
METRICBADGOOD
user cpu46% - 57%19% - 40%
system cpu20% - 37%1% - 10%
network traffic6M - 19Mup to 8M
system interrupts120k - 196k10k - 20k
syscalls1.6M - 2.1M49k - 110k
task-clock 10sec6825512324
cpu profiling info

We have gathered lots of metrics in attempt to identify why the BAD cluster uses so much CPU. All the information can be found here https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841  along with the environment information.


In the BAD cluster, would it be possible to see the annotated versions of process_main and scheduler_wait in the "perf record" run?
There doesn't seem to be any kernel symbols in the BAD cluster perf run, do you know why?
 

We appreciate a lot any insights as to what could be causing the issue and/or in relation to additional tools we could use.

My gut tells me that there is some syscall that is a lot slower on XEN than it is in the KVM. In virtualized environments I always tend to suspect the time source first. Different hypervisors have very different performance for getting the time and the Erlang VM does a lot of time fetching.

Lukas

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to pin-point High CPU utilization in Erlang VM

Marcial Rosales
In reply to this post by Danil Zagoskin-2
 Hi Danil ! Thank you very much for the suggestions. For the time being, we can't change the Bad cluster with a new erlang with the extended msacc and lcnt. But in the good cluster we have lcnt enabled and we are going to enable msacc. Although we wont have anything to compare against.

We'll report back any findings with regards the other suggestions very soon.

Thanks !
  

On Wed, Jul 18, 2018 at 11:28 AM Danil Zagoskin <[hidden email]> wrote:
Hi!

I know that feel when a load is very high and you don't know why.

More things to see:
* extended msacc: configure OTP with --with-microstate-accounting=extra
* LCNT: lcnt:apply(timer, sleep, [5000]), lcnt:conflicts(), lcnt:inspect, etc.
* Check if there are some processes spending too much time running. You may find this tool useful: https://gist.github.com/stolen/9a28ed9403c724541b0ee5fcd822613e
* network buffers. Check if rmem/wmem in sysctl are the same. Also check network interfaces — MTU, drops, etc.
* NUMA and scheduler bindings. Try running the whole application on single NUMA node to avoid interconnect cost
* Other processes on the host. Once we saw a malware miner that hid itself from ps and was active only when the server was busy doing its main job. Use perf top on the every CPU core to detect this.


On Wed, Jul 18, 2018 at 6:34 AM Marcial Rosales <[hidden email]> wrote:
We are experiencing a very high cpu utilization in 3 clustered Erlang VMs running RabbitMQ. We have deployed another cluster in an attempt to reproduce the same behaviour without much success.

Our goals are:

  • Find out where the CPU is being utilized
  • Choose the right tools to analyze CPU utilization

Our observations so far:

  • The BAD cluster observes a pretty excessive CPU utilization, both user and system ones, and also network.
  • The BAD cluster also observes a higher Erlang scheduler utilization, specially on microstate emulator and other. We are yet to understand what other could be. According to Erlang documentation is unaccounted things.
  • The BAD cluster observes a considerably higher number of system calls which we are yet to identify (dunno how) why is that.
  • The BAD cluster does not necessarily run higher number of reductions. In fact, the GOOD cluster runs more reductions and yet has a lower scheduler utilization.
METRICBADGOOD
user cpu46% - 57%19% - 40%
system cpu20% - 37%1% - 10%
network traffic6M - 19Mup to 8M
system interrupts120k - 196k10k - 20k
syscalls1.6M - 2.1M49k - 110k
task-clock 10sec6825512324
cpu profiling info

We have gathered lots of metrics in attempt to identify why the BAD cluster uses so much CPU. All the information can be found here https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841  along with the environment information.


We appreciate a lot any insights as to what could be causing the issue and/or in relation to additional tools we could use.

Many thanks
 
--
Marcial Rosales
Pivotal, Inc.  EMEA

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--
Danil Zagoskin | [hidden email]


--
Marcial Rosales
Advisory Solution Architect (Customer Success Organization)
Pivotal, Inc.  EMEA



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to pin-point High CPU utilization in Erlang VM

Marcial Rosales
In reply to this post by Lukas Larsson-8
 Hi Lukas, sorry for responding this late to you. I dont have yet all the information but I ought to send you a reply. 
 See my replies in-line:

On Thu, Jul 19, 2018 at 11:51 AM Lukas Larsson <[hidden email]> wrote:
Hello,

On Wed, Jul 18, 2018 at 5:34 AM Marcial Rosales <[hidden email]> wrote:
We are experiencing a very high cpu utilization in 3 clustered Erlang VMs running RabbitMQ. We have deployed another cluster in an attempt to reproduce the same behaviour without much success.

Our goals are:

  • Find out where the CPU is being utilized
  • Choose the right tools to analyze CPU utilization

Our observations so far:

  • The BAD cluster observes a pretty excessive CPU utilization, both user and system ones, and also network.
  • The BAD cluster also observes a higher Erlang scheduler utilization, specially on microstate emulator and other. We are yet to understand what other could be. According to Erlang documentation is unaccounted things.
If you compile Erlang with "./configure --with-microstate-accounting=extra" (as suggested by Danil), the other part will be broken into more granular parts. However, looking at your perf recordings I would guess that most of the other time is time spent spinning before going to sleep.
 
MR > Thanks! We are going to recompile Erlang with lock counting and extra microstate-accounting. Hopefully we can get more visibility the next time we get a cluster in that bad state.

  • The BAD cluster observes a considerably higher number of system calls which we are yet to identify (dunno how) why is that.
Maybe use strace and then write some small script that counts the different syscall made?

MR >  The strace reports that 97% of the calls are `futex`, should that helped. 

 
  • The BAD cluster does not necessarily run higher number of reductions. In fact, the GOOD cluster runs more reductions and yet has a lower scheduler utilization.
METRICBADGOOD
user cpu46% - 57%19% - 40%
system cpu20% - 37%1% - 10%
network traffic6M - 19Mup to 8M
system interrupts120k - 196k10k - 20k
syscalls1.6M - 2.1M49k - 110k
task-clock 10sec6825512324
cpu profiling info

We have gathered lots of metrics in attempt to identify why the BAD cluster uses so much CPU. All the information can be found here https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841  along with the environment information.


In the BAD cluster, would it be possible to see the annotated versions of process_main and scheduler_wait in the "perf record" run?
There doesn't seem to be any kernel symbols in the BAD cluster perf run, do you know why?

MR> I dont really know. We are running with the same OS image. Any ideas what could that be?

 

We appreciate a lot any insights as to what could be causing the issue and/or in relation to additional tools we could use.

My gut tells me that there is some syscall that is a lot slower on XEN than it is in the KVM. In virtualized environments I always tend to suspect the time source first. Different hypervisors have very different performance for getting the time and the Erlang VM does a lot of time fetching.

MR> We are in the process of provisioning a cluster in AWS in an attempt to discard or confirm the hypervisor.
 
Lukas


--
Marcial Rosales
Advisory Solution Architect (Customer Success Organization)
Pivotal, Inc.  EMEA



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to pin-point High CPU utilization in Erlang VM

Eric des Courtis-3
We've had issues recently with Xen VMs on AWS being much slower than usual. It wasn't even running Erlang.

On Wed, Jul 25, 2018, 4:18 AM Marcial Rosales <[hidden email]> wrote:
 Hi Lukas, sorry for responding this late to you. I dont have yet all the information but I ought to send you a reply. 
 See my replies in-line:

On Thu, Jul 19, 2018 at 11:51 AM Lukas Larsson <[hidden email]> wrote:
Hello,

On Wed, Jul 18, 2018 at 5:34 AM Marcial Rosales <[hidden email]> wrote:
We are experiencing a very high cpu utilization in 3 clustered Erlang VMs running RabbitMQ. We have deployed another cluster in an attempt to reproduce the same behaviour without much success.

Our goals are:

  • Find out where the CPU is being utilized
  • Choose the right tools to analyze CPU utilization

Our observations so far:

  • The BAD cluster observes a pretty excessive CPU utilization, both user and system ones, and also network.
  • The BAD cluster also observes a higher Erlang scheduler utilization, specially on microstate emulator and other. We are yet to understand what other could be. According to Erlang documentation is unaccounted things.
If you compile Erlang with "./configure --with-microstate-accounting=extra" (as suggested by Danil), the other part will be broken into more granular parts. However, looking at your perf recordings I would guess that most of the other time is time spent spinning before going to sleep.
 
MR > Thanks! We are going to recompile Erlang with lock counting and extra microstate-accounting. Hopefully we can get more visibility the next time we get a cluster in that bad state.

  • The BAD cluster observes a considerably higher number of system calls which we are yet to identify (dunno how) why is that.
Maybe use strace and then write some small script that counts the different syscall made?

MR >  The strace reports that 97% of the calls are `futex`, should that helped. 

 
  • The BAD cluster does not necessarily run higher number of reductions. In fact, the GOOD cluster runs more reductions and yet has a lower scheduler utilization.
METRICBADGOOD
user cpu46% - 57%19% - 40%
system cpu20% - 37%1% - 10%
network traffic6M - 19Mup to 8M
system interrupts120k - 196k10k - 20k
syscalls1.6M - 2.1M49k - 110k
task-clock 10sec6825512324
cpu profiling info

We have gathered lots of metrics in attempt to identify why the BAD cluster uses so much CPU. All the information can be found here https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841  along with the environment information.


In the BAD cluster, would it be possible to see the annotated versions of process_main and scheduler_wait in the "perf record" run?
There doesn't seem to be any kernel symbols in the BAD cluster perf run, do you know why?

MR> I dont really know. We are running with the same OS image. Any ideas what could that be?

 

We appreciate a lot any insights as to what could be causing the issue and/or in relation to additional tools we could use.

My gut tells me that there is some syscall that is a lot slower on XEN than it is in the KVM. In virtualized environments I always tend to suspect the time source first. Different hypervisors have very different performance for getting the time and the Erlang VM does a lot of time fetching.

MR> We are in the process of provisioning a cluster in AWS in an attempt to discard or confirm the hypervisor.
 
Lukas


--
Marcial Rosales
Advisory Solution Architect (Customer Success Organization)
Pivotal, Inc.  EMEA


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions