How to test multi-pollset?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

How to test multi-pollset?

pablo platt-3
Hi,

What is the expected effect of the multi-pollset PR [1] on a UDP socket on the sender/receiver side?
My use case is a media server with several broadcaster and many viewers.
Each stream use 1Mbps (aprox 100 * 1500 bytes packets per second).
Should I expect improvement when gen_udp is sending packets, receiving packets or both?

Is it reasonable to pick a point in master and use it on a production system after testing?

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to test multi-pollset?

Lukas Larsson-8
Hello,

On Mon, Jan 29, 2018 at 9:22 AM, pablo platt <[hidden email]> wrote:
Hi,

What is the expected effect of the multi-pollset PR [1] on a UDP socket on the sender/receiver side?
My use case is a media server with several broadcaster and many viewers.
Each stream use 1Mbps (aprox 100 * 1500 bytes packets per second).
Should I expect improvement when gen_udp is sending packets, receiving packets or both?

Yes, I believe that you will see an improvement. It depends on what type of HW that you are running on, typically the more logical cpu's you have the more gain you will get from the improvements in I/O polling[1]. Also the exact usage pattern matters.
 
Is it reasonable to pick a point in master and use it on a production system after testing?

I would take the latest tip of master and test that thoroughly for you application. The things that we merge into master have gone through all our testing before it is merged, so it is as stable as the maint branch. However we make a lot more changes in master than in maint, so because of that there will be a greater chance of some bug slipping through.

If you do decide to give the improved I/O polling implementation a go, please do come back with any negative or positive findings that you get!

Lukas

[1]: The largest change in the PR is not actually the ability to use multiple pollsets, but that the polling has been lifted out to be done by dedicated threads.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to test multi-pollset?

pablo platt-3
It's great to see all the hard work invested in performance in master.
Thanks.

On Mon, Jan 29, 2018 at 10:49 AM, Lukas Larsson <[hidden email]> wrote:
Hello,

On Mon, Jan 29, 2018 at 9:22 AM, pablo platt <[hidden email]> wrote:
Hi,

What is the expected effect of the multi-pollset PR [1] on a UDP socket on the sender/receiver side?
My use case is a media server with several broadcaster and many viewers.
Each stream use 1Mbps (aprox 100 * 1500 bytes packets per second).
Should I expect improvement when gen_udp is sending packets, receiving packets or both?

Yes, I believe that you will see an improvement. It depends on what type of HW that you are running on, typically the more logical cpu's you have the more gain you will get from the improvements in I/O polling[1]. Also the exact usage pattern matters.
 
Is it reasonable to pick a point in master and use it on a production system after testing?

I would take the latest tip of master and test that thoroughly for you application. The things that we merge into master have gone through all our testing before it is merged, so it is as stable as the maint branch. However we make a lot more changes in master than in maint, so because of that there will be a greater chance of some bug slipping through.

If you do decide to give the improved I/O polling implementation a go, please do come back with any negative or positive findings that you get!

Lukas

[1]: The largest change in the PR is not actually the ability to use multiple pollsets, but that the polling has been lifted out to be done by dedicated threads.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to test multi-pollset?

pablo platt-3
Master with multi poll-sets is running on my dev machine without errors so far.

What's the difference between polling threads and poll-sets (+IOt  and +IOp)?
How do I know if I should increase the number of polling threads or poll-sets?
I'm using a 8 or 16 vCPUs machine (vCPU =  hyper-threads) running Ubuntu 16.04.

I've measured with msacc as the docs [1] recommends on 1 vCPU machine.
Each gen_udp receiving 100 UDP packets per second increase the 'poll' row load by about 0.05%.
Adding several gen_udp that send 100 UDP packets per second almost doesn't affect the load.
Is it expected that gen_udp receiving packets has high load but gen_udp sending packets very low load?

How can I compare master with multiple poll-sets with erlang/otp 20?
msacc on otp 20 doesn't have stats about poll.


On Mon, Jan 29, 2018 at 11:19 AM, pablo platt <[hidden email]> wrote:
It's great to see all the hard work invested in performance in master.
Thanks.

On Mon, Jan 29, 2018 at 10:49 AM, Lukas Larsson <[hidden email]> wrote:
Hello,

On Mon, Jan 29, 2018 at 9:22 AM, pablo platt <[hidden email]> wrote:
Hi,

What is the expected effect of the multi-pollset PR [1] on a UDP socket on the sender/receiver side?
My use case is a media server with several broadcaster and many viewers.
Each stream use 1Mbps (aprox 100 * 1500 bytes packets per second).
Should I expect improvement when gen_udp is sending packets, receiving packets or both?

Yes, I believe that you will see an improvement. It depends on what type of HW that you are running on, typically the more logical cpu's you have the more gain you will get from the improvements in I/O polling[1]. Also the exact usage pattern matters.
 
Is it reasonable to pick a point in master and use it on a production system after testing?

I would take the latest tip of master and test that thoroughly for you application. The things that we merge into master have gone through all our testing before it is merged, so it is as stable as the maint branch. However we make a lot more changes in master than in maint, so because of that there will be a greater chance of some bug slipping through.

If you do decide to give the improved I/O polling implementation a go, please do come back with any negative or positive findings that you get!

Lukas

[1]: The largest change in the PR is not actually the ability to use multiple pollsets, but that the polling has been lifted out to be done by dedicated threads.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to test multi-pollset?

Lukas Larsson-8
Hello,

On Fri, Feb 23, 2018 at 3:15 PM, pablo platt <[hidden email]> wrote:
Master with multi poll-sets is running on my dev machine without errors so far.

great!
 

What's the difference between polling threads and poll-sets (+IOt  and +IOp)?

Its explained in this presentation: http://www.erlang-factory.com/euc2017/kenneth-lundin

Polling threads can be added when its load (as measured by msacc) increases. In my stress tests I've seen a small decrease in latency on large machines when adding a second thread.
The number of poll-sets can be increased to deal with scalability issues in the underlying libc/kernel implementation. Modern operating systems rarely have any problems with this, so I'm not sure how useful changing this value actually is.
 
How do I know if I should increase the number of polling threads or poll-sets?

The best way is probably through experimentation. Measure your applications latency/throughput and then play with the settings and see if it changes.
 
I'm using a 8 or 16 vCPUs machine (vCPU =  hyper-threads) running Ubuntu 16.04.

I've measured with msacc as the docs [1] recommends on 1 vCPU machine.
Each gen_udp receiving 100 UDP packets per second increase the 'poll' row load by about 0.05%.
Adding several gen_udp that send 100 UDP packets per second almost doesn't affect the load.
Is it expected that gen_udp receiving packets has high load but gen_udp sending packets very low load?

yes, especially if you use active mode when receiving. Sending UDP shouldn't really go via the poll implementation at all as if the kernel buffer is full it will just discard the packet.
 

How can I compare master with multiple poll-sets with erlang/otp 20?

Run your application and measure using external tools the difference in throughput/latency.

Lukas
 
msacc on otp 20 doesn't have stats about poll.


On Mon, Jan 29, 2018 at 11:19 AM, pablo platt <[hidden email]> wrote:
It's great to see all the hard work invested in performance in master.
Thanks.

On Mon, Jan 29, 2018 at 10:49 AM, Lukas Larsson <[hidden email]> wrote:
Hello,

On Mon, Jan 29, 2018 at 9:22 AM, pablo platt <[hidden email]> wrote:
Hi,

What is the expected effect of the multi-pollset PR [1] on a UDP socket on the sender/receiver side?
My use case is a media server with several broadcaster and many viewers.
Each stream use 1Mbps (aprox 100 * 1500 bytes packets per second).
Should I expect improvement when gen_udp is sending packets, receiving packets or both?

Yes, I believe that you will see an improvement. It depends on what type of HW that you are running on, typically the more logical cpu's you have the more gain you will get from the improvements in I/O polling[1]. Also the exact usage pattern matters.
 
Is it reasonable to pick a point in master and use it on a production system after testing?

I would take the latest tip of master and test that thoroughly for you application. The things that we merge into master have gone through all our testing before it is merged, so it is as stable as the maint branch. However we make a lot more changes in master than in maint, so because of that there will be a greater chance of some bug slipping through.

If you do decide to give the improved I/O polling implementation a go, please do come back with any negative or positive findings that you get!

Lukas

[1]: The largest change in the PR is not actually the ability to use multiple pollsets, but that the polling has been lifted out to be done by dedicated threads.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to test multi-pollset?

pablo platt-3


On Fri, Feb 23, 2018 at 5:10 PM, Lukas Larsson <[hidden email]> wrote:
Hello,

On Fri, Feb 23, 2018 at 3:15 PM, pablo platt <[hidden email]> wrote:
Master with multi poll-sets is running on my dev machine without errors so far.

great!
 

What's the difference between polling threads and poll-sets (+IOt  and +IOp)?

Its explained in this presentation: http://www.erlang-factory.com/euc2017/kenneth-lundin

Polling threads can be added when its load (as measured by msacc) increases. In my stress tests I've seen a small decrease in latency on large machines when adding a second thread.
The number of poll-sets can be increased to deal with scalability issues in the underlying libc/kernel implementation. Modern operating systems rarely have any problems with this, so I'm not sure how useful changing this value actually is.
 
How do I know if I should increase the number of polling threads or poll-sets?

The best way is probably through experimentation. Measure your applications latency/throughput and then play with the settings and see if it changes.
 
I'm using a 8 or 16 vCPUs machine (vCPU =  hyper-threads) running Ubuntu 16.04.

I've measured with msacc as the docs [1] recommends on 1 vCPU machine.
Each gen_udp receiving 100 UDP packets per second increase the 'poll' row load by about 0.05%.
Adding several gen_udp that send 100 UDP packets per second almost doesn't affect the load.
Is it expected that gen_udp receiving packets has high load but gen_udp sending packets very low load?

yes, especially if you use active mode when receiving. Sending UDP shouldn't really go via the poll implementation at all as if the kernel buffer is full it will just discard the packet.

I'm using gen_udp with {active, once} and {recbuf, 16384}.
So if I have 5 gen_udp receiving packets and 500 gen_udp sending packets, I'll probably won't see a big performance difference from multi poll-sets.
 
 

How can I compare master with multiple poll-sets with erlang/otp 20?

Run your application and measure using external tools the difference in throughput/latency.

I will. Thanks.
 

Lukas
 
msacc on otp 20 doesn't have stats about poll.


On Mon, Jan 29, 2018 at 11:19 AM, pablo platt <[hidden email]> wrote:
It's great to see all the hard work invested in performance in master.
Thanks.

On Mon, Jan 29, 2018 at 10:49 AM, Lukas Larsson <[hidden email]> wrote:
Hello,

On Mon, Jan 29, 2018 at 9:22 AM, pablo platt <[hidden email]> wrote:
Hi,

What is the expected effect of the multi-pollset PR [1] on a UDP socket on the sender/receiver side?
My use case is a media server with several broadcaster and many viewers.
Each stream use 1Mbps (aprox 100 * 1500 bytes packets per second).
Should I expect improvement when gen_udp is sending packets, receiving packets or both?

Yes, I believe that you will see an improvement. It depends on what type of HW that you are running on, typically the more logical cpu's you have the more gain you will get from the improvements in I/O polling[1]. Also the exact usage pattern matters.
 
Is it reasonable to pick a point in master and use it on a production system after testing?

I would take the latest tip of master and test that thoroughly for you application. The things that we merge into master have gone through all our testing before it is merged, so it is as stable as the maint branch. However we make a lot more changes in master than in maint, so because of that there will be a greater chance of some bug slipping through.

If you do decide to give the improved I/O polling implementation a go, please do come back with any negative or positive findings that you get!

Lukas

[1]: The largest change in the PR is not actually the ability to use multiple pollsets, but that the polling has been lifted out to be done by dedicated threads.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions




_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to test multi-pollset?

pablo platt-3
Hello,

Updating with my test.

I've been using master in production for two days on 4 servers.
Each server has 4 vCPUs.
Each server handles about 10K UDP packets per second. Approx 1K incoming packets per second and 9K outgoing packets per second.

I didn't see any issues. Seems to work fine.
CPU load in OTP Master 21 increased compared to OTP 20 from 58% to 68%. Does this make sense?
This is a real production system that does other things in addition to just sending UDP packets but the OTP version is the only change.

Thanks




On Fri, Feb 23, 2018 at 5:29 PM, pablo platt <[hidden email]> wrote:


On Fri, Feb 23, 2018 at 5:10 PM, Lukas Larsson <[hidden email]> wrote:
Hello,

On Fri, Feb 23, 2018 at 3:15 PM, pablo platt <[hidden email]> wrote:
Master with multi poll-sets is running on my dev machine without errors so far.

great!
 

What's the difference between polling threads and poll-sets (+IOt  and +IOp)?

Its explained in this presentation: http://www.erlang-factory.com/euc2017/kenneth-lundin

Polling threads can be added when its load (as measured by msacc) increases. In my stress tests I've seen a small decrease in latency on large machines when adding a second thread.
The number of poll-sets can be increased to deal with scalability issues in the underlying libc/kernel implementation. Modern operating systems rarely have any problems with this, so I'm not sure how useful changing this value actually is.
 
How do I know if I should increase the number of polling threads or poll-sets?

The best way is probably through experimentation. Measure your applications latency/throughput and then play with the settings and see if it changes.
 
I'm using a 8 or 16 vCPUs machine (vCPU =  hyper-threads) running Ubuntu 16.04.

I've measured with msacc as the docs [1] recommends on 1 vCPU machine.
Each gen_udp receiving 100 UDP packets per second increase the 'poll' row load by about 0.05%.
Adding several gen_udp that send 100 UDP packets per second almost doesn't affect the load.
Is it expected that gen_udp receiving packets has high load but gen_udp sending packets very low load?

yes, especially if you use active mode when receiving. Sending UDP shouldn't really go via the poll implementation at all as if the kernel buffer is full it will just discard the packet.

I'm using gen_udp with {active, once} and {recbuf, 16384}.
So if I have 5 gen_udp receiving packets and 500 gen_udp sending packets, I'll probably won't see a big performance difference from multi poll-sets.
 
 

How can I compare master with multiple poll-sets with erlang/otp 20?

Run your application and measure using external tools the difference in throughput/latency.

I will. Thanks.
 

Lukas
 
msacc on otp 20 doesn't have stats about poll.


On Mon, Jan 29, 2018 at 11:19 AM, pablo platt <[hidden email]> wrote:
It's great to see all the hard work invested in performance in master.
Thanks.

On Mon, Jan 29, 2018 at 10:49 AM, Lukas Larsson <[hidden email]> wrote:
Hello,

On Mon, Jan 29, 2018 at 9:22 AM, pablo platt <[hidden email]> wrote:
Hi,

What is the expected effect of the multi-pollset PR [1] on a UDP socket on the sender/receiver side?
My use case is a media server with several broadcaster and many viewers.
Each stream use 1Mbps (aprox 100 * 1500 bytes packets per second).
Should I expect improvement when gen_udp is sending packets, receiving packets or both?

Yes, I believe that you will see an improvement. It depends on what type of HW that you are running on, typically the more logical cpu's you have the more gain you will get from the improvements in I/O polling[1]. Also the exact usage pattern matters.
 
Is it reasonable to pick a point in master and use it on a production system after testing?

I would take the latest tip of master and test that thoroughly for you application. The things that we merge into master have gone through all our testing before it is merged, so it is as stable as the maint branch. However we make a lot more changes in master than in maint, so because of that there will be a greater chance of some bug slipping through.

If you do decide to give the improved I/O polling implementation a go, please do come back with any negative or positive findings that you get!

Lukas

[1]: The largest change in the PR is not actually the ability to use multiple pollsets, but that the polling has been lifted out to be done by dedicated threads.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions





_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to test multi-pollset?

Lukas Larsson-8
Hello!

On Thu, Mar 8, 2018 at 8:43 PM, pablo platt <[hidden email]> wrote:
Hello,

Updating with my test.

I've been using master in production for two days on 4 servers.
Each server has 4 vCPUs.
Each server handles about 10K UDP packets per second. Approx 1K incoming packets per second and 9K outgoing packets per second.

I didn't see any issues. Seems to work fine.
 
That's great! Thanks for testing it! 

CPU load in OTP Master 21 increased compared to OTP 20 from 58% to 68%. Does this make sense?

hmm, no not really. I would have expected it to decrease.

Would you mind helping me to figure out why the CPU usage has gone up?

To start with I'd like to have a look at the output of

1> msacc:start(30000), msacc:print().

for both versions in the Erlang shell. Using this I hope to be able to narrow down where the extra CPU time is being spent.

Also it would great if you could run a few perf commands to compare the systems.

Ideally it would be be best if you could recompile beam with the --build-id linker flag. i.e.

> ./configure LDLAGS="-Wl,--build-id"

That way it is possible to use "perf archive"[1] to collect the symbols of beam.smp.

> sudo perf stat -d -p $BEAMPID -- sleep 30 2> stat.log
> sudo perf record -g -p $BEAMPID -- sleep 30
> sudo perf archive
> sudo tar czf $BEAMPID.tar.gz stat.log perf.data.tar.bz2 perf.data

and send me the tar.gz for OTP-20 and master.

If you cannot re-compile erts with --build-id, then just do "sudo perf report" and send me a screenshot of the tools shows you.

If you prefer to this off-list, we can do that as well.

Lukas

[1]: If the "perf archive" command does not work (which it doesn't on my machine), just download the script from here: https://elixir.bootlin.com/linux/v3.18/source/tools/perf/perf-archive.sh

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to test multi-pollset?

pablo platt-3


On Fri, Mar 9, 2018 at 12:38 PM, Lukas Larsson <[hidden email]> wrote:
Hello!

On Thu, Mar 8, 2018 at 8:43 PM, pablo platt <[hidden email]> wrote:
Hello,

Updating with my test.

I've been using master in production for two days on 4 servers.
Each server has 4 vCPUs.
Each server handles about 10K UDP packets per second. Approx 1K incoming packets per second and 9K outgoing packets per second.

I didn't see any issues. Seems to work fine.
 
That's great! Thanks for testing it! 

CPU load in OTP Master 21 increased compared to OTP 20 from 58% to 68%. Does this make sense?

hmm, no not really. I would have expected it to decrease.

Would you mind helping me to figure out why the CPU usage has gone up?

To start with I'd like to have a look at the output of

1> msacc:start(30000), msacc:print().

for both versions in the Erlang shell. Using this I hope to be able to narrow down where the extra CPU time is being spent.

Also it would great if you could run a few perf commands to compare the systems.

Ideally it would be be best if you could recompile beam with the --build-id linker flag. i.e.

> ./configure LDLAGS="-Wl,--build-id"

I'll try to compile with kerl:
export LDLAGS="-Wl,--build-id" && kerl build git https://github.com/erlang/otp 9bc4a096025254aed157e4753743be61ce1f7489 master

How can I verify that the flag was actually used?
 

That way it is possible to use "perf archive"[1] to collect the symbols of beam.smp.

> sudo perf stat -d -p $BEAMPID -- sleep 30 2> stat.log
> sudo perf record -g -p $BEAMPID -- sleep 30
> sudo perf archive
> sudo tar czf $BEAMPID.tar.gz stat.log perf.data.tar.bz2 perf.data


Can I do it on a dev machine with a smaller load or does it have to be on a real production server?
What's the effect of perf (and msacc) on a production system? When I tried to do erlang profiling in the past it crashed my server.
 
and send me the tar.gz for OTP-20 and master.

If you cannot re-compile erts with --build-id, then just do "sudo perf report" and send me a screenshot of the tools shows you.

If you prefer to this off-list, we can do that as well.

Lukas

[1]: If the "perf archive" command does not work (which it doesn't on my machine), just download the script from here: https://elixir.bootlin.com/linux/v3.18/source/tools/perf/perf-archive.sh


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to test multi-pollset?

Lukas Larsson-8


On Fri, Mar 9, 2018 at 11:58 AM, pablo platt <[hidden email]> wrote:


On Fri, Mar 9, 2018 at 12:38 PM, Lukas Larsson <[hidden email]> wrote:
Hello!

On Thu, Mar 8, 2018 at 8:43 PM, pablo platt <[hidden email]> wrote:
Hello,

Updating with my test.

I've been using master in production for two days on 4 servers.
Each server has 4 vCPUs.
Each server handles about 10K UDP packets per second. Approx 1K incoming packets per second and 9K outgoing packets per second.

I didn't see any issues. Seems to work fine.
 
That's great! Thanks for testing it! 

CPU load in OTP Master 21 increased compared to OTP 20 from 58% to 68%. Does this make sense?

hmm, no not really. I would have expected it to decrease.

Would you mind helping me to figure out why the CPU usage has gone up?

To start with I'd like to have a look at the output of

1> msacc:start(30000), msacc:print().

for both versions in the Erlang shell. Using this I hope to be able to narrow down where the extra CPU time is being spent.

Also it would great if you could run a few perf commands to compare the systems.

Ideally it would be be best if you could recompile beam with the --build-id linker flag. i.e.

> ./configure LDLAGS="-Wl,--build-id"

I'll try to compile with kerl:
export LDLAGS="-Wl,--build-id" && kerl build git https://github.com/erlang/otp 9bc4a096025254aed157e4753743be61ce1f7489 master

How can I verify that the flag was actually used?

If you do "file path/to/beam.smp" you should get something like:

> file bin/x86_64-unknown-linux-gnu/beam.smp
bin/x86_64-unknown-linux-gnu/beam.smp: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=1a8ff129828d9ed4a8197d47e7731b10015b2456, not stripped

notice the BuildID[sha1] thing.

 

That way it is possible to use "perf archive"[1] to collect the symbols of beam.smp.

> sudo perf stat -d -p $BEAMPID -- sleep 30 2> stat.log
> sudo perf record -g -p $BEAMPID -- sleep 30
> sudo perf archive
> sudo tar czf $BEAMPID.tar.gz stat.log perf.data.tar.bz2 perf.data


Can I do it on a dev machine with a smaller load or does it have to be on a real production server?

If you can observe the same difference in a small system then that is fine.
 
What's the effect of perf (and msacc) on a production system? When I tried to do erlang profiling in the past it crashed my server.

It should be very minimal. I double that you will notice it. If you are unsure, you can change the frequency that perf collects data by adding the "-F NUMBER" flag to "perf report" with a lower than default frequency. The default is 4000.
 
 
and send me the tar.gz for OTP-20 and master.

If you cannot re-compile erts with --build-id, then just do "sudo perf report" and send me a screenshot of the tools shows you.

If you prefer to this off-list, we can do that as well.

Lukas

[1]: If the "perf archive" command does not work (which it doesn't on my machine), just download the script from here: https://elixir.bootlin.com/linux/v3.18/source/tools/perf/perf-archive.sh


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How to test multi-pollset?

pablo platt-3


On Fri, Mar 9, 2018 at 2:34 PM, Lukas Larsson <[hidden email]> wrote:


On Fri, Mar 9, 2018 at 11:58 AM, pablo platt <[hidden email]> wrote:


On Fri, Mar 9, 2018 at 12:38 PM, Lukas Larsson <[hidden email]> wrote:
Hello!

On Thu, Mar 8, 2018 at 8:43 PM, pablo platt <[hidden email]> wrote:
Hello,

Updating with my test.

I've been using master in production for two days on 4 servers.
Each server has 4 vCPUs.
Each server handles about 10K UDP packets per second. Approx 1K incoming packets per second and 9K outgoing packets per second.

I didn't see any issues. Seems to work fine.
 
That's great! Thanks for testing it! 

CPU load in OTP Master 21 increased compared to OTP 20 from 58% to 68%. Does this make sense?

hmm, no not really. I would have expected it to decrease.

Would you mind helping me to figure out why the CPU usage has gone up?

To start with I'd like to have a look at the output of

1> msacc:start(30000), msacc:print().

for both versions in the Erlang shell. Using this I hope to be able to narrow down where the extra CPU time is being spent.

Also it would great if you could run a few perf commands to compare the systems.

Ideally it would be be best if you could recompile beam with the --build-id linker flag. i.e.

> ./configure LDLAGS="-Wl,--build-id"

I'll try to compile with kerl:
export LDLAGS="-Wl,--build-id" && kerl build git https://github.com/erlang/otp 9bc4a096025254aed157e4753743be61ce1f7489 master

How can I verify that the flag was actually used?

If you do "file path/to/beam.smp" you should get something like:

> file bin/x86_64-unknown-linux-gnu/beam.smp
bin/x86_64-unknown-linux-gnu/beam.smp: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=1a8ff129828d9ed4a8197d47e7731b10015b2456, not stripped

notice the BuildID[sha1] thing.

 

That way it is possible to use "perf archive"[1] to collect the symbols of beam.smp.

> sudo perf stat -d -p $BEAMPID -- sleep 30 2> stat.log
> sudo perf record -g -p $BEAMPID -- sleep 30
> sudo perf archive
> sudo tar czf $BEAMPID.tar.gz stat.log perf.data.tar.bz2 perf.data


Can I do it on a dev machine with a smaller load or does it have to be on a real production server?

If you can observe the same difference in a small system then that is fine.
 
What's the effect of perf (and msacc) on a production system? When I tried to do erlang profiling in the past it crashed my server.

It should be very minimal. I double that you will notice it. If you are unsure, you can change the frequency that perf collects data by adding the "-F NUMBER" flag to "perf report" with a lower than default frequency. The default is 4000.
 

I'll try to reproduce on a dev machine and get the msacc and perf reports.
Thanks
 
 
and send me the tar.gz for OTP-20 and master.

If you cannot re-compile erts with --build-id, then just do "sudo perf report" and send me a screenshot of the tools shows you.

If you prefer to this off-list, we can do that as well.

Lukas

[1]: If the "perf archive" command does not work (which it doesn't on my machine), just download the script from here: https://elixir.bootlin.com/linux/v3.18/source/tools/perf/perf-archive.sh


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions




_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions