UDP receive performance

classic Classic list List threaded Threaded
42 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Jonas Falkevik


On Fri, Jun 15, 2018 at 12:07 PM, Jonas Falkevik <[hidden email]> wrote:
>
>
> Interesting... I wonder if maybe it wouldn't be better to solve this problem in the code for realloc so that no copy is done when a realloc of the same size if issued... that way we solve it in all places instead of only in the inet_driver
>

From 444fb00ff2a9d1f40a8c66f48bea1cf3f07ca86c Mon Sep 17 00:00:00 2001
From: jonasf <[hidden email]>
Date: Fri, 15 Jun 2018 15:55:38 +0200
Subject: [PATCH] erts realloc optimization same size

optimize the case when realloc is called with same size
---
 erts/emulator/beam/erl_binary.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/erts/emulator/beam/erl_binary.h b/erts/emulator/beam/erl_binary.h
index 7dfd0c273a9..9ba6aac1a0d 100644
--- a/erts/emulator/beam/erl_binary.h
+++ b/erts/emulator/beam/erl_binary.h
@@ -419,6 +419,8 @@ erts_bin_realloc_fnf(Binary *bp, Uint size)
     ErtsAlcType_t type = (bp->intern.flags & BIN_FLAG_DRV) ? ERTS_ALC_T_DRV_BINARY
 	                                            : ERTS_ALC_T_BINARY;
     ASSERT((bp->intern.flags & BIN_FLAG_MAGIC) == 0);
+    if (bp->orig_size == size)
+	return bp;
     if (bsize < size) /* overflow */
 	return NULL;
     nbp = erts_realloc_fnf(type, (void *) bp, bsize);
@@ -436,6 +438,8 @@ erts_bin_realloc(Binary *bp, Uint size)
     ErtsAlcType_t type = (bp->intern.flags & BIN_FLAG_DRV) ? ERTS_ALC_T_DRV_BINARY
 	                                            : ERTS_ALC_T_BINARY;
     ASSERT((bp->intern.flags & BIN_FLAG_MAGIC) == 0);
+    if (bp->orig_size == size)
+	return bp;
     if (bsize < size) /* overflow */
 	erts_realloc_enomem(type, bp, size);
     nbp = erts_realloc_fnf(type, (void *) bp, bsize);


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Lukas Larsson-8


On Fri, Jun 15, 2018 at 10:54 PM Jonas Falkevik <[hidden email]> wrote:


On Fri, Jun 15, 2018 at 12:07 PM, Jonas Falkevik <[hidden email]> wrote:
>
>
> Interesting... I wonder if maybe it wouldn't be better to solve this problem in the code for realloc so that no copy is done when a realloc of the same size if issued... that way we solve it in all places instead of only in the inet_driver
>



This makes it so that the absolute and relative single block shrink thesholds are respected for reallocs made on a remote scheduler. This should solve the problem that you have found also, through as I still haven't reproduced it I can't test that it actually solves it.

Lukas

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Jonas Falkevik

On Mon, Jun 18, 2018 at 3:57 PM, Lukas Larsson <[hidden email]> wrote:



This makes it so that the absolute and relative single block shrink thesholds are respected for reallocs made on a remote scheduler. This should solve the problem that you have found also, through as I still haven't reproduced it I can't test that it actually solves it.

 
Are you adding the multicast network to the loop back interface? Using some other interface that does not allow multicast traffic?

I have been able to reproduce it in macOS Sierra (Darwin Kernel Version 16.7.0: Fri Apr 27 17:59:46 PDT 2018; root:xnu-3789.73.13~1/RELEASE_X86_64 x86_64)
and also on Ubuntu Linux 16.04 LTS with kernel 4.4.0-112-generic.

Danil Zagoskin, do you have time to try and see if there is any change in the behaviour for you with a patched system?


With the bugfix, the beam is spending most of its time in sched_spin_wait on linux and sched_yield on macOS.

Stats below are from OTP 21 and then OTP21 with bugfix on linux

Erlang/OTP 21 [erts-10.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]

Eshell V10.0  (abort with ^G)
1> udptest:start_sender({239,9,9,9}, 3999).
<0.78.0>
2> [udptest:start_reader({239,9,9,9}, 3999) || _ <- lists:seq(1, 40)].
[<0.80.0>,<0.81.0>,<0.82.0>,<0.83.0>,<0.84.0>,<0.85.0>,
 <0.86.0>,<0.87.0>,<0.88.0>,<0.89.0>,<0.90.0>,<0.91.0>,
 <0.92.0>,<0.93.0>,<0.94.0>,<0.95.0>,<0.96.0>,<0.97.0>,
 <0.98.0>,<0.99.0>,<0.100.0>,<0.101.0>,<0.102.0>,<0.103.0>,
 <0.104.0>,<0.105.0>,<0.106.0>,<0.107.0>,<0.108.0>|...]
3> msacc:start(10000), msacc:print().
Average thread real-time    : 10002138 us
Accumulated system run-time : 62576106 us
Average scheduler run-time  :  7775309 us

        Thread      aux check_io emulator       gc    other     port    sleep

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
       aux( 1)    0.06%    0.96%    0.00%    0.00%    0.07%    0.00%   98.90%
dirty_cpu_( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
      poll( 0)    0.00%    2.63%    0.00%    0.00%    0.00%    0.00%   97.37%
 scheduler( 1)    0.82%    0.00%    0.52%    0.35%    7.33%   67.91%   23.08%
 scheduler( 2)    0.76%    0.00%    0.79%    0.34%    7.30%   68.61%   22.21%
 scheduler( 3)    0.74%    0.00%    0.83%    0.33%    7.17%   68.54%   22.38%
 scheduler( 4)    0.76%    0.00%    0.72%    0.34%    7.38%   68.84%   21.96%
 scheduler( 5)    0.79%    0.00%    0.82%    0.34%    7.21%   68.94%   21.90%
 scheduler( 6)    0.80%    0.00%    0.74%    0.31%    7.15%   68.56%   22.44%
 scheduler( 7)    0.87%    0.00%    0.72%    0.37%    7.24%   68.43%   22.36%
 scheduler( 8)    0.77%    0.00%    0.68%    0.35%    7.41%   69.02%   21.77%

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
           aux    0.06%    0.96%    0.00%    0.00%    0.07%    0.00%   98.90%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
          poll    0.00%    2.63%    0.00%    0.00%    0.00%    0.00%   97.37%
     scheduler    0.79%    0.00%    0.73%    0.34%    7.27%   68.61%   22.26%
ok
4> 


$ ../src/udp_performace_bug_fix/bin/erl
Erlang/OTP 21 [RELEASE CANDIDATE 2] [erts-9.3.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]

Eshell V9.3.2  (abort with ^G)
1> udptest:start_sender({239,9,9,9}, 3999).
<0.78.0>
2> [udptest:start_reader({239,9,9,9}, 3999) || _ <- lists:seq(1, 40)].
[<0.80.0>,<0.81.0>,<0.82.0>,<0.83.0>,<0.84.0>,<0.85.0>,
 <0.86.0>,<0.87.0>,<0.88.0>,<0.89.0>,<0.90.0>,<0.91.0>,
 <0.92.0>,<0.93.0>,<0.94.0>,<0.95.0>,<0.96.0>,<0.97.0>,
 <0.98.0>,<0.99.0>,<0.100.0>,<0.101.0>,<0.102.0>,<0.103.0>,
 <0.104.0>,<0.105.0>,<0.106.0>,<0.107.0>,<0.108.0>|...]
3> msacc:start(10000), msacc:print().
Average thread real-time    : 10000416 us
Accumulated system run-time : 19541445 us
Average scheduler run-time  :  2279822 us

        Thread      aux check_io emulator       gc    other     port    sleep

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
       aux( 1)    0.00%   10.72%    0.00%    0.00%    0.00%    0.00%   89.28%
dirty_cpu_( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
      poll( 0)    0.00%    2.31%    0.00%    0.00%    0.00%    0.00%   97.69%
 scheduler( 1)    0.46%    0.00%    0.43%    0.14%   17.57%    4.07%   77.33%
 scheduler( 2)    0.50%    0.00%    0.44%    0.15%   17.13%    4.21%   77.57%
 scheduler( 3)    0.49%    0.00%    0.47%    0.16%   17.07%    4.35%   77.46%
 scheduler( 4)    0.45%    0.00%    0.42%    0.14%   16.25%    4.02%   78.72%
 scheduler( 5)    0.48%    0.00%    0.42%    0.14%   17.04%    4.04%   77.88%
 scheduler( 6)    0.46%    0.00%    0.39%    0.14%   16.57%    3.77%   78.67%
 scheduler( 7)    0.53%    0.00%    1.88%    0.27%   17.60%    8.22%   71.50%
 scheduler( 8)    0.47%    0.00%    0.42%    0.15%   16.60%    3.86%   78.50%

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
           aux    0.00%   10.72%    0.00%    0.00%    0.00%    0.00%   89.28%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
          poll    0.00%    2.31%    0.00%    0.00%    0.00%    0.00%   97.69%
     scheduler    0.48%    0.00%    0.61%    0.16%   16.98%    4.57%   77.20%
ok

Jonas


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Max Lapshin-2
Wow!  This is a serious change.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Lukas Larsson-8
In reply to this post by Jonas Falkevik
On Wed, Jun 20, 2018 at 1:56 PM Jonas Falkevik <[hidden email]> wrote:
 
Are you adding the multicast network to the loop back interface? Using some other interface that does not allow multicast traffic?


I though I had, but apparently not.... I managed to reproduce the test case now.

I also spent some time staring at the code in the inet_driver and realized what the problem was in there. It would seem that a performance/feature fix for SCTP in R15B inadvertently introduced this behaviour for UDP :( I've pushed a bug fix to that problem in the inet_driver to the same branch. In my tests the scheduler utilization goes form 71% to about 4% with both patches. With the fix in the inet driver, my allocator patch does not really do any difference for this specific testcase, but I'll keep that fix anyways as it is a "good thing". I should probably also add a benchmark for this so that it does not happen again....

I'd also like to add that changing the test-case to use a smaller user-space buffer also has the same effect. So if anyone is running a performance critical UDP server that has set the "recbuf" to a high value, I really recommend that you lower the "buffer" size to something close to the max expected packet size. In the example that Danil provided I applied the following patch:

diff --git a/udptest.erl b/udptest.erl
index 16a1798..4edeef0 100644
--- a/udptest.erl
+++ b/udptest.erl
@@ -33,7 +33,9 @@ send_packet(ID, S, Addr, Port) ->
 
 start_reader(Addr, Port) ->
   GwIP = {0,0,0,0}, % {127,0,0,1},
-  Common = [binary,{reuseaddr,true},{recbuf,2*1024*1024},inet,{read_packets,100},{active,500}],
+  Common = [binary,{reuseaddr,true},
+            {buffer,1500}, %% 1500 is just an example value, don't just copy this. You need to know what you max UDP packet size will be.
+            {recbuf,2*1024*1024},inet,{read_packets,100},{active,500}],
   Multicast = [{multicast_ttl,4},{multicast_loop,true},{ip,Addr},{add_membership,{Addr,GwIP}}],
   Options = Common ++ Multicast,
   spawn(fun() -> run_reader(Port, Options) end).

and the scheduler utilization dropped to about 4% there as well.

Lukas


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Rich Neswold-2
On Wed, Jun 20, 2018 at 12:07 PM Lukas Larsson <[hidden email]> wrote:
So if anyone is running a performance critical UDP server that has set the "recbuf" to a high value, I really recommend that you lower the "buffer" size to something close to the max expected packet size.

That's interesting because the official documentation says:

"It is recommended to have val(buffer) >= max(val(sndbuf),val(recbuf)) to avoid performance issues because of unnecessary copying."

Maybe the documentation is wrong because that doesn't make much sense; sndbuf and recbuf are sent to the kernel and are supposed to be bigger than the user's buffer.

--
Rich

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Lukas Larsson-8
On Wed, Jun 20, 2018 at 10:00 PM Rich Neswold <[hidden email]> wrote:
On Wed, Jun 20, 2018 at 12:07 PM Lukas Larsson <[hidden email]> wrote:
So if anyone is running a performance critical UDP server that has set the "recbuf" to a high value, I really recommend that you lower the "buffer" size to something close to the max expected packet size.

That's interesting because the official documentation says:

"It is recommended to have val(buffer) >= max(val(sndbuf),val(recbuf)) to avoid performance issues because of unnecessary copying."

Maybe the documentation is wrong because that doesn't make much sense; sndbuf and recbuf are sent to the kernel and are supposed to be bigger than the user's buffer.

I think that part of the documentation is mainly written with TCP in mind, not UDP. Also, following the docs works fine if it weren't for the bug uncovered in this mail thread. I'll see what I can do about making the docs better.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Сергей Прохоров-2
In reply to this post by Danil Zagoskin-2
Regarding this suggestion in docs, I had some funny times because I blindly applied this suggestion on an application with large amount of not very active sockets.

On my machine default settings for kernel buffers were
[{sndbuf,87040},{recbuf,372480}]

So, I ended up having 360kb userspace buffer per socket and I had like 10s of thousands of sockets.

And that was very difficult to track down (`erlang:memory()` showed a lots of `binary` memory, while sum of sizes of binaries referenced by processes (via `process_info(P, binary)`) was 2 orders of magnitude smaller).
I found the root cause only intuitively and I still don't know if there are any tool that is able to point me to a right direction. Even `[erlang:port_info(Port, memory) || Port <- erlang:ports()]` didn't show this memory.
I also had prometheus BEAM allocators dashboard like this one https://github.com/deadtrickster/beam-dashboards/blob/master/BEAM-memory_allocators.png and it showed me 90% allocator utilization. So, nothing looked suspicious except just extremely high memory usage.

> On Wed, Jun 20, 2018 at 12:07 PM Lukas Larsson <[hidden email]> wrote:
>
>> So if anyone is running a performance critical UDP server that has set
>> the "recbuf" to a high value, I really recommend that you lower the
>> "buffer" size to something close to the max expected packet size.
>>
>
> That's interesting because the official documentation says:
>
> "It is recommended to have val(buffer) >= max(val(sndbuf),val(recbuf)) to
> avoid performance issues because of unnecessary copying."
>
>
> Maybe the documentation is wrong because that doesn't make much sense;
> sndbuf and recbuf are sent to the kernel and are supposed to be bigger than
> the user's buffer.
>
I think that part of the documentation is mainly written with TCP in mind,
not UDP. Also, following the docs works fine if it weren't for the bug
uncovered in this mail thread. I'll see what I can do about making the docs
better.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Danil Zagoskin-2
In reply to this post by Jonas Falkevik
Hi!
Sorry for the late reply.



The patch works like a charm. Huge performance boost!
Before:
        Thread    alloc      aux      bifbusy_wait check_io emulator      ets       gc  gc_full      nif    other     port     send    sleep   timers
Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
           aux    0.04%    0.05%    0.00%    0.00%    2.02%    0.00%    0.00%    0.00%    0.00%    0.00%    0.01%    0.00%    0.00%   97.88%    0.00%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
          poll    0.04%    0.00%    0.00%    0.00%    6.03%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   93.93%    0.00%
     scheduler   56.63%    0.80%    0.48%   16.13%    0.00%    4.07%    0.00%    1.24%    0.00%    0.00%    2.69%   14.89%    0.00%    2.80%    0.25%


After:
        Thread    alloc      aux      bifbusy_wait check_io emulator      ets       gc  gc_full      nif    other     port     send    sleep   timers
Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
           aux    0.00%    0.00%    0.00%    0.00%   14.57%    0.00%    0.00%    0.00%    0.00%    0.00%    1.24%    0.00%    0.00%   84.19%    0.00%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
          poll    0.00%    0.00%    0.00%    0.00%    3.10%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   96.90%    0.00%
     scheduler    2.86%    0.78%    0.38%   57.22%    0.00%    2.81%    0.00%    0.49%    0.00%    0.00%    7.32%    9.87%    0.00%   18.08%    0.18%

The machine is quad-core i5-6400@2.70GHz with Linux 4.13.0
Load generated with this commands (10 senders, 10 readers on each address):
[udptest:start_sender({239,9,9,X}, 3999) || X <- lists:seq(1,10)],
[udptest:start_reader({239,9,9,X}, 3999) || X <- lists:seq(1,10), _ <- lists:seq(1, 10)].

Thank you for the awesome work!

P.S. For some reason there is an increased check_io time in aux thread, but I think it's not critical and may be related to more often socket reads due to lots of idle time.

On Wed, Jun 20, 2018 at 2:56 PM Jonas Falkevik <[hidden email]> wrote:

On Mon, Jun 18, 2018 at 3:57 PM, Lukas Larsson <[hidden email]> wrote:



This makes it so that the absolute and relative single block shrink thesholds are respected for reallocs made on a remote scheduler. This should solve the problem that you have found also, through as I still haven't reproduced it I can't test that it actually solves it.

 
Are you adding the multicast network to the loop back interface? Using some other interface that does not allow multicast traffic?

I have been able to reproduce it in macOS Sierra (Darwin Kernel Version 16.7.0: Fri Apr 27 17:59:46 PDT 2018; root:xnu-3789.73.13~1/RELEASE_X86_64 x86_64)
and also on Ubuntu Linux 16.04 LTS with kernel 4.4.0-112-generic.

Danil Zagoskin, do you have time to try and see if there is any change in the behaviour for you with a patched system?


With the bugfix, the beam is spending most of its time in sched_spin_wait on linux and sched_yield on macOS.

Stats below are from OTP 21 and then OTP21 with bugfix on linux

Erlang/OTP 21 [erts-10.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]

Eshell V10.0  (abort with ^G)
1> udptest:start_sender({239,9,9,9}, 3999).
<0.78.0>
2> [udptest:start_reader({239,9,9,9}, 3999) || _ <- lists:seq(1, 40)].
[<0.80.0>,<0.81.0>,<0.82.0>,<0.83.0>,<0.84.0>,<0.85.0>,
 <0.86.0>,<0.87.0>,<0.88.0>,<0.89.0>,<0.90.0>,<0.91.0>,
 <0.92.0>,<0.93.0>,<0.94.0>,<0.95.0>,<0.96.0>,<0.97.0>,
 <0.98.0>,<0.99.0>,<0.100.0>,<0.101.0>,<0.102.0>,<0.103.0>,
 <0.104.0>,<0.105.0>,<0.106.0>,<0.107.0>,<0.108.0>|...]
3> msacc:start(10000), msacc:print().
Average thread real-time    : 10002138 us
Accumulated system run-time : 62576106 us
Average scheduler run-time  :  7775309 us

        Thread      aux check_io emulator       gc    other     port    sleep

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
       aux( 1)    0.06%    0.96%    0.00%    0.00%    0.07%    0.00%   98.90%
dirty_cpu_( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
      poll( 0)    0.00%    2.63%    0.00%    0.00%    0.00%    0.00%   97.37%
 scheduler( 1)    0.82%    0.00%    0.52%    0.35%    7.33%   67.91%   23.08%
 scheduler( 2)    0.76%    0.00%    0.79%    0.34%    7.30%   68.61%   22.21%
 scheduler( 3)    0.74%    0.00%    0.83%    0.33%    7.17%   68.54%   22.38%
 scheduler( 4)    0.76%    0.00%    0.72%    0.34%    7.38%   68.84%   21.96%
 scheduler( 5)    0.79%    0.00%    0.82%    0.34%    7.21%   68.94%   21.90%
 scheduler( 6)    0.80%    0.00%    0.74%    0.31%    7.15%   68.56%   22.44%
 scheduler( 7)    0.87%    0.00%    0.72%    0.37%    7.24%   68.43%   22.36%
 scheduler( 8)    0.77%    0.00%    0.68%    0.35%    7.41%   69.02%   21.77%

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
           aux    0.06%    0.96%    0.00%    0.00%    0.07%    0.00%   98.90%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
          poll    0.00%    2.63%    0.00%    0.00%    0.00%    0.00%   97.37%
     scheduler    0.79%    0.00%    0.73%    0.34%    7.27%   68.61%   22.26%
ok
4> 


$ ../src/udp_performace_bug_fix/bin/erl
Erlang/OTP 21 [RELEASE CANDIDATE 2] [erts-9.3.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]

Eshell V9.3.2  (abort with ^G)
1> udptest:start_sender({239,9,9,9}, 3999).
<0.78.0>
2> [udptest:start_reader({239,9,9,9}, 3999) || _ <- lists:seq(1, 40)].
[<0.80.0>,<0.81.0>,<0.82.0>,<0.83.0>,<0.84.0>,<0.85.0>,
 <0.86.0>,<0.87.0>,<0.88.0>,<0.89.0>,<0.90.0>,<0.91.0>,
 <0.92.0>,<0.93.0>,<0.94.0>,<0.95.0>,<0.96.0>,<0.97.0>,
 <0.98.0>,<0.99.0>,<0.100.0>,<0.101.0>,<0.102.0>,<0.103.0>,
 <0.104.0>,<0.105.0>,<0.106.0>,<0.107.0>,<0.108.0>|...]
3> msacc:start(10000), msacc:print().
Average thread real-time    : 10000416 us
Accumulated system run-time : 19541445 us
Average scheduler run-time  :  2279822 us

        Thread      aux check_io emulator       gc    other     port    sleep

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
       aux( 1)    0.00%   10.72%    0.00%    0.00%    0.00%    0.00%   89.28%
dirty_cpu_( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
      poll( 0)    0.00%    2.31%    0.00%    0.00%    0.00%    0.00%   97.69%
 scheduler( 1)    0.46%    0.00%    0.43%    0.14%   17.57%    4.07%   77.33%
 scheduler( 2)    0.50%    0.00%    0.44%    0.15%   17.13%    4.21%   77.57%
 scheduler( 3)    0.49%    0.00%    0.47%    0.16%   17.07%    4.35%   77.46%
 scheduler( 4)    0.45%    0.00%    0.42%    0.14%   16.25%    4.02%   78.72%
 scheduler( 5)    0.48%    0.00%    0.42%    0.14%   17.04%    4.04%   77.88%
 scheduler( 6)    0.46%    0.00%    0.39%    0.14%   16.57%    3.77%   78.67%
 scheduler( 7)    0.53%    0.00%    1.88%    0.27%   17.60%    8.22%   71.50%
 scheduler( 8)    0.47%    0.00%    0.42%    0.15%   16.60%    3.86%   78.50%

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
           aux    0.00%   10.72%    0.00%    0.00%    0.00%    0.00%   89.28%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
          poll    0.00%    2.31%    0.00%    0.00%    0.00%    0.00%   97.69%
     scheduler    0.48%    0.00%    0.61%    0.16%   16.98%    4.57%   77.20%
ok

Jonas



--
Danil Zagoskin | [hidden email]

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Danil Zagoskin-2
In reply to this post by Lukas Larsson-8
Hmm, seems like the value should be very close to expected size.
Setting buffer to 2048 did not change anything significantly.
The results in my previous message are with {buffer, 2048} option left from earlier experiments.

On Wed, Jun 20, 2018 at 8:06 PM Lukas Larsson <[hidden email]> wrote:
On Wed, Jun 20, 2018 at 1:56 PM Jonas Falkevik <[hidden email]> wrote:
 
Are you adding the multicast network to the loop back interface? Using some other interface that does not allow multicast traffic?


I though I had, but apparently not.... I managed to reproduce the test case now.

I also spent some time staring at the code in the inet_driver and realized what the problem was in there. It would seem that a performance/feature fix for SCTP in R15B inadvertently introduced this behaviour for UDP :( I've pushed a bug fix to that problem in the inet_driver to the same branch. In my tests the scheduler utilization goes form 71% to about 4% with both patches. With the fix in the inet driver, my allocator patch does not really do any difference for this specific testcase, but I'll keep that fix anyways as it is a "good thing". I should probably also add a benchmark for this so that it does not happen again....

I'd also like to add that changing the test-case to use a smaller user-space buffer also has the same effect. So if anyone is running a performance critical UDP server that has set the "recbuf" to a high value, I really recommend that you lower the "buffer" size to something close to the max expected packet size. In the example that Danil provided I applied the following patch:

diff --git a/udptest.erl b/udptest.erl
index 16a1798..4edeef0 100644
--- a/udptest.erl
+++ b/udptest.erl
@@ -33,7 +33,9 @@ send_packet(ID, S, Addr, Port) ->
 
 start_reader(Addr, Port) ->
   GwIP = {0,0,0,0}, % {127,0,0,1},
-  Common = [binary,{reuseaddr,true},{recbuf,2*1024*1024},inet,{read_packets,100},{active,500}],
+  Common = [binary,{reuseaddr,true},
+            {buffer,1500}, %% 1500 is just an example value, don't just copy this. You need to know what you max UDP packet size will be.
+            {recbuf,2*1024*1024},inet,{read_packets,100},{active,500}],
   Multicast = [{multicast_ttl,4},{multicast_loop,true},{ip,Addr},{add_membership,{Addr,GwIP}}],
   Options = Common ++ Multicast,
   spawn(fun() -> run_reader(Port, Options) end).

and the scheduler utilization dropped to about 4% there as well.

Lukas



--
Danil Zagoskin | [hidden email]

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Michael Stellar-2
Looks good, does these patch already being committed upstream?

On Thu, Jun 21, 2018 at 8:56 PM Danil Zagoskin <[hidden email]> wrote:
Hmm, seems like the value should be very close to expected size.
Setting buffer to 2048 did not change anything significantly.
The results in my previous message are with {buffer, 2048} option left from earlier experiments.

On Wed, Jun 20, 2018 at 8:06 PM Lukas Larsson <[hidden email]> wrote:
On Wed, Jun 20, 2018 at 1:56 PM Jonas Falkevik <[hidden email]> wrote:
 
Are you adding the multicast network to the loop back interface? Using some other interface that does not allow multicast traffic?


I though I had, but apparently not.... I managed to reproduce the test case now.

I also spent some time staring at the code in the inet_driver and realized what the problem was in there. It would seem that a performance/feature fix for SCTP in R15B inadvertently introduced this behaviour for UDP :( I've pushed a bug fix to that problem in the inet_driver to the same branch. In my tests the scheduler utilization goes form 71% to about 4% with both patches. With the fix in the inet driver, my allocator patch does not really do any difference for this specific testcase, but I'll keep that fix anyways as it is a "good thing". I should probably also add a benchmark for this so that it does not happen again....

I'd also like to add that changing the test-case to use a smaller user-space buffer also has the same effect. So if anyone is running a performance critical UDP server that has set the "recbuf" to a high value, I really recommend that you lower the "buffer" size to something close to the max expected packet size. In the example that Danil provided I applied the following patch:

diff --git a/udptest.erl b/udptest.erl
index 16a1798..4edeef0 100644
--- a/udptest.erl
+++ b/udptest.erl
@@ -33,7 +33,9 @@ send_packet(ID, S, Addr, Port) ->
 
 start_reader(Addr, Port) ->
   GwIP = {0,0,0,0}, % {127,0,0,1},
-  Common = [binary,{reuseaddr,true},{recbuf,2*1024*1024},inet,{read_packets,100},{active,500}],
+  Common = [binary,{reuseaddr,true},
+            {buffer,1500}, %% 1500 is just an example value, don't just copy this. You need to know what you max UDP packet size will be.
+            {recbuf,2*1024*1024},inet,{read_packets,100},{active,500}],
   Multicast = [{multicast_ttl,4},{multicast_loop,true},{ip,Addr},{add_membership,{Addr,GwIP}}],
   Options = Common ++ Multicast,
   spawn(fun() -> run_reader(Port, Options) end).

and the scheduler utilization dropped to about 4% there as well.

Lukas



--
Danil Zagoskin | [hidden email]
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

pablo platt-3
Does this bug only affect multicast UDP or also normal UDP?

What values should I use for buffer and recbuf in a UDP socket receiving 1Mbps with 1500 Bytes MTU?
I thought I need N*MTU recbuf so the pid will be able to handle a small burst of UDP packets.
This is what I currently have:
gen_udp:open(0, [binary, {active, once}, {recbuf, 16*1024}])


On Fri, Jun 22, 2018 at 4:07 AM, Michael Stellar <[hidden email]> wrote:
Looks good, does these patch already being committed upstream?

On Thu, Jun 21, 2018 at 8:56 PM Danil Zagoskin <[hidden email]> wrote:
Hmm, seems like the value should be very close to expected size.
Setting buffer to 2048 did not change anything significantly.
The results in my previous message are with {buffer, 2048} option left from earlier experiments.

On Wed, Jun 20, 2018 at 8:06 PM Lukas Larsson <[hidden email]> wrote:
On Wed, Jun 20, 2018 at 1:56 PM Jonas Falkevik <[hidden email]> wrote:
 
Are you adding the multicast network to the loop back interface? Using some other interface that does not allow multicast traffic?


I though I had, but apparently not.... I managed to reproduce the test case now.

I also spent some time staring at the code in the inet_driver and realized what the problem was in there. It would seem that a performance/feature fix for SCTP in R15B inadvertently introduced this behaviour for UDP :( I've pushed a bug fix to that problem in the inet_driver to the same branch. In my tests the scheduler utilization goes form 71% to about 4% with both patches. With the fix in the inet driver, my allocator patch does not really do any difference for this specific testcase, but I'll keep that fix anyways as it is a "good thing". I should probably also add a benchmark for this so that it does not happen again....

I'd also like to add that changing the test-case to use a smaller user-space buffer also has the same effect. So if anyone is running a performance critical UDP server that has set the "recbuf" to a high value, I really recommend that you lower the "buffer" size to something close to the max expected packet size. In the example that Danil provided I applied the following patch:

diff --git a/udptest.erl b/udptest.erl
index 16a1798..4edeef0 100644
--- a/udptest.erl
+++ b/udptest.erl
@@ -33,7 +33,9 @@ send_packet(ID, S, Addr, Port) ->
 
 start_reader(Addr, Port) ->
   GwIP = {0,0,0,0}, % {127,0,0,1},
-  Common = [binary,{reuseaddr,true},{recbuf,2*1024*1024},inet,{read_packets,100},{active,500}],
+  Common = [binary,{reuseaddr,true},
+            {buffer,1500}, %% 1500 is just an example value, don't just copy this. You need to know what you max UDP packet size will be.
+            {recbuf,2*1024*1024},inet,{read_packets,100},{active,500}],
   Multicast = [{multicast_ttl,4},{multicast_loop,true},{ip,Addr},{add_membership,{Addr,GwIP}}],
   Options = Common ++ Multicast,
   spawn(fun() -> run_reader(Port, Options) end).

and the scheduler utilization dropped to about 4% there as well.

Lukas



--
Danil Zagoskin | [hidden email]
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Lukas Larsson-8
In reply to this post by Michael Stellar-2

On Fri, 22 Jun 2018, 03:08 Michael Stellar, <[hidden email]> wrote:
Looks good, does these patch already being committed upstream?

No, and it will be a couple of weeks until it is. I'll post here when it is merged.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Lukas Larsson-8
In reply to this post by pablo platt-3
On Fri, 22 Jun 2018, 09:24 pablo platt, <[hidden email]> wrote:
Does this bug only affect multicast UDP or also normal UDP?

It effects reception of all UDP messages.


What values should I use for buffer and recbuf in a UDP socket receiving 1Mbps with 1500 Bytes MTU?
I thought I need N*MTU recbuf so the pid will be able to handle a small burst of UDP packets.

There is no need to set buffer larger than your MTU. The recbuf however should be large enough to handle any bursts that may happen. 

This is what I currently have:
gen_udp:open(0, [binary, {active, once}, {recbuf, 16*1024}])

I doubt that you will see any large performance differences by setting the buffer size to, let's say 2*1024. But it will as always depend on your application.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Valentin Micic-5
In reply to this post by Lukas Larsson-8

On 20 Jun 2018, at 7:06 PM, Lukas Larsson wrote:

On Wed, Jun 20, 2018 at 1:56 PM Jonas Falkevik <[hidden email]> wrote:
 
Are you adding the multicast network to the loop back interface? Using some other interface that does not allow multicast traffic?


I though I had, but apparently not.... I managed to reproduce the test case now.

I also spent some time staring at the code in the inet_driver and realized what the problem was in there. It would seem that a performance/feature fix for SCTP in R15B inadvertently introduced this behaviour for UDP :( I've pushed a bug fix to that problem in the inet_driver to the same branch. In my tests the scheduler utilization goes form 71% to about 4% with both patches. With the fix in the inet driver, my allocator patch does not really do any difference for this specific testcase, but I'll keep that fix anyways as it is a "good thing". I should probably also add a benchmark for this so that it does not happen again....

I'd also like to add that changing the test-case to use a smaller user-space buffer also has the same effect. So if anyone is running a performance critical UDP server that has set the "recbuf" to a high value, I really recommend that you lower the "buffer" size to something close to the max expected packet size. In the example that Danil provided I applied the following patch:

diff --git a/udptest.erl b/udptest.erl
index 16a1798..4edeef0 100644
--- a/udptest.erl
+++ b/udptest.erl
@@ -33,7 +33,9 @@ send_packet(ID, S, Addr, Port) ->
 
 start_reader(Addr, Port) ->
   GwIP = {0,0,0,0}, % {127,0,0,1},
-  Common = [binary,{reuseaddr,true},{recbuf,2*1024*1024},inet,{read_packets,100},{active,500}],
+  Common = [binary,{reuseaddr,true},
+            {buffer,1500}, %% 1500 is just an example value, don't just copy this. You need to know what you max UDP packet size will be.
+            {recbuf,2*1024*1024},inet,{read_packets,100},{active,500}],
   Multicast = [{multicast_ttl,4},{multicast_loop,true},{ip,Addr},{add_membership,{Addr,GwIP}}],
   Options = Common ++ Multicast,
   spawn(fun() -> run_reader(Port, Options) end).

and the scheduler utilization dropped to about 4% there as well.

Lukas

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

Juts one observation about inet:setopts/2  which may not be present in R21, but certainly was in earlier releases.

When setting the "user buffer", the order of attributes specification seems to matter, e.g.

inet:setopts( S, [{buffer, 512}, {recbuf, 16#1ffff}, {sndbuf, 16#1ffff}]).  

will set the buffer to 512, whilst:

inet:setopts( S, [{recbuf, 16#1ffff}, {sndbuf, 16#1ffff}, {buffer, 512}]). 

will leave it as 16#1ffff! (e.g. set to recbuf size).

Kind regards

V/


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Valentin Micic-5
In reply to this post by Lukas Larsson-8

On 22 Jun 2018, at 10:27 AM, Lukas Larsson wrote:

On Fri, 22 Jun 2018, 09:24 pablo platt, <[hidden email]> wrote:
Does this bug only affect multicast UDP or also normal UDP?

It effects reception of all UDP messages.


What values should I use for buffer and recbuf in a UDP socket receiving 1Mbps with 1500 Bytes MTU?
I thought I need N*MTU recbuf so the pid will be able to handle a small burst of UDP packets.

There is no need to set buffer larger than your MTU. The recbuf however should be large enough to handle any bursts that may happen. 

This is what I currently have:
gen_udp:open(0, [binary, {active, once}, {recbuf, 16*1024}])

I doubt that you will see any large performance differences by setting the buffer size to, let's say 2*1024. But it will as always depend on your application.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

A high volume testing we have performed using earlier versions (R15+), yielded far less (or rather none) packet drops when recbuf has been set to a higher value.
Cannot recall what were the hardware specs, but I remember that without adjusting recbuf, the packet drops were experienced at the rate that was about 10,000 packet/sec.
With adjustment for recbuf (arbitrarily set to 1MB), we were able to push 70,000 packets/sec without a drop throughout tests.
The testing was not performed to measure impact on CPU, but rather to establish that packet drops were function of recbuf size (and/or sender's sndbuf size).
We have concluded this to be the case for the packet sizes not exceeding MTU.

Kind regards

V/


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Lukas Larsson-8
In reply to this post by Lukas Larsson-8
On Fri, Jun 22, 2018 at 10:14 AM Lukas Larsson <[hidden email]> wrote:

On Fri, 22 Jun 2018, 03:08 Michael Stellar, <[hidden email]> wrote:
Looks good, does these patch already being committed upstream?

No, and it will be a couple of weeks until it is. I'll post here when it is merged.

fyi: I just opened a PR with the fixes if you want to track this issue https://github.com/erlang/otp/pull/1876
 

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Max Lapshin-2
Will try to check it, because right now we are running  separate thread with libevent for capturing udp =(

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

Lukas Larsson-8
On Tue, Jul 17, 2018 at 11:10 AM Max Lapshin <[hidden email]> wrote:
Will try to check it, because right now we are running  separate thread with libevent for capturing udp =(

I suppose you didn't find time to test it yet? If you need any further help or find that it still isn't fast enough for you, I would be very interested in trying to make it fast enough.

I stumbled upon the recvmmsg syscall today, which could replace the read_packets option and significantly decrease the number of roundtrips into the kernel that you have to make.

Lukas

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: UDP receive performance

pablo platt-3
The PR is merged. Will the fix be in the 21 branch and next point release?

On Mon, Jul 30, 2018 at 6:08 PM, Lukas Larsson <[hidden email]> wrote:
On Tue, Jul 17, 2018 at 11:10 AM Max Lapshin <[hidden email]> wrote:
Will try to check it, because right now we are running  separate thread with libevent for capturing udp =(

I suppose you didn't find time to test it yet? If you need any further help or find that it still isn't fast enough for you, I would be very interested in trying to make it fast enough.

I stumbled upon the recvmmsg syscall today, which could replace the read_packets option and significantly decrease the number of roundtrips into the kernel that you have to make.

Lukas

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
123