Dirty CPU schedulers stuck at zero utilization

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Dirty CPU schedulers stuck at zero utilization

Jesse Stimpson
Hello,

We run a large scale WebRTC-based VOIP application, and currently we are evaluating whether or not dirty CPU schedulers can help us get more consistent performance with some critical NIFs. We've created a test that mimics our production load and have run it on AWS EC2 instances of various sizes. When dirty CPU schedulers are working as expected, we do see measurable improvement in performance, but unfortunately we have seen that they can drop to 0 utilization. They do not recover from this state.

The timings from 2 of our 12 NIFs are in the vicinity of the 1 msec rule of thumb, so we have experimented with them both on and off dirty CPU schedulers. We've left the other NIFs on the traditional schedulers. More detailed NIF timings can be found below.

At the end of this message is the concerning data (from recon:scheduler_usage/1).We have schedulers 1-36 as the traditional schedulers bound to the first hardware thread of each core, and schedulers 37-72 are dirty CPU schedulers. Based on mpstat it appears the dirty CPU schedulers are not bound to threads. After about 10 minutes of running the load test, schedulers 42, 43, 44, 61, 63, 64, 66, 68, 72 went to at 0 utilization and stayed there indefinitely. As time goes on, more dirty CPU schedulers enter this 0 utilization state. Schedulers 1-36 do not exhibit this behaviour. We have reproduced this consistently on other EC2 c5 instances of various sizes. 

We have tried various beam adjustments, but we have seen the same behaviour each time. Some examples of things we have tried:
+sbwt very_long +sbwtdcpu very_long +swt very_low +swtdcpu very_low
+sfwi 500

Is there anything else that we can try to avoid this 0-utilization issue? We are happy to provide more data if needed.

NIF: webrtc_acm_add_data_10ms

median: 606 usec

90 percentile: 1230 usec

99 percentile: 2285 usec

99.9 percentile: 3223 usec


NIF: webrtc_acm_playout_data_10ms

median: 98 usec

90 percentile: 371 usec

99 percentile: 410 usec

99.9 percentile: 449 usec


EC2 instance: c5.18xlarge


OS: Linux 4.4.0-1073-aws #83-Ubuntu SMP Sat Nov 17 00:26:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


beam flags: +P 4194304 +sbt db +SP 50:50 +sub true +A 128 +K true


Erlang banner: Erlang/OTP 21 [erts-10.2] [source] [64-bit] [smp:36:36] [ds:36:36:10] [async-threads:128] [hipe]


cpu_topology:
[{node,[{processor,[{core,[{thread,{logical,0}},
                           {thread,{logical,36}}]},
                    {core,[{thread,{logical,1}},{thread,{logical,37}}]},
                    {core,[{thread,{logical,2}},{thread,{logical,38}}]},
                    {core,[{thread,{logical,3}},{thread,{logical,39}}]},
                    {core,[{thread,{logical,4}},{thread,{logical,40}}]},
                    {core,[{thread,{logical,5}},{thread,{logical,41}}]},
                    {core,[{thread,{logical,6}},{thread,{logical,42}}]},
                    {core,[{thread,{logical,7}},{thread,{logical,43}}]},
                    {core,[{thread,{logical,8}},{thread,{logical,44}}]},
                    {core,[{thread,{logical,9}},{thread,{logical,45}}]},
                    {core,[{thread,{logical,10}},{thread,{logical,46}}]},
                    {core,[{thread,{logical,11}},{thread,{logical,47}}]},
                    {core,[{thread,{logical,12}},{thread,{logical,48}}]},
                    {core,[{thread,{logical,13}},{thread,{logical,49}}]},
                    {core,[{thread,{logical,14}},{thread,{logical,50}}]},
                    {core,[{thread,{logical,15}},{thread,{logical,51}}]},
                    {core,[{thread,{logical,16}},{thread,{logical,52}}]},
                    {core,[{thread,{logical,17}},{thread,{logical,53}}]}]}]},
 {node,[{processor,[{core,[{thread,{logical,18}},
                           {thread,{logical,54}}]},
                    {core,[{thread,{logical,19}},{thread,{logical,55}}]},
                    {core,[{thread,{logical,20}},{thread,{logical,56}}]},
                    {core,[{thread,{logical,21}},{thread,{logical,57}}]},
                    {core,[{thread,{logical,22}},{thread,{logical,58}}]},
                    {core,[{thread,{logical,23}},{thread,{logical,59}}]},
                    {core,[{thread,{logical,24}},{thread,{logical,60}}]},
                    {core,[{thread,{logical,25}},{thread,{logical,61}}]},
                    {core,[{thread,{logical,26}},{thread,{logical,62}}]},
                    {core,[{thread,{logical,27}},{thread,{logical,63}}]},
                    {core,[{thread,{logical,28}},{thread,{logical,64}}]},
                    {core,[{thread,{logical,29}},{thread,{logical,65}}]},
                    {core,[{thread,{logical,30}},{thread,{logical,66}}]},
                    {core,[{thread,{logical,31}},{thread,{logical,67}}]},
                    {core,[{thread,{logical,32}},{thread,{logical,68}}]},
                    {core,[{thread,{logical,33}},{thread,{logical,69}}]},
                    {core,[{thread,{logical,34}},{thread,{logical,70}}]},
                    {core,[{thread,{logical,35}},{thread,{logical,71}}]}]}]}]

scheduler_bindings: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29,30,31,32,33,34,35}

CPU utilization (from mpstat) during our load test:

Linux 4.4.0-1073-aws (isw-core-x-qa-awsoh02)    01/04/2019 _x86_64_ (72 CPU)


08:55:15 PM  CPU %usr %nice    %sys %iowait %irq %soft  %steal %guest %gnice %idle

08:55:15 PM  all 51.58  0.00 1.64 0.01    0.00 0.42 0.00 0.00    0.00 46.36

08:55:15 PM    0 62.34 0.00    2.69 0.06 0.00   0.00 0.00 0.00 0.00   34.90

08:55:15 PM    1 62.84 0.00    2.58 0.02 0.00   0.00 0.00 0.00 0.00   34.55

08:55:15 PM    2 62.04 0.00    2.52 0.00 0.00   0.00 0.00 0.00 0.00   35.44

08:55:15 PM    3 62.17 0.00    2.62 0.00 0.00   0.00 0.00 0.00 0.00   35.21

08:55:15 PM    4 62.14 0.00    2.54 0.02 0.00   0.00 0.00 0.00 0.00   35.30

08:55:15 PM    5 62.06 0.00    2.51 0.00 0.00   0.00 0.00 0.00 0.00   35.43

08:55:15 PM    6 61.93 0.00    2.58 0.01 0.00   0.00 0.02 0.00 0.00   35.46

08:55:15 PM    7 61.99 0.00    2.54 0.00 0.00   0.00 0.00 0.00 0.00   35.48

08:55:15 PM    8 62.01 0.00    2.52 0.00 0.00   0.00 0.00 0.00 0.00   35.47

08:55:15 PM    9 61.88 0.00    2.49 0.00 0.00   0.00 0.00 0.00 0.00   35.63

08:55:15 PM   10 62.18 0.00    2.48 0.00 0.00  0.00 0.00 0.00 0.00   35.34

08:55:15 PM   11 61.89 0.00    2.47 0.00 0.00  0.00 0.00 0.00 0.00   35.64

08:55:15 PM   12 61.98 0.00    2.50 0.00 0.00  0.00 0.00 0.00 0.00   35.52

08:55:15 PM   13 61.80 0.00    2.54 0.00 0.00  0.00 0.00 0.00 0.00   35.66

08:55:15 PM   14 61.96 0.00    2.56 0.00 0.00  0.00 0.00 0.00 0.00   35.48

08:55:15 PM   15 62.11 0.00    2.57 0.00 0.00  0.00 0.00 0.00 0.00   35.31

08:55:15 PM   16 62.03 0.00    2.55 0.00 0.00  0.00 0.00 0.00 0.00   35.42

08:55:15 PM   17 61.90 0.00    2.53 0.00 0.00  0.00 0.00 0.00 0.00   35.57

08:55:15 PM   18 58.36 0.00    2.41 0.09 0.00  3.72 0.00 0.00 0.00   35.43

08:55:15 PM   19 58.29 0.00    2.45 0.02 0.00  4.00 0.00 0.00 0.00   35.24

08:55:15 PM   20 58.45 0.00    2.36 0.02 0.00  3.73 0.00 0.00 0.00   35.44

08:55:15 PM   21 58.56 0.00    2.34 0.04 0.00  3.54 0.00 0.00 0.00   35.51

08:55:15 PM   22 58.12 0.00    2.46 0.00 0.00  3.84 0.00 0.00 0.00   35.58

08:55:15 PM   23 58.24 0.00    2.36 0.01 0.00  3.82 0.00 0.00 0.00   35.57

08:55:15 PM   24 58.84 0.00    2.42 0.00 0.00  3.60 0.00 0.00 0.00   35.14

08:55:15 PM   25 58.35 0.00    2.43 0.00 0.00  3.72 0.00 0.00 0.00   35.51

08:55:15 PM   26 62.09 0.00    2.47 0.00 0.00  0.00 0.00 0.00 0.00   35.45

08:55:15 PM   27 62.09 0.00    2.46 0.00 0.00  0.00 0.00 0.00 0.00   35.45

08:55:15 PM   28 62.15 0.00    2.48 0.00 0.00  0.00 0.00 0.00 0.00   35.37

08:55:15 PM   29 62.09 0.00    2.46 0.00 0.00  0.00 0.00 0.00 0.00   35.45

08:55:15 PM   30 62.74 0.00    2.45 0.00 0.00  0.00 0.00 0.00 0.00   34.81

08:55:15 PM   31 62.14 0.00    2.47 0.00 0.00  0.00 0.00 0.00 0.00   35.39

08:55:15 PM   32 62.35 0.00    2.52 0.00 0.00  0.00 0.00 0.00 0.00   35.13

08:55:15 PM   33 62.13 0.00    2.42 0.00 0.00  0.00 0.00 0.00 0.00   35.46

08:55:15 PM   34 62.08 0.00    2.47 0.00 0.00  0.00 0.00 0.00 0.00   35.45

08:55:15 PM   35 62.15 0.00    2.49 0.00 0.00  0.00 0.00 0.00 0.00   35.35

08:55:15 PM   36 43.19 0.00    0.80 0.00 0.00  0.00 0.00 0.00 0.00   56.01

08:55:15 PM   37 43.09 0.00    0.83 0.01 0.00  0.00 0.00 0.00 0.00   56.06

08:55:15 PM   38 42.59 0.00    0.78 0.01 0.00  0.00 0.00 0.00 0.00   56.63

08:55:15 PM   39 42.60 0.00    0.82 0.00 0.00  0.00 0.00 0.00 0.00   56.58

08:55:15 PM   40 42.94 0.00    0.76 0.02 0.00  0.00 0.00 0.00 0.00   56.28

08:55:15 PM   41 42.11 0.00    0.76 0.01 0.00  0.00 0.00 0.00 0.00   57.13

08:55:15 PM   42 43.26 0.00    0.85 0.01 0.00  0.00 0.00 0.00 0.00   55.88

08:55:15 PM   43 42.23 0.00    0.77 0.01 0.00  0.00 0.00 0.00 0.00   56.99

08:55:15 PM   44 42.24 0.00    0.80 0.00 0.00  0.00 0.00 0.00 0.00   56.95

08:55:15 PM   45 41.65 0.00    0.79 0.01 0.00  0.00 0.00 0.00 0.00   57.55

08:55:15 PM   46 41.76 0.00    0.77 0.01 0.00  0.00 0.00 0.00 0.00   57.46

08:55:15 PM   47 42.66 0.00    0.82 0.00 0.00  0.00 0.00 0.00 0.00   56.52

08:55:15 PM   48 41.83 0.00    0.76 0.01 0.00  0.00 0.00 0.00 0.00   57.40

08:55:15 PM   49 42.04 0.00    0.82 0.00 0.00  0.00 0.00 0.00 0.00   57.13

08:55:15 PM   50 41.55 0.00    0.78 0.01 0.00  0.00 0.00 0.00 0.00   57.67

08:55:15 PM   51 42.37 0.00    0.78 0.00 0.00  0.00 0.00 0.00 0.00   56.84

08:55:15 PM   52 41.66 0.00    0.77 0.00 0.00  0.00 0.00 0.00 0.00   57.57

08:55:15 PM   53 41.77 0.00    0.79 0.00 0.00  0.00 0.00 0.00 0.00   57.45

08:55:15 PM   54 40.33 0.00    0.88 0.01 0.00  0.00 0.00 0.00 0.00   58.79

08:55:15 PM   55 41.62 0.00    0.78 0.01 0.00  0.00 0.00 0.00 0.00   57.59

08:55:15 PM   56 41.51 0.00    0.82 0.00 0.00  0.00 0.00 0.00 0.00   57.67

08:55:15 PM   57 40.35 0.00    0.76 0.00 0.00  0.00 0.00 0.00 0.00   58.89

08:55:15 PM   58 41.92 0.00    0.79 0.01 0.00  0.00 0.00 0.00 0.00   57.28

08:55:15 PM   59 41.48 0.00    0.80 0.00 0.00  0.00 0.00 0.00 0.00   57.72

08:55:15 PM   60 40.83 0.00    0.75 0.01 0.00  0.00 0.00 0.00 0.00   58.41

08:55:15 PM   61 40.95 0.00    0.78 0.01 0.00  0.00 0.00 0.00 0.00   58.26

08:55:15 PM   62 40.85 0.00    0.80 0.01 0.00  0.00 0.00 0.00 0.00   58.35

08:55:15 PM   63 42.11 0.00    0.77 0.01 0.00  0.00 0.00 0.00 0.00   57.12

08:55:15 PM   64 42.22 0.00    0.78 0.01 0.00  0.00 0.00 0.00 0.00   56.99

08:55:15 PM   65 41.55 0.00    0.78 0.00 0.00  0.00 0.00 0.00 0.00   57.67

08:55:15 PM   66 41.68 0.00    0.81 0.00 0.00  0.00 0.00 0.00 0.00   57.51

08:55:15 PM   67 41.06 0.00    0.87 0.01 0.00  0.00 0.00 0.00 0.00   58.07

08:55:15 PM   68 41.80 0.00    0.79 0.00 0.00  0.00 0.00 0.00 0.00   57.41

08:55:15 PM   69 41.38 0.00    0.75 0.00 0.00  0.00 0.00 0.00 0.00   57.87

08:55:15 PM   70 42.17 0.00    0.75 0.00 0.00  0.00 0.00 0.00 0.00   57.08

08:55:15 PM   71 41.26 0.00    0.81 0.00 0.00  0.00 0.01 0.00 0.00   57.92



1> rp(recon:scheduler_usage(1000)).

[{1,0.8581586716045433},

{2,0.8607597847736659},

{3,0.8501543449289042},

{4,0.8619718426658023},

{5,0.8660226578705742},

{6,0.8555763933364806},

{7,0.8557195972097537},

{8,0.8668284625654983},

{9,0.858588675506532},

{10,0.8650501369640686},

{11,0.8615557214275474},

{12,0.8523392789683996},

{13,0.8469329348407398},

{14,0.8615637534991641},

{15,0.8591930739986557},

{16,0.8578596164201188},

{17,0.8549696086049403},

{18,0.8677098230685564},

{19,0.8556328340054443},

{20,0.8642542459479063},

{21,0.8670655179064826},

{22,0.8562638308551557},

{23,0.8511202058055239},

{24,0.8697339264798696},

{25,0.8594823462601853},

{26,0.8634550008001265},

{27,0.8568590130172198},

{28,0.864025757952233},

{29,0.8575614770622326},

{30,0.8611760239143281},

{31,0.8589789125669829},

{32,0.8537754468447621},

{33,0.8675538912040871},

{34,0.8607522803519515},

{35,0.8639389099445753},

{36,0.8743759133171144},

{37,0.9526507413773687},

{38,0.948925611917577},

{39,0.9511904612810083},

{40,0.9581015421004746},

{41,0.9534977201467516},

{42,0.0},

{43,0.0},

{44,0.0},

{45,0.9522518499948341},

{46,0.9453306755503674},

{47,0.9586374601140047},

{48,0.9574611845671873},

{49,0.950914304057791},

{50,0.9582124294398642},

{51,0.9448074463900029},

{52,0.9474771999511407},

{53,0.9450778098401229},

{54,0.0},

{55,0.9478167911711937},

{56,0.9612587860764551},

{57,0.943468696358506},

{58,0.9509498642900455},

{59,0.9603595134495414},

{60,0.9517397912369205},

{61,0.0},

{62,0.9587029756970658},

{63,0.0},

{64,0.0},

{65,0.9526167902065421},

{66,0.0},

{67,0.9475668680715805},

{68,0.0},

{69,0.9630301804839606},

{70,0.9475359592879946},

{71,0.9541718747681374},

{72,0.0}]



Thanks for your time,

Jesse Stimpson

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Jonas Falkevik
Hi,
This sounds like the issue I bumped into.
After some time, many of the dirty schedulers are not performing any work anymore.
Which was related to a corrupted list of sleeping dirty schedulers.

I haven’t found what triggered it in my case (maybe a GC?). For me it took approx. 1 day before it happened.

Doing Multi Scheduler Block can trigger the problem with the list of sleeping dirty schedulers which gets corrupted.
The sleepers list can contain only one or some schedulers. That list is later used to wake up schedulers when there is work to do.
For each enqueued dirty-cpu or dirty-io job, it wakes the first scheduler in the sleepers list.
But if this list only point to a few schedulers only those are woken up to pick jobs from the queue.

Since fixing that specific problem I have not seen the collapse of dirty schedulers anymore.

It would be great if you can test and see if it fixes your problem as well, https://github.com/erlang/otp/pull/2027
Any feedback on the bug fix is appreciated.

/Jonas

> On 8 Jan 2019, at 17:13, Jesse Stimpson <[hidden email]> wrote:
>
> Hello,
>
> We run a large scale WebRTC-based VOIP application, and currently we are evaluating whether or not dirty CPU schedulers can help us get more consistent performance with some critical NIFs. We've created a test that mimics our production load and have run it on AWS EC2 instances of various sizes. When dirty CPU schedulers are working as expected, we do see measurable improvement in performance, but unfortunately we have seen that they can drop to 0 utilization. They do not recover from this state.
>
> The timings from 2 of our 12 NIFs are in the vicinity of the 1 msec rule of thumb, so we have experimented with them both on and off dirty CPU schedulers. We've left the other NIFs on the traditional schedulers. More detailed NIF timings can be found below.
>
> At the end of this message is the concerning data (from recon:scheduler_usage/1).We have schedulers 1-36 as the traditional schedulers bound to the first hardware thread of each core, and schedulers 37-72 are dirty CPU schedulers. Based on mpstat it appears the dirty CPU schedulers are not bound to threads. After about 10 minutes of running the load test, schedulers 42, 43, 44, 61, 63, 64, 66, 68, 72 went to at 0 utilization and stayed there indefinitely. As time goes on, more dirty CPU schedulers enter this 0 utilization state. Schedulers 1-36 do not exhibit this behaviour. We have reproduced this consistently on other EC2 c5 instances of various sizes.
>
> We have tried various beam adjustments, but we have seen the same behaviour each time. Some examples of things we have tried:
> +sbwt very_long +sbwtdcpu very_long +swt very_low +swtdcpu very_low
> +sfwi 500
>
> Is there anything else that we can try to avoid this 0-utilization issue? We are happy to provide more data if needed.
>
> NIF: webrtc_acm_add_data_10ms
> median: 606 usec
> 90 percentile: 1230 usec
> 99 percentile: 2285 usec
> 99.9 percentile: 3223 usec
>
> NIF: webrtc_acm_playout_data_10ms
> median: 98 usec
> 90 percentile: 371 usec
> 99 percentile: 410 usec
> 99.9 percentile: 449 usec
>
> EC2 instance: c5.18xlarge
>
> OS: Linux 4.4.0-1073-aws #83-Ubuntu SMP Sat Nov 17 00:26:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> beam flags: +P 4194304 +sbt db +SP 50:50 +sub true +A 128 +K true
>
> Erlang banner: Erlang/OTP 21 [erts-10.2] [source] [64-bit] [smp:36:36] [ds:36:36:10] [async-threads:128] [hipe]
>
> cpu_topology:
> [{node,[{processor,[{core,[{thread,{logical,0}},
>                            {thread,{logical,36}}]},
>                     {core,[{thread,{logical,1}},{thread,{logical,37}}]},
>                     {core,[{thread,{logical,2}},{thread,{logical,38}}]},
>                     {core,[{thread,{logical,3}},{thread,{logical,39}}]},
>                     {core,[{thread,{logical,4}},{thread,{logical,40}}]},
>                     {core,[{thread,{logical,5}},{thread,{logical,41}}]},
>                     {core,[{thread,{logical,6}},{thread,{logical,42}}]},
>                     {core,[{thread,{logical,7}},{thread,{logical,43}}]},
>                     {core,[{thread,{logical,8}},{thread,{logical,44}}]},
>                     {core,[{thread,{logical,9}},{thread,{logical,45}}]},
>                     {core,[{thread,{logical,10}},{thread,{logical,46}}]},
>                     {core,[{thread,{logical,11}},{thread,{logical,47}}]},
>                     {core,[{thread,{logical,12}},{thread,{logical,48}}]},
>                     {core,[{thread,{logical,13}},{thread,{logical,49}}]},
>                     {core,[{thread,{logical,14}},{thread,{logical,50}}]},
>                     {core,[{thread,{logical,15}},{thread,{logical,51}}]},
>                     {core,[{thread,{logical,16}},{thread,{logical,52}}]},
>                     {core,[{thread,{logical,17}},{thread,{logical,53}}]}]}]},
>  {node,[{processor,[{core,[{thread,{logical,18}},
>                            {thread,{logical,54}}]},
>                     {core,[{thread,{logical,19}},{thread,{logical,55}}]},
>                     {core,[{thread,{logical,20}},{thread,{logical,56}}]},
>                     {core,[{thread,{logical,21}},{thread,{logical,57}}]},
>                     {core,[{thread,{logical,22}},{thread,{logical,58}}]},
>                     {core,[{thread,{logical,23}},{thread,{logical,59}}]},
>                     {core,[{thread,{logical,24}},{thread,{logical,60}}]},
>                     {core,[{thread,{logical,25}},{thread,{logical,61}}]},
>                     {core,[{thread,{logical,26}},{thread,{logical,62}}]},
>                     {core,[{thread,{logical,27}},{thread,{logical,63}}]},
>                     {core,[{thread,{logical,28}},{thread,{logical,64}}]},
>                     {core,[{thread,{logical,29}},{thread,{logical,65}}]},
>                     {core,[{thread,{logical,30}},{thread,{logical,66}}]},
>                     {core,[{thread,{logical,31}},{thread,{logical,67}}]},
>                     {core,[{thread,{logical,32}},{thread,{logical,68}}]},
>                     {core,[{thread,{logical,33}},{thread,{logical,69}}]},
>                     {core,[{thread,{logical,34}},{thread,{logical,70}}]},
>                     {core,[{thread,{logical,35}},{thread,{logical,71}}]}]}]}]
>
> scheduler_bindings: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
>  23,24,25,26,27,28,29,30,31,32,33,34,35}
>
> CPU utilization (from mpstat) during our load test:
> Linux 4.4.0-1073-aws (isw-core-x-qa-awsoh02)    01/04/2019      _x86_64_        (72 CPU)
>
> 08:55:15 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> 08:55:15 PM  all   51.58    0.00    1.64    0.01    0.00    0.42    0.00    0.00    0.00   46.36
> 08:55:15 PM    0   62.34    0.00    2.69    0.06    0.00    0.00    0.00    0.00    0.00   34.90
> 08:55:15 PM    1   62.84    0.00    2.58    0.02    0.00    0.00    0.00    0.00    0.00   34.55
> 08:55:15 PM    2   62.04    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.44
> 08:55:15 PM    3   62.17    0.00    2.62    0.00    0.00    0.00    0.00    0.00    0.00   35.21
> 08:55:15 PM    4   62.14    0.00    2.54    0.02    0.00    0.00    0.00    0.00    0.00   35.30
> 08:55:15 PM    5   62.06    0.00    2.51    0.00    0.00    0.00    0.00    0.00    0.00   35.43
> 08:55:15 PM    6   61.93    0.00    2.58    0.01    0.00    0.00    0.02    0.00    0.00   35.46
> 08:55:15 PM    7   61.99    0.00    2.54    0.00    0.00    0.00    0.00    0.00    0.00   35.48
> 08:55:15 PM    8   62.01    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.47
> 08:55:15 PM    9   61.88    0.00    2.49    0.00    0.00    0.00    0.00    0.00    0.00   35.63
> 08:55:15 PM   10   62.18    0.00    2.48    0.00    0.00    0.00    0.00    0.00    0.00   35.34
> 08:55:15 PM   11   61.89    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.64
> 08:55:15 PM   12   61.98    0.00    2.50    0.00    0.00    0.00    0.00    0.00    0.00   35.52
> 08:55:15 PM   13   61.80    0.00    2.54    0.00    0.00    0.00    0.00    0.00    0.00   35.66
> 08:55:15 PM   14   61.96    0.00    2.56    0.00    0.00    0.00    0.00    0.00    0.00   35.48
> 08:55:15 PM   15   62.11    0.00    2.57    0.00    0.00    0.00    0.00    0.00    0.00   35.31
> 08:55:15 PM   16   62.03    0.00    2.55    0.00    0.00    0.00    0.00    0.00    0.00   35.42
> 08:55:15 PM   17   61.90    0.00    2.53    0.00    0.00    0.00    0.00    0.00    0.00   35.57
> 08:55:15 PM   18   58.36    0.00    2.41    0.09    0.00    3.72    0.00    0.00    0.00   35.43
> 08:55:15 PM   19   58.29    0.00    2.45    0.02    0.00    4.00    0.00    0.00    0.00   35.24
> 08:55:15 PM   20   58.45    0.00    2.36    0.02    0.00    3.73    0.00    0.00    0.00   35.44
> 08:55:15 PM   21   58.56    0.00    2.34    0.04    0.00    3.54    0.00    0.00    0.00   35.51
> 08:55:15 PM   22   58.12    0.00    2.46    0.00    0.00    3.84    0.00    0.00    0.00   35.58
> 08:55:15 PM   23   58.24    0.00    2.36    0.01    0.00    3.82    0.00    0.00    0.00   35.57
> 08:55:15 PM   24   58.84    0.00    2.42    0.00    0.00    3.60    0.00    0.00    0.00   35.14
> 08:55:15 PM   25   58.35    0.00    2.43    0.00    0.00    3.72    0.00    0.00    0.00   35.51
> 08:55:15 PM   26   62.09    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   27   62.09    0.00    2.46    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   28   62.15    0.00    2.48    0.00    0.00    0.00    0.00    0.00    0.00   35.37
> 08:55:15 PM   29   62.09    0.00    2.46    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   30   62.74    0.00    2.45    0.00    0.00    0.00    0.00    0.00    0.00   34.81
> 08:55:15 PM   31   62.14    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.39
> 08:55:15 PM   32   62.35    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.13
> 08:55:15 PM   33   62.13    0.00    2.42    0.00    0.00    0.00    0.00    0.00    0.00   35.46
> 08:55:15 PM   34   62.08    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   35   62.15    0.00    2.49    0.00    0.00    0.00    0.00    0.00    0.00   35.35
> 08:55:15 PM   36   43.19    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   56.01
> 08:55:15 PM   37   43.09    0.00    0.83    0.01    0.00    0.00    0.00    0.00    0.00   56.06
> 08:55:15 PM   38   42.59    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   56.63
> 08:55:15 PM   39   42.60    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   56.58
> 08:55:15 PM   40   42.94    0.00    0.76    0.02    0.00    0.00    0.00    0.00    0.00   56.28
> 08:55:15 PM   41   42.11    0.00    0.76    0.01    0.00    0.00    0.00    0.00    0.00   57.13
> 08:55:15 PM   42   43.26    0.00    0.85    0.01    0.00    0.00    0.00    0.00    0.00   55.88
> 08:55:15 PM   43   42.23    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   56.99
> 08:55:15 PM   44   42.24    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   56.95
> 08:55:15 PM   45   41.65    0.00    0.79    0.01    0.00    0.00    0.00    0.00    0.00   57.55
> 08:55:15 PM   46   41.76    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   57.46
> 08:55:15 PM   47   42.66    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   56.52
> 08:55:15 PM   48   41.83    0.00    0.76    0.01    0.00    0.00    0.00    0.00    0.00   57.40
> 08:55:15 PM   49   42.04    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   57.13
> 08:55:15 PM   50   41.55    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   51   42.37    0.00    0.78    0.00    0.00    0.00    0.00    0.00    0.00   56.84
> 08:55:15 PM   52   41.66    0.00    0.77    0.00    0.00    0.00    0.00    0.00    0.00   57.57
> 08:55:15 PM   53   41.77    0.00    0.79    0.00    0.00    0.00    0.00    0.00    0.00   57.45
> 08:55:15 PM   54   40.33    0.00    0.88    0.01    0.00    0.00    0.00    0.00    0.00   58.79
> 08:55:15 PM   55   41.62    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   57.59
> 08:55:15 PM   56   41.51    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   57   40.35    0.00    0.76    0.00    0.00    0.00    0.00    0.00    0.00   58.89
> 08:55:15 PM   58   41.92    0.00    0.79    0.01    0.00    0.00    0.00    0.00    0.00   57.28
> 08:55:15 PM   59   41.48    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   57.72
> 08:55:15 PM   60   40.83    0.00    0.75    0.01    0.00    0.00    0.00    0.00    0.00   58.41
> 08:55:15 PM   61   40.95    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   58.26
> 08:55:15 PM   62   40.85    0.00    0.80    0.01    0.00    0.00    0.00    0.00    0.00   58.35
> 08:55:15 PM   63   42.11    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   57.12
> 08:55:15 PM   64   42.22    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   56.99
> 08:55:15 PM   65   41.55    0.00    0.78    0.00    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   66   41.68    0.00    0.81    0.00    0.00    0.00    0.00    0.00    0.00   57.51
> 08:55:15 PM   67   41.06    0.00    0.87    0.01    0.00    0.00    0.00    0.00    0.00   58.07
> 08:55:15 PM   68   41.80    0.00    0.79    0.00    0.00    0.00    0.00    0.00    0.00   57.41
> 08:55:15 PM   69   41.38    0.00    0.75    0.00    0.00    0.00    0.00    0.00    0.00   57.87
> 08:55:15 PM   70   42.17    0.00    0.75    0.00    0.00    0.00    0.00    0.00    0.00   57.08
> 08:55:15 PM   71   41.26    0.00    0.81    0.00    0.00    0.00    0.01    0.00    0.00   57.92
>
>
> 1> rp(recon:scheduler_usage(1000)).
> [{1,0.8581586716045433},
>  {2,0.8607597847736659},
>  {3,0.8501543449289042},
>  {4,0.8619718426658023},
>  {5,0.8660226578705742},
>  {6,0.8555763933364806},
>  {7,0.8557195972097537},
>  {8,0.8668284625654983},
>  {9,0.858588675506532},
>  {10,0.8650501369640686},
>  {11,0.8615557214275474},
>  {12,0.8523392789683996},
>  {13,0.8469329348407398},
>  {14,0.8615637534991641},
>  {15,0.8591930739986557},
>  {16,0.8578596164201188},
>  {17,0.8549696086049403},
>  {18,0.8677098230685564},
>  {19,0.8556328340054443},
>  {20,0.8642542459479063},
>  {21,0.8670655179064826},
>  {22,0.8562638308551557},
>  {23,0.8511202058055239},
>  {24,0.8697339264798696},
>  {25,0.8594823462601853},
>  {26,0.8634550008001265},
>  {27,0.8568590130172198},
>  {28,0.864025757952233},
>  {29,0.8575614770622326},
>  {30,0.8611760239143281},
>  {31,0.8589789125669829},
>  {32,0.8537754468447621},
>  {33,0.8675538912040871},
>  {34,0.8607522803519515},
>  {35,0.8639389099445753},
>  {36,0.8743759133171144},
>  {37,0.9526507413773687},
>  {38,0.948925611917577},
>  {39,0.9511904612810083},
>  {40,0.9581015421004746},
>  {41,0.9534977201467516},
>  {42,0.0},
>  {43,0.0},
>  {44,0.0},
>  {45,0.9522518499948341},
>  {46,0.9453306755503674},
>  {47,0.9586374601140047},
>  {48,0.9574611845671873},
>  {49,0.950914304057791},
>  {50,0.9582124294398642},
>  {51,0.9448074463900029},
>  {52,0.9474771999511407},
>  {53,0.9450778098401229},
>  {54,0.0},
>  {55,0.9478167911711937},
>  {56,0.9612587860764551},
>  {57,0.943468696358506},
>  {58,0.9509498642900455},
>  {59,0.9603595134495414},
>  {60,0.9517397912369205},
>  {61,0.0},
>  {62,0.9587029756970658},
>  {63,0.0},
>  {64,0.0},
>  {65,0.9526167902065421},
>  {66,0.0},
>  {67,0.9475668680715805},
>  {68,0.0},
>  {69,0.9630301804839606},
>  {70,0.9475359592879946},
>  {71,0.9541718747681374},
>  {72,0.0}]
>
>
> Thanks for your time,
>
> Jesse Stimpson
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Jesse Stimpson
Rikard and Jonas,

Thank you for your response. With the patch at OTP PR-2027, we do observe that the dirty CPU schedulers do not drop to zero utilization, but in addition we see that the performance of the traditional schedulers suffers.  Specifically, when we apply our load test, scheduler usage toggles in between the following two states. I've snipped the output of similar values for readability.

[{1,0.8933020805456553},

...

{36,0.8882068476700309},

{37,0.4932569692588096},

...

{72,0.49842881346116297}]


and


[{1,1.0},

...

{36,1.0},

{37,0.42909590032946493},

...

{72,0.4641339106204525}]


When applying this same load to the un-patched Erlang 21.2, the traditional schedulers are utilized at about 75%. So it appears that the 2027 patch may put undesirable pressure on the schedulers as a trade-off for consistent performance from the dirty CPU schedulers. Are you observing a similar response on your system?

Thanks,
Jesse

On Wed, Jan 9, 2019 at 3:33 AM Jonas Falkevik <[hidden email]> wrote:
Hi,
This sounds like the issue I bumped into.
After some time, many of the dirty schedulers are not performing any work anymore.
Which was related to a corrupted list of sleeping dirty schedulers.

I haven’t found what triggered it in my case (maybe a GC?). For me it took approx. 1 day before it happened.

Doing Multi Scheduler Block can trigger the problem with the list of sleeping dirty schedulers which gets corrupted.
The sleepers list can contain only one or some schedulers. That list is later used to wake up schedulers when there is work to do.
For each enqueued dirty-cpu or dirty-io job, it wakes the first scheduler in the sleepers list.
But if this list only point to a few schedulers only those are woken up to pick jobs from the queue.

Since fixing that specific problem I have not seen the collapse of dirty schedulers anymore.

It would be great if you can test and see if it fixes your problem as well, https://github.com/erlang/otp/pull/2027
Any feedback on the bug fix is appreciated.

/Jonas

> On 8 Jan 2019, at 17:13, Jesse Stimpson <[hidden email]> wrote:
>
> Hello,
>
> We run a large scale WebRTC-based VOIP application, and currently we are evaluating whether or not dirty CPU schedulers can help us get more consistent performance with some critical NIFs. We've created a test that mimics our production load and have run it on AWS EC2 instances of various sizes. When dirty CPU schedulers are working as expected, we do see measurable improvement in performance, but unfortunately we have seen that they can drop to 0 utilization. They do not recover from this state.
>
> The timings from 2 of our 12 NIFs are in the vicinity of the 1 msec rule of thumb, so we have experimented with them both on and off dirty CPU schedulers. We've left the other NIFs on the traditional schedulers. More detailed NIF timings can be found below.
>
> At the end of this message is the concerning data (from recon:scheduler_usage/1).We have schedulers 1-36 as the traditional schedulers bound to the first hardware thread of each core, and schedulers 37-72 are dirty CPU schedulers. Based on mpstat it appears the dirty CPU schedulers are not bound to threads. After about 10 minutes of running the load test, schedulers 42, 43, 44, 61, 63, 64, 66, 68, 72 went to at 0 utilization and stayed there indefinitely. As time goes on, more dirty CPU schedulers enter this 0 utilization state. Schedulers 1-36 do not exhibit this behaviour. We have reproduced this consistently on other EC2 c5 instances of various sizes.
>
> We have tried various beam adjustments, but we have seen the same behaviour each time. Some examples of things we have tried:
> +sbwt very_long +sbwtdcpu very_long +swt very_low +swtdcpu very_low
> +sfwi 500
>
> Is there anything else that we can try to avoid this 0-utilization issue? We are happy to provide more data if needed.
>
> NIF: webrtc_acm_add_data_10ms
> median: 606 usec
> 90 percentile: 1230 usec
> 99 percentile: 2285 usec
> 99.9 percentile: 3223 usec
>
> NIF: webrtc_acm_playout_data_10ms
> median: 98 usec
> 90 percentile: 371 usec
> 99 percentile: 410 usec
> 99.9 percentile: 449 usec
>
> EC2 instance: c5.18xlarge
>
> OS: Linux 4.4.0-1073-aws #83-Ubuntu SMP Sat Nov 17 00:26:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> beam flags: +P 4194304 +sbt db +SP 50:50 +sub true +A 128 +K true
>
> Erlang banner: Erlang/OTP 21 [erts-10.2] [source] [64-bit] [smp:36:36] [ds:36:36:10] [async-threads:128] [hipe]
>
> cpu_topology:
> [{node,[{processor,[{core,[{thread,{logical,0}},
>                            {thread,{logical,36}}]},
>                     {core,[{thread,{logical,1}},{thread,{logical,37}}]},
>                     {core,[{thread,{logical,2}},{thread,{logical,38}}]},
>                     {core,[{thread,{logical,3}},{thread,{logical,39}}]},
>                     {core,[{thread,{logical,4}},{thread,{logical,40}}]},
>                     {core,[{thread,{logical,5}},{thread,{logical,41}}]},
>                     {core,[{thread,{logical,6}},{thread,{logical,42}}]},
>                     {core,[{thread,{logical,7}},{thread,{logical,43}}]},
>                     {core,[{thread,{logical,8}},{thread,{logical,44}}]},
>                     {core,[{thread,{logical,9}},{thread,{logical,45}}]},
>                     {core,[{thread,{logical,10}},{thread,{logical,46}}]},
>                     {core,[{thread,{logical,11}},{thread,{logical,47}}]},
>                     {core,[{thread,{logical,12}},{thread,{logical,48}}]},
>                     {core,[{thread,{logical,13}},{thread,{logical,49}}]},
>                     {core,[{thread,{logical,14}},{thread,{logical,50}}]},
>                     {core,[{thread,{logical,15}},{thread,{logical,51}}]},
>                     {core,[{thread,{logical,16}},{thread,{logical,52}}]},
>                     {core,[{thread,{logical,17}},{thread,{logical,53}}]}]}]},
>  {node,[{processor,[{core,[{thread,{logical,18}},
>                            {thread,{logical,54}}]},
>                     {core,[{thread,{logical,19}},{thread,{logical,55}}]},
>                     {core,[{thread,{logical,20}},{thread,{logical,56}}]},
>                     {core,[{thread,{logical,21}},{thread,{logical,57}}]},
>                     {core,[{thread,{logical,22}},{thread,{logical,58}}]},
>                     {core,[{thread,{logical,23}},{thread,{logical,59}}]},
>                     {core,[{thread,{logical,24}},{thread,{logical,60}}]},
>                     {core,[{thread,{logical,25}},{thread,{logical,61}}]},
>                     {core,[{thread,{logical,26}},{thread,{logical,62}}]},
>                     {core,[{thread,{logical,27}},{thread,{logical,63}}]},
>                     {core,[{thread,{logical,28}},{thread,{logical,64}}]},
>                     {core,[{thread,{logical,29}},{thread,{logical,65}}]},
>                     {core,[{thread,{logical,30}},{thread,{logical,66}}]},
>                     {core,[{thread,{logical,31}},{thread,{logical,67}}]},
>                     {core,[{thread,{logical,32}},{thread,{logical,68}}]},
>                     {core,[{thread,{logical,33}},{thread,{logical,69}}]},
>                     {core,[{thread,{logical,34}},{thread,{logical,70}}]},
>                     {core,[{thread,{logical,35}},{thread,{logical,71}}]}]}]}]
>
> scheduler_bindings: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
>  23,24,25,26,27,28,29,30,31,32,33,34,35}
>
> CPU utilization (from mpstat) during our load test:
> Linux 4.4.0-1073-aws (isw-core-x-qa-awsoh02)    01/04/2019      _x86_64_        (72 CPU)
>
> 08:55:15 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> 08:55:15 PM  all   51.58    0.00    1.64    0.01    0.00    0.42    0.00    0.00    0.00   46.36
> 08:55:15 PM    0   62.34    0.00    2.69    0.06    0.00    0.00    0.00    0.00    0.00   34.90
> 08:55:15 PM    1   62.84    0.00    2.58    0.02    0.00    0.00    0.00    0.00    0.00   34.55
> 08:55:15 PM    2   62.04    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.44
> 08:55:15 PM    3   62.17    0.00    2.62    0.00    0.00    0.00    0.00    0.00    0.00   35.21
> 08:55:15 PM    4   62.14    0.00    2.54    0.02    0.00    0.00    0.00    0.00    0.00   35.30
> 08:55:15 PM    5   62.06    0.00    2.51    0.00    0.00    0.00    0.00    0.00    0.00   35.43
> 08:55:15 PM    6   61.93    0.00    2.58    0.01    0.00    0.00    0.02    0.00    0.00   35.46
> 08:55:15 PM    7   61.99    0.00    2.54    0.00    0.00    0.00    0.00    0.00    0.00   35.48
> 08:55:15 PM    8   62.01    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.47
> 08:55:15 PM    9   61.88    0.00    2.49    0.00    0.00    0.00    0.00    0.00    0.00   35.63
> 08:55:15 PM   10   62.18    0.00    2.48    0.00    0.00    0.00    0.00    0.00    0.00   35.34
> 08:55:15 PM   11   61.89    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.64
> 08:55:15 PM   12   61.98    0.00    2.50    0.00    0.00    0.00    0.00    0.00    0.00   35.52
> 08:55:15 PM   13   61.80    0.00    2.54    0.00    0.00    0.00    0.00    0.00    0.00   35.66
> 08:55:15 PM   14   61.96    0.00    2.56    0.00    0.00    0.00    0.00    0.00    0.00   35.48
> 08:55:15 PM   15   62.11    0.00    2.57    0.00    0.00    0.00    0.00    0.00    0.00   35.31
> 08:55:15 PM   16   62.03    0.00    2.55    0.00    0.00    0.00    0.00    0.00    0.00   35.42
> 08:55:15 PM   17   61.90    0.00    2.53    0.00    0.00    0.00    0.00    0.00    0.00   35.57
> 08:55:15 PM   18   58.36    0.00    2.41    0.09    0.00    3.72    0.00    0.00    0.00   35.43
> 08:55:15 PM   19   58.29    0.00    2.45    0.02    0.00    4.00    0.00    0.00    0.00   35.24
> 08:55:15 PM   20   58.45    0.00    2.36    0.02    0.00    3.73    0.00    0.00    0.00   35.44
> 08:55:15 PM   21   58.56    0.00    2.34    0.04    0.00    3.54    0.00    0.00    0.00   35.51
> 08:55:15 PM   22   58.12    0.00    2.46    0.00    0.00    3.84    0.00    0.00    0.00   35.58
> 08:55:15 PM   23   58.24    0.00    2.36    0.01    0.00    3.82    0.00    0.00    0.00   35.57
> 08:55:15 PM   24   58.84    0.00    2.42    0.00    0.00    3.60    0.00    0.00    0.00   35.14
> 08:55:15 PM   25   58.35    0.00    2.43    0.00    0.00    3.72    0.00    0.00    0.00   35.51
> 08:55:15 PM   26   62.09    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   27   62.09    0.00    2.46    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   28   62.15    0.00    2.48    0.00    0.00    0.00    0.00    0.00    0.00   35.37
> 08:55:15 PM   29   62.09    0.00    2.46    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   30   62.74    0.00    2.45    0.00    0.00    0.00    0.00    0.00    0.00   34.81
> 08:55:15 PM   31   62.14    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.39
> 08:55:15 PM   32   62.35    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.13
> 08:55:15 PM   33   62.13    0.00    2.42    0.00    0.00    0.00    0.00    0.00    0.00   35.46
> 08:55:15 PM   34   62.08    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   35   62.15    0.00    2.49    0.00    0.00    0.00    0.00    0.00    0.00   35.35
> 08:55:15 PM   36   43.19    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   56.01
> 08:55:15 PM   37   43.09    0.00    0.83    0.01    0.00    0.00    0.00    0.00    0.00   56.06
> 08:55:15 PM   38   42.59    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   56.63
> 08:55:15 PM   39   42.60    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   56.58
> 08:55:15 PM   40   42.94    0.00    0.76    0.02    0.00    0.00    0.00    0.00    0.00   56.28
> 08:55:15 PM   41   42.11    0.00    0.76    0.01    0.00    0.00    0.00    0.00    0.00   57.13
> 08:55:15 PM   42   43.26    0.00    0.85    0.01    0.00    0.00    0.00    0.00    0.00   55.88
> 08:55:15 PM   43   42.23    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   56.99
> 08:55:15 PM   44   42.24    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   56.95
> 08:55:15 PM   45   41.65    0.00    0.79    0.01    0.00    0.00    0.00    0.00    0.00   57.55
> 08:55:15 PM   46   41.76    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   57.46
> 08:55:15 PM   47   42.66    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   56.52
> 08:55:15 PM   48   41.83    0.00    0.76    0.01    0.00    0.00    0.00    0.00    0.00   57.40
> 08:55:15 PM   49   42.04    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   57.13
> 08:55:15 PM   50   41.55    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   51   42.37    0.00    0.78    0.00    0.00    0.00    0.00    0.00    0.00   56.84
> 08:55:15 PM   52   41.66    0.00    0.77    0.00    0.00    0.00    0.00    0.00    0.00   57.57
> 08:55:15 PM   53   41.77    0.00    0.79    0.00    0.00    0.00    0.00    0.00    0.00   57.45
> 08:55:15 PM   54   40.33    0.00    0.88    0.01    0.00    0.00    0.00    0.00    0.00   58.79
> 08:55:15 PM   55   41.62    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   57.59
> 08:55:15 PM   56   41.51    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   57   40.35    0.00    0.76    0.00    0.00    0.00    0.00    0.00    0.00   58.89
> 08:55:15 PM   58   41.92    0.00    0.79    0.01    0.00    0.00    0.00    0.00    0.00   57.28
> 08:55:15 PM   59   41.48    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   57.72
> 08:55:15 PM   60   40.83    0.00    0.75    0.01    0.00    0.00    0.00    0.00    0.00   58.41
> 08:55:15 PM   61   40.95    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   58.26
> 08:55:15 PM   62   40.85    0.00    0.80    0.01    0.00    0.00    0.00    0.00    0.00   58.35
> 08:55:15 PM   63   42.11    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   57.12
> 08:55:15 PM   64   42.22    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   56.99
> 08:55:15 PM   65   41.55    0.00    0.78    0.00    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   66   41.68    0.00    0.81    0.00    0.00    0.00    0.00    0.00    0.00   57.51
> 08:55:15 PM   67   41.06    0.00    0.87    0.01    0.00    0.00    0.00    0.00    0.00   58.07
> 08:55:15 PM   68   41.80    0.00    0.79    0.00    0.00    0.00    0.00    0.00    0.00   57.41
> 08:55:15 PM   69   41.38    0.00    0.75    0.00    0.00    0.00    0.00    0.00    0.00   57.87
> 08:55:15 PM   70   42.17    0.00    0.75    0.00    0.00    0.00    0.00    0.00    0.00   57.08
> 08:55:15 PM   71   41.26    0.00    0.81    0.00    0.00    0.00    0.01    0.00    0.00   57.92
>
>
> 1> rp(recon:scheduler_usage(1000)).
> [{1,0.8581586716045433},
>  {2,0.8607597847736659},
>  {3,0.8501543449289042},
>  {4,0.8619718426658023},
>  {5,0.8660226578705742},
>  {6,0.8555763933364806},
>  {7,0.8557195972097537},
>  {8,0.8668284625654983},
>  {9,0.858588675506532},
>  {10,0.8650501369640686},
>  {11,0.8615557214275474},
>  {12,0.8523392789683996},
>  {13,0.8469329348407398},
>  {14,0.8615637534991641},
>  {15,0.8591930739986557},
>  {16,0.8578596164201188},
>  {17,0.8549696086049403},
>  {18,0.8677098230685564},
>  {19,0.8556328340054443},
>  {20,0.8642542459479063},
>  {21,0.8670655179064826},
>  {22,0.8562638308551557},
>  {23,0.8511202058055239},
>  {24,0.8697339264798696},
>  {25,0.8594823462601853},
>  {26,0.8634550008001265},
>  {27,0.8568590130172198},
>  {28,0.864025757952233},
>  {29,0.8575614770622326},
>  {30,0.8611760239143281},
>  {31,0.8589789125669829},
>  {32,0.8537754468447621},
>  {33,0.8675538912040871},
>  {34,0.8607522803519515},
>  {35,0.8639389099445753},
>  {36,0.8743759133171144},
>  {37,0.9526507413773687},
>  {38,0.948925611917577},
>  {39,0.9511904612810083},
>  {40,0.9581015421004746},
>  {41,0.9534977201467516},
>  {42,0.0},
>  {43,0.0},
>  {44,0.0},
>  {45,0.9522518499948341},
>  {46,0.9453306755503674},
>  {47,0.9586374601140047},
>  {48,0.9574611845671873},
>  {49,0.950914304057791},
>  {50,0.9582124294398642},
>  {51,0.9448074463900029},
>  {52,0.9474771999511407},
>  {53,0.9450778098401229},
>  {54,0.0},
>  {55,0.9478167911711937},
>  {56,0.9612587860764551},
>  {57,0.943468696358506},
>  {58,0.9509498642900455},
>  {59,0.9603595134495414},
>  {60,0.9517397912369205},
>  {61,0.0},
>  {62,0.9587029756970658},
>  {63,0.0},
>  {64,0.0},
>  {65,0.9526167902065421},
>  {66,0.0},
>  {67,0.9475668680715805},
>  {68,0.0},
>  {69,0.9630301804839606},
>  {70,0.9475359592879946},
>  {71,0.9541718747681374},
>  {72,0.0}]
>
>
> Thanks for your time,
>
> Jesse Stimpson
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions



--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Jonas Falkevik
Hi Jesse,
we have not observe the behaviour you see.
Can you share microstate accounting stats for the different cases?
Maybe that could shed some light on what the schedulers are doing?
1> msacc:start(1000), msacc:print().

And maybe also running perf top as root on one of the scheduler threads that are fully utilised can be interesting.
# perf top --tid <traditional scheduler thread id>

/Jonas

On Wed, Jan 9, 2019 at 10:52 PM Jesse Stimpson <[hidden email]> wrote:
Rikard and Jonas,

Thank you for your response. With the patch at OTP PR-2027, we do observe that the dirty CPU schedulers do not drop to zero utilization, but in addition we see that the performance of the traditional schedulers suffers.  Specifically, when we apply our load test, scheduler usage toggles in between the following two states. I've snipped the output of similar values for readability.

[{1,0.8933020805456553},

...

{36,0.8882068476700309},

{37,0.4932569692588096},

...

{72,0.49842881346116297}]


and


[{1,1.0},

...

{36,1.0},

{37,0.42909590032946493},

...

{72,0.4641339106204525}]


When applying this same load to the un-patched Erlang 21.2, the traditional schedulers are utilized at about 75%. So it appears that the 2027 patch may put undesirable pressure on the schedulers as a trade-off for consistent performance from the dirty CPU schedulers. Are you observing a similar response on your system?

Thanks,
Jesse

On Wed, Jan 9, 2019 at 3:33 AM Jonas Falkevik <[hidden email]> wrote:
Hi,
This sounds like the issue I bumped into.
After some time, many of the dirty schedulers are not performing any work anymore.
Which was related to a corrupted list of sleeping dirty schedulers.

I haven’t found what triggered it in my case (maybe a GC?). For me it took approx. 1 day before it happened.

Doing Multi Scheduler Block can trigger the problem with the list of sleeping dirty schedulers which gets corrupted.
The sleepers list can contain only one or some schedulers. That list is later used to wake up schedulers when there is work to do.
For each enqueued dirty-cpu or dirty-io job, it wakes the first scheduler in the sleepers list.
But if this list only point to a few schedulers only those are woken up to pick jobs from the queue.

Since fixing that specific problem I have not seen the collapse of dirty schedulers anymore.

It would be great if you can test and see if it fixes your problem as well, https://github.com/erlang/otp/pull/2027
Any feedback on the bug fix is appreciated.

/Jonas

> On 8 Jan 2019, at 17:13, Jesse Stimpson <[hidden email]> wrote:
>
> Hello,
>
> We run a large scale WebRTC-based VOIP application, and currently we are evaluating whether or not dirty CPU schedulers can help us get more consistent performance with some critical NIFs. We've created a test that mimics our production load and have run it on AWS EC2 instances of various sizes. When dirty CPU schedulers are working as expected, we do see measurable improvement in performance, but unfortunately we have seen that they can drop to 0 utilization. They do not recover from this state.
>
> The timings from 2 of our 12 NIFs are in the vicinity of the 1 msec rule of thumb, so we have experimented with them both on and off dirty CPU schedulers. We've left the other NIFs on the traditional schedulers. More detailed NIF timings can be found below.
>
> At the end of this message is the concerning data (from recon:scheduler_usage/1).We have schedulers 1-36 as the traditional schedulers bound to the first hardware thread of each core, and schedulers 37-72 are dirty CPU schedulers. Based on mpstat it appears the dirty CPU schedulers are not bound to threads. After about 10 minutes of running the load test, schedulers 42, 43, 44, 61, 63, 64, 66, 68, 72 went to at 0 utilization and stayed there indefinitely. As time goes on, more dirty CPU schedulers enter this 0 utilization state. Schedulers 1-36 do not exhibit this behaviour. We have reproduced this consistently on other EC2 c5 instances of various sizes.
>
> We have tried various beam adjustments, but we have seen the same behaviour each time. Some examples of things we have tried:
> +sbwt very_long +sbwtdcpu very_long +swt very_low +swtdcpu very_low
> +sfwi 500
>
> Is there anything else that we can try to avoid this 0-utilization issue? We are happy to provide more data if needed.
>
> NIF: webrtc_acm_add_data_10ms
> median: 606 usec
> 90 percentile: 1230 usec
> 99 percentile: 2285 usec
> 99.9 percentile: 3223 usec
>
> NIF: webrtc_acm_playout_data_10ms
> median: 98 usec
> 90 percentile: 371 usec
> 99 percentile: 410 usec
> 99.9 percentile: 449 usec
>
> EC2 instance: c5.18xlarge
>
> OS: Linux 4.4.0-1073-aws #83-Ubuntu SMP Sat Nov 17 00:26:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> beam flags: +P 4194304 +sbt db +SP 50:50 +sub true +A 128 +K true
>
> Erlang banner: Erlang/OTP 21 [erts-10.2] [source] [64-bit] [smp:36:36] [ds:36:36:10] [async-threads:128] [hipe]
>
> cpu_topology:
> [{node,[{processor,[{core,[{thread,{logical,0}},
>                            {thread,{logical,36}}]},
>                     {core,[{thread,{logical,1}},{thread,{logical,37}}]},
>                     {core,[{thread,{logical,2}},{thread,{logical,38}}]},
>                     {core,[{thread,{logical,3}},{thread,{logical,39}}]},
>                     {core,[{thread,{logical,4}},{thread,{logical,40}}]},
>                     {core,[{thread,{logical,5}},{thread,{logical,41}}]},
>                     {core,[{thread,{logical,6}},{thread,{logical,42}}]},
>                     {core,[{thread,{logical,7}},{thread,{logical,43}}]},
>                     {core,[{thread,{logical,8}},{thread,{logical,44}}]},
>                     {core,[{thread,{logical,9}},{thread,{logical,45}}]},
>                     {core,[{thread,{logical,10}},{thread,{logical,46}}]},
>                     {core,[{thread,{logical,11}},{thread,{logical,47}}]},
>                     {core,[{thread,{logical,12}},{thread,{logical,48}}]},
>                     {core,[{thread,{logical,13}},{thread,{logical,49}}]},
>                     {core,[{thread,{logical,14}},{thread,{logical,50}}]},
>                     {core,[{thread,{logical,15}},{thread,{logical,51}}]},
>                     {core,[{thread,{logical,16}},{thread,{logical,52}}]},
>                     {core,[{thread,{logical,17}},{thread,{logical,53}}]}]}]},
>  {node,[{processor,[{core,[{thread,{logical,18}},
>                            {thread,{logical,54}}]},
>                     {core,[{thread,{logical,19}},{thread,{logical,55}}]},
>                     {core,[{thread,{logical,20}},{thread,{logical,56}}]},
>                     {core,[{thread,{logical,21}},{thread,{logical,57}}]},
>                     {core,[{thread,{logical,22}},{thread,{logical,58}}]},
>                     {core,[{thread,{logical,23}},{thread,{logical,59}}]},
>                     {core,[{thread,{logical,24}},{thread,{logical,60}}]},
>                     {core,[{thread,{logical,25}},{thread,{logical,61}}]},
>                     {core,[{thread,{logical,26}},{thread,{logical,62}}]},
>                     {core,[{thread,{logical,27}},{thread,{logical,63}}]},
>                     {core,[{thread,{logical,28}},{thread,{logical,64}}]},
>                     {core,[{thread,{logical,29}},{thread,{logical,65}}]},
>                     {core,[{thread,{logical,30}},{thread,{logical,66}}]},
>                     {core,[{thread,{logical,31}},{thread,{logical,67}}]},
>                     {core,[{thread,{logical,32}},{thread,{logical,68}}]},
>                     {core,[{thread,{logical,33}},{thread,{logical,69}}]},
>                     {core,[{thread,{logical,34}},{thread,{logical,70}}]},
>                     {core,[{thread,{logical,35}},{thread,{logical,71}}]}]}]}]
>
> scheduler_bindings: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
>  23,24,25,26,27,28,29,30,31,32,33,34,35}
>
> CPU utilization (from mpstat) during our load test:
> Linux 4.4.0-1073-aws (isw-core-x-qa-awsoh02)    01/04/2019      _x86_64_        (72 CPU)
>
> 08:55:15 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> 08:55:15 PM  all   51.58    0.00    1.64    0.01    0.00    0.42    0.00    0.00    0.00   46.36
> 08:55:15 PM    0   62.34    0.00    2.69    0.06    0.00    0.00    0.00    0.00    0.00   34.90
> 08:55:15 PM    1   62.84    0.00    2.58    0.02    0.00    0.00    0.00    0.00    0.00   34.55
> 08:55:15 PM    2   62.04    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.44
> 08:55:15 PM    3   62.17    0.00    2.62    0.00    0.00    0.00    0.00    0.00    0.00   35.21
> 08:55:15 PM    4   62.14    0.00    2.54    0.02    0.00    0.00    0.00    0.00    0.00   35.30
> 08:55:15 PM    5   62.06    0.00    2.51    0.00    0.00    0.00    0.00    0.00    0.00   35.43
> 08:55:15 PM    6   61.93    0.00    2.58    0.01    0.00    0.00    0.02    0.00    0.00   35.46
> 08:55:15 PM    7   61.99    0.00    2.54    0.00    0.00    0.00    0.00    0.00    0.00   35.48
> 08:55:15 PM    8   62.01    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.47
> 08:55:15 PM    9   61.88    0.00    2.49    0.00    0.00    0.00    0.00    0.00    0.00   35.63
> 08:55:15 PM   10   62.18    0.00    2.48    0.00    0.00    0.00    0.00    0.00    0.00   35.34
> 08:55:15 PM   11   61.89    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.64
> 08:55:15 PM   12   61.98    0.00    2.50    0.00    0.00    0.00    0.00    0.00    0.00   35.52
> 08:55:15 PM   13   61.80    0.00    2.54    0.00    0.00    0.00    0.00    0.00    0.00   35.66
> 08:55:15 PM   14   61.96    0.00    2.56    0.00    0.00    0.00    0.00    0.00    0.00   35.48
> 08:55:15 PM   15   62.11    0.00    2.57    0.00    0.00    0.00    0.00    0.00    0.00   35.31
> 08:55:15 PM   16   62.03    0.00    2.55    0.00    0.00    0.00    0.00    0.00    0.00   35.42
> 08:55:15 PM   17   61.90    0.00    2.53    0.00    0.00    0.00    0.00    0.00    0.00   35.57
> 08:55:15 PM   18   58.36    0.00    2.41    0.09    0.00    3.72    0.00    0.00    0.00   35.43
> 08:55:15 PM   19   58.29    0.00    2.45    0.02    0.00    4.00    0.00    0.00    0.00   35.24
> 08:55:15 PM   20   58.45    0.00    2.36    0.02    0.00    3.73    0.00    0.00    0.00   35.44
> 08:55:15 PM   21   58.56    0.00    2.34    0.04    0.00    3.54    0.00    0.00    0.00   35.51
> 08:55:15 PM   22   58.12    0.00    2.46    0.00    0.00    3.84    0.00    0.00    0.00   35.58
> 08:55:15 PM   23   58.24    0.00    2.36    0.01    0.00    3.82    0.00    0.00    0.00   35.57
> 08:55:15 PM   24   58.84    0.00    2.42    0.00    0.00    3.60    0.00    0.00    0.00   35.14
> 08:55:15 PM   25   58.35    0.00    2.43    0.00    0.00    3.72    0.00    0.00    0.00   35.51
> 08:55:15 PM   26   62.09    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   27   62.09    0.00    2.46    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   28   62.15    0.00    2.48    0.00    0.00    0.00    0.00    0.00    0.00   35.37
> 08:55:15 PM   29   62.09    0.00    2.46    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   30   62.74    0.00    2.45    0.00    0.00    0.00    0.00    0.00    0.00   34.81
> 08:55:15 PM   31   62.14    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.39
> 08:55:15 PM   32   62.35    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.13
> 08:55:15 PM   33   62.13    0.00    2.42    0.00    0.00    0.00    0.00    0.00    0.00   35.46
> 08:55:15 PM   34   62.08    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   35   62.15    0.00    2.49    0.00    0.00    0.00    0.00    0.00    0.00   35.35
> 08:55:15 PM   36   43.19    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   56.01
> 08:55:15 PM   37   43.09    0.00    0.83    0.01    0.00    0.00    0.00    0.00    0.00   56.06
> 08:55:15 PM   38   42.59    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   56.63
> 08:55:15 PM   39   42.60    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   56.58
> 08:55:15 PM   40   42.94    0.00    0.76    0.02    0.00    0.00    0.00    0.00    0.00   56.28
> 08:55:15 PM   41   42.11    0.00    0.76    0.01    0.00    0.00    0.00    0.00    0.00   57.13
> 08:55:15 PM   42   43.26    0.00    0.85    0.01    0.00    0.00    0.00    0.00    0.00   55.88
> 08:55:15 PM   43   42.23    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   56.99
> 08:55:15 PM   44   42.24    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   56.95
> 08:55:15 PM   45   41.65    0.00    0.79    0.01    0.00    0.00    0.00    0.00    0.00   57.55
> 08:55:15 PM   46   41.76    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   57.46
> 08:55:15 PM   47   42.66    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   56.52
> 08:55:15 PM   48   41.83    0.00    0.76    0.01    0.00    0.00    0.00    0.00    0.00   57.40
> 08:55:15 PM   49   42.04    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   57.13
> 08:55:15 PM   50   41.55    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   51   42.37    0.00    0.78    0.00    0.00    0.00    0.00    0.00    0.00   56.84
> 08:55:15 PM   52   41.66    0.00    0.77    0.00    0.00    0.00    0.00    0.00    0.00   57.57
> 08:55:15 PM   53   41.77    0.00    0.79    0.00    0.00    0.00    0.00    0.00    0.00   57.45
> 08:55:15 PM   54   40.33    0.00    0.88    0.01    0.00    0.00    0.00    0.00    0.00   58.79
> 08:55:15 PM   55   41.62    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   57.59
> 08:55:15 PM   56   41.51    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   57   40.35    0.00    0.76    0.00    0.00    0.00    0.00    0.00    0.00   58.89
> 08:55:15 PM   58   41.92    0.00    0.79    0.01    0.00    0.00    0.00    0.00    0.00   57.28
> 08:55:15 PM   59   41.48    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   57.72
> 08:55:15 PM   60   40.83    0.00    0.75    0.01    0.00    0.00    0.00    0.00    0.00   58.41
> 08:55:15 PM   61   40.95    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   58.26
> 08:55:15 PM   62   40.85    0.00    0.80    0.01    0.00    0.00    0.00    0.00    0.00   58.35
> 08:55:15 PM   63   42.11    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   57.12
> 08:55:15 PM   64   42.22    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   56.99
> 08:55:15 PM   65   41.55    0.00    0.78    0.00    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   66   41.68    0.00    0.81    0.00    0.00    0.00    0.00    0.00    0.00   57.51
> 08:55:15 PM   67   41.06    0.00    0.87    0.01    0.00    0.00    0.00    0.00    0.00   58.07
> 08:55:15 PM   68   41.80    0.00    0.79    0.00    0.00    0.00    0.00    0.00    0.00   57.41
> 08:55:15 PM   69   41.38    0.00    0.75    0.00    0.00    0.00    0.00    0.00    0.00   57.87
> 08:55:15 PM   70   42.17    0.00    0.75    0.00    0.00    0.00    0.00    0.00    0.00   57.08
> 08:55:15 PM   71   41.26    0.00    0.81    0.00    0.00    0.00    0.01    0.00    0.00   57.92
>
>
> 1> rp(recon:scheduler_usage(1000)).
> [{1,0.8581586716045433},
>  {2,0.8607597847736659},
>  {3,0.8501543449289042},
>  {4,0.8619718426658023},
>  {5,0.8660226578705742},
>  {6,0.8555763933364806},
>  {7,0.8557195972097537},
>  {8,0.8668284625654983},
>  {9,0.858588675506532},
>  {10,0.8650501369640686},
>  {11,0.8615557214275474},
>  {12,0.8523392789683996},
>  {13,0.8469329348407398},
>  {14,0.8615637534991641},
>  {15,0.8591930739986557},
>  {16,0.8578596164201188},
>  {17,0.8549696086049403},
>  {18,0.8677098230685564},
>  {19,0.8556328340054443},
>  {20,0.8642542459479063},
>  {21,0.8670655179064826},
>  {22,0.8562638308551557},
>  {23,0.8511202058055239},
>  {24,0.8697339264798696},
>  {25,0.8594823462601853},
>  {26,0.8634550008001265},
>  {27,0.8568590130172198},
>  {28,0.864025757952233},
>  {29,0.8575614770622326},
>  {30,0.8611760239143281},
>  {31,0.8589789125669829},
>  {32,0.8537754468447621},
>  {33,0.8675538912040871},
>  {34,0.8607522803519515},
>  {35,0.8639389099445753},
>  {36,0.8743759133171144},
>  {37,0.9526507413773687},
>  {38,0.948925611917577},
>  {39,0.9511904612810083},
>  {40,0.9581015421004746},
>  {41,0.9534977201467516},
>  {42,0.0},
>  {43,0.0},
>  {44,0.0},
>  {45,0.9522518499948341},
>  {46,0.9453306755503674},
>  {47,0.9586374601140047},
>  {48,0.9574611845671873},
>  {49,0.950914304057791},
>  {50,0.9582124294398642},
>  {51,0.9448074463900029},
>  {52,0.9474771999511407},
>  {53,0.9450778098401229},
>  {54,0.0},
>  {55,0.9478167911711937},
>  {56,0.9612587860764551},
>  {57,0.943468696358506},
>  {58,0.9509498642900455},
>  {59,0.9603595134495414},
>  {60,0.9517397912369205},
>  {61,0.0},
>  {62,0.9587029756970658},
>  {63,0.0},
>  {64,0.0},
>  {65,0.9526167902065421},
>  {66,0.0},
>  {67,0.9475668680715805},
>  {68,0.0},
>  {69,0.9630301804839606},
>  {70,0.9475359592879946},
>  {71,0.9541718747681374},
>  {72,0.0}]
>
>
> Thanks for your time,
>
> Jesse Stimpson
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions



--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Jesse Stimpson
Jonas,

Below is an excerpt from msacc for one scheduler while the system is under moderate load (not saturated as in my previous data). On a periodic basis, the schedulers under the 2027 patch have a significantly higher utilization in the emulator column. At times I saw this spike occur every 5 minutes on the nose. Other times it was less predictable. But it typically occurs on the order of minutes (3 - 7 minutes). The full output is included as well.

scheduler(36)    0.68% 0.22% 35.16%    0.55% 55.38% 1.35% 6.66%

scheduler(36)    0.39% 0.08% 76.21%    0.48% 19.52% 1.03% 2.29%


I've not been able to get the thread id for a scheduler yet to run perf. There is a past mailing list thread suggesting use of "ps -Leo pid,tid,comm | grep beam", but this is not returning the thread ids on my current system.



Average thread real-time    : 1000868 us

Accumulated system run-time : 52140693 us

Average scheduler run-time  : 929256 us


       Thread    aux check_io emulator       gc other port sleep


Stats per thread:

    async( 0)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 1)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 2)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 3)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 4)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 5)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 6)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 7)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 8)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 9)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(10)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(11)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(12)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(13)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(14)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(15)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(16)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(17)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(18)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(19)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(20)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(21)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(22)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(23)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(24)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(25)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(26)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(27)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(28)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(29)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(30)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(31)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(32)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(33)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(34)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(35)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(36)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(37)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(38)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(39)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(40)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(41)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(42)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(43)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(44)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(45)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(46)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(47)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(48)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(49)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(50)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(51)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(52)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(53)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(54)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(55)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(56)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(57)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(58)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(59)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(60)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(61)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(62)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(63)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(64)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(65)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(66)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(67)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(68)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(69)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(70)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(71)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(72)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(73)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(74)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(75)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(76)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(77)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(78)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(79)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(80)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(81)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(82)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(83)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(84)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(85)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(86)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(87)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(88)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(89)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(90)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(91)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(92)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(93)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(94)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(95)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(96)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(97)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(98)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(99)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

      aux( 1)    3.61% 1.11%    0.00% 0.00% 0.03%    0.00% 95.24%

dirty_cpu_( 1)    0.00% 0.00% 35.00%    0.00% 19.49% 0.00% 45.51%

dirty_cpu_( 2)    0.00% 0.00% 34.84%    0.00% 18.28% 0.00% 46.89%

dirty_cpu_( 3)    0.00% 0.00% 35.72%    0.00% 19.16% 0.00% 45.11%

dirty_cpu_( 4)    0.00% 0.00% 30.97%    0.00% 16.72% 0.00% 52.30%

dirty_cpu_( 5)    0.00% 0.00% 30.20%    0.00% 18.21% 0.00% 51.58%

dirty_cpu_( 6)    0.00% 0.00% 33.56%    0.00% 17.62% 0.00% 48.82%

dirty_cpu_( 7)    0.00% 0.00% 33.94%    0.00% 17.94% 0.00% 48.12%

dirty_cpu_( 8)    0.00% 0.00% 32.99%    0.00% 18.24% 0.00% 48.77%

dirty_cpu_( 9)    0.00% 0.00% 31.68%    0.00% 17.47% 0.00% 50.86%

dirty_cpu_(10)    0.00% 0.00% 34.06%    0.00% 19.18% 0.00% 46.75%

dirty_cpu_(11)    0.00% 0.00% 33.91%    0.00% 17.87% 0.00% 48.23%

dirty_cpu_(12)    0.00% 0.00% 30.02%    0.00% 19.52% 0.00% 50.46%

dirty_cpu_(13)    0.00% 0.00% 32.33%    0.00% 18.73% 0.00% 48.94%

dirty_cpu_(14)    0.00% 0.00% 36.54%    0.00% 19.58% 0.00% 43.88%

dirty_cpu_(15)    0.00% 0.00% 37.19%    0.00% 18.62% 0.00% 44.19%

dirty_cpu_(16)    0.00% 0.00% 30.57%    0.00% 17.45% 0.00% 51.99%

dirty_cpu_(17)    0.00% 0.00% 30.65%    0.00% 18.08% 0.00% 51.28%

dirty_cpu_(18)    0.00% 0.00% 36.95%    0.00% 19.01% 0.00% 44.04%

dirty_cpu_(19)    0.00% 0.00% 34.25%    0.00% 18.26% 0.00% 47.49%

dirty_cpu_(20)    0.00% 0.00% 34.76%    0.00% 17.07% 0.00% 48.17%

dirty_cpu_(21)    0.00% 0.00% 34.37%    0.00% 18.78% 0.00% 46.85%

dirty_cpu_(22)    0.00% 0.00% 29.20%    0.00% 17.98% 0.00% 52.82%

dirty_cpu_(23)    0.00% 0.00% 32.41%    0.00% 18.01% 0.00% 49.58%

dirty_cpu_(24)    0.00% 0.00% 32.83%    0.00% 17.95% 0.00% 49.23%

dirty_cpu_(25)    0.00% 0.00% 34.42%    0.00% 21.18% 0.00% 44.40%

dirty_cpu_(26)    0.00% 0.00% 34.99%    0.00% 18.62% 0.00% 46.39%

dirty_cpu_(27)    0.00% 0.00% 31.30%    0.00% 17.76% 0.00% 50.94%

dirty_cpu_(28)    0.00% 0.00% 32.23%    0.00% 17.42% 0.00% 50.35%

dirty_cpu_(29)    0.00% 0.00% 33.25%    0.00% 18.83% 0.00% 47.92%

dirty_cpu_(30)    0.00% 0.00% 34.15%    0.00% 18.93% 0.00% 46.93%

dirty_cpu_(31)    0.00% 0.00% 33.88%    0.00% 17.94% 0.00% 48.17%

dirty_cpu_(32)    0.00% 0.00% 35.30%    0.00% 17.34% 0.00% 47.36%

dirty_cpu_(33)    0.00% 0.00% 34.17%    0.00% 17.96% 0.00% 47.87%

dirty_cpu_(34)    0.00% 0.00% 32.21%    0.00% 18.94% 0.00% 48.85%

dirty_cpu_(35)    0.00% 0.00% 35.37%    0.00% 19.08% 0.00% 45.56%

dirty_cpu_(36)    0.00% 0.00% 31.21%    0.00% 17.71% 0.00% 51.07%

dirty_io_s( 1)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 2)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 3)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 4)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 5)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 6)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 7)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 8)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 9)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s(10)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

     poll( 0)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

scheduler( 1)    0.72% 0.18% 36.27%    0.51% 55.29% 1.59% 5.44%

scheduler( 2)    0.62% 0.19% 33.97%    0.55% 54.90% 1.49% 8.27%

scheduler( 3)    0.64% 0.22% 34.90%    0.53% 55.50% 1.43% 6.78%

scheduler( 4)    0.56% 0.16% 31.66%    0.49% 55.55% 1.33% 10.25%

scheduler( 5)    0.71% 0.20% 35.66%    0.58% 54.59% 1.49% 6.77%

scheduler( 6)    0.59% 0.23% 31.19%    0.48% 57.81% 1.28% 8.42%

scheduler( 7)    0.65% 0.23% 34.66%    0.56% 56.09% 1.60% 6.22%

scheduler( 8)    0.78% 0.23% 35.04%    0.59% 55.32% 1.58% 6.47%

scheduler( 9)    0.64% 0.21% 35.12%    0.47% 56.39% 1.50% 5.66%

scheduler(10)    0.80% 0.21% 37.18%    0.53% 54.64% 1.59% 5.04%

scheduler(11)    0.63% 0.20% 31.56%    0.53% 57.86% 1.48% 7.74%

scheduler(12)    0.69% 0.22% 37.50%    0.57% 54.38% 1.75% 4.89%

scheduler(13)    0.78% 0.21% 35.19%    0.55% 55.96% 1.54% 5.76%

scheduler(14)    0.70% 0.22% 31.07%    0.48% 54.35% 1.30% 11.88%

scheduler(15)    0.64% 0.23% 36.66%    0.51% 54.74% 1.66% 5.56%

scheduler(16)    0.68% 0.18% 33.76%    0.50% 55.47% 1.28% 8.12%

scheduler(17)    0.55% 0.28% 27.34%    0.43% 57.13% 1.24% 13.03%

scheduler(18)    0.64% 0.18% 32.85%    0.47% 56.12% 1.32% 8.42%

scheduler(19)    0.76% 0.24% 35.84%    0.53% 55.13% 1.56% 5.94%

scheduler(20)    0.57% 0.19% 27.58%    0.42% 57.17% 1.12% 12.96%

scheduler(21)    0.67% 0.20% 33.02%    0.44% 56.36% 1.36% 7.95%

scheduler(22)    0.81% 0.23% 34.44%    0.54% 56.18% 1.35% 6.46%

scheduler(23)    0.72% 0.22% 36.49%    0.53% 55.47% 1.53% 5.04%

scheduler(24)    0.65% 0.24% 34.68%    0.45% 55.89% 1.44% 6.65%

scheduler(25)    0.71% 0.21% 35.66%    0.46% 55.12% 1.65% 6.18%

scheduler(26)    0.73% 0.20% 33.23%    0.46% 56.45% 1.40% 7.54%

scheduler(27)    0.71% 0.20% 35.40%    0.60% 54.82% 1.47% 6.79%

scheduler(28)    0.61% 0.18% 31.44%    0.48% 58.58% 1.37% 7.34%

scheduler(29)    0.73% 0.22% 36.53%    0.52% 55.65% 1.61% 4.75%

scheduler(30)    0.71% 0.17% 36.66%    0.48% 55.07% 1.49% 5.44%

scheduler(31)    0.76% 0.21% 37.13%    0.50% 54.88% 1.51% 5.02%

scheduler(32)    0.75% 0.17% 32.85%    0.49% 56.76% 1.35% 7.62%

scheduler(33)    0.72% 0.23% 34.21%    0.54% 54.55% 1.42% 8.34%

scheduler(34)    0.73% 0.19% 35.86%    0.58% 55.21% 1.41% 6.03%

scheduler(35)    0.62% 0.19% 36.14%    0.51% 55.15% 1.57% 5.82%

scheduler(36)    0.68% 0.22% 35.16%    0.55% 55.38% 1.35% 6.66%


Stats per type:

        async 0.00%    0.00% 0.00% 0.00%    0.00% 0.00% 100.00%

          aux 3.61%    1.11% 0.00%  0.00% 0.03% 0.00%   95.24%

dirty_cpu_sche    0.00% 0.00% 33.37%    0.00% 18.36% 0.00% 48.27%

dirty_io_sched    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

         poll 0.00%    0.00% 0.00% 0.00%    0.00% 0.00% 100.00%

    scheduler    0.69% 0.21%   34.27% 0.51% 55.72%    1.46% 7.15%




Average thread real-time    : 1001205 us

Accumulated system run-time : 53032153 us

Average scheduler run-time  : 976005 us


       Thread    aux check_io emulator       gc other port sleep


Stats per thread:

    async( 0)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 1)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 2)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 3)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 4)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 5)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 6)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 7)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 8)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async( 9)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(10)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(11)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(12)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(13)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(14)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(15)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(16)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(17)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(18)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(19)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(20)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(21)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(22)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(23)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(24)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(25)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(26)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(27)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(28)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(29)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(30)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(31)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(32)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(33)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(34)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(35)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(36)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(37)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(38)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(39)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(40)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(41)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(42)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(43)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(44)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(45)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(46)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(47)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(48)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(49)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(50)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(51)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(52)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(53)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(54)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(55)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(56)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(57)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(58)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(59)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(60)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(61)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(62)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(63)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(64)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(65)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(66)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(67)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(68)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(69)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(70)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(71)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(72)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(73)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(74)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(75)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(76)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(77)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(78)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(79)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(80)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(81)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(82)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(83)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(84)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(85)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(86)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(87)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(88)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(89)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(90)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(91)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(92)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(93)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(94)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(95)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(96)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(97)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(98)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(99)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

    async(**)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

      aux( 1)    4.15% 0.43%    0.00% 0.00% 0.21%    0.00% 95.21%

dirty_cpu_( 1)    0.00% 0.00% 27.65%    0.00% 19.43% 0.00% 52.92%

dirty_cpu_( 2)    0.00% 0.00% 33.04%    0.00% 21.14% 0.00% 45.82%

dirty_cpu_( 3)    0.00% 0.00% 27.82%    0.00% 18.95% 0.00% 53.23%

dirty_cpu_( 4)    0.00% 0.00% 28.83%    0.00% 18.41% 0.00% 52.76%

dirty_cpu_( 5)    0.00% 0.00% 31.85%    0.00% 19.13% 0.00% 49.02%

dirty_cpu_( 6)    0.00% 0.00% 37.99%    0.00% 22.95% 0.00% 39.06%

dirty_cpu_( 7)    0.00% 0.00% 30.23%    0.00% 19.39% 0.00% 50.38%

dirty_cpu_( 8)    0.00% 0.00% 27.41%    0.00% 19.20% 0.00% 53.39%

dirty_cpu_( 9)    0.00% 0.00% 25.97%    0.00% 17.06% 0.00% 56.97%

dirty_cpu_(10)    0.00% 0.00% 33.72%    0.00% 22.95% 0.00% 43.33%

dirty_cpu_(11)    0.00% 0.00% 29.36%    0.00% 17.68% 0.00% 52.96%

dirty_cpu_(12)    0.00% 0.00% 31.76%    0.00% 21.85% 0.00% 46.39%

dirty_cpu_(13)    0.00% 0.00% 29.66%    0.00% 16.06% 0.00% 54.28%

dirty_cpu_(14)    0.00% 0.00% 28.40%    0.00% 19.99% 0.00% 51.61%

dirty_cpu_(15)    0.00% 0.00% 30.92%    0.00% 17.43% 0.00% 51.65%

dirty_cpu_(16)    0.00% 0.00% 26.79%    0.00% 16.10% 0.00% 57.11%

dirty_cpu_(17)    0.00% 0.00% 27.51%    0.00% 17.93% 0.00% 54.56%

dirty_cpu_(18)    0.00% 0.00% 29.95%    0.00% 17.58% 0.00% 52.47%

dirty_cpu_(19)    0.00% 0.00% 32.60%    0.00% 20.00% 0.00% 47.40%

dirty_cpu_(20)    0.00% 0.00% 28.34%    0.00% 16.17% 0.00% 55.49%

dirty_cpu_(21)    0.00% 0.00% 30.26%    0.00% 16.87% 0.00% 52.87%

dirty_cpu_(22)    0.00% 0.00% 32.03%    0.00% 19.08% 0.00% 48.88%

dirty_cpu_(23)    0.00% 0.00% 28.59%    0.00% 19.82% 0.00% 51.59%

dirty_cpu_(24)    0.00% 0.00% 29.55%    0.00% 17.38% 0.00% 53.07%

dirty_cpu_(25)    0.00% 0.00% 30.51%    0.00% 18.91% 0.00% 50.58%

dirty_cpu_(26)    0.00% 0.00% 29.72%    0.00% 17.58% 0.00% 52.70%

dirty_cpu_(27)    0.00% 0.00% 28.25%    0.00% 18.94% 0.00% 52.81%

dirty_cpu_(28)    0.00% 0.00% 33.06%    0.00% 19.75% 0.00% 47.19%

dirty_cpu_(29)    0.00% 0.00% 27.28%    0.00% 19.40% 0.00% 53.32%

dirty_cpu_(30)    0.00% 0.00% 34.10%    0.00% 22.20% 0.00% 43.70%

dirty_cpu_(31)    0.00% 0.00% 32.76%    0.00% 21.75% 0.00% 45.49%

dirty_cpu_(32)    0.00% 0.00% 30.09%    0.00% 22.01% 0.00% 47.90%

dirty_cpu_(33)    0.00% 0.00% 31.58%    0.00% 20.70% 0.00% 47.72%

dirty_cpu_(34)    0.00% 0.00% 29.87%    0.00% 20.26% 0.00% 49.86%

dirty_cpu_(35)    0.00% 0.00% 31.85%    0.00% 21.93% 0.00% 46.21%

dirty_cpu_(36)    0.00% 0.00% 28.77%    0.00% 18.50% 0.00% 52.73%

dirty_io_s( 1)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 2)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 3)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 4)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 5)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 6)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 7)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 8)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s( 9)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

dirty_io_s(10)    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

     poll( 0)    0.00% 0.00%    0.00% 0.00% 0.00%    0.00% 100.00%

scheduler( 1)    0.40% 0.09% 76.76%    0.50% 18.81% 1.15% 2.30%

scheduler( 2)    0.36% 0.14% 77.57%    0.49% 18.74% 1.08% 1.61%

scheduler( 3)    0.35% 0.08% 76.81%    0.46% 18.96% 1.03% 2.32%

scheduler( 4)    0.36% 0.08% 78.06%    0.40% 18.20% 1.11% 1.78%

scheduler( 5)    0.33% 0.07% 76.77%    0.40% 19.36% 0.88% 2.19%

scheduler( 6)    0.31% 0.06% 74.23%    0.38% 19.66% 0.81% 4.54%

scheduler( 7)    0.40% 0.08% 76.56%    0.48% 18.86% 1.25% 2.36%

scheduler( 8)    0.48% 0.07% 75.85%    0.50% 19.36% 1.40% 2.34%

scheduler( 9)    0.36% 0.07% 76.28%    0.52% 19.21% 1.39% 2.17%

scheduler(10)    0.32% 0.07% 75.76%    0.43% 19.01% 1.19% 3.23%

scheduler(11)    0.35% 0.07% 76.43%    0.52% 19.00% 1.19% 2.43%

scheduler(12)    0.31% 0.06% 75.31%    0.41% 19.55% 1.02% 3.35%

scheduler(13)    0.38% 0.07% 77.07%    0.59% 18.84% 1.33% 1.73%

scheduler(14)    0.41% 0.18% 77.68%    0.48% 18.25% 1.34% 1.66%

scheduler(15)    0.34% 0.08% 76.80%    0.44% 19.12% 1.05% 2.16%

scheduler(16)    0.38% 0.06% 77.68%    0.44% 18.22% 1.13% 2.08%

scheduler(17)    0.36% 0.06% 76.70%    0.49% 18.93% 1.19% 2.27%

scheduler(18)    0.40% 0.08% 77.05%    0.44% 18.97% 1.25% 1.82%

scheduler(19)    0.35% 0.07% 75.97%    0.45% 19.40% 1.07% 2.69%

scheduler(20)    0.37% 0.07% 76.11%    0.46% 19.42% 1.14% 2.43%

scheduler(21)    0.37% 0.08% 76.33%    0.47% 19.34% 1.01% 2.40%

scheduler(22)    0.46% 0.08% 77.41%    0.50% 18.75% 0.97% 1.83%

scheduler(23)    0.43% 0.08% 75.50%    0.51% 19.85% 1.13% 2.51%

scheduler(24)    0.39% 0.07% 74.49%    0.49% 20.29% 1.07% 3.20%

scheduler(25)    0.36% 0.07% 74.69%    0.48% 19.32% 0.98% 4.10%

scheduler(26)    0.35% 0.07% 75.96%    0.41% 18.79% 0.98% 3.44%

scheduler(27)    0.36% 0.06% 76.86%    0.43% 19.18% 0.96% 2.15%

scheduler(28)    0.32% 0.08% 75.08%    0.46% 19.71% 1.02% 3.34%

scheduler(29)    0.43% 0.14% 77.10%    0.43% 18.80% 1.22% 1.88%

scheduler(30)    0.35% 0.06% 76.45%    0.47% 18.64% 1.11% 2.92%

scheduler(31)    0.38% 0.08% 78.35%    0.52% 18.21% 1.17% 1.29%

scheduler(32)    0.37% 0.06% 74.47%    0.50% 19.71% 1.34% 3.56%

scheduler(33)    0.38% 0.06% 77.12%    0.44% 18.59% 1.12% 2.29%

scheduler(34)    0.46% 0.07% 75.58%    0.51% 19.87% 0.93% 2.58%

scheduler(35)    0.38% 0.13% 75.24%    0.46% 19.68% 1.03% 3.07%

scheduler(36)    0.39% 0.08% 76.21%    0.48% 19.52% 1.03% 2.29%


Stats per type:

        async 0.00%    0.00% 0.00% 0.00%    0.00% 0.00% 100.00%

          aux 4.15%    0.43% 0.00%  0.00% 0.21% 0.00%   95.21%

dirty_cpu_sche    0.00% 0.00% 30.22%    0.00% 19.29% 0.00% 50.48%

dirty_io_sched    0.00% 0.00% 0.00%    0.00% 0.00% 0.00% 100.00%

         poll 0.00%    0.00% 0.00% 0.00%    0.00% 0.00% 100.00%

    scheduler    0.38% 0.08%   76.34% 0.47% 19.11%    1.11% 2.51%




On Wed, Jan 9, 2019 at 6:06 PM Jonas Falkevik <[hidden email]> wrote:
Hi Jesse,
we have not observe the behaviour you see.
Can you share microstate accounting stats for the different cases?
Maybe that could shed some light on what the schedulers are doing?
1> msacc:start(1000), msacc:print().

And maybe also running perf top as root on one of the scheduler threads that are fully utilised can be interesting.
# perf top --tid <traditional scheduler thread id>

/Jonas

On Wed, Jan 9, 2019 at 10:52 PM Jesse Stimpson <[hidden email]> wrote:
Rikard and Jonas,

Thank you for your response. With the patch at OTP PR-2027, we do observe that the dirty CPU schedulers do not drop to zero utilization, but in addition we see that the performance of the traditional schedulers suffers.  Specifically, when we apply our load test, scheduler usage toggles in between the following two states. I've snipped the output of similar values for readability.

[{1,0.8933020805456553},

...

{36,0.8882068476700309},

{37,0.4932569692588096},

...

{72,0.49842881346116297}]


and


[{1,1.0},

...

{36,1.0},

{37,0.42909590032946493},

...

{72,0.4641339106204525}]


When applying this same load to the un-patched Erlang 21.2, the traditional schedulers are utilized at about 75%. So it appears that the 2027 patch may put undesirable pressure on the schedulers as a trade-off for consistent performance from the dirty CPU schedulers. Are you observing a similar response on your system?

Thanks,
Jesse

On Wed, Jan 9, 2019 at 3:33 AM Jonas Falkevik <[hidden email]> wrote:
Hi,
This sounds like the issue I bumped into.
After some time, many of the dirty schedulers are not performing any work anymore.
Which was related to a corrupted list of sleeping dirty schedulers.

I haven’t found what triggered it in my case (maybe a GC?). For me it took approx. 1 day before it happened.

Doing Multi Scheduler Block can trigger the problem with the list of sleeping dirty schedulers which gets corrupted.
The sleepers list can contain only one or some schedulers. That list is later used to wake up schedulers when there is work to do.
For each enqueued dirty-cpu or dirty-io job, it wakes the first scheduler in the sleepers list.
But if this list only point to a few schedulers only those are woken up to pick jobs from the queue.

Since fixing that specific problem I have not seen the collapse of dirty schedulers anymore.

It would be great if you can test and see if it fixes your problem as well, https://github.com/erlang/otp/pull/2027
Any feedback on the bug fix is appreciated.

/Jonas

> On 8 Jan 2019, at 17:13, Jesse Stimpson <[hidden email]> wrote:
>
> Hello,
>
> We run a large scale WebRTC-based VOIP application, and currently we are evaluating whether or not dirty CPU schedulers can help us get more consistent performance with some critical NIFs. We've created a test that mimics our production load and have run it on AWS EC2 instances of various sizes. When dirty CPU schedulers are working as expected, we do see measurable improvement in performance, but unfortunately we have seen that they can drop to 0 utilization. They do not recover from this state.
>
> The timings from 2 of our 12 NIFs are in the vicinity of the 1 msec rule of thumb, so we have experimented with them both on and off dirty CPU schedulers. We've left the other NIFs on the traditional schedulers. More detailed NIF timings can be found below.
>
> At the end of this message is the concerning data (from recon:scheduler_usage/1).We have schedulers 1-36 as the traditional schedulers bound to the first hardware thread of each core, and schedulers 37-72 are dirty CPU schedulers. Based on mpstat it appears the dirty CPU schedulers are not bound to threads. After about 10 minutes of running the load test, schedulers 42, 43, 44, 61, 63, 64, 66, 68, 72 went to at 0 utilization and stayed there indefinitely. As time goes on, more dirty CPU schedulers enter this 0 utilization state. Schedulers 1-36 do not exhibit this behaviour. We have reproduced this consistently on other EC2 c5 instances of various sizes.
>
> We have tried various beam adjustments, but we have seen the same behaviour each time. Some examples of things we have tried:
> +sbwt very_long +sbwtdcpu very_long +swt very_low +swtdcpu very_low
> +sfwi 500
>
> Is there anything else that we can try to avoid this 0-utilization issue? We are happy to provide more data if needed.
>
> NIF: webrtc_acm_add_data_10ms
> median: 606 usec
> 90 percentile: 1230 usec
> 99 percentile: 2285 usec
> 99.9 percentile: 3223 usec
>
> NIF: webrtc_acm_playout_data_10ms
> median: 98 usec
> 90 percentile: 371 usec
> 99 percentile: 410 usec
> 99.9 percentile: 449 usec
>
> EC2 instance: c5.18xlarge
>
> OS: Linux 4.4.0-1073-aws #83-Ubuntu SMP Sat Nov 17 00:26:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> beam flags: +P 4194304 +sbt db +SP 50:50 +sub true +A 128 +K true
>
> Erlang banner: Erlang/OTP 21 [erts-10.2] [source] [64-bit] [smp:36:36] [ds:36:36:10] [async-threads:128] [hipe]
>
> cpu_topology:
> [{node,[{processor,[{core,[{thread,{logical,0}},
>                            {thread,{logical,36}}]},
>                     {core,[{thread,{logical,1}},{thread,{logical,37}}]},
>                     {core,[{thread,{logical,2}},{thread,{logical,38}}]},
>                     {core,[{thread,{logical,3}},{thread,{logical,39}}]},
>                     {core,[{thread,{logical,4}},{thread,{logical,40}}]},
>                     {core,[{thread,{logical,5}},{thread,{logical,41}}]},
>                     {core,[{thread,{logical,6}},{thread,{logical,42}}]},
>                     {core,[{thread,{logical,7}},{thread,{logical,43}}]},
>                     {core,[{thread,{logical,8}},{thread,{logical,44}}]},
>                     {core,[{thread,{logical,9}},{thread,{logical,45}}]},
>                     {core,[{thread,{logical,10}},{thread,{logical,46}}]},
>                     {core,[{thread,{logical,11}},{thread,{logical,47}}]},
>                     {core,[{thread,{logical,12}},{thread,{logical,48}}]},
>                     {core,[{thread,{logical,13}},{thread,{logical,49}}]},
>                     {core,[{thread,{logical,14}},{thread,{logical,50}}]},
>                     {core,[{thread,{logical,15}},{thread,{logical,51}}]},
>                     {core,[{thread,{logical,16}},{thread,{logical,52}}]},
>                     {core,[{thread,{logical,17}},{thread,{logical,53}}]}]}]},
>  {node,[{processor,[{core,[{thread,{logical,18}},
>                            {thread,{logical,54}}]},
>                     {core,[{thread,{logical,19}},{thread,{logical,55}}]},
>                     {core,[{thread,{logical,20}},{thread,{logical,56}}]},
>                     {core,[{thread,{logical,21}},{thread,{logical,57}}]},
>                     {core,[{thread,{logical,22}},{thread,{logical,58}}]},
>                     {core,[{thread,{logical,23}},{thread,{logical,59}}]},
>                     {core,[{thread,{logical,24}},{thread,{logical,60}}]},
>                     {core,[{thread,{logical,25}},{thread,{logical,61}}]},
>                     {core,[{thread,{logical,26}},{thread,{logical,62}}]},
>                     {core,[{thread,{logical,27}},{thread,{logical,63}}]},
>                     {core,[{thread,{logical,28}},{thread,{logical,64}}]},
>                     {core,[{thread,{logical,29}},{thread,{logical,65}}]},
>                     {core,[{thread,{logical,30}},{thread,{logical,66}}]},
>                     {core,[{thread,{logical,31}},{thread,{logical,67}}]},
>                     {core,[{thread,{logical,32}},{thread,{logical,68}}]},
>                     {core,[{thread,{logical,33}},{thread,{logical,69}}]},
>                     {core,[{thread,{logical,34}},{thread,{logical,70}}]},
>                     {core,[{thread,{logical,35}},{thread,{logical,71}}]}]}]}]
>
> scheduler_bindings: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
>  23,24,25,26,27,28,29,30,31,32,33,34,35}
>
> CPU utilization (from mpstat) during our load test:
> Linux 4.4.0-1073-aws (isw-core-x-qa-awsoh02)    01/04/2019      _x86_64_        (72 CPU)
>
> 08:55:15 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> 08:55:15 PM  all   51.58    0.00    1.64    0.01    0.00    0.42    0.00    0.00    0.00   46.36
> 08:55:15 PM    0   62.34    0.00    2.69    0.06    0.00    0.00    0.00    0.00    0.00   34.90
> 08:55:15 PM    1   62.84    0.00    2.58    0.02    0.00    0.00    0.00    0.00    0.00   34.55
> 08:55:15 PM    2   62.04    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.44
> 08:55:15 PM    3   62.17    0.00    2.62    0.00    0.00    0.00    0.00    0.00    0.00   35.21
> 08:55:15 PM    4   62.14    0.00    2.54    0.02    0.00    0.00    0.00    0.00    0.00   35.30
> 08:55:15 PM    5   62.06    0.00    2.51    0.00    0.00    0.00    0.00    0.00    0.00   35.43
> 08:55:15 PM    6   61.93    0.00    2.58    0.01    0.00    0.00    0.02    0.00    0.00   35.46
> 08:55:15 PM    7   61.99    0.00    2.54    0.00    0.00    0.00    0.00    0.00    0.00   35.48
> 08:55:15 PM    8   62.01    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.47
> 08:55:15 PM    9   61.88    0.00    2.49    0.00    0.00    0.00    0.00    0.00    0.00   35.63
> 08:55:15 PM   10   62.18    0.00    2.48    0.00    0.00    0.00    0.00    0.00    0.00   35.34
> 08:55:15 PM   11   61.89    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.64
> 08:55:15 PM   12   61.98    0.00    2.50    0.00    0.00    0.00    0.00    0.00    0.00   35.52
> 08:55:15 PM   13   61.80    0.00    2.54    0.00    0.00    0.00    0.00    0.00    0.00   35.66
> 08:55:15 PM   14   61.96    0.00    2.56    0.00    0.00    0.00    0.00    0.00    0.00   35.48
> 08:55:15 PM   15   62.11    0.00    2.57    0.00    0.00    0.00    0.00    0.00    0.00   35.31
> 08:55:15 PM   16   62.03    0.00    2.55    0.00    0.00    0.00    0.00    0.00    0.00   35.42
> 08:55:15 PM   17   61.90    0.00    2.53    0.00    0.00    0.00    0.00    0.00    0.00   35.57
> 08:55:15 PM   18   58.36    0.00    2.41    0.09    0.00    3.72    0.00    0.00    0.00   35.43
> 08:55:15 PM   19   58.29    0.00    2.45    0.02    0.00    4.00    0.00    0.00    0.00   35.24
> 08:55:15 PM   20   58.45    0.00    2.36    0.02    0.00    3.73    0.00    0.00    0.00   35.44
> 08:55:15 PM   21   58.56    0.00    2.34    0.04    0.00    3.54    0.00    0.00    0.00   35.51
> 08:55:15 PM   22   58.12    0.00    2.46    0.00    0.00    3.84    0.00    0.00    0.00   35.58
> 08:55:15 PM   23   58.24    0.00    2.36    0.01    0.00    3.82    0.00    0.00    0.00   35.57
> 08:55:15 PM   24   58.84    0.00    2.42    0.00    0.00    3.60    0.00    0.00    0.00   35.14
> 08:55:15 PM   25   58.35    0.00    2.43    0.00    0.00    3.72    0.00    0.00    0.00   35.51
> 08:55:15 PM   26   62.09    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   27   62.09    0.00    2.46    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   28   62.15    0.00    2.48    0.00    0.00    0.00    0.00    0.00    0.00   35.37
> 08:55:15 PM   29   62.09    0.00    2.46    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   30   62.74    0.00    2.45    0.00    0.00    0.00    0.00    0.00    0.00   34.81
> 08:55:15 PM   31   62.14    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.39
> 08:55:15 PM   32   62.35    0.00    2.52    0.00    0.00    0.00    0.00    0.00    0.00   35.13
> 08:55:15 PM   33   62.13    0.00    2.42    0.00    0.00    0.00    0.00    0.00    0.00   35.46
> 08:55:15 PM   34   62.08    0.00    2.47    0.00    0.00    0.00    0.00    0.00    0.00   35.45
> 08:55:15 PM   35   62.15    0.00    2.49    0.00    0.00    0.00    0.00    0.00    0.00   35.35
> 08:55:15 PM   36   43.19    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   56.01
> 08:55:15 PM   37   43.09    0.00    0.83    0.01    0.00    0.00    0.00    0.00    0.00   56.06
> 08:55:15 PM   38   42.59    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   56.63
> 08:55:15 PM   39   42.60    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   56.58
> 08:55:15 PM   40   42.94    0.00    0.76    0.02    0.00    0.00    0.00    0.00    0.00   56.28
> 08:55:15 PM   41   42.11    0.00    0.76    0.01    0.00    0.00    0.00    0.00    0.00   57.13
> 08:55:15 PM   42   43.26    0.00    0.85    0.01    0.00    0.00    0.00    0.00    0.00   55.88
> 08:55:15 PM   43   42.23    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   56.99
> 08:55:15 PM   44   42.24    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   56.95
> 08:55:15 PM   45   41.65    0.00    0.79    0.01    0.00    0.00    0.00    0.00    0.00   57.55
> 08:55:15 PM   46   41.76    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   57.46
> 08:55:15 PM   47   42.66    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   56.52
> 08:55:15 PM   48   41.83    0.00    0.76    0.01    0.00    0.00    0.00    0.00    0.00   57.40
> 08:55:15 PM   49   42.04    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   57.13
> 08:55:15 PM   50   41.55    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   51   42.37    0.00    0.78    0.00    0.00    0.00    0.00    0.00    0.00   56.84
> 08:55:15 PM   52   41.66    0.00    0.77    0.00    0.00    0.00    0.00    0.00    0.00   57.57
> 08:55:15 PM   53   41.77    0.00    0.79    0.00    0.00    0.00    0.00    0.00    0.00   57.45
> 08:55:15 PM   54   40.33    0.00    0.88    0.01    0.00    0.00    0.00    0.00    0.00   58.79
> 08:55:15 PM   55   41.62    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   57.59
> 08:55:15 PM   56   41.51    0.00    0.82    0.00    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   57   40.35    0.00    0.76    0.00    0.00    0.00    0.00    0.00    0.00   58.89
> 08:55:15 PM   58   41.92    0.00    0.79    0.01    0.00    0.00    0.00    0.00    0.00   57.28
> 08:55:15 PM   59   41.48    0.00    0.80    0.00    0.00    0.00    0.00    0.00    0.00   57.72
> 08:55:15 PM   60   40.83    0.00    0.75    0.01    0.00    0.00    0.00    0.00    0.00   58.41
> 08:55:15 PM   61   40.95    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   58.26
> 08:55:15 PM   62   40.85    0.00    0.80    0.01    0.00    0.00    0.00    0.00    0.00   58.35
> 08:55:15 PM   63   42.11    0.00    0.77    0.01    0.00    0.00    0.00    0.00    0.00   57.12
> 08:55:15 PM   64   42.22    0.00    0.78    0.01    0.00    0.00    0.00    0.00    0.00   56.99
> 08:55:15 PM   65   41.55    0.00    0.78    0.00    0.00    0.00    0.00    0.00    0.00   57.67
> 08:55:15 PM   66   41.68    0.00    0.81    0.00    0.00    0.00    0.00    0.00    0.00   57.51
> 08:55:15 PM   67   41.06    0.00    0.87    0.01    0.00    0.00    0.00    0.00    0.00   58.07
> 08:55:15 PM   68   41.80    0.00    0.79    0.00    0.00    0.00    0.00    0.00    0.00   57.41
> 08:55:15 PM   69   41.38    0.00    0.75    0.00    0.00    0.00    0.00    0.00    0.00   57.87
> 08:55:15 PM   70   42.17    0.00    0.75    0.00    0.00    0.00    0.00    0.00    0.00   57.08
> 08:55:15 PM   71   41.26    0.00    0.81    0.00    0.00    0.00    0.01    0.00    0.00   57.92
>
>
> 1> rp(recon:scheduler_usage(1000)).
> [{1,0.8581586716045433},
>  {2,0.8607597847736659},
>  {3,0.8501543449289042},
>  {4,0.8619718426658023},
>  {5,0.8660226578705742},
>  {6,0.8555763933364806},
>  {7,0.8557195972097537},
>  {8,0.8668284625654983},
>  {9,0.858588675506532},
>  {10,0.8650501369640686},
>  {11,0.8615557214275474},
>  {12,0.8523392789683996},
>  {13,0.8469329348407398},
>  {14,0.8615637534991641},
>  {15,0.8591930739986557},
>  {16,0.8578596164201188},
>  {17,0.8549696086049403},
>  {18,0.8677098230685564},
>  {19,0.8556328340054443},
>  {20,0.8642542459479063},
>  {21,0.8670655179064826},
>  {22,0.8562638308551557},
>  {23,0.8511202058055239},
>  {24,0.8697339264798696},
>  {25,0.8594823462601853},
>  {26,0.8634550008001265},
>  {27,0.8568590130172198},
>  {28,0.864025757952233},
>  {29,0.8575614770622326},
>  {30,0.8611760239143281},
>  {31,0.8589789125669829},
>  {32,0.8537754468447621},
>  {33,0.8675538912040871},
>  {34,0.8607522803519515},
>  {35,0.8639389099445753},
>  {36,0.8743759133171144},
>  {37,0.9526507413773687},
>  {38,0.948925611917577},
>  {39,0.9511904612810083},
>  {40,0.9581015421004746},
>  {41,0.9534977201467516},
>  {42,0.0},
>  {43,0.0},
>  {44,0.0},
>  {45,0.9522518499948341},
>  {46,0.9453306755503674},
>  {47,0.9586374601140047},
>  {48,0.9574611845671873},
>  {49,0.950914304057791},
>  {50,0.9582124294398642},
>  {51,0.9448074463900029},
>  {52,0.9474771999511407},
>  {53,0.9450778098401229},
>  {54,0.0},
>  {55,0.9478167911711937},
>  {56,0.9612587860764551},
>  {57,0.943468696358506},
>  {58,0.9509498642900455},
>  {59,0.9603595134495414},
>  {60,0.9517397912369205},
>  {61,0.0},
>  {62,0.9587029756970658},
>  {63,0.0},
>  {64,0.0},
>  {65,0.9526167902065421},
>  {66,0.0},
>  {67,0.9475668680715805},
>  {68,0.0},
>  {69,0.9630301804839606},
>  {70,0.9475359592879946},
>  {71,0.9541718747681374},
>  {72,0.0}]
>
>
> Thanks for your time,
>
> Jesse Stimpson
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions



--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com



--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Rickard Green-2
Please try the fix in PR-2093 <https://github.com/erlang/otp/pull/2093>

Regards,
Rickadr
--
Rickard Green, Erlang/OTP, Ericsson AB

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Frank Muller
Hello Jesse

Did Rickard’s patch (PR-2093) fixed your Dirty scheduler issue? 

Best
/Frank

Please try the fix in PR-2093 <https://github.com/erlang/otp/pull/2093>

Regards,
Rickadr
--
Rickard Green, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Jesse Stimpson
My apologies as I've been distracted by other matters recently. I hope to be able to report back within the next few days.

Thanks

On Tue, Jan 15, 2019 at 12:18 PM Frank Muller <[hidden email]> wrote:
Hello Jesse

Did Rickard’s patch (PR-2093) fixed your Dirty scheduler issue? 

Best
/Frank

Please try the fix in PR-2093 <https://github.com/erlang/otp/pull/2093>

Regards,
Rickadr
--
Rickard Green, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Jesse Stimpson
Behavior of the schedulers appears to have the same issue with 2093 patch. 

But I did notice something new in the msacc output. There is a very brief period, at approx the same time as the normal schedulers usage spikes, where all the dirty cpu schedulers have a significant sleep time. I've included timestamped excerpts below, starting with the increase in dirty cpu sleep, and ending with a "steady state" utilization. 

        Thread      aux check_io emulator       gc    other     port    sleep

{{2019,1,15},{21,57,34}}
dirty_cpu_( 1)    0.00%    0.00%   26.80%    0.00%   22.75%    0.00%   50.45%
 scheduler( 1)    0.17%    0.00%   98.22%    0.50%    0.18%    0.77%    0.16%

{{2019,1,15},{21,57,36}}
dirty_cpu_( 1)    0.00%    0.00%   76.92%    0.00%   17.73%    0.00%    5.35%
 scheduler( 1)    0.61%    0.00%   94.20%    1.23%    0.61%    3.35%    0.00%

{{2019,1,15},{21,57,38}}
dirty_cpu_( 1)    0.00%    0.00%   87.16%    0.00%    6.52%    0.00%    6.31%
 scheduler( 1)    0.94%    0.14%   93.72%    1.51%    0.63%    2.84%    0.22%

{{2019,1,15},{21,57,39}}
dirty_cpu_( 1)    0.00%    0.00%   87.92%    0.00%    6.85%    0.00%    5.23%
 scheduler( 1)    0.98%    0.14%   90.88%    1.32%    3.71%    2.71%    0.26%

{{2019,1,15},{21,57,40}}
dirty_cpu_( 1)    0.00%    0.00%   67.47%    0.00%   19.51%    0.00%   13.02%
 scheduler( 1)    1.09%    0.17%   81.95%    1.07%   12.01%    3.08%    0.64%

{{2019,1,15},{21,57,41}}
dirty_cpu_( 1)    0.00%    0.00%   64.25%    0.00%   18.86%    0.00%   16.88%
 scheduler( 1)    0.99%    0.28%   74.67%    0.98%   19.56%    2.80%    0.71%

Thank you for the continuing support!

Jesse


On Tue, Jan 15, 2019 at 4:14 PM Jesse Stimpson <[hidden email]> wrote:
My apologies as I've been distracted by other matters recently. I hope to be able to report back within the next few days.

Thanks

On Tue, Jan 15, 2019 at 12:18 PM Frank Muller <[hidden email]> wrote:
Hello Jesse

Did Rickard’s patch (PR-2093) fixed your Dirty scheduler issue? 

Best
/Frank

Please try the fix in PR-2093 <https://github.com/erlang/otp/pull/2093>

Regards,
Rickadr
--
Rickard Green, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com



--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Rickard Green-2
On 2019-01-15 23:11, Jesse Stimpson wrote:
> Behavior of the schedulers appears to have the same issue with 2093 patch.
>
> But I did notice something new in the msacc output. There is a very
> brief period, at approx the same time as the normal schedulers usage
> spikes, where all the dirty cpu schedulers have a significant sleep
> time. I've included timestamped excerpts below, starting with the
> increase in dirty cpu sleep, and ending with a "steady state" utilization.
>

We just released OTP-21.2.3 containing PR-2093.

I don't think PR-2093 cause the spikes. This change does not affect how
work is moved between normal and dirty schedulers, only prevents the
"loss" of dirty schedulers.

If a process is scheduled on a dirty scheduler it wont make progress
until it has executed on a dirty scheduler and vice versa (for normal
schedulers). This is the same both before and after PR-2093. Since dirty
schedulers aren't "lost" after PR-2093 progress of such processes will
happen earlier which of course change the behavior, but that is due to
the work load.

Regards,
Rickard
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Frank Muller
Thanks Rickard.

@Jonas: did the PR2093 help in your case?

/Frank

On 2019-01-15 23:11, Jesse Stimpson wrote:
> Behavior of the schedulers appears to have the same issue with 2093 patch.
>
> But I did notice something new in the msacc output. There is a very
> brief period, at approx the same time as the normal schedulers usage
> spikes, where all the dirty cpu schedulers have a significant sleep
> time. I've included timestamped excerpts below, starting with the
> increase in dirty cpu sleep, and ending with a "steady state" utilization.
>

We just released OTP-21.2.3 containing PR-2093.

I don't think PR-2093 cause the spikes. This change does not affect how
work is moved between normal and dirty schedulers, only prevents the
"loss" of dirty schedulers.

If a process is scheduled on a dirty scheduler it wont make progress
until it has executed on a dirty scheduler and vice versa (for normal
schedulers). This is the same both before and after PR-2093. Since dirty
schedulers aren't "lost" after PR-2093 progress of such processes will
happen earlier which of course change the behavior, but that is due to
the work load.

Regards,
Rickard

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Jesse Stimpson
In reply to this post by Rickard Green-2
It's possible that during our tests the utilization spike was masked by the collapse issue fixed in the recent PRs. Is there any other analysis I can provide on the utilization spike/sleep behavior we're seeing, or any other debugging or code reading you recommend? As far as I can tell, there's nothing about our workload that would cause periodic behavior like this. The application is slinging RTP audio via udp to remote endpoints at a 20 msec ptime. Each function call for the NIF in question adds 10 msec of audio to the WebRTC buffer.

As point of corroboration, this user on stackoverflow appears to be having the same or a similar issue: https://stackoverflow.com/questions/49563067/erlang-schedulers-just-sleep-why

As always, the level of support from the Erlang community is second to none. Thanks to all for your time!

Jesse



On Wed, Jan 16, 2019 at 6:35 AM Rickard Green <[hidden email]> wrote:
On 2019-01-15 23:11, Jesse Stimpson wrote:
> Behavior of the schedulers appears to have the same issue with 2093 patch.
>
> But I did notice something new in the msacc output. There is a very
> brief period, at approx the same time as the normal schedulers usage
> spikes, where all the dirty cpu schedulers have a significant sleep
> time. I've included timestamped excerpts below, starting with the
> increase in dirty cpu sleep, and ending with a "steady state" utilization.
>

We just released OTP-21.2.3 containing PR-2093.

I don't think PR-2093 cause the spikes. This change does not affect how
work is moved between normal and dirty schedulers, only prevents the
"loss" of dirty schedulers.

If a process is scheduled on a dirty scheduler it wont make progress
until it has executed on a dirty scheduler and vice versa (for normal
schedulers). This is the same both before and after PR-2093. Since dirty
schedulers aren't "lost" after PR-2093 progress of such processes will
happen earlier which of course change the behavior, but that is due to
the work load.

Regards,
Rickard


--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Frank Muller
Hey Jesse

Glad to hear your Dirty schedulers collapse issue was solved by Rickard’s PR.

The 21.2.3 release fixed a similar problem we had with Dirty Schedulers.

Jesse: can’t you rewrite your Dirty NIF as a Yielding one? There’s a nice slide deck from “Andrew Bennett” describing this technique (page 58):
Best
/Frank

It's possible that during our tests the utilization spike was masked by the collapse issue fixed in the recent PRs. Is there any other analysis I can provide on the utilization spike/sleep behavior we're seeing, or any other debugging or code reading you recommend? As far as I can tell, there's nothing about our workload that would cause periodic behavior like this. The application is slinging RTP audio via udp to remote endpoints at a 20 msec ptime. Each function call for the NIF in question adds 10 msec of audio to the WebRTC buffer.

As point of corroboration, this user on stackoverflow appears to be having the same or a similar issue: https://stackoverflow.com/questions/49563067/erlang-schedulers-just-sleep-why

As always, the level of support from the Erlang community is second to none. Thanks to all for your time!

Jesse



On Wed, Jan 16, 2019 at 6:35 AM Rickard Green <[hidden email]> wrote:
On 2019-01-15 23:11, Jesse Stimpson wrote:
> Behavior of the schedulers appears to have the same issue with 2093 patch.
>
> But I did notice something new in the msacc output. There is a very
> brief period, at approx the same time as the normal schedulers usage
> spikes, where all the dirty cpu schedulers have a significant sleep
> time. I've included timestamped excerpts below, starting with the
> increase in dirty cpu sleep, and ending with a "steady state" utilization.
>

We just released OTP-21.2.3 containing PR-2093.

I don't think PR-2093 cause the spikes. This change does not affect how
work is moved between normal and dirty schedulers, only prevents the
"loss" of dirty schedulers.

If a process is scheduled on a dirty scheduler it wont make progress
until it has executed on a dirty scheduler and vice versa (for normal
schedulers). This is the same both before and after PR-2093. Since dirty
schedulers aren't "lost" after PR-2093 progress of such processes will
happen earlier which of course change the behavior, but that is due to
the work load.

Regards,
Rickard


--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Max Lapshin-2
In reply to this post by Jesse Stimpson
why do you do it via nif?

On Wed, Jan 16, 2019 at 6:00 PM Jesse Stimpson <[hidden email]> wrote:
It's possible that during our tests the utilization spike was masked by the collapse issue fixed in the recent PRs. Is there any other analysis I can provide on the utilization spike/sleep behavior we're seeing, or any other debugging or code reading you recommend? As far as I can tell, there's nothing about our workload that would cause periodic behavior like this. The application is slinging RTP audio via udp to remote endpoints at a 20 msec ptime. Each function call for the NIF in question adds 10 msec of audio to the WebRTC buffer.

As point of corroboration, this user on stackoverflow appears to be having the same or a similar issue: https://stackoverflow.com/questions/49563067/erlang-schedulers-just-sleep-why

As always, the level of support from the Erlang community is second to none. Thanks to all for your time!

Jesse



On Wed, Jan 16, 2019 at 6:35 AM Rickard Green <[hidden email]> wrote:
On 2019-01-15 23:11, Jesse Stimpson wrote:
> Behavior of the schedulers appears to have the same issue with 2093 patch.
>
> But I did notice something new in the msacc output. There is a very
> brief period, at approx the same time as the normal schedulers usage
> spikes, where all the dirty cpu schedulers have a significant sleep
> time. I've included timestamped excerpts below, starting with the
> increase in dirty cpu sleep, and ending with a "steady state" utilization.
>

We just released OTP-21.2.3 containing PR-2093.

I don't think PR-2093 cause the spikes. This change does not affect how
work is moved between normal and dirty schedulers, only prevents the
"loss" of dirty schedulers.

If a process is scheduled on a dirty scheduler it wont make progress
until it has executed on a dirty scheduler and vice versa (for normal
schedulers). This is the same both before and after PR-2093. Since dirty
schedulers aren't "lost" after PR-2093 progress of such processes will
happen earlier which of course change the behavior, but that is due to
the work load.

Regards,
Rickard


--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Jesse Stimpson
To clarify about our workload, the NIF execution itself is around 1 msec, the data on which is operates represents 10 msec of audio. Apologies if my last message was unclear.

Out of convenience, we're using the open source WebRTC project to take advantage of their built in PLC, FEC, Opus, etc. The project is written in C++, so we have integrated with it via NIF. Unfortunately, re-writing using a yielding NIF, or re-writing to Erlang is not as straightforward as we would like. Although I admit it would alleviate our scheduling issues.

We'll continue our testing with 21.2.3 and report back if there appear to be any other leads.

Thanks,
Jesse

On Tue, Jan 22, 2019 at 2:56 PM Max Lapshin <[hidden email]> wrote:
why do you do it via nif?

On Wed, Jan 16, 2019 at 6:00 PM Jesse Stimpson <[hidden email]> wrote:
It's possible that during our tests the utilization spike was masked by the collapse issue fixed in the recent PRs. Is there any other analysis I can provide on the utilization spike/sleep behavior we're seeing, or any other debugging or code reading you recommend? As far as I can tell, there's nothing about our workload that would cause periodic behavior like this. The application is slinging RTP audio via udp to remote endpoints at a 20 msec ptime. Each function call for the NIF in question adds 10 msec of audio to the WebRTC buffer.

As point of corroboration, this user on stackoverflow appears to be having the same or a similar issue: https://stackoverflow.com/questions/49563067/erlang-schedulers-just-sleep-why

As always, the level of support from the Erlang community is second to none. Thanks to all for your time!

Jesse



On Wed, Jan 16, 2019 at 6:35 AM Rickard Green <[hidden email]> wrote:
On 2019-01-15 23:11, Jesse Stimpson wrote:
> Behavior of the schedulers appears to have the same issue with 2093 patch.
>
> But I did notice something new in the msacc output. There is a very
> brief period, at approx the same time as the normal schedulers usage
> spikes, where all the dirty cpu schedulers have a significant sleep
> time. I've included timestamped excerpts below, starting with the
> increase in dirty cpu sleep, and ending with a "steady state" utilization.
>

We just released OTP-21.2.3 containing PR-2093.

I don't think PR-2093 cause the spikes. This change does not affect how
work is moved between normal and dirty schedulers, only prevents the
"loss" of dirty schedulers.

If a process is scheduled on a dirty scheduler it wont make progress
until it has executed on a dirty scheduler and vice versa (for normal
schedulers). This is the same both before and after PR-2093. Since dirty
schedulers aren't "lost" after PR-2093 progress of such processes will
happen earlier which of course change the behavior, but that is due to
the work load.

Regards,
Rickard


--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dirty CPU schedulers stuck at zero utilization

Jesse Stimpson
After more testing with 21.2.3, it appears that the behavior we are seeing with respect to dirty schedulers going to sleep occurs only when the dirty scheduler runs on a cpu hyperthread. On our test instance, I disabled hyperthreading and used +SP 50:50, and the problem has gone away. This allows our NIF workload and non-NIF workload an even attribution of CPU resources, which works well for our purposes.

It's unclear to us why hyperthreading causes this behavior, but we do wonder if it may have something to do with AWS virtualization. In any case, thanks to everyone for the assistance!

Best,
Jesse

On Wed, Jan 23, 2019 at 1:14 PM Jesse Stimpson <[hidden email]> wrote:
To clarify about our workload, the NIF execution itself is around 1 msec, the data on which is operates represents 10 msec of audio. Apologies if my last message was unclear.

Out of convenience, we're using the open source WebRTC project to take advantage of their built in PLC, FEC, Opus, etc. The project is written in C++, so we have integrated with it via NIF. Unfortunately, re-writing using a yielding NIF, or re-writing to Erlang is not as straightforward as we would like. Although I admit it would alleviate our scheduling issues.

We'll continue our testing with 21.2.3 and report back if there appear to be any other leads.

Thanks,
Jesse

On Tue, Jan 22, 2019 at 2:56 PM Max Lapshin <[hidden email]> wrote:
why do you do it via nif?

On Wed, Jan 16, 2019 at 6:00 PM Jesse Stimpson <[hidden email]> wrote:
It's possible that during our tests the utilization spike was masked by the collapse issue fixed in the recent PRs. Is there any other analysis I can provide on the utilization spike/sleep behavior we're seeing, or any other debugging or code reading you recommend? As far as I can tell, there's nothing about our workload that would cause periodic behavior like this. The application is slinging RTP audio via udp to remote endpoints at a 20 msec ptime. Each function call for the NIF in question adds 10 msec of audio to the WebRTC buffer.

As point of corroboration, this user on stackoverflow appears to be having the same or a similar issue: https://stackoverflow.com/questions/49563067/erlang-schedulers-just-sleep-why

As always, the level of support from the Erlang community is second to none. Thanks to all for your time!

Jesse



On Wed, Jan 16, 2019 at 6:35 AM Rickard Green <[hidden email]> wrote:
On 2019-01-15 23:11, Jesse Stimpson wrote:
> Behavior of the schedulers appears to have the same issue with 2093 patch.
>
> But I did notice something new in the msacc output. There is a very
> brief period, at approx the same time as the normal schedulers usage
> spikes, where all the dirty cpu schedulers have a significant sleep
> time. I've included timestamped excerpts below, starting with the
> increase in dirty cpu sleep, and ending with a "steady state" utilization.
>

We just released OTP-21.2.3 containing PR-2093.

I don't think PR-2093 cause the spikes. This change does not affect how
work is moved between normal and dirty schedulers, only prevents the
"loss" of dirty schedulers.

If a process is scheduled on a dirty scheduler it wont make progress
until it has executed on a dirty scheduler and vice versa (for normal
schedulers). This is the same both before and after PR-2093. Since dirty
schedulers aren't "lost" after PR-2093 progress of such processes will
happen earlier which of course change the behavior, but that is due to
the work load.

Regards,
Rickard


--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com



--

Jesse Stimpson

Site Reliability Engineering

m: <a href="tel:(919)%20995-0424" value="+19192971077" style="color:rgb(17,85,204)" target="_blank">9199950424
RepublicWireless.com


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions