Port locks with high time under LCNT

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Port locks with high time under LCNT

Frank Muller
Hi All

My custom-made web server which only serves only static files (a la Hugo: https://gohugo.io/) is showing this under LCNT:

Can someone explain how to dig further to understand what’s going on and why these Port locks are taking so much time?

/Frank
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Frank Muller
I was able to find out which lock is used by run_queue:

Question: how can I check the lock behind #Port<app@127.0.0.1.19659> for example?

/Frank

Hi All

My custom-made web server which only serves only static files (a la Hugo: https://gohugo.io/) is showing this under LCNT:

Can someone explain how to dig further to understand what’s going on and why these Port locks are taking so much time?

/Frank
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Frank Muller
Thanks to @Peti Gömöri, I was able to identify these Port locks:
https://gist.github.com/frankmullerl/008174c6594ca27584ac7f4e6724bee5

Some of them are taking up to 6.7sec (6707846 usec) :-/

My application is serving static images by calling file:sendfile/2 (https://erlang.org/doc/man/file.html#sendfile-2).

Can someone please explain how I can avoid these locks or at least make their impact lesser?

/Frank


Mon 30 dec. 2019 at 19:58, Frank Muller <[hidden email]> wrote:
I was able to find out which lock is used by run_queue:

Question: how can I check the lock behind #Port<app@127.0.0.1.19659> for example?

/Frank

Hi All

My custom-made web server which only serves only static files (a la Hugo: https://gohugo.io/) is showing this under LCNT:

Can someone explain how to dig further to understand what’s going on and why these Port locks are taking so much time?

/Frank
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Mikael Pettersson-5
On Thu, Jan 2, 2020 at 5:38 PM Frank Muller <[hidden email]> wrote:
>
> Thanks to @Peti Gömöri, I was able to identify these Port locks:
> https://gist.github.com/frankmullerl/008174c6594ca27584ac7f4e6724bee5
>
> Some of them are taking up to 6.7sec (6707846 usec) :-/
>
> My application is serving static images by calling file:sendfile/2 (https://erlang.org/doc/man/file.html#sendfile-2).
>
> Can someone please explain how I can avoid these locks or at least make their impact lesser?

Using the file module has been known to cause synchronization
overheads.  Often we (Klarna) use prim_file instead,
but it has sendfile/8 not /2.  Anyway, that may be an avenue worth
investigating.
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Frank Muller
Hi Mikael

Thanks for pointing out prim_inet. To my knowledge, it’s undocumented I can be changed by the OTP team. That’s why I’ve avoided it at the first place. 

By the way, you probably mean prim_inet:sendfile/4 not 8, correct?

I made a quick the change to my app and found that the throughput is a little bit better (3% faster).

But these port locks are still there :-/
Any other ideas?

/Frank

On Thu, Jan 2, 2020 at 5:38 PM Frank Muller <[hidden email]> wrote:
>
> Thanks to @Peti Gömöri, I was able to identify these Port locks:
> https://gist.github.com/frankmullerl/008174c6594ca27584ac7f4e6724bee5
>
> Some of them are taking up to 6.7sec (6707846 usec) :-/
>
> My application is serving static images by calling file:sendfile/2 (https://erlang.org/doc/man/file.html#sendfile-2).
>
> Can someone please explain how I can avoid these locks or at least make their impact lesser?

Using the file module has been known to cause synchronization
overheads.  Often we (Klarna) use prim_file instead,
but it has sendfile/8 not /2.  Anyway, that may be an avenue worth
investigating.
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Frank Muller
The more load we have, the more locks contention we get:


/Frank


Hi Mikael

Thanks for pointing out prim_inet. To my knowledge, it’s undocumented I can be changed by the OTP team. That’s why I’ve avoided it at the first place. 

By the way, you probably mean prim_inet:sendfile/4 not 8, correct?

I made a quick the change to my app and found that the throughput is a little bit better (3% faster).

But these port locks are still there :-/
Any other ideas?

/Frank

On Thu, Jan 2, 2020 at 5:38 PM Frank Muller <[hidden email]> wrote:
>
> Thanks to @Peti Gömöri, I was able to identify these Port locks:
> https://gist.github.com/frankmullerl/008174c6594ca27584ac7f4e6724bee5
>
> Some of them are taking up to 6.7sec (6707846 usec) :-/
>
> My application is serving static images by calling file:sendfile/2 (https://erlang.org/doc/man/file.html#sendfile-2).
>
> Can someone please explain how I can avoid these locks or at least make their impact lesser?

Using the file module has been known to cause synchronization
overheads.  Often we (Klarna) use prim_file instead,
but it has sendfile/8 not /2.  Anyway, that may be an avenue worth
investigating.
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Dániel Szoboszlay
Hi Frank,

I think some context would help us to better understand your problem. For example what kind of problems do these lock conflicts cause in your system? Are there some Erlang processes that got stuck for several seconds? What are they trying to do at that time? Or is it only that the throughput you experience stays below your expectations?

The lock stats in the gists don't seem that wrong to me. I don't have much experience with lock counters and can't figure out from the gists what do these ports are (the image files? tcp connections?) and how they are used, but my guess is they are the tcp connections. Each port lock is tried 4 or 5 times and there's only 1 (albeit often very long) collision. This is how I'd expect a web server to handle a tcp connection: do one read, one write, close it - a handful of operations only. And if you have an unresponsive client or bad network, one operation may easily take 6-7 seconds, blocking the next one. So just a guess, but is it possible you attempt to use the same tcp socket from two different processes?

Cheers,
Daniel

On Fri, 3 Jan 2020 at 00:40, Frank Muller <[hidden email]> wrote:
The more load we have, the more locks contention we get:


/Frank


Hi Mikael

Thanks for pointing out prim_inet. To my knowledge, it’s undocumented I can be changed by the OTP team. That’s why I’ve avoided it at the first place. 

By the way, you probably mean prim_inet:sendfile/4 not 8, correct?

I made a quick the change to my app and found that the throughput is a little bit better (3% faster).

But these port locks are still there :-/
Any other ideas?

/Frank

On Thu, Jan 2, 2020 at 5:38 PM Frank Muller <[hidden email]> wrote:
>
> Thanks to @Peti Gömöri, I was able to identify these Port locks:
> https://gist.github.com/frankmullerl/008174c6594ca27584ac7f4e6724bee5
>
> Some of them are taking up to 6.7sec (6707846 usec) :-/
>
> My application is serving static images by calling file:sendfile/2 (https://erlang.org/doc/man/file.html#sendfile-2).
>
> Can someone please explain how I can avoid these locks or at least make their impact lesser?

Using the file module has been known to cause synchronization
overheads.  Often we (Klarna) use prim_file instead,
but it has sendfile/8 not /2.  Anyway, that may be an avenue worth
investigating.
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Frank Muller
Hi Daniel

Thanks for the feedback. Sorry for being unclear, but this thread discontinued when switched to 2020. I gave some context when I started it at:

More context:
1. I’m building a Hugo (https://gohugo.io/) like service in Erlang.
Actually, it’s much more simpler than Hugo. You ask for a file, and you get it (or not).

2. The design is simple. One Erlang process handles one incoming connection.
Thus, it’s not possible to mess up with TCP sockets here. Whether the file exists and gets delivered to the caller or not.

3. Average file size is ~150KB. Max file size: ~3MB

4. I’m using Erlang 22.2 on both Linux (prod system) and MacOS (dev).

5. The service is caping at 4'000 req/sec.

6. @Mikael Pettersson suggested to use prim_file:sendfile/8 which gaves us a boost of ~3% in throughput.

7. For now, I’m only using HTTP (no HTTPS).

$ ulimit -n
655360

Anything else you’d like to know?

I completely agree with you regarding the Port locks. But when 1 collision occurs, it tooks 72’690'401usec.
That’s my concern :-/

Can these Port locks be avoided?
Is there any undocumented feature I can try?

I’m open to test any idea and/or change my design architecture if needed. 

Thanks in advance.

/Frank


Fri 3 jan 2020 at 10:21, Dániel Szoboszlay <[hidden email]> wrote :
Hi Frank,

I think some context would help us to better understand your problem. For example what kind of problems do these lock conflicts cause in your system? Are there some Erlang processes that got stuck for several seconds? What are they trying to do at that time? Or is it only that the throughput you experience stays below your expectations?

The lock stats in the gists don't seem that wrong to me. I don't have much experience with lock counters and can't figure out from the gists what do these ports are (the image files? tcp connections?) and how they are used, but my guess is they are the tcp connections. Each port lock is tried 4 or 5 times and there's only 1 (albeit often very long) collision. This is how I'd expect a web server to handle a tcp connection: do one read, one write, close it - a handful of operations only. And if you have an unresponsive client or bad network, one operation may easily take 6-7 seconds, blocking the next one. So just a guess, but is it possible you attempt to use the same tcp socket from two different processes?

Cheers,
Daniel

On Fri, 3 Jan 2020 at 00:40, Frank Muller <[hidden email]> wrote:
The more load we have, the more locks contention we get:


/Frank


Hi Mikael

Thanks for pointing out prim_inet. To my knowledge, it’s undocumented I can be changed by the OTP team. That’s why I’ve avoided it at the first place. 

By the way, you probably mean prim_inet:sendfile/4 not 8, correct?

I made a quick the change to my app and found that the throughput is a little bit better (3% faster).

But these port locks are still there :-/
Any other ideas?

/Frank

On Thu, Jan 2, 2020 at 5:38 PM Frank Muller <[hidden email]> wrote:
>
> Thanks to @Peti Gömöri, I was able to identify these Port locks:
> https://gist.github.com/frankmullerl/008174c6594ca27584ac7f4e6724bee5
>
> Some of them are taking up to 6.7sec (6707846 usec) :-/
>
> My application is serving static images by calling file:sendfile/2 (https://erlang.org/doc/man/file.html#sendfile-2).
>
> Can someone please explain how I can avoid these locks or at least make their impact lesser?

Using the file module has been known to cause synchronization
overheads.  Often we (Klarna) use prim_file instead,
but it has sendfile/8 not /2.  Anyway, that may be an avenue worth
investigating.
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Max Lapshin-2
I advise to try to refuse from sendfile and use pread/write

Our Flussonic can serve 10-20 gigabits per second. We have removed all  "optimisations"  like mmap and so on and left with
old plain read/write.
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Frank Muller
Hi Max

Those numbers are amazing. Can you please elaborate more?

So, you read data from disk with file:pread/3 and send it out using gen_tcp:send/2. Am I right?

/Frank 

Sat. 4 janv. 2020 at 08:32, Max Lapshin <[hidden email]> wrote :
I advise to try to refuse from sendfile and use pread/write

Our Flussonic can serve 10-20 gigabits per second. We have removed all  "optimisations"  like mmap and so on and left with
old plain read/write.
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Valentin Micic-6
In reply to this post by Frank Muller
Just my 2c (for what is) worth…

There are some versions of Linux kernel that are known to cause performance problems. Recently, we had a situation where a particular OS kernel patch caused a serious performance degradation. We were lucky, as the same software ran on multiple machines with almost the same hardware configuration, thus we did not have to doubt Erlang, or our code to that matter. 
BTW: the worst performing machine was running at 30% capacity relative to the best performing machine. After a brief wild goose chase with a hardware vendor, they (the vendor) have established that the problem was not related to the Intel CPU (Speedstep Technology), as *we* initially suspected, but actual OS kernel version.

When you become desperate enough, you could check your OS kernel version/patch level as well.

V/ 

On 03 Jan 2020, at 19:39, Frank Muller <[hidden email]> wrote:

Hi Daniel

Thanks for the feedback. Sorry for being unclear, but this thread discontinued when switched to 2020. I gave some context when I started it at:

More context:
1. I’m building a Hugo (https://gohugo.io/) like service in Erlang.
Actually, it’s much more simpler than Hugo. You ask for a file, and you get it (or not).

2. The design is simple. One Erlang process handles one incoming connection.
Thus, it’s not possible to mess up with TCP sockets here. Whether the file exists and gets delivered to the caller or not.

3. Average file size is ~150KB. Max file size: ~3MB

4. I’m using Erlang 22.2 on both Linux (prod system) and MacOS (dev).

5. The service is caping at 4'000 req/sec.

6. @Mikael Pettersson suggested to use prim_file:sendfile/8 which gaves us a boost of ~3% in throughput.

7. For now, I’m only using HTTP (no HTTPS).

$ ulimit -n
655360

Anything else you’d like to know?

I completely agree with you regarding the Port locks. But when 1 collision occurs, it tooks 72’690'401usec.
That’s my concern :-/

Can these Port locks be avoided?
Is there any undocumented feature I can try?

I’m open to test any idea and/or change my design architecture if needed. 

Thanks in advance.

/Frank


Fri 3 jan 2020 at 10:21, Dániel Szoboszlay <[hidden email]> wrote :
Hi Frank,

I think some context would help us to better understand your problem. For example what kind of problems do these lock conflicts cause in your system? Are there some Erlang processes that got stuck for several seconds? What are they trying to do at that time? Or is it only that the throughput you experience stays below your expectations?

The lock stats in the gists don't seem that wrong to me. I don't have much experience with lock counters and can't figure out from the gists what do these ports are (the image files? tcp connections?) and how they are used, but my guess is they are the tcp connections. Each port lock is tried 4 or 5 times and there's only 1 (albeit often very long) collision. This is how I'd expect a web server to handle a tcp connection: do one read, one write, close it - a handful of operations only. And if you have an unresponsive client or bad network, one operation may easily take 6-7 seconds, blocking the next one. So just a guess, but is it possible you attempt to use the same tcp socket from two different processes?

Cheers,
Daniel

On Fri, 3 Jan 2020 at 00:40, Frank Muller <[hidden email]> wrote:
The more load we have, the more locks contention we get:


/Frank


Hi Mikael

Thanks for pointing out prim_inet. To my knowledge, it’s undocumented I can be changed by the OTP team. That’s why I’ve avoided it at the first place. 

By the way, you probably mean prim_inet:sendfile/4 not 8, correct?

I made a quick the change to my app and found that the throughput is a little bit better (3% faster).

But these port locks are still there :-/
Any other ideas?

/Frank

On Thu, Jan 2, 2020 at 5:38 PM Frank Muller <[hidden email]> wrote:
>
> Thanks to @Peti Gömöri, I was able to identify these Port locks:
> https://gist.github.com/frankmullerl/008174c6594ca27584ac7f4e6724bee5
>
> Some of them are taking up to 6.7sec (6707846 usec) :-/
>
> My application is serving static images by calling file:sendfile/2 (https://erlang.org/doc/man/file.html#sendfile-2).
>
> Can someone please explain how I can avoid these locks or at least make their impact lesser?

Using the file module has been known to cause synchronization
overheads.  Often we (Klarna) use prim_file instead,
but it has sendfile/8 not /2.  Anyway, that may be an avenue worth
investigating.

Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Max Lapshin-2
In reply to this post by Frank Muller
Yes.
And we repack from mp4 to mpegts on fly.  We cannot do sendfile due to this requirement and it happened that we do not need to do it.

So 10 gbps is an absolutely normal load for E5.

But of course you need to be not green =)  Disable all powersave.

On Sat, Jan 4, 2020 at 10:59 AM Frank Muller <[hidden email]> wrote:
Hi Max

Those numbers are amazing. Can you please elaborate more?

So, you read data from disk with file:pread/3 and send it out using gen_tcp:send/2. Am I right?

/Frank 

Sat. 4 janv. 2020 at 08:32, Max Lapshin <[hidden email]> wrote :
I advise to try to refuse from sendfile and use pread/write

Our Flussonic can serve 10-20 gigabits per second. We have removed all  "optimisations"  like mmap and so on and left with
old plain read/write.
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Frank Muller
Gonna update my design, test and report back 

/Frank

Sat. 4 janv. 2020 at 09:30, Max Lapshin <[hidden email]> wrote :
Yes.
And we repack from mp4 to mpegts on fly.  We cannot do sendfile due to this requirement and it happened that we do not need to do it.

So 10 gbps is an absolutely normal load for E5.

But of course you need to be not green =)  Disable all powersave.

On Sat, Jan 4, 2020 at 10:59 AM Frank Muller <[hidden email]> wrote:
Hi Max

Those numbers are amazing. Can you please elaborate more?

So, you read data from disk with file:pread/3 and send it out using gen_tcp:send/2. Am I right?

/Frank 

Sat. 4 janv. 2020 at 08:32, Max Lapshin <[hidden email]> wrote :
I advise to try to refuse from sendfile and use pread/write

Our Flussonic can serve 10-20 gigabits per second. We have removed all  "optimisations"  like mmap and so on and left with
old plain read/write.
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

John Högberg-2
What you're seeing is caused by how we're using the `sendfile(2)` system call. Like all other file operations it's impossible to make it non-blocking on Linux, and since it's used from the output callback in the inet driver it will block the scheduler until completion (while the port lock is held, hence its visibility with `lcnt`). Fixing this is non-trivial so we might force it to use the read/send fallback in a patch.

/John

On Sat, 2020-01-04 at 11:39 +0100, Frank Muller wrote:
Gonna update my design, test and report back 

/Frank

Sat. 4 janv. 2020 at 09:30, Max Lapshin <[hidden email]> wrote :
Yes.
And we repack from mp4 to mpegts on fly.  We cannot do sendfile due to this requirement and it happened that we do not need to do it.

So 10 gbps is an absolutely normal load for E5.

But of course you need to be not green =)  Disable all powersave.

On Sat, Jan 4, 2020 at 10:59 AM Frank Muller <[hidden email]> wrote:
Hi Max

Those numbers are amazing. Can you please elaborate more?

So, you read data from disk with file:pread/3 and send it out using gen_tcp:send/2. Am I right?

/Frank 

Sat. 4 janv. 2020 at 08:32, Max Lapshin <[hidden email]> wrote :
I advise to try to refuse from sendfile and use pread/write

Our Flussonic can serve 10-20 gigabits per second. We have removed all  "optimisations"  like mmap and so on and left with
old plain read/write.
Reply | Threaded
Open this post in threaded view
|

Re: Port locks with high time under LCNT

Frank Muller
Hi John

Thanks for clarifying the fact that sendfile call being synchronous (maybe this info should be be added to the doc).

Switching from prim_file:sendfile to file:pread + gen_tcp:send (per @Max Lapshin suggestion) gave me a different landscape in term of locks:

Tail latency (+99%ile) is still high and the throughput more a less the same.

How can I interpret these new LCNT info to improve things?

Best
/Frank


Tue 7 janv. 2020 at 13:05, John Högberg <[hidden email]> wrote :
What you're seeing is caused by how we're using the `sendfile(2)` system call. Like all other file operations it's impossible to make it non-blocking on Linux, and since it's used from the output callback in the inet driver it will block the scheduler until completion (while the port lock is held, hence its visibility with `lcnt`). Fixing this is non-trivial so we might force it to use the read/send fallback in a patch.

/John

On Sat, 2020-01-04 at 11:39 +0100, Frank Muller wrote:
Gonna update my design, test and report back 

/Frank

Sat. 4 janv. 2020 at 09:30, Max Lapshin <[hidden email]> wrote :
Yes.
And we repack from mp4 to mpegts on fly.  We cannot do sendfile due to this requirement and it happened that we do not need to do it.

So 10 gbps is an absolutely normal load for E5.

But of course you need to be not green =)  Disable all powersave.

On Sat, Jan 4, 2020 at 10:59 AM Frank Muller <[hidden email]> wrote:
Hi Max

Those numbers are amazing. Can you please elaborate more?

So, you read data from disk with file:pread/3 and send it out using gen_tcp:send/2. Am I right?

/Frank 

Sat. 4 janv. 2020 at 08:32, Max Lapshin <[hidden email]> wrote :
I advise to try to refuse from sendfile and use pread/write

Our Flussonic can serve 10-20 gigabits per second. We have removed all  "optimisations"  like mmap and so on and left with
old plain read/write.