High latency when exchanging small messages between different Erlang nodes

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

High latency when exchanging small messages between different Erlang nodes

Konstantinos Kallas

Hello,

I have an Erlang application where latency is crucial and a lot of small messages (tuples with an atom and integer) are exchanged between processes in different nodes.

The main procedure is that a main process sends a small message to 4 worker processes in other Erlang nodes, the worker processes do some negligible processing, and then they reply back to the main node with a small message.

Each separate Erlang node is on a different docker container (generated from the erlang:21 docker image), and all the containers are connected using a standard docker bridge network.

I have noticed that latency (the time from when the first message is sent, and its replies arrive) linearly increases with time. It starts at 1 second and after 30 seconds of execution latency has become 10 seconds.

I have tried running all processes on the same erlang node, and then latency is (as expected) a couple milliseconds, so my assumption is that the problem could be caused by one (or more) of the following:

- Some misconfiguration of the Erlang nodes

- Some misconfiguration of the docker network/containers

- Some penalty imposed by the operating system/docker because a lot of small messages are exchanged

Has anyone encountered this issue, or does anyone know how to configure Erlang nodes (and the operating system) to reduce message latency?

Thanks in advance.

Best,

Konstantinos


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Led
Reply | Threaded
Open this post in threaded view
|

Re: High latency when exchanging small messages between different Erlang nodes

Led
In decent societies kicked for the expression "docker in production".

чт, 11 квіт. 2019 о 22:07 Konstantinos Kallas <[hidden email]> пише:

Hello,

I have an Erlang application where latency is crucial and a lot of small messages (tuples with an atom and integer) are exchanged between processes in different nodes.

The main procedure is that a main process sends a small message to 4 worker processes in other Erlang nodes, the worker processes do some negligible processing, and then they reply back to the main node with a small message.

Each separate Erlang node is on a different docker container (generated from the erlang:21 docker image), and all the containers are connected using a standard docker bridge network.

I have noticed that latency (the time from when the first message is sent, and its replies arrive) linearly increases with time. It starts at 1 second and after 30 seconds of execution latency has become 10 seconds.

I have tried running all processes on the same erlang node, and then latency is (as expected) a couple milliseconds, so my assumption is that the problem could be caused by one (or more) of the following:

- Some misconfiguration of the Erlang nodes

- Some misconfiguration of the docker network/containers

- Some penalty imposed by the operating system/docker because a lot of small messages are exchanged

Has anyone encountered this issue, or does anyone know how to configure Erlang nodes (and the operating system) to reduce message latency?

Thanks in advance.

Best,

Konstantinos

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--
Led.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: High latency when exchanging small messages between different Erlang nodes

zxq9-2
On 2019年4月11日木曜日 22時16分09秒 JST Led wrote:
> In decent societies kicked for the expression "docker in production".


Process in a runtime in a docker in a VM in a host in a cloud platform in a...


We need to go deeper.
-- Leonardo DiCaprio: System Engineer


-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: High latency when exchanging small messages between different Erlang nodes

Tristan Sloughter-4
In reply to this post by Konstantinos Kallas
What is the frequency of initial messages sent? Are the worker processes mailboxes increasing over time as well?

Tristan

On Thu, Apr 11, 2019, at 13:07, Konstantinos Kallas wrote:

Hello,

I have an Erlang application where latency is crucial and a lot of small messages (tuples with an atom and integer) are exchanged between processes in different nodes.

The main procedure is that a main process sends a small message to 4 worker processes in other Erlang nodes, the worker processes do some negligible processing, and then they reply back to the main node with a small message.

Each separate Erlang node is on a different docker container (generated from the erlang:21 docker image), and all the containers are connected using a standard docker bridge network.

I have noticed that latency (the time from when the first message is sent, and its replies arrive) linearly increases with time. It starts at 1 second and after 30 seconds of execution latency has become 10 seconds.

I have tried running all processes on the same erlang node, and then latency is (as expected) a couple milliseconds, so my assumption is that the problem could be caused by one (or more) of the following:

- Some misconfiguration of the Erlang nodes

- Some misconfiguration of the docker network/containers

- Some penalty imposed by the operating system/docker because a lot of small messages are exchanged

Has anyone encountered this issue, or does anyone know how to configure Erlang nodes (and the operating system) to reduce message latency?

Thanks in advance.

Best,

Konstantinos

_______________________________________________
erlang-questions mailing list
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: High latency when exchanging small messages between different Erlang nodes

Jesper Louis Andersen-2
In reply to this post by Konstantinos Kallas
My first recommendation is to add instrumentation to the system, so you can see what is going on:

* Tristan already suggested looking at mailbox sizes
* Network blocking is worth investigating as well. Many small messages can lead to network overload situations
* Docker/Kubernetes environments tend to be noisy if a lot of work is running in them. In particular, if you have high-throughput systems banded with low latency systems, you are going to run into trouble.
* Enable the Erlang system monitor. Get it to report on blocked ports and processes.
* Add VM metrics: prometheus for instance.

The problem can be everywhere: Inside your code, the VM, docker, kernel, hardware, ... Your first goal is to narrow down that. Verify things are looking correct in each layer before moving to the next.

The fact latency starts out at 1 second where we are at millisecond level locally, would suggest something has to do with the distribution. Either in your own code, or in the underlying setup.

On Thu, Apr 11, 2019 at 9:07 PM Konstantinos Kallas <[hidden email]> wrote:

Hello,

I have an Erlang application where latency is crucial and a lot of small messages (tuples with an atom and integer) are exchanged between processes in different nodes.

The main procedure is that a main process sends a small message to 4 worker processes in other Erlang nodes, the worker processes do some negligible processing, and then they reply back to the main node with a small message.

Each separate Erlang node is on a different docker container (generated from the erlang:21 docker image), and all the containers are connected using a standard docker bridge network.

I have noticed that latency (the time from when the first message is sent, and its replies arrive) linearly increases with time. It starts at 1 second and after 30 seconds of execution latency has become 10 seconds.

I have tried running all processes on the same erlang node, and then latency is (as expected) a couple milliseconds, so my assumption is that the problem could be caused by one (or more) of the following:

- Some misconfiguration of the Erlang nodes

- Some misconfiguration of the docker network/containers

- Some penalty imposed by the operating system/docker because a lot of small messages are exchanged

Has anyone encountered this issue, or does anyone know how to configure Erlang nodes (and the operating system) to reduce message latency?

Thanks in advance.

Best,

Konstantinos

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--
J.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: High latency when exchanging small messages between different Erlang nodes

Tristan Sloughter-4
Yea, instrumentation from the beginning is a good bet. Shameless plug https://opencensus.io/quickstart/erlang/ :) -- and prometheus.erl for vm metrics like Jesper suggests.

Tristan

On Fri, Apr 12, 2019, at 03:53, Jesper Louis Andersen wrote:
My first recommendation is to add instrumentation to the system, so you can see what is going on:

* Tristan already suggested looking at mailbox sizes
* Network blocking is worth investigating as well. Many small messages can lead to network overload situations
* Docker/Kubernetes environments tend to be noisy if a lot of work is running in them. In particular, if you have high-throughput systems banded with low latency systems, you are going to run into trouble.
* Enable the Erlang system monitor. Get it to report on blocked ports and processes.
* Add VM metrics: prometheus for instance.

The problem can be everywhere: Inside your code, the VM, docker, kernel, hardware, ... Your first goal is to narrow down that. Verify things are looking correct in each layer before moving to the next.

The fact latency starts out at 1 second where we are at millisecond level locally, would suggest something has to do with the distribution. Either in your own code, or in the underlying setup.

On Thu, Apr 11, 2019 at 9:07 PM Konstantinos Kallas <[hidden email]> wrote:

Hello,

I have an Erlang application where latency is crucial and a lot of small messages (tuples with an atom and integer) are exchanged between processes in different nodes.

The main procedure is that a main process sends a small message to 4 worker processes in other Erlang nodes, the worker processes do some negligible processing, and then they reply back to the main node with a small message.

Each separate Erlang node is on a different docker container (generated from the erlang:21 docker image), and all the containers are connected using a standard docker bridge network.

I have noticed that latency (the time from when the first message is sent, and its replies arrive) linearly increases with time. It starts at 1 second and after 30 seconds of execution latency has become 10 seconds.

I have tried running all processes on the same erlang node, and then latency is (as expected) a couple milliseconds, so my assumption is that the problem could be caused by one (or more) of the following:

- Some misconfiguration of the Erlang nodes

- Some misconfiguration of the docker network/containers

- Some penalty imposed by the operating system/docker because a lot of small messages are exchanged

Has anyone encountered this issue, or does anyone know how to configure Erlang nodes (and the operating system) to reduce message latency?

Thanks in advance.

Best,

Konstantinos

_______________________________________________
erlang-questions mailing list


--
J.
_______________________________________________
erlang-questions mailing list
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: High latency when exchanging small messages between different Erlang nodes

Konstantinos Kallas

Thanks for the constructive feedback :)

On 12/4/19 9:34 π.μ., Tristan Sloughter wrote:
Yea, instrumentation from the beginning is a good bet. Shameless plug https://opencensus.io/quickstart/erlang/ :) -- and prometheus.erl for vm metrics like Jesper suggests.

Tristan

On Fri, Apr 12, 2019, at 03:53, Jesper Louis Andersen wrote:
My first recommendation is to add instrumentation to the system, so you can see what is going on:

* Tristan already suggested looking at mailbox sizes
* Network blocking is worth investigating as well. Many small messages can lead to network overload situations
* Docker/Kubernetes environments tend to be noisy if a lot of work is running in them. In particular, if you have high-throughput systems banded with low latency systems, you are going to run into trouble.
* Enable the Erlang system monitor. Get it to report on blocked ports and processes.
* Add VM metrics: prometheus for instance.

The problem can be everywhere: Inside your code, the VM, docker, kernel, hardware, ... Your first goal is to narrow down that. Verify things are looking correct in each layer before moving to the next.

The fact latency starts out at 1 second where we are at millisecond level locally, would suggest something has to do with the distribution. Either in your own code, or in the underlying setup.

On Thu, Apr 11, 2019 at 9:07 PM Konstantinos Kallas <[hidden email]> wrote:

Hello,

I have an Erlang application where latency is crucial and a lot of small messages (tuples with an atom and integer) are exchanged between processes in different nodes.

The main procedure is that a main process sends a small message to 4 worker processes in other Erlang nodes, the worker processes do some negligible processing, and then they reply back to the main node with a small message.

Each separate Erlang node is on a different docker container (generated from the erlang:21 docker image), and all the containers are connected using a standard docker bridge network.

I have noticed that latency (the time from when the first message is sent, and its replies arrive) linearly increases with time. It starts at 1 second and after 30 seconds of execution latency has become 10 seconds.

I have tried running all processes on the same erlang node, and then latency is (as expected) a couple milliseconds, so my assumption is that the problem could be caused by one (or more) of the following:

- Some misconfiguration of the Erlang nodes

- Some misconfiguration of the docker network/containers

- Some penalty imposed by the operating system/docker because a lot of small messages are exchanged

Has anyone encountered this issue, or does anyone know how to configure Erlang nodes (and the operating system) to reduce message latency?

Thanks in advance.

Best,

Konstantinos

_______________________________________________
erlang-questions mailing list


--
J.
_______________________________________________
erlang-questions mailing list



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions