Node mysteriously sends 11 MB while spawning a process on another node

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Node mysteriously sends 11 MB while spawning a process on another node

Filip Niksic
Hi all,

I am trying to understand why a node sends 11 MB of unknown data to another node while spawning a process on that node.

Let me briefly explain my setup. There are two nodes involved: main and a. I am running them in two docker containers, which in turn are running in a simulated network in which I can inspect and analyze network traffic using Wireshark. Once the nodes are started, main spawns a process on node a with spawn_link(). In Wireshark I can observe an exchange of ErlDP (distribution protocol) packets. The spawn_link causes a colossal REG_SEND message being sent from main to a; the message has length 11011057 (11 MB) and it is broken into 7605 TCP packets.

Now, it has to be noted that one of the arguments to the spawned process is a function closure. Could it be that this closure causes the runtime to pack all of its data structures and pass them along with the message? If so, how can such a situation be avoided? Is there some general rule of thumb that function closures should not be passed as arguments in a distributed setting?

Thanks,

Filip


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Node mysteriously sends 11 MB while spawning a process on another node

dmkolesnikov
Hello

On 19 Apr 2019, at 17.39, Filip Niksic <[hidden email]> wrote:

Now, it has to be noted that one of the arguments to the spawned process is a function closure. Could it be that this closure causes the runtime to pack all of its data structures and pass them along with the message? If so, how can such a situation be avoided? Is there some general rule of thumb that function closures should not be passed as arguments in a distributed setting?

I would bet that usage of closure function is the major reason. Passing anonymous functions causes a serialisation of its entire environment. It might easily grow to 11MB.

Some hints and discussion can be found here

The best practice to use fun Mod:Fun/N over fun() -> … end.

Best Regards,
Dmitry

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Node mysteriously sends 11 MB while spawning a process on another node

Jesper Louis Andersen-2
In reply to this post by Filip Niksic
As Dmitry says, the closure must be sent upon the function spawn. If it is large, you can expect the 11MB to be sent when the function is spawned.

The best way around it is to avoid sending a large body of information when you create a function. My guess is you are referencing a large map or list of data, which in turn gets copied. In some situations, this will also hurt when you spawn the same function locally, so there is good reason to avoid it there as well (though there are some caveats if the referenced data is part of the literal arena in the memory allocation system, and so on).


On Fri, Apr 19, 2019 at 4:47 PM Filip Niksic <[hidden email]> wrote:
Hi all,

I am trying to understand why a node sends 11 MB of unknown data to another node while spawning a process on that node.

Let me briefly explain my setup. There are two nodes involved: main and a. I am running them in two docker containers, which in turn are running in a simulated network in which I can inspect and analyze network traffic using Wireshark. Once the nodes are started, main spawns a process on node a with spawn_link(). In Wireshark I can observe an exchange of ErlDP (distribution protocol) packets. The spawn_link causes a colossal REG_SEND message being sent from main to a; the message has length 11011057 (11 MB) and it is broken into 7605 TCP packets.

Now, it has to be noted that one of the arguments to the spawned process is a function closure. Could it be that this closure causes the runtime to pack all of its data structures and pass them along with the message? If so, how can such a situation be avoided? Is there some general rule of thumb that function closures should not be passed as arguments in a distributed setting?

Thanks,

Filip

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--
J.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions