VM leaking memory

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: VM leaking memory

Frank Muller
Hi Michael

All packets in transit have a “seq_id” (sequential number).

This means that in theory packet1, packet2...packetN can be checked in parallel and in any order (which is not the case in my current design), but they must be send to the next processing stage in order: packet1 first, then packet2...

I would love to hear from you how can I turn this long-lived process to multiple short-lived ones while enforcing ordering.

/Frank

On 2/1/19 8:11 AM, Frank Muller wrote:
I tried two solutions to reduce the memory usage of the problematic process:

1. calling garbage:collect/0 after processing N packets (varying N=10..128).
Nothing changed at all and the bin_alloc memory stayed fragmented as you can see:
http://147.135.36.188:3400/observer_cli_BEFORE.jpg

The call to instrument:carriers/0:
http://147.135.36.188:3400/instrument_carriers.jpg

The call to instrument:allocations/0:
http://147.135.36.188:3400/instrument_allocations.jpg


2. Hibernating the process after processing N packets (varying N=10..128).
The HIT rates went above 90% immediately.
http://147.135.36.188:3400/observer_cli_AFTER.jpg

What is the effect of hibernating this process on the long term? 
This process is receiving about ~1200 packets/sec under normal load and can reach ~3000 packets/sec under heavy load.

Is there a better way of solving the problem by tweeting the bin allocator SBC/MBC?


/Frank
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

If you move the creation of temporary binaries out of any Erlang processes you have that are long-lived, into short-lived Erlang processes, you would no longer have this problem.  The tuning discussions, allocator options, hibernate use, etc. is not solving the cause of the problem.  Source code should not need to call garbage:collect/0 and using temporary Erlang processes makes the garbage collection occur naturally, at a pace that shouldn't require special tuning.

Best Regards,
Michael

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: VM leaking memory

Frank Muller
In reply to this post by Fred Hebert-2
Hi Fred

I will implement your hibernation idea every N-minutes on monday and report back.

Thanks again.

/Frank



On Fri, Feb 1, 2019 at 11:11 AM Frank Muller <[hidden email]> wrote:

2. Hibernating the process after processing N packets (varying N=10..128).
The HIT rates went above 90% immediately.
http://147.135.36.188:3400/observer_cli_AFTER.jpg

What is the effect of hibernating this process on the long term? 
This process is receiving about ~1200 packets/sec under normal load and can reach ~3000 packets/sec under heavy load.

Is there a better way of solving the problem by tweeting the bin allocator SBC/MBC?

So hibernation will do a few things:

- a full-sweep garbage collection
- drop the stack
- memory compaction on the process.

Unless specified otherwise, a call to `erlang:garbage_collect(Pid)` forces a major collection so it may look like what you have is more or less a case of a process spiking with a lot of memory, and then becoming mostly idle while still holding enough references to refc binaries to not GC old data. Rince and repeat and you get a lot of old stuff.

Fragmentation of this kind is often resolved with settings such as `+MBas aobf +MBlmbcs 512` being passed to the emulator, which changes the allocation strategy for one that favors lower addresses, and reduces the size of a multiblock carrier to use more of them. The objective of this being to reuse existing blocks and make it easier to deallocate some by reducing the chance some binary keeps it active.

If what you have is really one process though, you may get better results by running some hibernation from time to time, but only experimentation will confirm. If the allocator strategies don't cut it (or can't be used because you want to keep the 5 years live upgrade streak going), do something like count the packets you received, and every N of them (pick a large value so you might run a compaction once every 10 minutes or so; you can pick based on the leaking rate), force a hibernation to shed some memory.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: VM leaking memory

Michael Truog
In reply to this post by Frank Muller
On 2/1/19 10:53 PM, Frank Muller wrote:

> Hi Michael
>
> All packets in transit have a “seq_id” (sequential number).
>
> This means that in theory packet1, packet2...packetN can be checked in
> parallel and in any order (which is not the case in my current
> design), but they must be send to the next processing stage in order:
> packet1 first, then packet2...
>
> I would love to hear from you how can I turn this long-lived process
> to multiple short-lived ones while enforcing ordering.

An easy way to think about it comes from a module I used in the past
called immediate_gc
(https://gist.github.com/okeuday/dee991d580eeb00cd02c).  The sync_fun/2
function is below:


sync_fun(F, A) when is_function(F), is_list(A) ->
     Parent = self(),
     Child = erlang:spawn_opt(fun() ->
         Parent ! {self(), erlang:apply(F, A)},
         erlang:garbage_collect()
     end, [link, {fullsweep_after, 0}]),
     receive
         {Child, Result} -> Result
     end.

That is all you need to use a temporary process in a blocking way that
consumes all the temporary binary data as quickly as the BEAM allows. 
However, that example is more complex than it needs to be, with the
child process using the fullsweep_after option and
erlang:garbage_collect/0.  The extra complexity in the example is really
not necessary or desirable, though it does force the garbage collection
to occur as quickly as possible when consuming the temporary binary data
(binary data that nothing else references).

Spawn a similar child process before you start decoding a large binary,
so the temporary Erlang process has a lifetime the length of the single
request (packet in your situation) or less. Spawning Erlang processes is
cheap, so you shouldn't hesitate to use them, just ensure they are
linked so their failures may be tracked.

CloudI (https://cloudi.org) internal services use temporary processes
for handling service requests, in a way that is tunable with the service
configuration options request_pid_uses and info_pid_uses , so you can
control how many requests are processed in a temporary Erlang process
before a new one is created (with the exit exception being used to
terminate the Erlang process with its last result).  CloudI internal
services also have the hibernate service configuration option, with the
hibernate based on request rate checked every few seconds (the service
configuration options are described at
http://cloudi.org/api.html#2_services_add_config_opts).

The idea of using a temporary Erlang process for consuming temporary
binary data, is likely unusual for people new to Erlang/Elixir and may
not have found its way into Erlang/Elixir books, though it is important
to know about if you want to avoid excessive memory consumption (and
potentially causing the BEAM to die due to memory use).

Best Regards,
Michael


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: VM leaking memory

Frank Muller
Hello Michael

Thanks again for the tip. I will update my design accordingly and let you know how it goes. 

/Frank

On 2/1/19 10:53 PM, Frank Muller wrote:
> Hi Michael
>
> All packets in transit have a “seq_id” (sequential number).
>
> This means that in theory packet1, packet2...packetN can be checked in
> parallel and in any order (which is not the case in my current
> design), but they must be send to the next processing stage in order:
> packet1 first, then packet2...
>
> I would love to hear from you how can I turn this long-lived process
> to multiple short-lived ones while enforcing ordering.

An easy way to think about it comes from a module I used in the past
called immediate_gc
(https://gist.github.com/okeuday/dee991d580eeb00cd02c).  The sync_fun/2
function is below:


sync_fun(F, A) when is_function(F), is_list(A) ->
     Parent = self(),
     Child = erlang:spawn_opt(fun() ->
         Parent ! {self(), erlang:apply(F, A)},
         erlang:garbage_collect()
     end, [link, {fullsweep_after, 0}]),
     receive
         {Child, Result} -> Result
     end.

That is all you need to use a temporary process in a blocking way that
consumes all the temporary binary data as quickly as the BEAM allows. 
However, that example is more complex than it needs to be, with the
child process using the fullsweep_after option and
erlang:garbage_collect/0.  The extra complexity in the example is really
not necessary or desirable, though it does force the garbage collection
to occur as quickly as possible when consuming the temporary binary data
(binary data that nothing else references).

Spawn a similar child process before you start decoding a large binary,
so the temporary Erlang process has a lifetime the length of the single
request (packet in your situation) or less. Spawning Erlang processes is
cheap, so you shouldn't hesitate to use them, just ensure they are
linked so their failures may be tracked.

CloudI (https://cloudi.org) internal services use temporary processes
for handling service requests, in a way that is tunable with the service
configuration options request_pid_uses and info_pid_uses , so you can
control how many requests are processed in a temporary Erlang process
before a new one is created (with the exit exception being used to
terminate the Erlang process with its last result).  CloudI internal
services also have the hibernate service configuration option, with the
hibernate based on request rate checked every few seconds (the service
configuration options are described at
http://cloudi.org/api.html#2_services_add_config_opts).

The idea of using a temporary Erlang process for consuming temporary
binary data, is likely unusual for people new to Erlang/Elixir and may
not have found its way into Erlang/Elixir books, though it is important
to know about if you want to avoid excessive memory consumption (and
potentially causing the BEAM to die due to memory use).

Best Regards,
Michael



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: VM leaking memory

Frank Muller
Hello Michael and Fred

Combing both solutions gave the best result. No need to even touch the memory allocators at all and the system look stable for the last days.

Michael: I’m passing a sub_binary to the child process created with your sync_fun/2. Is it still a good idea to call binary:copy/1 on this sub_bin inside the Fun? I think it’s useless because this process is garbage collected ASAP anyway.

/Frank

Hello Michael

Thanks again for the tip. I will update my design accordingly and let you know how it goes. 

/Frank

On 2/1/19 10:53 PM, Frank Muller wrote:
> Hi Michael
>
> All packets in transit have a “seq_id” (sequential number).
>
> This means that in theory packet1, packet2...packetN can be checked in
> parallel and in any order (which is not the case in my current
> design), but they must be send to the next processing stage in order:
> packet1 first, then packet2...
>
> I would love to hear from you how can I turn this long-lived process
> to multiple short-lived ones while enforcing ordering.

An easy way to think about it comes from a module I used in the past
called immediate_gc
(https://gist.github.com/okeuday/dee991d580eeb00cd02c).  The sync_fun/2
function is below:


sync_fun(F, A) when is_function(F), is_list(A) ->
     Parent = self(),
     Child = erlang:spawn_opt(fun() ->
         Parent ! {self(), erlang:apply(F, A)},
         erlang:garbage_collect()
     end, [link, {fullsweep_after, 0}]),
     receive
         {Child, Result} -> Result
     end.

That is all you need to use a temporary process in a blocking way that
consumes all the temporary binary data as quickly as the BEAM allows. 
However, that example is more complex than it needs to be, with the
child process using the fullsweep_after option and
erlang:garbage_collect/0.  The extra complexity in the example is really
not necessary or desirable, though it does force the garbage collection
to occur as quickly as possible when consuming the temporary binary data
(binary data that nothing else references).

Spawn a similar child process before you start decoding a large binary,
so the temporary Erlang process has a lifetime the length of the single
request (packet in your situation) or less. Spawning Erlang processes is
cheap, so you shouldn't hesitate to use them, just ensure they are
linked so their failures may be tracked.

CloudI (https://cloudi.org) internal services use temporary processes
for handling service requests, in a way that is tunable with the service
configuration options request_pid_uses and info_pid_uses , so you can
control how many requests are processed in a temporary Erlang process
before a new one is created (with the exit exception being used to
terminate the Erlang process with its last result).  CloudI internal
services also have the hibernate service configuration option, with the
hibernate based on request rate checked every few seconds (the service
configuration options are described at
http://cloudi.org/api.html#2_services_add_config_opts).

The idea of using a temporary Erlang process for consuming temporary
binary data, is likely unusual for people new to Erlang/Elixir and may
not have found its way into Erlang/Elixir books, though it is important
to know about if you want to avoid excessive memory consumption (and
potentially causing the BEAM to die due to memory use).

Best Regards,
Michael



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: VM leaking memory

Michael Truog
On 2/6/19 2:53 AM, Frank Muller wrote:
> Hello Michael and Fred
>
> Combing both solutions gave the best result. No need to even touch the
> memory allocators at all and the system look stable for the last days.
>
> Michael: I’m passing a sub_binary to the child process created with
> your sync_fun/2. Is it still a good idea to call binary:copy/1 on this
> sub_bin inside the Fun? I think it’s useless because this process is
> garbage collected ASAP anyway.

It isn't necessary.  It should be better and simpler to only create a
sub_binary inside the temporary process, but it may depend on your
situation.  The reference counting will ensure all the binaries you need
after the temporary process is done are kept and you don't need to do
anything special for that to happen (only keep all the temporary
binaries inside the temporary process).

Best Regards,
Michael
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: VM leaking memory

Frank Muller
Hi Michael

My sub_bin is built outside, but visible from inside the Fun’s scope.

/Frank
 
On 2/6/19 2:53 AM, Frank Muller wrote:
> Hello Michael and Fred
>
> Combing both solutions gave the best result. No need to even touch the
> memory allocators at all and the system look stable for the last days.
>
> Michael: I’m passing a sub_binary to the child process created with
> your sync_fun/2. Is it still a good idea to call binary:copy/1 on this
> sub_bin inside the Fun? I think it’s useless because this process is
> garbage collected ASAP anyway.

It isn't necessary.  It should be better and simpler to only create a
sub_binary inside the temporary process, but it may depend on your
situation.  The reference counting will ensure all the binaries you need
after the temporary process is done are kept and you don't need to do
anything special for that to happen (only keep all the temporary
binaries inside the temporary process).

Best Regards,
Michael

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
12