Memory usage in OTP 21

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory usage in OTP 21

Devon Estes
Hey everyone,

First off, I would like to apologize for the following code example in Elixir - it's the language I know best, and where we're having the problem.

Anyway, I'm one of the maintainers of an Elixir benchmarking tool called Benchee. A few months ago we added memory measurement as a feature in our benchmarking tool. However, with the release of OTP 21, we're seeing some measurements that seem very strange. For example, according to our measurements, the following function uses 0 bytes of memory: `Enum.to_list(1..10)`. On OTP 20, this always uses between 350-380 byes depending on the platform. There are a few other instances of these kinds of functions which we believe should be using memory somewhere, but the results that we get back from our measurements say they are not using any memory, and all of them are around functions that in OTP 20 used very little memory. We are also seeing somewhat frequently what appears to be _negative_ net memory usage, which again seems really strange.

So, is this some sort of optimization that we're missing, or is there somewhere else (maybe in a heap fragment?) that these structures might be stored? And if they are somewhere else, is it possible to measure this memory usage?

At the moment we're measuring memory usage by using `erlang:trace/3` to listen to the garbage collection events, and calculate the used memory from there, and adding any remaining used memory on the heap at the end of the function (after the last GC run).

Thanks again for any help y'all might be able to offer!

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage in OTP 21

Jesper Louis Andersen-2
I'm pretty sure that's not it, but a function such as

Enum.to_list(1..10)

contains an enumeration which is a constant and to_list can be unfolded to produce [1, ..., 10].

Since that is a constant it ends up as a literal in the beam bytecode and thus it never ever generates any garbage when called. I'm pretty sure that Erlang compiler is not smart enough to make this unfolding, but it doesn't take a lot of work to make a compiler constant fold such a case. Especially if it is common in code since compiler developers tend to target that. Another common trick is if escape analysis shows the result doesn't outlive its scope in which case data can be stack-allocated, making it far more unlikely to produce heap bump allocation and thus trigger the GC.

This assumption can be verifed by disassembly of the beam bytecode and looking for what the system is doing.

The negative allocation sounds strange to me though. That warrants investigation in what the trace calls are returning IMO to verify it happens at that level or lower.

On Thu, Jul 19, 2018 at 7:42 AM Devon Estes <[hidden email]> wrote:
Hey everyone,

First off, I would like to apologize for the following code example in Elixir - it's the language I know best, and where we're having the problem.

Anyway, I'm one of the maintainers of an Elixir benchmarking tool called Benchee. A few months ago we added memory measurement as a feature in our benchmarking tool. However, with the release of OTP 21, we're seeing some measurements that seem very strange. For example, according to our measurements, the following function uses 0 bytes of memory: `Enum.to_list(1..10)`. On OTP 20, this always uses between 350-380 byes depending on the platform. There are a few other instances of these kinds of functions which we believe should be using memory somewhere, but the results that we get back from our measurements say they are not using any memory, and all of them are around functions that in OTP 20 used very little memory. We are also seeing somewhat frequently what appears to be _negative_ net memory usage, which again seems really strange.

So, is this some sort of optimization that we're missing, or is there somewhere else (maybe in a heap fragment?) that these structures might be stored? And if they are somewhere else, is it possible to measure this memory usage?

At the moment we're measuring memory usage by using `erlang:trace/3` to listen to the garbage collection events, and calculate the used memory from there, and adding any remaining used memory on the heap at the end of the function (after the last GC run).

Thanks again for any help y'all might be able to offer!
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--
J.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage in OTP 21

Lukas Larsson-8
In reply to this post by Devon Estes
Hello,

On Thu, Jul 19, 2018 at 7:42 AM Devon Estes <[hidden email]> wrote:
Hey everyone,

First off, I would like to apologize for the following code example in Elixir - it's the language I know best, and where we're having the problem.

Anyway, I'm one of the maintainers of an Elixir benchmarking tool called Benchee. A few months ago we added memory measurement as a feature in our benchmarking tool. However, with the release of OTP 21, we're seeing some measurements that seem very strange. For example, according to our measurements, the following function uses 0 bytes of memory: `Enum.to_list(1..10)`. On OTP 20, this always uses between 350-380 byes depending on the platform. There are a few other instances of these kinds of functions which we believe should be using memory somewhere, but the results that we get back from our measurements say they are not using any memory, and all of them are around functions that in OTP 20 used very little memory. We are also seeing somewhat frequently what appears to be _negative_ net memory usage, which again seems really strange.

So, is this some sort of optimization that we're missing, or is there somewhere else (maybe in a heap fragment?) that these structures might be stored? And if they are somewhere else, is it possible to measure this memory usage?

At the moment we're measuring memory usage by using `erlang:trace/3` to listen to the garbage collection events, and calculate the used memory from there, and adding any remaining used memory on the heap at the end of the function (after the last GC run).

Could you please provide a minimal example showcasing the behaviour? I can't think of any optimization done that would explain the change in behaviour.

One thing that you could try is to compile the code using OTP 20 and run in on OTP 21. That way we'll know if it is the compiler that has gotten better, or if the run-time somehow is responsible for the change.

Lukas

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage in OTP 21

Tobias Pfeiffer
In reply to this post by Jesper Louis Andersen-2
Hi there and thank you for your help!

I already disassembled it to make sure it's not inlined (I had this with
another benchmark) and it's not:

   {:function, :test, 0, 10,
    [
      {:label, 9},
      {:line, [{:location, 'nofile', 2}]},
      {:func_info, {:atom, Test}, {:atom, :test}, 0},
      {:label, 10},
      {:move, {:literal, 1..10}, {:x, 0}},
      {:line, [{:location, 'nofile', 3}]},
      {:call_ext_only, 1, {:extfunc, Enum, :to_list, 1}}
    ]},


Disassembled with this code:
https://github.com/PragTob/elixir_playground/blob/master/lib/asm.ex

I'll try to give some more background information on the problem we have
in the coming days.

Dank you everyone!
Tobi

On 07/19/2018 05:28 PM, Jesper Louis Andersen wrote:

> I'm pretty sure that's not it, but a function such as
>
> Enum.to_list(1..10)
>
> contains an enumeration which is a constant and to_list can be unfolded
> to produce [1, ..., 10].
>
> Since that is a constant it ends up as a literal in the beam bytecode
> and thus it never ever generates any garbage when called. I'm pretty
> sure that Erlang compiler is not smart enough to make this unfolding,
> but it doesn't take a lot of work to make a compiler constant fold such
> a case. Especially if it is common in code since compiler developers
> tend to target that. Another common trick is if escape analysis shows
> the result doesn't outlive its scope in which case data can be
> stack-allocated, making it far more unlikely to produce heap bump
> allocation and thus trigger the GC.
>
> This assumption can be verifed by disassembly of the beam bytecode and
> looking for what the system is doing.
>
> The negative allocation sounds strange to me though. That warrants
> investigation in what the trace calls are returning IMO to verify it
> happens at that level or lower.
>
> On Thu, Jul 19, 2018 at 7:42 AM Devon Estes <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hey everyone,
>
>     First off, I would like to apologize for the following code example
>     in Elixir - it's the language I know best, and where we're having
>     the problem.
>
>     Anyway, I'm one of the maintainers of an Elixir benchmarking tool
>     called Benchee. A few months ago we added memory measurement as a
>     feature in our benchmarking tool. However, with the release of OTP
>     21, we're seeing some measurements that seem very strange. For
>     example, according to our measurements, the following function uses
>     0 bytes of memory: `Enum.to_list(1..10)`. On OTP 20, this always
>     uses between 350-380 byes depending on the platform. There are a few
>     other instances of these kinds of functions which we believe should
>     be using memory somewhere, but the results that we get back from our
>     measurements say they are not using any memory, and all of them are
>     around functions that in OTP 20 used very little memory. We are also
>     seeing somewhat frequently what appears to be _negative_ net memory
>     usage, which again seems really strange.
>
>     So, is this some sort of optimization that we're missing, or is
>     there somewhere else (maybe in a heap fragment?) that these
>     structures might be stored? And if they are somewhere else, is it
>     possible to measure this memory usage?
>
>     At the moment we're measuring memory usage by using `erlang:trace/3`
>     to listen to the garbage collection events, and calculate the used
>     memory from there, and adding any remaining used memory on the heap
>     at the end of the function (after the last GC run).
>
>     Thanks again for any help y'all might be able to offer!
>     _______________________________________________
>     erlang-questions mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> --
> J.
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>

--
http://www.pragtob.info/
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage in OTP 21

Devon Estes
I now have an idea as to what's going on here. While looking at a function that uses a little bit more memory, we're not getting any tracing messages that indicate garbage collection is happening. And yet, when I check the process's garbage collection info immediately after executing the function, I see this:

[
  old_heap_block_size: 610,
  heap_block_size: 376,
  mbuf_size: 0,
  recent_size: 100,
  stack_size: 14,
  old_heap_size: 58,
  heap_size: 177,
  bin_vheap_size: 0,
  bin_vheap_block_size: 46422,
  bin_old_vheap_size: 0,
  bin_old_vheap_block_size: 46422
]

The fact that the heap_block_size has been increased from 233 to 376 words, and that there's an old_heap that's been allocated and now has data on it, indicates that garbage collection happened, but our tracer process was never sent a message. I tried flushing out the process mailbox for our tracer to see if maybe we were just missing the message, and there was still nothing in there after the function was run. I also tried on a much larger function that generated a lot of data, and I did see plenty of garbage collection messages sent to our tracer in that case, both minor and major GC runs.

So, is it possible that the tracer is configured incorrectly? When we set the tracer we're using `:erlang.trace(pid, true, [:garbage_collection, tracer: self()])` (which I think is `erlang:trace(Pid, true, [garbage_collection, {tracer, self()}])` in Erlang syntax if that's more understandable for y'all).

This would explain the negative measurements, as there might have been some data on the heap that was collected away, but we were never received any messages about it. Also, this behavior is showing up in both OTP 20 and 21, but it appears to be more pronounced in OTP 21.

I hope this helps jog some ideas as to what might be going on here, and thanks again for any help y'all can offer.

Cheers,
Devon

On Sat, Jul 21, 2018 at 12:45 PM Tobias Pfeiffer <[hidden email]> wrote:
Hi there and thank you for your help!

I already disassembled it to make sure it's not inlined (I had this with
another benchmark) and it's not:

   {:function, :test, 0, 10,
    [
      {:label, 9},
      {:line, [{:location, 'nofile', 2}]},
      {:func_info, {:atom, Test}, {:atom, :test}, 0},
      {:label, 10},
      {:move, {:literal, 1..10}, {:x, 0}},
      {:line, [{:location, 'nofile', 3}]},
      {:call_ext_only, 1, {:extfunc, Enum, :to_list, 1}}
    ]},


Disassembled with this code:
https://github.com/PragTob/elixir_playground/blob/master/lib/asm.ex

I'll try to give some more background information on the problem we have
in the coming days.

Dank you everyone!
Tobi

On 07/19/2018 05:28 PM, Jesper Louis Andersen wrote:
> I'm pretty sure that's not it, but a function such as
>
> Enum.to_list(1..10)
>
> contains an enumeration which is a constant and to_list can be unfolded
> to produce [1, ..., 10].
>
> Since that is a constant it ends up as a literal in the beam bytecode
> and thus it never ever generates any garbage when called. I'm pretty
> sure that Erlang compiler is not smart enough to make this unfolding,
> but it doesn't take a lot of work to make a compiler constant fold such
> a case. Especially if it is common in code since compiler developers
> tend to target that. Another common trick is if escape analysis shows
> the result doesn't outlive its scope in which case data can be
> stack-allocated, making it far more unlikely to produce heap bump
> allocation and thus trigger the GC.
>
> This assumption can be verifed by disassembly of the beam bytecode and
> looking for what the system is doing.
>
> The negative allocation sounds strange to me though. That warrants
> investigation in what the trace calls are returning IMO to verify it
> happens at that level or lower.
>
> On Thu, Jul 19, 2018 at 7:42 AM Devon Estes <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hey everyone,
>
>     First off, I would like to apologize for the following code example
>     in Elixir - it's the language I know best, and where we're having
>     the problem.
>
>     Anyway, I'm one of the maintainers of an Elixir benchmarking tool
>     called Benchee. A few months ago we added memory measurement as a
>     feature in our benchmarking tool. However, with the release of OTP
>     21, we're seeing some measurements that seem very strange. For
>     example, according to our measurements, the following function uses
>     0 bytes of memory: `Enum.to_list(1..10)`. On OTP 20, this always
>     uses between 350-380 byes depending on the platform. There are a few
>     other instances of these kinds of functions which we believe should
>     be using memory somewhere, but the results that we get back from our
>     measurements say they are not using any memory, and all of them are
>     around functions that in OTP 20 used very little memory. We are also
>     seeing somewhat frequently what appears to be _negative_ net memory
>     usage, which again seems really strange.
>
>     So, is this some sort of optimization that we're missing, or is
>     there somewhere else (maybe in a heap fragment?) that these
>     structures might be stored? And if they are somewhere else, is it
>     possible to measure this memory usage?
>
>     At the moment we're measuring memory usage by using `erlang:trace/3`
>     to listen to the garbage collection events, and calculate the used
>     memory from there, and adding any remaining used memory on the heap
>     at the end of the function (after the last GC run).
>
>     Thanks again for any help y'all might be able to offer!
>     _______________________________________________
>     erlang-questions mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> --
> J.
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>

--
http://www.pragtob.info/

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage in OTP 21

Lukas Larsson-8
Hello,

On Fri, Jul 27, 2018 at 10:11 PM Devon Estes <[hidden email]> wrote:
I hope this helps jog some ideas as to what might be going on here, and thanks again for any help y'all can offer.

I think I have figured out what is going on. It seems to be the allocation of the term containing the garbage_collection_info that is messing with measurements. I'll try to change the code inside process_info to give a more accurate view of the current memory usage, however as long as the call to process_info happens in the process doing the call there will be some miss match.

I would suggest that you instead do a call to erlang:garbage_collect() before and after the benchmark is run and rely on tracing to get the correct information. Alternatively you could call erlang:process_info on the process from the trace collector, though then you would need some type of synchronization in between the processes.

I have not been able to reproduce the scenario you mentioned where the heap_block_size is increased without getting a trace message. What code did you run to trigger that? As always an Erlang example would be preferable, but I should manage with an Elixir one as well.

Lukas

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage in OTP 21

Devon Estes
Thanks so much for this! That's probably a good idea calling `erlang:garbage_collect()` and relying on the trace information. I'll play around with that to see how that affects things.

I'm seeing the behavior with the heap_block_size on this branch: https://github.com/PragTob/benchee/tree/memory-measurement-issues

We have some samples in there, usually to show how to use benchee, but I've added one to reproduce this issue (in `samples/wtf.exs`). If you run `mix run samples/wtf.exs`, you'll see the process_info data printed to the console, both before and after the function to be measured is run, and you won't see any indication that a garbage collection event happened. If you bump up the size of the range in the sample `samples/wtf.exs` to `1..5000`, you'll see that 9 GC runs take place (7 minor, 2 major).

Also, if it helps, the code around memory measurement is all in `lib/benchee/benchmark/measure/memory.ex`.

I'll try tomorrow to write this up in Erlang to reproduce it that way as well.

On Tue, Jul 31, 2018 at 9:39 AM Lukas Larsson <[hidden email]> wrote:
Hello,

On Fri, Jul 27, 2018 at 10:11 PM Devon Estes <[hidden email]> wrote:
I hope this helps jog some ideas as to what might be going on here, and thanks again for any help y'all can offer.

I think I have figured out what is going on. It seems to be the allocation of the term containing the garbage_collection_info that is messing with measurements. I'll try to change the code inside process_info to give a more accurate view of the current memory usage, however as long as the call to process_info happens in the process doing the call there will be some miss match.

I would suggest that you instead do a call to erlang:garbage_collect() before and after the benchmark is run and rely on tracing to get the correct information. Alternatively you could call erlang:process_info on the process from the trace collector, though then you would need some type of synchronization in between the processes.

I have not been able to reproduce the scenario you mentioned where the heap_block_size is increased without getting a trace message. What code did you run to trigger that? As always an Erlang example would be preferable, but I should manage with an Elixir one as well.

Lukas

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage in OTP 21

Devon Estes
And here's a gist with an Erlang reproduction: https://gist.github.com/devonestes/abe4dc60484e5426d4890c4494430194. When I run this escript as is, I see that the `recent_size` is different (which is only supposed to change after a GC run, right?) but no trace messages indicating that garbage collection took place. However, with this Erlang example, the `heap_block_size` and `old_heap_block_size` remain the same. When I bump up the size of the list I'm mapping over (ToMap) from 50 to 100, I see garbage collection trace messages indicating that a single minor collection took place.

On Tue, Jul 31, 2018 at 8:08 PM Devon Estes <[hidden email]> wrote:
Thanks so much for this! That's probably a good idea calling `erlang:garbage_collect()` and relying on the trace information. I'll play around with that to see how that affects things.

I'm seeing the behavior with the heap_block_size on this branch: https://github.com/PragTob/benchee/tree/memory-measurement-issues

We have some samples in there, usually to show how to use benchee, but I've added one to reproduce this issue (in `samples/wtf.exs`). If you run `mix run samples/wtf.exs`, you'll see the process_info data printed to the console, both before and after the function to be measured is run, and you won't see any indication that a garbage collection event happened. If you bump up the size of the range in the sample `samples/wtf.exs` to `1..5000`, you'll see that 9 GC runs take place (7 minor, 2 major).

Also, if it helps, the code around memory measurement is all in `lib/benchee/benchmark/measure/memory.ex`.

I'll try tomorrow to write this up in Erlang to reproduce it that way as well.

On Tue, Jul 31, 2018 at 9:39 AM Lukas Larsson <[hidden email]> wrote:
Hello,

On Fri, Jul 27, 2018 at 10:11 PM Devon Estes <[hidden email]> wrote:
I hope this helps jog some ideas as to what might be going on here, and thanks again for any help y'all can offer.

I think I have figured out what is going on. It seems to be the allocation of the term containing the garbage_collection_info that is messing with measurements. I'll try to change the code inside process_info to give a more accurate view of the current memory usage, however as long as the call to process_info happens in the process doing the call there will be some miss match.

I would suggest that you instead do a call to erlang:garbage_collect() before and after the benchmark is run and rely on tracing to get the correct information. Alternatively you could call erlang:process_info on the process from the trace collector, though then you would need some type of synchronization in between the processes.

I have not been able to reproduce the scenario you mentioned where the heap_block_size is increased without getting a trace message. What code did you run to trigger that? As always an Erlang example would be preferable, but I should manage with an Elixir one as well.

Lukas

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage in OTP 21

Lukas Larsson-8


On Wed, Aug 1, 2018 at 12:30 PM Devon Estes <[hidden email]> wrote:
And here's a gist with an Erlang reproduction: https://gist.github.com/devonestes/abe4dc60484e5426d4890c4494430194. When I run this escript as is, I see that the `recent_size` is different (which is only supposed to change after a GC run, right?) but no trace messages indicating that garbage collection took place. However, with this Erlang example, the `heap_block_size` and `old_heap_block_size` remain the same. When I bump up the size of the list I'm mapping over (ToMap) from 50 to 100, I see garbage collection trace messages indicating that a single minor collection took place.

I think that is because you do not synchronize the start of the tracer with the benchmark. i.e. by the time that the tracer is started and applies the trace the benchmark has already completed. I forked your gist and updated it here: https://gist.github.com/garazdawi/01467ecd6b8c7bcb08ddc8f3eb03d1af. With that change i get trace messages from three collections.






_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Memory usage in OTP 21

Devon Estes
Well, between the race condition that you spotted and removing the calls to `process_info/2` and replacing them with calls to `erlang:garbage_collect/0`, we've now got this working like a charm. Thank you so much for the help!

On Wed, Aug 1, 2018 at 1:35 PM Lukas Larsson <[hidden email]> wrote:
On Wed, Aug 1, 2018 at 12:30 PM Devon Estes <[hidden email]> wrote:
And here's a gist with an Erlang reproduction: https://gist.github.com/devonestes/abe4dc60484e5426d4890c4494430194. When I run this escript as is, I see that the `recent_size` is different (which is only supposed to change after a GC run, right?) but no trace messages indicating that garbage collection took place. However, with this Erlang example, the `heap_block_size` and `old_heap_block_size` remain the same. When I bump up the size of the list I'm mapping over (ToMap) from 50 to 100, I see garbage collection trace messages indicating that a single minor collection took place.

I think that is because you do not synchronize the start of the tracer with the benchmark. i.e. by the time that the tracer is started and applies the trace the benchmark has already completed. I forked your gist and updated it here: https://gist.github.com/garazdawi/01467ecd6b8c7bcb08ddc8f3eb03d1af. With that change i get trace messages from three collections.






_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions