# Cost of Copy Classic List Threaded 12 messages Open this post in threaded view
|

## Cost of Copy

 I am still honestly marveling about this. In case anyone finds time to answer. Especially on the theory that more deeply nested structures will be faster to mutate. As fewer internal 'sibling'-pointers will have to be copied. http://www.trapexit.org/forum/viewtopic.php?p=55001Thanks, Henning
Open this post in threaded view
|

## Re: Cost of Copy

 If you ignore message passing, then the copying/sharing issues in Erlang are just about the same as in Lisp and Scheme, so you may find some information that way. The short version is that some values are represented directly in a memory "cell," like atoms and (most) integers. Others are pointers to data stored elsewhere (list elements, or conses, which are two consecutive cells in memory) or tuples (N consecutive cells plus one for the size). Just by looking at code you can see where conses and tuples are created. If you use an existing "pointer" to data, then that data is not copied. Examples: B = {A, A, A} This creates a three element tuple (four memory cells). It doesn't matter if A is a list of a million elements or a tuple or the number 2. B just contains three cell-sized values (plus the size of the tuple). X = [57|List] This creates a single cons (2 cells). The first element is 57. The second is a pointer to List. So even though you've got two lists (X and List), all of the data is shared except for the first cons of X.
Open this post in threaded view
|

## Re: Cost of Copy

 Thanks a lot James! am I right then to conclude that mutating something in a list will be the faster the more the data is spread out deeper over several levels? A = { B, C, D, E, F, G, H } will be slower to change than, say A = { { B, C }, { D, E },  {F, G, H } } because if I change e.g. B, only { B', C } will be created as new   internal list of pointers, as opposed to the whole of { B', C, D, E, F, G, H }. Which should for larger lists and tuples, and frequent changes, start to make a difference? So this would be the fastes option? A = { { B }, { C }, { D }, { E },  { F }, { G }, { H } } And if the tuple may be expected to change in size (elements not only changeing but being added/deleted), this: A = { { { { B }, { C } }, { { D }, { E } } }, { { { F }, { G } }, { { H } } } } Maybe I am wrong that there may be a difference between tuples and lists, but is it  the same for A = [ B, C, D, E, F, G, H ]  slower to change than A = [ { B, C }, { D, E },  { F, G, H } ] ? Or are list-elements changed faster, in general, virtually 'in-place' as only one pointer is replaced? Hardly I guess, because the list itself must not be changed? So the whole new list must be copied anyway, just like a tuple. Thanks for your take, Henning James Hague wrote: > If you ignore message passing, then the copying/sharing issues in Erlang are > just about the same as in Lisp and Scheme, so you may find some information > that way. > > The short version is that some values are represented directly in a memory > "cell," like atoms and (most) integers. Others are pointers to data stored > elsewhere (list elements, or conses, which are two consecutive cells in > memory) or tuples (N consecutive cells plus one for the size). Just by > looking at code you can see where conses and tuples are created. If you use > an existing "pointer" to data, then that data is not copied. > > Examples: > > B = {A, A, A} > This creates a three element tuple (four memory cells). It doesn't matter if > A is a list of a million elements or a tuple or the number 2. B just > contains three cell-sized values (plus the size of the tuple). > > X = [57|List] > This creates a single cons (2 cells). The first element is 57. The second is > a pointer to List. So even though you've got two lists (X and List), all of > the data is shared except for the first cons of X. > >
Open this post in threaded view
|

## Re: Cost of Copy

 On Thu, May 27, 2010 at 9:09 PM, Henning Diedrich <[hidden email]>wrote: > Thanks a lot James! > > am I right then to conclude that mutating something in a list will be the > faster the more the data is spread out deeper over several levels? > > > You can't mutate! So your examples don't really make sense. You can only build new structures. You cannot change them once their built. For mutation you need to look at ets, dets, mnesia, processes and messages. Robby
Open this post in threaded view
|

## Re: Cost of Copy

 In reply to this post by Henning Diedrich-2 On Thu, 27 May 2010 22:09:21 +0200, Henning Diedrich <[hidden email]>   wrote: > A = { { B, C }, { D, E },  {F, G, H } } > because if I change e.g. B, only { B', C } will be created as new > internal list of pointers, as opposed to the whole of { B', C, D, E, F, > G, H }. Which should for larger lists and tuples, and frequent changes, > start to make a difference? Not exactly.  Lets say we have A as above and want to construct a new tuple X = {{B', C}, {D, E},  {F, G, H }} What we then have to do is construct a new tuple with three elements X = {  _ , _ , _} We then have to construct a new tuple for {B', C} X = {{_, _}, _, _} and then we of course have to construct B' X = {{B',_}, _,_} so we have constructed two tuples, one of size tree and one of size two   (plus B'), all other structurs C, {D,E} and {F,G,H} are shared. If we instead have A = {B, C, D, E, F, G, H } and want to construct X = {B', C, D, E, F, G, H} we construct one new tuple X = {_, _, _ , _, _, _, _} and then B' X = {B', _, _ , _, _, _, _} so now we have only constructed one tuple of size seven. All other   structures C,D,E,F,G,H are shared. The difference is in this case marginal (and I don't know which one   executes faster). If we look at larger structures the difference become   apparent. Assume you have a N elements in a set and want to update these   element one by one. One way is to represent this a tuple of size N. Each   update operation would then construct a new tuple of size N. All elements   in the tuple would be shared but the new tuple has one element that   differs. If we instead represent this as a list of length N we would have to   construct a new list to the point where we have located the element that   we want to change. Assume we want to change E: A = [B | [C | [D | [E | [F | [G | [H| []]]]]]]] X = [_ | [_ | [_ | [E' | _]]]] Depending in how far down E is found we have to construct new list cells   for all element before E. In the example above this is four list cells   [_|_]. In average thins means N/2 new list cells for each update. This is for   large N more efficient than having to construct a new tuple of size N. A even better way, if N is large, is to construct a tree. A = {tree,        {tree,           {tree, {leaf, B}, {leaf, C}},           {tree, {leaf, D}, {leaf, E}}}        {tree,           {tree, {leaf, F}, {leaf, G}},           {leaf, H}}} Now if we want to change E then we have A = {tree,        {tree,           _,           {tree, _, {leaf, E'}}}        _} So we construct new tree tuples all the way down to the location of E. All   other structures are shared. In general this means that we will construct   lg(N) new tree tuples. So a tuple of size N, N/2 new list cells or lg(N) new tree tuples. The data structures of course have other implications when it comes to   reading and adding elements.   Johan -- Associate Professor Johan Montelius Royal Institute of Technology - KTH School of Information and Communication Technology - ICT ________________________________________________________________ erlang-questions (at) erlang.org mailing list. See http://www.erlang.org/faq.htmlTo unsubscribe; mailto:[hidden email]
Open this post in threaded view
|

## Re: Cost of Copy

 On Fri, May 28, 2010 at 6:15 PM, Johan Montelius <[hidden email]> wrote: > If we instead have > > A = {B, C, D, E, F, G, H } > > and want to construct > > X = {B', C, D, E, F, G, H} > > we construct one new tuple > > X = {_, _, _ , _, _, _, _} > > and then B' > > X = {B', _, _ , _, _, _, _} > > so now we have only constructed one tuple of size seven. All other > structures C,D,E,F,G,H are shared. but... IIRC, if A is not used in the code after X is bound, the erlang compiler can optimise this to an in-place operation where the storage for A is reused for X. So only the B element pointer is changed to point to B' and no allocation is required. Is this not one of the reasons for record syntax? eg. A = #myRecord{B,C,D,E,F,G,H}, X = A#myRecord{B2}, ________________________________________________________________ erlang-questions (at) erlang.org mailing list. See http://www.erlang.org/faq.htmlTo unsubscribe; mailto:[hidden email]
Open this post in threaded view
|

## Re: Cost of Copy

 On Fri, 28 May 2010 12:46:50 +0200, Richard Andrews   <[hidden email]> wrote: > IIRC, if A is not used in the code after X is bound, the erlang > compiler can optimise this to an in-place operation where the storage > for A is reused for X. So only the B element pointer is changed to > point to B' and no allocation is required. Is this not one of the > reasons for record syntax? I would be happily surprised if that is the case. Determining if A is the   last reference to the structure involves either a lot of program analysis   or run-time reference counting. The record syntax solves a problem of   readability, maintainability etc of src code. To see why the compiler will have a hard time identifying when it is safe   to do destructive updates look at the following: foo(X) ->         case X of                 {A, B, C} ->                         B1 = mod(B),                         Y = {A, mod(B), C}                         zot(Y)         end. If X is the last reference to the structure {A, B, C} we can replace B   with B1 and just hand it to zot/1. The problem is of course that we don't   know if X is used somewhere else. If this was the call to foo/1      :    foo(X),    bar(X),      : then we would make a mistake in destroying the structure.  To figure out   at compile time when it is safe to reclaim structures is tricky.   Johan -- Associate Professor Johan Montelius Royal Institute of Technology - KTH School of Information and Communication Technology - ICT ________________________________________________________________ erlang-questions (at) erlang.org mailing list. See http://www.erlang.org/faq.htmlTo unsubscribe; mailto:[hidden email]
Open this post in threaded view
|

## Re: Cost of Copy

Open this post in threaded view
|

## Re: Cost of Copy

 In reply to this post by James Hague Hi James, can you clarify this for me, what you mean by "shared" ... I think I understood the rest of your post alright, thanks! Henning > X = [57|List] > This creates a single cons (2 cells). The first element is 57. The second is > a pointer to List. So even though you've got two lists (X and List), all of > the data is shared except for the first cons of X. > >
Open this post in threaded view
|