Cost of Copy

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Cost of Copy

Henning Diedrich-2
I am still honestly marveling about this. In case anyone finds time to
answer.

Especially on the theory that more deeply nested structures will be
faster to mutate. As fewer internal 'sibling'-pointers will have to be
copied.

http://www.trapexit.org/forum/viewtopic.php?p=55001

Thanks,
Henning
Reply | Threaded
Open this post in threaded view
|

Re: Cost of Copy

James Hague
If you ignore message passing, then the copying/sharing issues in Erlang are
just about the same as in Lisp and Scheme, so you may find some information
that way.

The short version is that some values are represented directly in a memory
"cell," like atoms and (most) integers. Others are pointers to data stored
elsewhere (list elements, or conses, which are two consecutive cells in
memory) or tuples (N consecutive cells plus one for the size). Just by
looking at code you can see where conses and tuples are created. If you use
an existing "pointer" to data, then that data is not copied.

Examples:

B = {A, A, A}
This creates a three element tuple (four memory cells). It doesn't matter if
A is a list of a million elements or a tuple or the number 2. B just
contains three cell-sized values (plus the size of the tuple).

X = [57|List]
This creates a single cons (2 cells). The first element is 57. The second is
a pointer to List. So even though you've got two lists (X and List), all of
the data is shared except for the first cons of X.
Reply | Threaded
Open this post in threaded view
|

Re: Cost of Copy

Henning Diedrich-2
Thanks a lot James!

am I right then to conclude that mutating something in a list will be
the faster the more the data is spread out deeper over several levels?

A = { B, C, D, E, F, G, H }

will be slower to change than, say

A = { { B, C }, { D, E },  {F, G, H } }

because if I change e.g. B, only { B', C } will be created as new  
internal list of pointers, as opposed to the whole of { B', C, D, E, F,
G, H }. Which should for larger lists and tuples, and frequent changes,
start to make a difference?

So this would be the fastes option?

A = { { B }, { C }, { D }, { E },  { F }, { G }, { H } }

And if the tuple may be expected to change in size (elements not only
changeing but being added/deleted), this:

A = { { { { B }, { C } }, { { D }, { E } } }, { { { F }, { G } }, { { H
} } } }

Maybe I am wrong that there may be a difference between tuples and
lists, but is it  the same for

A = [ B, C, D, E, F, G, H ]

 slower to change than

A = [ { B, C }, { D, E },  { F, G, H } ]

? Or are list-elements changed faster, in general, virtually 'in-place'
as only one pointer is replaced?

Hardly I guess, because the list itself must not be changed? So the
whole new list must be copied anyway, just like a tuple.

Thanks for your take,
Henning


James Hague wrote:

> If you ignore message passing, then the copying/sharing issues in Erlang are
> just about the same as in Lisp and Scheme, so you may find some information
> that way.
>
> The short version is that some values are represented directly in a memory
> "cell," like atoms and (most) integers. Others are pointers to data stored
> elsewhere (list elements, or conses, which are two consecutive cells in
> memory) or tuples (N consecutive cells plus one for the size). Just by
> looking at code you can see where conses and tuples are created. If you use
> an existing "pointer" to data, then that data is not copied.
>
> Examples:
>
> B = {A, A, A}
> This creates a three element tuple (four memory cells). It doesn't matter if
> A is a list of a million elements or a tuple or the number 2. B just
> contains three cell-sized values (plus the size of the tuple).
>
> X = [57|List]
> This creates a single cons (2 cells). The first element is 57. The second is
> a pointer to List. So even though you've got two lists (X and List), all of
> the data is shared except for the first cons of X.
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: Cost of Copy

Robert Raschke
On Thu, May 27, 2010 at 9:09 PM, Henning Diedrich <[hidden email]>wrote:

> Thanks a lot James!
>
> am I right then to conclude that mutating something in a list will be the
> faster the more the data is spread out deeper over several levels?
>
>
>
You can't mutate! So your examples don't really make sense.

You can only build new structures. You cannot change them once their built.
For mutation you need to look at ets, dets, mnesia, processes and messages.

Robby
Reply | Threaded
Open this post in threaded view
|

Re: Cost of Copy

Johan Montelius-2
In reply to this post by Henning Diedrich-2
On Thu, 27 May 2010 22:09:21 +0200, Henning Diedrich <[hidden email]>  
wrote:

> A = { { B, C }, { D, E },  {F, G, H } }
> because if I change e.g. B, only { B', C } will be created as new
> internal list of pointers, as opposed to the whole of { B', C, D, E, F,
> G, H }. Which should for larger lists and tuples, and frequent changes,
> start to make a difference?

Not exactly.  Lets say we have A as above and want to construct a new tuple

X = {{B', C}, {D, E},  {F, G, H }}

What we then have to do is construct a new tuple with three elements

X = {  _ , _ , _}

We then have to construct a new tuple for {B', C}

X = {{_, _}, _, _}

and then we of course have to construct B'

X = {{B',_}, _,_}

so we have constructed two tuples, one of size tree and one of size two  
(plus B'), all other structurs C, {D,E} and {F,G,H} are shared.

If we instead have

A = {B, C, D, E, F, G, H }

and want to construct

X = {B', C, D, E, F, G, H}

we construct one new tuple

X = {_, _, _ , _, _, _, _}

and then B'

X = {B', _, _ , _, _, _, _}

so now we have only constructed one tuple of size seven. All other  
structures C,D,E,F,G,H are shared.

The difference is in this case marginal (and I don't know which one  
executes faster). If we look at larger structures the difference become  
apparent. Assume you have a N elements in a set and want to update these  
element one by one. One way is to represent this a tuple of size N. Each  
update operation would then construct a new tuple of size N. All elements  
in the tuple would be shared but the new tuple has one element that  
differs.

If we instead represent this as a list of length N we would have to  
construct a new list to the point where we have located the element that  
we want to change. Assume we want to change E:

A = [B | [C | [D | [E | [F | [G | [H| []]]]]]]]

X = [_ | [_ | [_ | [E' | _]]]]

Depending in how far down E is found we have to construct new list cells  
for all element before E. In the example above this is four list cells  
[_|_].

In average thins means N/2 new list cells for each update. This is for  
large N more efficient than having to construct a new tuple of size N.

A even better way, if N is large, is to construct a tree.

A = {tree,
       {tree,
          {tree, {leaf, B}, {leaf, C}},
          {tree, {leaf, D}, {leaf, E}}}
       {tree,
          {tree, {leaf, F}, {leaf, G}},
          {leaf, H}}}

Now if we want to change E then we have


A = {tree,
       {tree,
          _,
          {tree, _, {leaf, E'}}}
       _}

So we construct new tree tuples all the way down to the location of E. All  
other structures are shared. In general this means that we will construct  
lg(N) new tree tuples.

So a tuple of size N, N/2 new list cells or lg(N) new tree tuples.

The data structures of course have other implications when it comes to  
reading and adding elements.


  Johan







--
Associate Professor Johan Montelius
Royal Institute of Technology - KTH
School of Information and Communication Technology - ICT

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Cost of Copy

Richard Andrews-5
On Fri, May 28, 2010 at 6:15 PM, Johan Montelius <[hidden email]> wrote:

> If we instead have
>
> A = {B, C, D, E, F, G, H }
>
> and want to construct
>
> X = {B', C, D, E, F, G, H}
>
> we construct one new tuple
>
> X = {_, _, _ , _, _, _, _}
>
> and then B'
>
> X = {B', _, _ , _, _, _, _}
>
> so now we have only constructed one tuple of size seven. All other
> structures C,D,E,F,G,H are shared.

but...

IIRC, if A is not used in the code after X is bound, the erlang
compiler can optimise this to an in-place operation where the storage
for A is reused for X. So only the B element pointer is changed to
point to B' and no allocation is required. Is this not one of the
reasons for record syntax?

eg.

A = #myRecord{B,C,D,E,F,G,H},
X = A#myRecord{B2},

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Cost of Copy

Johan Montelius-2
On Fri, 28 May 2010 12:46:50 +0200, Richard Andrews  
<[hidden email]> wrote:

> IIRC, if A is not used in the code after X is bound, the erlang
> compiler can optimise this to an in-place operation where the storage
> for A is reused for X. So only the B element pointer is changed to
> point to B' and no allocation is required. Is this not one of the
> reasons for record syntax?

I would be happily surprised if that is the case. Determining if A is the  
last reference to the structure involves either a lot of program analysis  
or run-time reference counting. The record syntax solves a problem of  
readability, maintainability etc of src code.

To see why the compiler will have a hard time identifying when it is safe  
to do destructive updates look at the following:


foo(X) ->
        case X of
                {A, B, C} ->
                        B1 = mod(B),
                        Y = {A, mod(B), C}
                        zot(Y)
        end.


If X is the last reference to the structure {A, B, C} we can replace B  
with B1 and just hand it to zot/1. The problem is of course that we don't  
know if X is used somewhere else. If this was the call to foo/1


     :
   foo(X),
   bar(X),
     :

then we would make a mistake in destroying the structure.  To figure out  
at compile time when it is safe to reclaim structures is tricky.

  Johan





--
Associate Professor Johan Montelius
Royal Institute of Technology - KTH
School of Information and Communication Technology - ICT

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Cost of Copy

Robert Virding
In reply to this post by Henning Diedrich-2
As others have pointed out data in Erlang is defined to immutable.
This implies that if you wish to do destructive operations then you
have to be sure they are not seen anywhere. In general this can be
quite difficult as it is often not trivial to see if a data structure,
or a portion of it, is shared. The compiler is very conservative about
this and IIRC only does it when there is a sequence of setelement/3
where the temporaries are not exported and nothing is done between
them. You typically get this when you do a single multiple record
update, for example

A1 = A0#foo{bar=57,baz="Robert",zip=now()}

It is better if you can combine record updates like this instead of
doing them one-by-one.

I think it is easiest to view a complex data structure as a tree with
generally different sized nodes. If you modify one of the nodes then
the only other nodes you will have to rewrite are those nodes between
the root of the tree and the modified node. Unless of course you are
doing something special along the way. Or you have botched your code.
:-)

What sized nodes depends entirely upon what you are doing, the only
general hint I can give is use that which fits best into your
application. I mean it, really. Don't go complicating matters by
picking weirdo data structures. So if you want a dynamic sequence then
a list is usually the best, though not always. If you have 7 things
which go together then put them in one 7 element tuple or in a record.
I would only really start worrying about splitting my tuples if they
get really big. Or if you feel that grouping them is what feels best.

The only time I would seriously start thinking about what size thing
to use is if you are writing more general library type packages.
Generally having smaller nodes but a deeper resulting tree will lead
to less rewriting, but as the tree is deeper it will cost more to
traverse it and though the rewrites will be smaller there will be more
of them. The opposite for wider flatter trees. It also depends on how
you select which subnode to go to, this is, of course, very algorithm
dependent. So in the array module each node contains 10 slots, IIRC,
but here it is easy to select the right one using its index. Dicts
also use indices and are wider. For more general search trees it
becomes more complex and binary trees are popular, but I would say
that it is more the algorithms which decide this.

A list is really a degenerate binary tree.

It all depends. This is clearly optimization and I would read
http://en.wikipedia.org/wiki/Program_optimization "When to optimize"
before getting to deep into this.

Robert

On 27 May 2010 22:09, Henning Diedrich <[hidden email]> wrote:

> Thanks a lot James!
>
> am I right then to conclude that mutating something in a list will be the
> faster the more the data is spread out deeper over several levels?
>
> A = { B, C, D, E, F, G, H }
>
> will be slower to change than, say
>
> A = { { B, C }, { D, E },  {F, G, H } }
>
> because if I change e.g. B, only { B', C } will be created as new  internal
> list of pointers, as opposed to the whole of { B', C, D, E, F, G, H }. Which
> should for larger lists and tuples, and frequent changes, start to make a
> difference?
>
> So this would be the fastes option?
>
> A = { { B }, { C }, { D }, { E },  { F }, { G }, { H } }
>
> And if the tuple may be expected to change in size (elements not only
> changeing but being added/deleted), this:
>
> A = { { { { B }, { C } }, { { D }, { E } } }, { { { F }, { G } }, { { H } }
> } }
>
> Maybe I am wrong that there may be a difference between tuples and lists,
> but is it  the same for
>
> A = [ B, C, D, E, F, G, H ]
>
> slower to change than
>
> A = [ { B, C }, { D, E },  { F, G, H } ]
>
> ? Or are list-elements changed faster, in general, virtually 'in-place' as
> only one pointer is replaced?
>
> Hardly I guess, because the list itself must not be changed? So the whole
> new list must be copied anyway, just like a tuple.
>
> Thanks for your take,
> Henning
>
>
> James Hague wrote:
>>
>> If you ignore message passing, then the copying/sharing issues in Erlang
>> are
>> just about the same as in Lisp and Scheme, so you may find some
>> information
>> that way.
>>
>> The short version is that some values are represented directly in a memory
>> "cell," like atoms and (most) integers. Others are pointers to data stored
>> elsewhere (list elements, or conses, which are two consecutive cells in
>> memory) or tuples (N consecutive cells plus one for the size). Just by
>> looking at code you can see where conses and tuples are created. If you
>> use
>> an existing "pointer" to data, then that data is not copied.
>>
>> Examples:
>>
>> B = {A, A, A}
>> This creates a three element tuple (four memory cells). It doesn't matter
>> if
>> A is a list of a million elements or a tuple or the number 2. B just
>> contains three cell-sized values (plus the size of the tuple).
>>
>> X = [57|List]
>> This creates a single cons (2 cells). The first element is 57. The second
>> is
>> a pointer to List. So even though you've got two lists (X and List), all
>> of
>> the data is shared except for the first cons of X.
>>
>>
>
>

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Cost of Copy

Henning Diedrich-2
In reply to this post by James Hague
Hi James,

can you clarify this for me, what you mean by "shared" ... I think I
understood the rest of your post alright, thanks!

Henning

> X = [57|List]
> This creates a single cons (2 cells). The first element is 57. The second is
> a pointer to List. So even though you've got two lists (X and List), all of
> the data is shared except for the first cons of X.
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: Cost of Copy

Henning Diedrich-2
In reply to this post by Robert Virding
Thanks Robert V, Robert R, James, Richard, thanks Dmitry:

I did a naive benchmark on this meanwhile:
http://www.eonblast.com/blog/cost-of-mutation/

 From what I understood now, there is a clear and simple answer to my
basic question: prefer deeper nesting over more flat structures.

The benchmark compares structures like

            tupelMutateFlat_10(FlatTupel, NewD) ->


                { A, B, C, _, E, F, G, H, I, J } = FlatTupel,
                { A, B, C, NewD, E, F, G, H, I, J }.

            tupelMutateNested_10(NestedTupel, NewD) ->

                { { AB, {C, _}}, EFGH, IJ  } = NestedTupel,
                { { AB, {C, NewD}}, EFGH, IJ }.  


Any comment on erroneous benchmark construction highly welcome.

Specifically in answer to Robert Virding:
> If you modify one of the nodes then
> the only other nodes you will have to rewrite are those nodes between
> the root of the tree and the modified node.
Plus the sibling nodes, I think, which were my focus.

> What sized nodes depends entirely upon what you are doing, the only
> general hint I can give is use that which fits best into your
> application. I mean it, really. Don't go complicating matters by
> picking weirdo data structures.
>  
I was asking this general question to have guidance on the basic
decisions in application and data design.

> The only time I would seriously start thinking about what size thing
> to use is if you are writing more general library type packages.
>  
Yes. Plus, I think, what structure the data should have that you write
away to disk would equally affect an app performance so profoundly that
it may warrant this kind of reflection.
> Generally having smaller nodes but a deeper resulting tree will lead
> to less rewriting, but as the tree is deeper it will cost more to
> traverse it and though the rewrites will be smaller there will be more
> of them.
How strong that effect is, that is what I test in the above benchmark.
In the extreme, it gets pretty high:

    100 element flat list const Bin    (discarding result, 1M  ): run
    1450 wall 1512 ms.
    100 element flat list integer N    (discarding result, 1M  ): run
    1470 wall 1550 ms.
    100 element nested list const Bin  (discarding result, 1M  ): run
    400 wall 462 ms. (72% faster)
    100 element nested list integer N  (discarding result, 1M  ): run
    380 wall 414 ms. (74% faster)


Let me know please if you find where the assumptions are wrong.

On the traversal statement, Rob, I can't immediately follow --- maybe
you mean with a lot larger structures? I am talking pretty much about a
usual data base record size, say, of a customer with a lot of associated
data attached.

Thanks,
Henning


Reply | Threaded
Open this post in threaded view
|

Re: Cost of Copy

Henning Diedrich-2
In reply to this post by Johan Montelius-2
Hi Johan,

Johan Montelius wrote:
> If we look at larger structures the difference become apparent.
As per the mailing list, I did a benchmark on this meanwhile.
>
> If we instead represent this as a list of length N we would have to
> construct a new list to the point where we have located the element
> that we want to change.
> In average thins means N/2 new list cells for each update. This is for
> large N more efficient than having to construct a new tuple of size N.
This is something I didn't dig deeper yet, thanks a lot for clarifying
the difference that lists make.

Thanks a lot for your explanations,
Henning
Reply | Threaded
Open this post in threaded view
|

Re: Cost of Copy

James Hague
In reply to this post by Henning Diedrich-2
A cons (to use the Lisp terminology) is a "something" and a pointer. In the
[57|List] example, one cons is created where the something is 57 and the
pointer is List. That's it. Two memory cells.


On Sat, May 29, 2010 at 7:26 PM, Henning Diedrich <[hidden email]>wrote:

>  Hi James,
>
> can you clarify this for me, what you mean by "shared" ... I think I
> understood the rest of your post alright, thanks!
>
> Henning
>
>
>  X = [57|List]
> This creates a single cons (2 cells). The first element is 57. The second is
> a pointer to List. So even though you've got two lists (X and List), all of
> the data is shared except for the first cons of X.
>
>
>
>
>