I've spent a bit of time recently testing the read/write performance of
different erlang data types:
Test Machine: Apple G4 tower Mac (400 Mhz, MacOsX)
ETS : about 500,000 reads or writes per second (at any ets table size)
Tuples (read) : about 4,500,000 reads (at any tuple size)
Tuples (write): slower than ets when tuple size > 100
number of writes = c/TupleSize (where c is some constant)
Tested sizes = 1 - 100,000
Note: old (R5 ?) OTP had performance problems with large lists because they
where converted to lists.
Some problems like matrix multiplication or rescaling bitmap pictures are very
simple to do if one got an array like data type (read and write O(1)) ets has
array like behaviour but a rather large overhead compared to things like a
tuple. Acording to my tests tuples and lists are about the fastest data types in
Erlang so I started to think if it was possible to increase tuple write speed:
* A destructive update rather than creating a copy of the pointer array that
keeps track of the elements of the tuple should be similar in perfromance to
a tuple read operation
* Destructive updates are problematic if several processes keep the same tuple,
this is not the case each one keeps its own copy - so there is only one GC
memory heap to consider.
* Execution in a single process is sequential i.e. functions are run in the
order they are written, this means that it should be risk free to
destructivly update a tuple if the old version of it will not be used later
in the same clause. It can easily be determined at runtime if a destructive
update can be used or if it must be a copy update.
* It seams reasonable to assume that we could get about 4,500,000 tuple write
operations as well useing such a optimization.
So I'm wondering if someone has considered adding such a optimization to the
erlang compiler/VM or are there any problems I forgot about ?
Hakan Stenholm <etxhste> writes:
>So I'm wondering if someone has considered adding such a optimization to the
>erlang compiler/VM or are there any problems I forgot about ?
This has been in two different forms:
1. The first, from before ETS, was nukeable arrays. The write
operation was destructive and returned a new reference to the array
while the old reference was "nuked". This ensured that the
destructive operation preserved non-destructive semantics by
maintaining uniqueness at runtime. A runtime version of uniqueness
2. The second, which exists in the current inplementation though not
reachable, are vectors. The write operation is also destructive and
returns a new reference while the old reference is updated with enough
info so that it still looks the same as before. It preserves
non-destructive semantics but prioritizes using the modified
One problem is that it is very sensitive to keeping old references
around, the update operations can become quite costly in both time and
space. Bj?rn tested my dict implementation, which tries to be smart
by heavily reusing tuples, and it apparently went slower with vectors.
The first is more efficient, but the second preserves Erlang semantics
A datatype like this would be useful, I suppose it is just a matter of
deciding what you want. Having both would a Bad Thing!