Heap based mutable arrays?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Heap based mutable arrays?

Sean Hinde-2
All,

Support for mutable arrays has almost got to the point of a FAQ.

The current choices seem to be:

tuples - slow when of any decent size due to the copying involved
ets - quite a lot of fixed overhead (min 50uS on my machine).
clever combinations of smaller tuples such as dynarray and friends.. not too
bad - comparable to ets.

We hear that supporting automatically mutable tuples is not possible due to
the nature of the current garbage collector - in R8 there is an optimisation
for multiple updates in one go within a record but it still gets copied.

So..

Why not have a new bif library called e.g array with the same semantics as
ets but without the hashing and buckets overhead?

It could have much the same interface as ets e.g.

A = array:new(array_name, [{initial_offset, 1}, {max_index, 1000}]).
array:insert(A, N, Term).
array:delete(A, N).
array:update_counter(A, N, 1).
array:delete(A).

It would have the same copying overhead as ets (copying the term to the heap
- unless storing a large binary) but this is probably not the end of the
world.

I don't see there would be a need for garbage collection - just delete the
whole thing on the demise of the creating process. It would need to do some
tidying up work in the background if we insert a larger term than last time
(or some default minimum?)

Any thoughts?

- Sean



NOTICE AND DISCLAIMER:
This email (including attachments) is confidential.  If you have received
this email in error please notify the sender immediately and delete this
email from your system without copying or disseminating it or placing any
reliance upon its contents.  We cannot accept liability for any breaches of
confidence arising through use of email.  Any opinions expressed in this
email (including attachments) are those of the author and do not necessarily
reflect our opinions.  We will not accept responsibility for any commitments
made by our employees outside the scope of our business.  We do not warrant
the accuracy or completeness of such information.




Reply | Threaded
Open this post in threaded view
|

Heap based mutable arrays?

Thomas Lindgren-2

> Support for mutable arrays has almost got to the point of a FAQ.
>
> The current choices seem to be:
>
> tuples - slow when of any decent size due to the copying involved
> ets - quite a lot of fixed overhead (min 50uS on my machine).
> clever combinations of smaller tuples such as dynarray and friends.. not too
> bad - comparable to ets.

HIPE used a third solution for a long time. However, it requires support
from a generational garbage collector, since it relies on destructive
updates 'under the hood'.

The HIPE solution I wrote had these characteristics:

- declarative, so old versions of array still available if needed
- constant time access (read/write) to the most recent version

Here is the implementation. The idea was taken from Prolog work in the
80's (shown in manly ascii graphics):

1. There is no direct access to the array per se; all accesses go through
   a _handle_ which is updated to reflect array changes

The original array is implemented as a tuple (array) with a handle:
<handle1> -> <array>

2. How to write element N with value V1 and old value V0:

<handle1>-> <N:V0> -> <array where N:=V1>
                        ^
                        |
           <newhandle>--+

Thus: (a) an exception element <N:V0> is allocated
      (b) the array/tuple is updated with N := V1
      (c) a new handle is allocated to point to the array
      (d) the new handle is returned

This is a constant time operation, but of course not as cheap as a direct
update.

3. How to look up an element N in array (starting from a handle):
   - if there is an exception chain, check if N occurs in the chain;
     if so, return that value
   - otherwise, return element N of the array

Accessing the most recent array is thus a constant time operation,
since there is no exception list to scan.

What were the drawbacks:

- Implemented in Erlang (except for the destructive update), so one
  could subvert the implementation. By moving to C, handles etc can
  be made invisible (just like they are for binaries).

- The exception chain is costly if an old array is used. This can be
  solved by using another method for lookup: if an exception chain is
  found, _copy_ the array, install the exception chain in the copy and
  repoint the handle to the new array. (This can also be done at GC-time.)

  The effect: constant time access to the old array (after the copy),
  but sometimes requires more space. Also thrashing if a single lookup
  is made in the old array.

  (What to do requires more study, I suspect; the policy could be
  handled by setting options at array creation time.)

- Possibilities to create circular references, and references from the
  old generation to the new generation in a gen-gc system. There are
  well-known solutions to this, since almost all gen-gcs have to handle
  the problem. (Basically, keep a stack or list of pointers from old to
  new generation; push a new element when such a pointer is created.)

  I think some other copying functions in erts rely on non-circularity,
  which also needs changing.

What were the advantages:

- Very useful, in short :-) For HIPE, graph algorithms became much
  more pleasant. I also implemented hash tables on top of the arrays,
  which rapidly spread through the compiler. Many common algorithms
  are much easier to write when your array ADT has the right
  complexity.

I know there has been discussions about implementing something like
this in OTP, but I don't know the precise status. Bj?rn G. probably
knows more :-)

                        Thomas
--
Thomas Lindgren thomas+junk
Alteon WebSystems


Reply | Threaded
Open this post in threaded view
|

Heap based mutable arrays?

Björn Gustavsson-3
Thomas Lindgren <thomasl> writes:

.
.
.

>
> I know there has been discussions about implementing something like
> this in OTP, but I don't know the precise status. Bj?rn G. probably
> knows more :-)

Actually, the code for a vector type is included in R7B, but the
BIFs used to create vectors (vector:new/2 and vector:from_list/1)
are disabled. By a few simple changes to the erl_vector.c source
file in the open source distribution vector support can be enabled.

The following features are not implemented (and will crasch the emulator
if you try to use them):

   - Term comparisions of whole vectors.
   - Taking hash values of vectors.
   - term_to_binary(Vector)
   - Creating circular terms will cause an infinite loop in term copying
     (and perhaps in other places).

I bypass the garbage collection problem by forcing a fullsweep collection
as soon as any vector is updated.

My time measurements were somewhat of a disappointment. You'll get a noticeable
performance improvement if you use vectos with many elements (typically more
than several hundred elements) compared to using a tuple of the same size,
but for many algorithm data structures using lists and/or tuples are faster
than vectors.

Vectors may get more useful if we'll fix the garbage collector (implementing
a proper write barrier).

/Bjorn
--
Bj?rn Gustavsson            Ericsson Utvecklings AB
bjorn      ?T2/UAB/F/P
                            BOX 1505
+46 8 727 56 87    125 25 ?lvsj?


Reply | Threaded
Open this post in threaded view
|

Heap based mutable arrays?

Thomas Lindgren-2

> I bypass the garbage collection problem by forcing a fullsweep collection
> as soon as any vector is updated. /.../
>
> My time measurements were somewhat of a disappointment. You'll get a
> noticeable performance improvement if you use vectos with many
> elements (typically more than several hundred elements) compared to
> using a tuple of the same size, but for many algorithm data
> structures using lists and/or tuples are faster than vectors.

I'm impressed that you see any improvement at all ... the copying
work in a full sweep sounds much larger than copying a single tuple,
even if the tuple is large.

Anyway, good work and here's to a write barrier in subsequent
versions.

                     Thomas
--
Thomas Lindgren thomas+junk
Alteon WebSystems