Optimized lists

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimized lists

Zoltan Peter Toth-3
Hi,

Currently there's no length information stored in the list elements, so
length/1 is an O(n) operation.
(This is not so nice if an application wants to safeguard against
operating on too long lists, but even length itself may last too long.)
However, list handling BIFs seem to count as a fixed number of
reductions, irrespectively of list length.
(See http://www.erlang.org/ml-archive/erlang-questions/200502/msg00168.html)

Idea:
Store in each list element the length of the remaining part of the list.

This way, length() would become an O(1) operation,
[this may be a benefit if it's used frequently enough]
but also, the list BIFs could count as variable number
of reductions (based on length), keeping the real time behaviour.
Tail recursion would also work, as the length information in the
tail of the list would be correct too.

(Currently I think it is possible to call a BIF on a long list that keeps
the beam busy in that process so long that the net_ticktime elapses and
other nodes lose contact with the node. Am I wrong ?)

How feasible this would be to implement ?
(The more ambitious solution would be that if a list BIF would take more
than 1000 reductions (the limit for rescheduling) then it is scheduled out.
This part may be tricky, though.)
Cheers,
        Zoltan


Reply | Threaded
Open this post in threaded view
|

Optimized lists

Vlad Dumitrescu-4
From: "Zoltan Peter Toth" <zoltan.peter.toth>
> Idea:
> Store in each list element the length of the remaining part of the list.

The basic idea is thinkable, but I see one big disadvantage: strings are already
stored in a bloated way (8 bytes per char).

On the other hand, it may be useful to have in any case strings as a type or a
nicer handling of binaries_as_strings.

regards,
Vlad


Reply | Threaded
Open this post in threaded view
|

Optimized lists

Matthias Lang-2
In reply to this post by Zoltan Peter Toth-3
Zoltan Peter Toth writes:

 > Idea:
 > Store in each list element the length of the remaining part of the list.
[...]

 > How feasible this would be to implement ?

Pretty straightforward:

   -module(zoltan).
   -compile([export_all]).
   
   from_list(List) ->
     from_list(lists:reverse(List), [], 0).
   
   from_list([], Acc, _) -> Acc;
   from_list([H|T], Acc, N) -> from_list(T, [{H, N}|Acc], N+1).
   
   to_list(List) -> [H || {H,_} <- List].
   
   length([]) -> 0;
   length([{_, N}|_]) -> N + 1.

I'm not sure many people will use it, given that it imposes even more
space overhead than lists already have.

Implementing it inside the runtime would be possible too, with likely
space and performance gains. I'm sure a bunch of people would
implement it right away, if only the Erlang source code wasn't kept a
closely guarded secret.

Matthias


Reply | Threaded
Open this post in threaded view
|

Optimized lists

Luke Gorrie-3
In reply to this post by Zoltan Peter Toth-3
Zoltan Peter Toth <zoltan.peter.toth> writes:

> Currently there's no length information stored in the list elements, so
> length/1 is an O(n) operation.

You might want to look in jungerl/lib/ermacs/src/cord.erl. This is an
efficient string data structure built on a balanced tree (with ~2KB
binaries for leaves). Length check is O(1), space is about one byte
per character, and I've loaded 100MB log files with it.

Boehm's idea. He used them for efficient immutable strings in C.

Of course cords aren't IO-lists.




Reply | Threaded
Open this post in threaded view
|

Optimized lists

Zoltan Peter Toth-3
In reply to this post by Matthias Lang-2
Matthias Lang wrote:

> Zoltan Peter Toth writes:
>
>  > Idea:
>  > Store in each list element the length of the remaining part of the
> list.
> [...]
>
>  > How feasible this would be to implement ?
>
> Pretty straightforward:
...
> I'm not sure many people will use it, given that it imposes even more
> space overhead than lists already have.
>
> Implementing it inside the runtime would be possible too, with likely
> space and performance gains. I'm sure a bunch of people would
> implement it right away, if only the Erlang source code wasn't kept a
> closely guarded secret.

Hi Matthias,

The Erlang solution is really pretty straightforward, and may be an option
for new code that uses only the new list structure.
But it would not solve the problem when we get a long conventional list that
we don't want to handle and want to check its size with length().
Further on, transforming to/from conventional list would also be O(n).
Therefore I propose a change to the implementation of the Erlang list construct
in the emulator.

(Afaik, currently the head structures are linked into a list and each of them has
a pointer to the next head and the actual list element.
What I propose is to include a length field into the head structure. So it would
not consume too much extra memory I believe.)

Cheers,
        Zoltan


Reply | Threaded
Open this post in threaded view
|

Optimized lists

Matthias Lang-2
Zoltan Peter Toth writes:

 > The Erlang solution is really pretty straightforward, and may
 > be an option for new code that uses only the new list structure.
 > But it would not solve the problem when we get a long conventional
 > list that we don't want to handle and want to check its size with
 > length().

 > Further on, transforming to/from conventional list would also be O(n).
 > Therefore I propose a change to the implementation of the Erlang
 > list construct in the emulator.

Transforming a conventional list to one with the size in every element
will cost O(n), even if the emulator does it for you.

Or is the idea that the emulator be changed to _only_ allow
tail-size-in-every-element lists? The benefit is that length() becomes
O(1). The cost is O(n) space for all lists and O(n) extra execution
time for most string operations.

Matthias


Reply | Threaded
Open this post in threaded view
|

Optimized lists

Zoltan Peter Toth-3
Hi,

> Or is the idea that the emulator be changed to _only_ allow
> tail-size-in-every-element lists?

Yes, that's the idea. Quite a change, I admit.

> The benefit is that length() becomes
> O(1). The cost is O(n) space for all lists and O(n) extra execution
> time for most string operations.

True, but for list BIFs the extra time would probably mean memcpy-ing a struct
that's one word longer.
I don't have any profiling information about execution time
of BIFs, but I guess this would not be the dominant part on todays CPUs.
It won't be noticable if list operations concern only one or a few elements
(like prepending an element to the list).
If it becomes substantially slower, then probably we have such a long list
where it would be risky to call length().

The same for the O(n) space increase: if it really matters, then calling length()
would be a real time problem.

Further on, this would allow to count the BIF reductions in a length-dependent
way, which requires to get the length in O(1) steps (or faster :)).

Regards,
        Zoltan