mnesia:match_object() on an ordered_set

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

mnesia:match_object() on an ordered_set

Ulf Wiger-4

I started writing some code happily assuming that calling
mnesia:match_object/2 on an ordered_set table will return
an ordered set of objects. For a brief moment, some doubt
snuck in and I decided to check the actual implementation...

It turns out, AFAICT, that mnesia simply prepends objects
written during the transaction, thus breaking the
ordered_set semantics, and -- In my opinion -- violating the
Principle of Least Surprise(tm).

The Mnesia documentation doesn't make any promises regarding
order, so this behaviour doesn't formally violate the Mnesia
interface. However, where anything is written about the
expected semantics of match_object(), the following can be
found in the ETS Reference Manual:

"On tables of the ordered_set type, the result is in the
same order as in a first/next traversal."

Given the current implementation of mnesia:match_object(), I
don't think preserving the order would have to impact
performance, esp. not on large sets (and on small sets, I
don't think it matters.)

I've not made a patch. I could suggest a fix, but not today.

/Uffe
--
Ulf Wiger, Senior Specialist,
   / / /   Architecture & Design of Carrier-Class Software
  / / /    Strategic Product & System Management
 / / /     Ericsson AB, Connectivity and Control Nodes



Reply | Threaded
Open this post in threaded view
|

mnesia:match_object() on an ordered_set

Håkan Mattsson-6
On Thu, 24 Apr 2003, Ulf Wiger wrote:

Uffe> I started writing some code happily assuming that calling
Uffe> mnesia:match_object/2 on an ordered_set table will return
Uffe> an ordered set of objects. For a brief moment, some doubt
Uffe> snuck in and I decided to check the actual implementation...
Uffe>
Uffe> It turns out, AFAICT, that mnesia simply prepends objects
Uffe> written during the transaction, thus breaking the
Uffe> ordered_set semantics, and -- In my opinion -- violating the
Uffe> Principle of Least Surprise(tm).
Uffe>
Uffe> The Mnesia documentation doesn't make any promises regarding
Uffe> order, so this behaviour doesn't formally violate the Mnesia
Uffe> interface. However, where anything is written about the
Uffe> expected semantics of match_object(), the following can be
Uffe> found in the ETS Reference Manual:
Uffe>
Uffe> "On tables of the ordered_set type, the result is in the
Uffe> same order as in a first/next traversal."

The Mnesia API does only support first/next traversal with dirty
semantics and for a match_object with dirty semantics there are no
transaction store to bother about. ;-)

Uffe> Given the current implementation of mnesia:match_object(), I
Uffe> don't think preserving the order would have to impact
Uffe> performance, esp. not on large sets (and on small sets, I
Uffe> don't think it matters.)

For fragmented tables it would have a tremendous impact, as the order
only is preserved within each fragment. In order to return an ordered
result, it would require a quite substantial sort operation.

Of course Mnesia could ensure that the result of a match_object on an
ordered_set always should be ordered, but is it worth the performance
penalty?

/H?kan

---
H?kan Mattsson
Ericsson
High Availability Software, DBMS Internals
http://www.ericsson.com/cslab/~hakan/



Reply | Threaded
Open this post in threaded view
|

mnesia:match_object() on an ordered_set

Ulf Wiger-4
On Thu, 24 Apr 2003, Hakan Mattsson wrote:

>On Thu, 24 Apr 2003, Ulf Wiger wrote:
>
>Uffe> Given the current implementation of mnesia:match_object(), I
>Uffe> don't think preserving the order would have to impact
>Uffe> performance, esp. not on large sets (and on small sets, I
>Uffe> don't think it matters.)
>
>For fragmented tables it would have a tremendous impact, as
>the order only is preserved within each fragment. In order
>to return an ordered result, it would require a quite
>substantial sort operation.

True, unless one writes a custom mnesia_frag_hash callback
which manages fragments with some global order. This is
what I had in mind to do with XML Query for Mnesia.

It does bring up the question how far one should try to
preserve the semantics. It would of course be nice if one
could guarantee the same semantics for fragmented tables as
for basic tables, but certainly not at any price (allowing
only the lowest common denominator is also a price to pay,
of course.)

>Of course Mnesia could ensure that the result of a
>match_object on an ordered_set always should be ordered,
>but is it worth the performance penalty?

Given the current implementation, I'd say yes. ;-)

This is the function in mnesia.erl that adds objects from
the transaction store to the found set:

add_match([], Objs, Type) ->
    Objs;
add_match([{Oid, _, delete}|R], Objs, Type) ->
    add_match(R, deloid(Oid, Objs), Type);
add_match([{Oid, Val, delete_object}|R], Objs, Type) ->
    add_match(R, lists:delete(Val, Objs), Type);
add_match([{Oid, Val, write}|R], Objs, Type) ->
    case Type of
        bag ->
            add_match(R, [Val | lists:delete(Val, Objs)], Type);
        _ ->
            add_match(R, [Val | deloid(Oid,Objs)],Type)
    end.

deloid(Oid, []) ->
    [];
deloid({Tab, Key}, [H | T]) when element(2, H) == Key ->
    deloid({Tab, Key}, T);
deloid(Oid, [H | T]) ->
    [H | deloid(Oid, T)].


Given that Type == ordered_set, Objs would be an ordered
list to start with, and sort_insert(Key, Val, Objs) would
not be slower than [Val | deloid(Oid,Objs)]. In fact, the
following sort_insert(Key,Val,Objs) is actually more than
twice as fast as the current implementation in mnesia.erl
(on a given sample with 1000 Objs, 10 written objs in
the lower key range, and 10 in the upper key range.)

sort_insert(Key, Val, [H|T]) when element(2,H) == Key ->
    [Val | T];
sort_insert(Key, Val, [H|_]=L) when element(2,H) > Key ->
    [Val | L];
sort_insert(Key, Val, [H|T]) ->
    [H | sort_insert(Key, Val, T)];
sort_insert(_, _, []) ->
    [].

The reason for this is mostly that deloid/2 is too generic.
It traverses the whole list each time, even after it's found
a matching key. Assuming set semantics, this is not
necessary. I fail to see why lists:keydelete/3 wouldn't do
the job.

The modified mnesia:add_match/3 would then become:

add_match([{{Tab,Key}, Val, write}|R], Objs, Type) ->
    case Type of
        bag ->
            add_match(R, [Val | lists:delete(Val, Objs)], Type);
        ordered_set ->
            add_match(R, sort_insert(Key,Val,Objs), Type);
        _ ->
            add_match(R, [Val |
                          lists:keydelete(Key,2,Objs)],Type)
    end.

Even with this fix, my sort_insert/3 is some 20% faster. (:
At the moment, I fail to see why on this particular test.

On a non-contiguous range of objects, where R contains new
records as well as updated ones, my sort_insert version is
much faster, as deloid/keydelete must traverse the entire
set for each new object.

/Uffe
--
Ulf Wiger, Senior Specialist,
   / / /   Architecture & Design of Carrier-Class Software
  / / /    Strategic Product & System Management
 / / /     Ericsson AB, Connectivity and Control Nodes




Reply | Threaded
Open this post in threaded view
|

mnesia:match_object() on an ordered_set

Ulf Wiger-4
On Fri, 25 Apr 2003, Ulf Wiger wrote:

>On Thu, 24 Apr 2003, Hakan Mattsson wrote:
>
>>On Thu, 24 Apr 2003, Ulf Wiger wrote:
>>
>>Uffe> Given the current implementation of mnesia:match_object(), I
>>Uffe> don't think preserving the order would have to impact
>>Uffe> performance, esp. not on large sets (and on small sets, I
>>Uffe> don't think it matters.)
>>
>>For fragmented tables it would have a tremendous impact, as
>>the order only is preserved within each fragment. In order
>>to return an ordered result, it would require a quite
>>substantial sort operation.
>
>True, unless one writes a custom mnesia_frag_hash callback
>which manages fragments with some global order. This is
>what I had in mind to do with XML Query for Mnesia.

Actually, given the current implementation of
mnesia:match_object/2, the order is _not_ preserved, even
within the fragment. ;-)

As to preserving a global order, mnesia_frag in its default
implementation makes this hard. On the other hand, what's
the point of having multiple ordered set fragments if
objects are distributed among the fragments using a hashing
algorithm?  (:  Each ordered_set fragment will contain only
a pseudo-random subset of the global set, so it's doubtful
whether there is any point to the local ordering anyway.

Using a custom fragmentation scheme, one may distribute
objects such that e.g. all objects belonging to one
particular document fall into the same fragment; then there
is some point to having an ordered_set fragment. It's
still quite feasible to have a hashing algorithm at the top,
so that the order among documents is undefined, but the
order _within_ documents is well defined.

Fixing mnesia:match_object/2 for ordered sets should make
this scheme work well.

(A random thought: if an ets:match_object/3 could allow one
to specify a target ets table, a corresponding
mnesia:match_object/3 run over a fragmented table could drop
all found objects in an ordered_set table, and the problem
would be solved. This ets:match_object/3 function would also
have very pleasant memory characteristics, since it would
not build large heap structures.)

/Uffe
--
Ulf Wiger, Senior Specialist,
   / / /   Architecture & Design of Carrier-Class Software
  / / /    Strategic Product & System Management
 / / /     Ericsson AB, Connectivity and Control Nodes



Reply | Threaded
Open this post in threaded view
|

mnesia:match_object() on an ordered_set

Ulf Wiger-4
On Fri, 25 Apr 2003, Ulf Wiger wrote:

>(A random thought: if an ets:match_object/3 could allow one
>to specify a target ets table, a corresponding
>mnesia:match_object/3 run over a fragmented table could
>drop all found objects in an ordered_set table, and the
>problem would be solved. This ets:match_object/3 function
>would also have very pleasant memory characteristics, since
>it would not build large heap structures.)

(I notice that I keep following up on my own posts...)

Perhaps this is really a version of ets:new()...?


e.g.

ets:new(found_set, [{type, ordered_set},
                    {select, {Tab, Pattern}}]).

This would create a new ETS table, and populate it with the
results from ets:select(Tab, Pattern).

The next step would be to perform a join operation. ;-)

/Uffe
--
Ulf Wiger, Senior Specialist,
   / / /   Architecture & Design of Carrier-Class Software
  / / /    Strategic Product & System Management
 / / /     Ericsson AB, Connectivity and Control Nodes



Reply | Threaded
Open this post in threaded view
|

mnesia:match_object() on an ordered_set

Håkan Mattsson-6
In reply to this post by Ulf Wiger-4
On Fri, 25 Apr 2003, Ulf Wiger wrote:

Uffe> On Fri, 25 Apr 2003, Ulf Wiger wrote:
Uffe>
Uffe> >On Thu, 24 Apr 2003, Hakan Mattsson wrote:
Uffe> >
Uffe> >>On Thu, 24 Apr 2003, Ulf Wiger wrote:
Uffe> >>
Uffe> >>Uffe> Given the current implementation of mnesia:match_object(), I
Uffe> >>Uffe> don't think preserving the order would have to impact
Uffe> >>Uffe> performance, esp. not on large sets (and on small sets, I
Uffe> >>Uffe> don't think it matters.)
Uffe> >>
Uffe> >>For fragmented tables it would have a tremendous impact, as
Uffe> >>the order only is preserved within each fragment. In order
Uffe> >>to return an ordered result, it would require a quite
Uffe> >>substantial sort operation.
Uffe> >
Uffe> >True, unless one writes a custom mnesia_frag_hash callback
Uffe> >which manages fragments with some global order. This is
Uffe> >what I had in mind to do with XML Query for Mnesia.
Uffe>
Uffe> Actually, given the current implementation of
Uffe> mnesia:match_object/2, the order is _not_ preserved, even
Uffe> within the fragment. ;-)

With "as the order only is preserved within each fragment", I was
referring to the persistent order, not the transient order during the
transaction. Sorry for the confusion.

Uffe> As to preserving a global order, mnesia_frag in its default
Uffe> implementation makes this hard. On the other hand, what's
Uffe> the point of having multiple ordered set fragments if
Uffe> objects are distributed among the fragments using a hashing
Uffe> algorithm?  (:  Each ordered_set fragment will contain only
Uffe> a pseudo-random subset of the global set, so it's doubtful
Uffe> whether there is any point to the local ordering anyway.

In most cases it would not be any point. But if you have an access
pattern with frequent matching of a partitially bound key, it could
make a big difference. You would in the standard case (using the
default hash function) still need to perform the match in all
fragments, but ets would not need to perform an exhaustive search of
the entire table.

Uffe> Using a custom fragmentation scheme, one may distribute
Uffe> objects such that e.g. all objects belonging to one
Uffe> particular document fall into the same fragment; then there
Uffe> is some point to having an ordered_set fragment. It's
Uffe> still quite feasible to have a hashing algorithm at the top,
Uffe> so that the order among documents is undefined, but the
Uffe> order _within_ documents is well defined.
Uffe>
Uffe> Fixing mnesia:match_object/2 for ordered sets should make
Uffe> this scheme work well.

If the documented behavior of mnesia:match_object/2 on an ordered_set,
is changed to also guarantee an ordered result, then this implies that
this is consequently implemented. The semantics should be the same for
both ordinary tables and fragmented tables and not rely on any
customized fragmentation scheme.

/H?kan

---
H?kan Mattsson
Ericsson
High Availability Software, DBMS Internals
http://www.ericsson.com/cslab/~hakan/



Reply | Threaded
Open this post in threaded view
|

mnesia:match_object() on an ordered_set

Ulf Wiger-4
On Fri, 25 Apr 2003, Hakan Mattsson wrote:

>Uffe> As to preserving a global order, mnesia_frag in its default
>Uffe> implementation makes this hard. On the other hand, what's
>Uffe> the point of having multiple ordered set fragments if
>Uffe> objects are distributed among the fragments using a hashing
>Uffe> algorithm?  (:  Each ordered_set fragment will contain only
>Uffe> a pseudo-random subset of the global set, so it's doubtful
>Uffe> whether there is any point to the local ordering anyway.
>
>In most cases it would not be any point. But if you have an
>access pattern with frequent matching of a partitially
>bound key, it could make a big difference. You would in the
>standard case (using the default hash function) still need
>to perform the match in all fragments, but ets would not
>need to perform an exhaustive search of the entire table.

Good point.


>If the documented behavior of mnesia:match_object/2 on an
>ordered_set, is changed to also guarantee an ordered
>result, then this implies that this is consequently
>implemented. The semantics should be the same for both
>ordinary tables and fragmented tables and not rely on any
>customized fragmentation scheme.

Considering that my suggested change basically doubles the
performance of the merge of the found set and the
transaction store, given an ordered_set, one might want to
implement it anyway, without documenting it as a guarantee.

/Uffe
--
Ulf Wiger, Senior Specialist,
   / / /   Architecture & Design of Carrier-Class Software
  / / /    Strategic Product & System Management
 / / /     Ericsson AB, Connectivity and Control Nodes



Reply | Threaded
Open this post in threaded view
|

mnesia:match_object() on an ordered_set

Sean Hinde-2
In reply to this post by Ulf Wiger-4
> Perhaps this is really a version of ets:new()...?
>
>
> e.g.
>
> ets:new(found_set, [{type, ordered_set},
>                     {select, {Tab, Pattern}}]).
>
> This would create a new ETS table, and populate it with the
> results from ets:select(Tab, Pattern).
>
> The next step would be to perform a join operation. ;-)
>
>

I like this - yes please OTP team!

Sean