mnesia_leveldb + prefix keys

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

mnesia_leveldb + prefix keys

Matthias Rieber
Hello,

I've a question regarding the prefix keys in mnesia_eleveldb. I have a
record like this:

-record(key, {
  jid       :: { binary(), binary() },
  timestamp :: erlang:timestamp()
}).

-record(table, {
  key = key{},
  payload :: term()
}).


As far as I understand these queries are fast:

mnesia:match_object(
  #table{key=#key{jid={<<"user1">>, <<"localhost">>},
                  timestamp = '_'},
         payload='_'})

Since the keys are ordered I expected that query to be fast:

mnesia:select(table, ets:fun2ms(
 fun(#table{key=#key{jid={<<"user1">>, <<"localhost">>},
                     timestamp = Timstamp},
            payload=P}) when Timstamp > {1,2,3} -> P
 end)

But performance tests show that this will read all records (probably only
the ones with the matching jid). Is it possible to select this in an
efficient manner?

Best regards,
Matthias
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: mnesia_leveldb + prefix keys

Mikael Pettersson-5
On Tue, Jan 22, 2019 at 11:32 AM Matthias Rieber <[hidden email]> wrote:

>
> Hello,
>
> I've a question regarding the prefix keys in mnesia_eleveldb. I have a
> record like this:
>
> -record(key, {
>   jid       :: { binary(), binary() },
>   timestamp :: erlang:timestamp()
> }).
>
> -record(table, {
>   key = key{},
>   payload :: term()
> }).
>
>
> As far as I understand these queries are fast:
>
> mnesia:match_object(
>   #table{key=#key{jid={<<"user1">>, <<"localhost">>},
>                   timestamp = '_'},
>          payload='_'})
>
> Since the keys are ordered I expected that query to be fast:
>
> mnesia:select(table, ets:fun2ms(
>  fun(#table{key=#key{jid={<<"user1">>, <<"localhost">>},
>                      timestamp = Timstamp},
>             payload=P}) when Timstamp > {1,2,3} -> P
>  end)
>
> But performance tests show that this will read all records (probably only
> the ones with the matching jid). Is it possible to select this in an
> efficient manner?

Your observations are correct.  The second case is a "range select"
which LevelDB can support efficiently, but mnesia_eleveldb doesn't
implement that optimization yet.  I've had a ticket to implement this
for a while now, but the initial prototype didn't work and I haven't
had time to up-prioritize it yet.

Meanwhile we've implemented another optimization in mnesia_eleveldb:
we no longer store the tags of records in mnesia_eleveldb tables as
those tags are invariant and redundant.  This reduces I/O and CPU
usage.

/Mikael
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: mnesia_leveldb + prefix keys

Dániel Szoboszlay
Until the optimisations mentioned by Mikael are implemented, you can reach good performance by writing the loop manually:

loop(mnesia:next(table, #key{jid={<<"user1">>, <<"localhost">>}, timestamp = {1,2,3}}), []).

loop(K = #key{jid={<<"user1">>, <<"localhost">>}}, Acc) ->
  loop(mnesia:next(table, K), mnesia:read(table, K) ++ Acc);
loop(_, Acc) -> Acc.

On Tue, 22 Jan 2019 at 13:41 Mikael Pettersson <[hidden email]> wrote:
On Tue, Jan 22, 2019 at 11:32 AM Matthias Rieber <[hidden email]> wrote:
>
> Hello,
>
> I've a question regarding the prefix keys in mnesia_eleveldb. I have a
> record like this:
>
> -record(key, {
>   jid       :: { binary(), binary() },
>   timestamp :: erlang:timestamp()
> }).
>
> -record(table, {
>   key = key{},
>   payload :: term()
> }).
>
>
> As far as I understand these queries are fast:
>
> mnesia:match_object(
>   #table{key=#key{jid={<<"user1">>, <<"localhost">>},
>                   timestamp = '_'},
>          payload='_'})
>
> Since the keys are ordered I expected that query to be fast:
>
> mnesia:select(table, ets:fun2ms(
>  fun(#table{key=#key{jid={<<"user1">>, <<"localhost">>},
>                      timestamp = Timstamp},
>             payload=P}) when Timstamp > {1,2,3} -> P
>  end)
>
> But performance tests show that this will read all records (probably only
> the ones with the matching jid). Is it possible to select this in an
> efficient manner?

Your observations are correct.  The second case is a "range select"
which LevelDB can support efficiently, but mnesia_eleveldb doesn't
implement that optimization yet.  I've had a ticket to implement this
for a while now, but the initial prototype didn't work and I haven't
had time to up-prioritize it yet.

Meanwhile we've implemented another optimization in mnesia_eleveldb:
we no longer store the tags of records in mnesia_eleveldb tables as
those tags are invariant and redundant.  This reduces I/O and CPU
usage.

/Mikael
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: mnesia_leveldb + prefix keys

Matthias Rieber
In reply to this post by Mikael Pettersson-5
Hi Mikael,

On Tue, 22 Jan 2019, Mikael Pettersson wrote:

> On Tue, Jan 22, 2019 at 11:32 AM Matthias Rieber <[hidden email]> wrote:

[...]

> > Since the keys are ordered I expected that query to be fast:
> >
> > mnesia:select(table, ets:fun2ms(
> >  fun(#table{key=#key{jid={<<"user1">>, <<"localhost">>},
> >                      timestamp = Timstamp},
> >             payload=P}) when Timstamp > {1,2,3} -> P
> >  end)
> >
> > But performance tests show that this will read all records (probably only
> > the ones with the matching jid). Is it possible to select this in an
> > efficient manner?
>
> Your observations are correct.  The second case is a "range select"
> which LevelDB can support efficiently, but mnesia_eleveldb doesn't
> implement that optimization yet.  I've had a ticket to implement this
> for a while now, but the initial prototype didn't work and I haven't
> had time to up-prioritize it yet.
>
> Meanwhile we've implemented another optimization in mnesia_eleveldb:
> we no longer store the tags of records in mnesia_eleveldb tables as
> those tags are invariant and redundant.  This reduces I/O and CPU
> usage.

thanks for your answer. I'll try that!

Matthias

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: mnesia_leveldb + prefix keys

Matthias Rieber
In reply to this post by Dániel Szoboszlay
Hi Dániel,

On Tue, 22 Jan 2019, Dániel Szoboszlay wrote:

> Until the optimisations mentioned by Mikael are implemented, you can reach
> good performance by writing the loop manually:
>
> loop(mnesia:next(table, #key{jid={<<"user1">>, <<"localhost">>}, timestamp
> = {1,2,3}}), []).
>
> loop(K = #key{jid={<<"user1">>, <<"localhost">>}}, Acc) ->
>   loop(mnesia:next(table, K), mnesia:read(table, K) ++ Acc);
> loop(_, Acc) -> Acc.

Thanks! This works. I was not aware that the key does not have to exist in
mnesia:next.

Matthias


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions