Checking if the keys of a list of tuples match a list

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Checking if the keys of a list of tuples match a list

Silas
Sorry for the confusing subject.  I couldn't find another way to explain
this simple problem.  Before just making the question, I'll ask you
about the architecture of the system because it is probably the root of
the doubt.

I'm making a small bookmark tag system.  The user adds some information
about a URL (url, title, etc.) and its corresponding tags.  I have two
dets tables that use the records:

    -record(url, {url, title, date, description, refs, tags}).
    -record(tag, {tag, url}).

The url record is stored in a "urls" table.

You see that the url record has some information and also a list of tags
these bookmark has.  To speed up tags lookup I created the "tags" table
that stores tag records.  So I have something like this:

    #url{url="wikipedia.org", title="Wikipedia", ... , tags=["internet","documentation"]}
    #url{url="erlang.org", title="Erlang", ... , tags=["documentation","programming"]}

    #tag{tag="internet", url="wikipedia.org"}
    #tag{tag="documentation", url="wikipedia.org"}
    #tag{tag="documentation", url="erlang.org"}
    #tag{tag="programming", url="erlang.org"}

(Note I also have tags information in #url record).

It works pretty fine for one tag: I just have to lookup at the tags
table, retrieve all urls related to that tag and make a lookup for each
url at the "urls" table.

The problem arises when I tried to implement more than one tag lookup
with AND, so when searching for "documentation" and "programming" it
would return "erlang.org" only.

The solution I found works, but it doesn't look quite elegant for me:

a. I first lookup one of the tags the user provided (in the example
   above, "documentation".

b. I then retrieve all urls matching that that.

c. I then filter out all urls that don't have ALL the remaining tags
   (in the example above, just "programming").  I do this by looking
   at the "tags" field of the "url" record.

d. I print all the remaining urls records.

Although it works, I see two problems/questions:

1. Retrieving all urls information and discarding then looks just waste
   of processing.  I may fix this over a rather complex (and slow)
   algorithm over "tags" table or using a third table to speed this
   (maybe {url, tag}?).  Any sugestions?

2. Now a question about Erlang data structures: To check if a list has
   all elements of another list (regardless of the order), I'm using
   this:

    % This function returns true if all elements in List2 are in List1, regardless
    % of the order.
    sublistmember(_, []) -> true;
    sublistmember(List1, List2) ->
        [Head|Tail] = List2,
        case lists:member(Head, List1) of
            true -> sublistmember(List1, Tail);
            false -> false
        end.

Its complexity is O(len(List1)) * O(len(List2)) which is not actually a
problem for small lists (I can limit the number of tags the user may
provide) but I was wondering if there is a more effective way.  I
thought about using sets to simplify but list -> set -> list conversion
may be even slower, right?  Any suggestions on this?

Thanks!

--
Silas
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Checking if the keys of a list of tuples match a list

Mikael Pettersson-5
On Fri, Apr 13, 2018 at 5:18 PM, Silas <[hidden email]> wrote:

> 2. Now a question about Erlang data structures: To check if a list has   all
> elements of another list (regardless of the order), I'm using   this:
>
>    % This function returns true if all elements in List2 are in List1,
> regardless
>    % of the order.
>    sublistmember(_, []) -> true;
>    sublistmember(List1, List2) ->
>        [Head|Tail] = List2,
>        case lists:member(Head, List1) of
>            true -> sublistmember(List1, Tail);
>            false -> false
>        end.
>
> Its complexity is O(len(List1)) * O(len(List2)) which is not actually a
> problem for small lists (I can limit the number of tags the user may
> provide) but I was wondering if there is a more effective way.  I thought
> about using sets to simplify but list -> set -> list conversion may be even
> slower, right?  Any suggestions on this?

lists:member/2 is a BIF, so implemented by C code in the VM.  For
short lists this is going to be pretty much optimal.  I'd move the
[Head|Tail] extraction into the function head however.

You can easily do the check in linear time if the lists are sorted;
otherwise you may pre-sort for O(N*logN) overall cost.  But this is
only worthwhile for longer lists, so you'd have to run some
measurements to determine a suitable cut-off point.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions