Exercise: ETS with secondary indices

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Exercise: ETS with secondary indices

Roberto Ostinelli-5
All,
I'm doing a conceptual exercise to evaluate the usage of ETS with secondary indices.

The exercise is really simple:
  1. There are elements which can belong to groups, in a m-to-n relationship (every element can belong to one or more groups).
  2. Elements and groups can be created/deleted at will.
  3. We need to find the groups an element belongs to.
  4. We need to find the elements included into a group.
  5. We need to not keep an element if it does not belong to a group, nor a group if it does not have elements in it (this is to avoid memory leaks since groups / elements get added / removed).
All of these operations should be optimized as much as possible.


A. One possibility is to have 2 ETS tables of type set, that contain the following terms:

1. table elements with tuple format {Element, Groups}
2. table groups with tuple format {Group, Elements}

Groups and Elements would be maps (so that finding an element of the map does not require traversing an array).

Adding an Element to a Group would mean:
  • An insert in the table elements where the Group gets added to the Groups map (bonus of using a map: no duplicates can be created).
  • An insert in the table groups where the Element gets added to the Elements map.
Retrieving the groups an element belongs to is simple as getting the Element in the table elements, the groups will be keys of the Groups map.
Retrieving the elements a group contains to is simple as getting the Group in the table groups, the elements will be keys of the Elements map.

Deleting is far from being optimized though. For instance, deleting an Element means:
  • Getting the Groups it belongs to with a lookup in the table elements.
  • For every Group in the Groups map, remove the Element from the Elements map.
It becomes something like this:

case ets:lookup(elements, Element) of
    [{Element, Groups}] ->
        ets:delete(elements, Element),
        lists:foreach(fun({Group}) ->
            case ets:lookup(groups, Group) of
                [{Group, Elements}] ->
                    case maps:remove(Element, Elements) of
                        Elements1 when map_size(Elements1) == 0 ->
                            ets:delete(groups, Group);
                        Elements1 ->
                            ets:insert(groups, {Group, Elements1})
                    end;
                _ ->
                    ok
            end
        end, maps:keys(Groups))
    _ ->
        ok
end.


Ouch. Same goes for groups. And this is to delete one element, imagine if bigger operations need to be made.


B. Another possibility would be to have 2 ETS tables of type bag, that contain the following terms:

1. table elements with tuple format {Element, Group}
2. table groups with tuple format {Group, Element}

Adding an Element to a Group means:
  • An insert in the table elements of the tuple {Element, Group}.
  • An insert in the table groups of the tuple {Group, Element}.
Retrieving the groups an element belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(elements, {Element, _ = '_'}).

Retrieving the elements a group belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(groups, {Group, _ = '_'}).

So retrieving requires the traversing of a table, but it should be relatively quick, given that the match is done on the index itself.

Deleting is not particularly optimized though, because it requires the traversing of a table with elements that are not indexed. For instance, deleting an Element would look something like:
ets:match_delete(elements, {Element, _ = '_'}). %% fast, indexed
ets:match_delete(groups, {_ = '_', Element}). %% not fast, table is being traversed.


What would you do? Any takers are welcome ^^_

Cheers,
r.


Reply | Threaded
Open this post in threaded view
|

Re: Exercise: ETS with secondary indices

Sverker Eriksson-5
Here is a variant of your second solution. But instead insert each element-group pair as a key
and use ordered_set which has an optimization for traversal with partially bound key.


ets:new(elements,[ordered_set,named_table]).
ets:new(groups,[ordered_set,named_table]).

% Insert element
[ets:insert(elements, {{Element,G}}) || G <- Groups],
[ets:insert(groups, {{G,Element}}) || G <- Groups],


% Lookup groups from element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),

% Lookup elements from group
Elements = ets:select(groups, [{{{Group,'$1'}}, [], ['$1']}]),

% Delete element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
ets:match_delete(elements, {{Element, '_'}}),
[ets:delete(groups, {G,Element}) || G <- Groups],


All select and match traversals use a partially bound key and will
only search the part of the ordered_set (tree) that may contain such keys.


/Sverker

On lör, 2019-12-07 at 18:54 +0100, Roberto Ostinelli wrote:
All,
I'm doing a conceptual exercise to evaluate the usage of ETS with secondary indices.

The exercise is really simple:
  1. There are elements which can belong to groups, in a m-to-n relationship (every element can belong to one or more groups).
  2. Elements and groups can be created/deleted at will.
  3. We need to find the groups an element belongs to.
  4. We need to find the elements included into a group.
  5. We need to not keep an element if it does not belong to a group, nor a group if it does not have elements in it (this is to avoid memory leaks since groups / elements get added / removed).
All of these operations should be optimized as much as possible.


A. One possibility is to have 2 ETS tables of type set, that contain the following terms:

1. table elements with tuple format {Element, Groups}
2. table groups with tuple format {Group, Elements}

Groups and Elements would be maps (so that finding an element of the map does not require traversing an array).

Adding an Element to a Group would mean:
  • An insert in the table elements where the Group gets added to the Groups map (bonus of using a map: no duplicates can be created).
  • An insert in the table groups where the Element gets added to the Elements map.
Retrieving the groups an element belongs to is simple as getting the Element in the table elements, the groups will be keys of the Groups map.
Retrieving the elements a group contains to is simple as getting the Group in the table groups, the elements will be keys of the Elements map.

Deleting is far from being optimized though. For instance, deleting an Element means:
  • Getting the Groups it belongs to with a lookup in the table elements.
  • For every Group in the Groups map, remove the Element from the Elements map.
It becomes something like this:

case ets:lookup(elements, Element) of
    [{Element, Groups}] ->
        ets:delete(elements, Element),
        lists:foreach(fun({Group}) ->
            case ets:lookup(groups, Group) of
                [{Group, Elements}] ->
                    case maps:remove(Element, Elements) of
                        Elements1 when map_size(Elements1) == 0 ->
                            ets:delete(groups, Group);
                        Elements1 ->
                            ets:insert(groups, {Group, Elements1})
                    end;
                _ ->
                    ok
            end
        end, maps:keys(Groups))
    _ ->
        ok
end.


Ouch. Same goes for groups. And this is to delete one element, imagine if bigger operations need to be made.


B. Another possibility would be to have 2 ETS tables of type bag, that contain the following terms:

1. table elements with tuple format {Element, Group}
2. table groups with tuple format {Group, Element}

Adding an Element to a Group means:
  • An insert in the table elements of the tuple {Element, Group}.
  • An insert in the table groups of the tuple {Group, Element}.
Retrieving the groups an element belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(elements, {Element, _ = '_'}).

Retrieving the elements a group belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(groups, {Group, _ = '_'}).

So retrieving requires the traversing of a table, but it should be relatively quick, given that the match is done on the index itself.

Deleting is not particularly optimized though, because it requires the traversing of a table with elements that are not indexed. For instance, deleting an Element would look something like:
ets:match_delete(elements, {Element, _ = '_'}). %% fast, indexed
ets:match_delete(groups, {_ = '_', Element}). %% not fast, table is being traversed.


What would you do? Any takers are welcome ^^_

Cheers,
r.


Reply | Threaded
Open this post in threaded view
|

Re: Exercise: ETS with secondary indices

Roberto Ostinelli-5
Sverker,
This looks very nice. I was considering using composite keys but I didn't know that partial lookups would work.

Seems definitely cleaner, and if I understand correctly the lookup time will be the same as with a single key, with the added benefit of the delete functionalities (please correct me if I'm wrong).

Thank you so much,
r.


On Mon, Dec 9, 2019 at 8:15 PM Sverker Eriksson <[hidden email]> wrote:
Here is a variant of your second solution. But instead insert each element-group pair as a key
and use ordered_set which has an optimization for traversal with partially bound key.


ets:new(elements,[ordered_set,named_table]).
ets:new(groups,[ordered_set,named_table]).

% Insert element
[ets:insert(elements, {{Element,G}}) || G <- Groups],
[ets:insert(groups, {{G,Element}}) || G <- Groups],


% Lookup groups from element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),

% Lookup elements from group
Elements = ets:select(groups, [{{{Group,'$1'}}, [], ['$1']}]),

% Delete element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
ets:match_delete(elements, {{Element, '_'}}),
[ets:delete(groups, {G,Element}) || G <- Groups],


All select and match traversals use a partially bound key and will
only search the part of the ordered_set (tree) that may contain such keys.


/Sverker

On lör, 2019-12-07 at 18:54 +0100, Roberto Ostinelli wrote:
All,
I'm doing a conceptual exercise to evaluate the usage of ETS with secondary indices.

The exercise is really simple:
  1. There are elements which can belong to groups, in a m-to-n relationship (every element can belong to one or more groups).
  2. Elements and groups can be created/deleted at will.
  3. We need to find the groups an element belongs to.
  4. We need to find the elements included into a group.
  5. We need to not keep an element if it does not belong to a group, nor a group if it does not have elements in it (this is to avoid memory leaks since groups / elements get added / removed).
All of these operations should be optimized as much as possible.


A. One possibility is to have 2 ETS tables of type set, that contain the following terms:

1. table elements with tuple format {Element, Groups}
2. table groups with tuple format {Group, Elements}

Groups and Elements would be maps (so that finding an element of the map does not require traversing an array).

Adding an Element to a Group would mean:
  • An insert in the table elements where the Group gets added to the Groups map (bonus of using a map: no duplicates can be created).
  • An insert in the table groups where the Element gets added to the Elements map.
Retrieving the groups an element belongs to is simple as getting the Element in the table elements, the groups will be keys of the Groups map.
Retrieving the elements a group contains to is simple as getting the Group in the table groups, the elements will be keys of the Elements map.

Deleting is far from being optimized though. For instance, deleting an Element means:
  • Getting the Groups it belongs to with a lookup in the table elements.
  • For every Group in the Groups map, remove the Element from the Elements map.
It becomes something like this:

case ets:lookup(elements, Element) of
    [{Element, Groups}] ->
        ets:delete(elements, Element),
        lists:foreach(fun({Group}) ->
            case ets:lookup(groups, Group) of
                [{Group, Elements}] ->
                    case maps:remove(Element, Elements) of
                        Elements1 when map_size(Elements1) == 0 ->
                            ets:delete(groups, Group);
                        Elements1 ->
                            ets:insert(groups, {Group, Elements1})
                    end;
                _ ->
                    ok
            end
        end, maps:keys(Groups))
    _ ->
        ok
end.


Ouch. Same goes for groups. And this is to delete one element, imagine if bigger operations need to be made.


B. Another possibility would be to have 2 ETS tables of type bag, that contain the following terms:

1. table elements with tuple format {Element, Group}
2. table groups with tuple format {Group, Element}

Adding an Element to a Group means:
  • An insert in the table elements of the tuple {Element, Group}.
  • An insert in the table groups of the tuple {Group, Element}.
Retrieving the groups an element belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(elements, {Element, _ = '_'}).

Retrieving the elements a group belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(groups, {Group, _ = '_'}).

So retrieving requires the traversing of a table, but it should be relatively quick, given that the match is done on the index itself.

Deleting is not particularly optimized though, because it requires the traversing of a table with elements that are not indexed. For instance, deleting an Element would look something like:
ets:match_delete(elements, {Element, _ = '_'}). %% fast, indexed
ets:match_delete(groups, {_ = '_', Element}). %% not fast, table is being traversed.


What would you do? Any takers are welcome ^^_

Cheers,
r.


Reply | Threaded
Open this post in threaded view
|

Re: Exercise: ETS with secondary indices

Roberto Ostinelli-5
Related, is there any difference in terms of performance in this case (ordered_set) between:

ets:select(groups, [{
    {{Name, '$2'}, '_', '_', '_'},
    [],
    ['$2']
}])


and

ets:select(groups, [{
    {{$1, '$2'}, '_', '_', '_'},
    [{'=:=', '$1', Name}],
    ['$2']
}])


On Tue, Dec 10, 2019 at 12:34 PM Roberto Ostinelli <[hidden email]> wrote:
Sverker,
This looks very nice. I was considering using composite keys but I didn't know that partial lookups would work.

Seems definitely cleaner, and if I understand correctly the lookup time will be the same as with a single key, with the added benefit of the delete functionalities (please correct me if I'm wrong).

Thank you so much,
r.


On Mon, Dec 9, 2019 at 8:15 PM Sverker Eriksson <[hidden email]> wrote:
Here is a variant of your second solution. But instead insert each element-group pair as a key
and use ordered_set which has an optimization for traversal with partially bound key.


ets:new(elements,[ordered_set,named_table]).
ets:new(groups,[ordered_set,named_table]).

% Insert element
[ets:insert(elements, {{Element,G}}) || G <- Groups],
[ets:insert(groups, {{G,Element}}) || G <- Groups],


% Lookup groups from element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),

% Lookup elements from group
Elements = ets:select(groups, [{{{Group,'$1'}}, [], ['$1']}]),

% Delete element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
ets:match_delete(elements, {{Element, '_'}}),
[ets:delete(groups, {G,Element}) || G <- Groups],


All select and match traversals use a partially bound key and will
only search the part of the ordered_set (tree) that may contain such keys.


/Sverker

On lör, 2019-12-07 at 18:54 +0100, Roberto Ostinelli wrote:
All,
I'm doing a conceptual exercise to evaluate the usage of ETS with secondary indices.

The exercise is really simple:
  1. There are elements which can belong to groups, in a m-to-n relationship (every element can belong to one or more groups).
  2. Elements and groups can be created/deleted at will.
  3. We need to find the groups an element belongs to.
  4. We need to find the elements included into a group.
  5. We need to not keep an element if it does not belong to a group, nor a group if it does not have elements in it (this is to avoid memory leaks since groups / elements get added / removed).
All of these operations should be optimized as much as possible.


A. One possibility is to have 2 ETS tables of type set, that contain the following terms:

1. table elements with tuple format {Element, Groups}
2. table groups with tuple format {Group, Elements}

Groups and Elements would be maps (so that finding an element of the map does not require traversing an array).

Adding an Element to a Group would mean:
  • An insert in the table elements where the Group gets added to the Groups map (bonus of using a map: no duplicates can be created).
  • An insert in the table groups where the Element gets added to the Elements map.
Retrieving the groups an element belongs to is simple as getting the Element in the table elements, the groups will be keys of the Groups map.
Retrieving the elements a group contains to is simple as getting the Group in the table groups, the elements will be keys of the Elements map.

Deleting is far from being optimized though. For instance, deleting an Element means:
  • Getting the Groups it belongs to with a lookup in the table elements.
  • For every Group in the Groups map, remove the Element from the Elements map.
It becomes something like this:

case ets:lookup(elements, Element) of
    [{Element, Groups}] ->
        ets:delete(elements, Element),
        lists:foreach(fun({Group}) ->
            case ets:lookup(groups, Group) of
                [{Group, Elements}] ->
                    case maps:remove(Element, Elements) of
                        Elements1 when map_size(Elements1) == 0 ->
                            ets:delete(groups, Group);
                        Elements1 ->
                            ets:insert(groups, {Group, Elements1})
                    end;
                _ ->
                    ok
            end
        end, maps:keys(Groups))
    _ ->
        ok
end.


Ouch. Same goes for groups. And this is to delete one element, imagine if bigger operations need to be made.


B. Another possibility would be to have 2 ETS tables of type bag, that contain the following terms:

1. table elements with tuple format {Element, Group}
2. table groups with tuple format {Group, Element}

Adding an Element to a Group means:
  • An insert in the table elements of the tuple {Element, Group}.
  • An insert in the table groups of the tuple {Group, Element}.
Retrieving the groups an element belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(elements, {Element, _ = '_'}).

Retrieving the elements a group belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(groups, {Group, _ = '_'}).

So retrieving requires the traversing of a table, but it should be relatively quick, given that the match is done on the index itself.

Deleting is not particularly optimized though, because it requires the traversing of a table with elements that are not indexed. For instance, deleting an Element would look something like:
ets:match_delete(elements, {Element, _ = '_'}). %% fast, indexed
ets:match_delete(groups, {_ = '_', Element}). %% not fast, table is being traversed.


What would you do? Any takers are welcome ^^_

Cheers,
r.


Reply | Threaded
Open this post in threaded view
|

Re: Exercise: ETS with secondary indices

Sverker Eriksson-5
The ordered_set optimization traversal with partially bound key
does only trigger on the MatchHead tuple and not the MatchConditions list.

So both the select calls will have the same result, but the second one will traverse the entire search tree.


With a partially bound key like {Name, '_'} the total "lookup time" is roughly  the same as doing one lookup per matching key.

With a partially bound key like {Name, '_', Other} the the total lookup time may be longer
as it have to visit all keys matching {Name, '_', '_'} and check if Other also matches.

/Sverker

On tis, 2019-12-10 at 20:26 +0100, Roberto Ostinelli wrote:
Related, is there any difference in terms of performance in this case (ordered_set) between:

ets:select(groups, [{
    {{Name, '$2'}, '_', '_', '_'},
    [],
    ['$2']
}])


and

ets:select(groups, [{
    {{$1, '$2'}, '_', '_', '_'},
    [{'=:=', '$1', Name}],
    ['$2']
}])


On Tue, Dec 10, 2019 at 12:34 PM Roberto Ostinelli <[hidden email]> wrote:
Sverker,
This looks very nice. I was considering using composite keys but I didn't know that partial lookups would work.

Seems definitely cleaner, and if I understand correctly the lookup time will be the same as with a single key, with the added benefit of the delete functionalities (please correct me if I'm wrong).

Thank you so much,
r.


On Mon, Dec 9, 2019 at 8:15 PM Sverker Eriksson <[hidden email]> wrote:
Here is a variant of your second solution. But instead insert each element-group pair as a key
and use ordered_set which has an optimization for traversal with partially bound key.


ets:new(elements,[ordered_set,named_table]).
ets:new(groups,[ordered_set,named_table]).

% Insert element
[ets:insert(elements, {{Element,G}}) || G <- Groups],
[ets:insert(groups, {{G,Element}}) || G <- Groups],


% Lookup groups from element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),

% Lookup elements from group
Elements = ets:select(groups, [{{{Group,'$1'}}, [], ['$1']}]),

% Delete element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
ets:match_delete(elements, {{Element, '_'}}),
[ets:delete(groups, {G,Element}) || G <- Groups],


All select and match traversals use a partially bound key and will
only search the part of the ordered_set (tree) that may contain such keys.


/Sverker

On lör, 2019-12-07 at 18:54 +0100, Roberto Ostinelli wrote:
All,
I'm doing a conceptual exercise to evaluate the usage of ETS with secondary indices.

The exercise is really simple:
  1. There are elements which can belong to groups, in a m-to-n relationship (every element can belong to one or more groups).
  2. Elements and groups can be created/deleted at will.
  3. We need to find the groups an element belongs to.
  4. We need to find the elements included into a group.
  5. We need to not keep an element if it does not belong to a group, nor a group if it does not have elements in it (this is to avoid memory leaks since groups / elements get added / removed).
All of these operations should be optimized as much as possible.


A. One possibility is to have 2 ETS tables of type set, that contain the following terms:

1. table elements with tuple format {Element, Groups}
2. table groups with tuple format {Group, Elements}

Groups and Elements would be maps (so that finding an element of the map does not require traversing an array).

Adding an Element to a Group would mean:
  • An insert in the table elements where the Group gets added to the Groups map (bonus of using a map: no duplicates can be created).
  • An insert in the table groups where the Element gets added to the Elements map.
Retrieving the groups an element belongs to is simple as getting the Element in the table elements, the groups will be keys of the Groups map.
Retrieving the elements a group contains to is simple as getting the Group in the table groups, the elements will be keys of the Elements map.

Deleting is far from being optimized though. For instance, deleting an Element means:
  • Getting the Groups it belongs to with a lookup in the table elements.
  • For every Group in the Groups map, remove the Element from the Elements map.
It becomes something like this:

case ets:lookup(elements, Element) of
    [{Element, Groups}] ->
        ets:delete(elements, Element),
        lists:foreach(fun({Group}) ->
            case ets:lookup(groups, Group) of
                [{Group, Elements}] ->
                    case maps:remove(Element, Elements) of
                        Elements1 when map_size(Elements1) == 0 ->
                            ets:delete(groups, Group);
                        Elements1 ->
                            ets:insert(groups, {Group, Elements1})
                    end;
                _ ->
                    ok
            end
        end, maps:keys(Groups))
    _ ->
        ok
end.


Ouch. Same goes for groups. And this is to delete one element, imagine if bigger operations need to be made.


B. Another possibility would be to have 2 ETS tables of type bag, that contain the following terms:

1. table elements with tuple format {Element, Group}
2. table groups with tuple format {Group, Element}

Adding an Element to a Group means:
  • An insert in the table elements of the tuple {Element, Group}.
  • An insert in the table groups of the tuple {Group, Element}.
Retrieving the groups an element belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(elements, {Element, _ = '_'}).

Retrieving the elements a group belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(groups, {Group, _ = '_'}).

So retrieving requires the traversing of a table, but it should be relatively quick, given that the match is done on the index itself.

Deleting is not particularly optimized though, because it requires the traversing of a table with elements that are not indexed. For instance, deleting an Element would look something like:
ets:match_delete(elements, {Element, _ = '_'}). %% fast, indexed
ets:match_delete(groups, {_ = '_', Element}). %% not fast, table is being traversed.


What would you do? Any takers are welcome ^^_

Cheers,
r.




Reply | Threaded
Open this post in threaded view
|

Re: Exercise: ETS with secondary indices

Roberto Ostinelli-5
Thank you Sverker!

On Wed, Dec 11, 2019 at 3:06 PM Sverker Eriksson <[hidden email]> wrote:
The ordered_set optimization traversal with partially bound key
does only trigger on the MatchHead tuple and not the MatchConditions list.

So both the select calls will have the same result, but the second one will traverse the entire search tree.


With a partially bound key like {Name, '_'} the total "lookup time" is roughly  the same as doing one lookup per matching key.

With a partially bound key like {Name, '_', Other} the the total lookup time may be longer
as it have to visit all keys matching {Name, '_', '_'} and check if Other also matches.

/Sverker

On tis, 2019-12-10 at 20:26 +0100, Roberto Ostinelli wrote:
Related, is there any difference in terms of performance in this case (ordered_set) between:

ets:select(groups, [{
    {{Name, '$2'}, '_', '_', '_'},
    [],
    ['$2']
}])


and

ets:select(groups, [{
    {{$1, '$2'}, '_', '_', '_'},
    [{'=:=', '$1', Name}],
    ['$2']
}])


On Tue, Dec 10, 2019 at 12:34 PM Roberto Ostinelli <[hidden email]> wrote:
Sverker,
This looks very nice. I was considering using composite keys but I didn't know that partial lookups would work.

Seems definitely cleaner, and if I understand correctly the lookup time will be the same as with a single key, with the added benefit of the delete functionalities (please correct me if I'm wrong).

Thank you so much,
r.


On Mon, Dec 9, 2019 at 8:15 PM Sverker Eriksson <[hidden email]> wrote:
Here is a variant of your second solution. But instead insert each element-group pair as a key
and use ordered_set which has an optimization for traversal with partially bound key.


ets:new(elements,[ordered_set,named_table]).
ets:new(groups,[ordered_set,named_table]).

% Insert element
[ets:insert(elements, {{Element,G}}) || G <- Groups],
[ets:insert(groups, {{G,Element}}) || G <- Groups],


% Lookup groups from element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),

% Lookup elements from group
Elements = ets:select(groups, [{{{Group,'$1'}}, [], ['$1']}]),

% Delete element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
ets:match_delete(elements, {{Element, '_'}}),
[ets:delete(groups, {G,Element}) || G <- Groups],


All select and match traversals use a partially bound key and will
only search the part of the ordered_set (tree) that may contain such keys.


/Sverker

On lör, 2019-12-07 at 18:54 +0100, Roberto Ostinelli wrote:
All,
I'm doing a conceptual exercise to evaluate the usage of ETS with secondary indices.

The exercise is really simple:
  1. There are elements which can belong to groups, in a m-to-n relationship (every element can belong to one or more groups).
  2. Elements and groups can be created/deleted at will.
  3. We need to find the groups an element belongs to.
  4. We need to find the elements included into a group.
  5. We need to not keep an element if it does not belong to a group, nor a group if it does not have elements in it (this is to avoid memory leaks since groups / elements get added / removed).
All of these operations should be optimized as much as possible.


A. One possibility is to have 2 ETS tables of type set, that contain the following terms:

1. table elements with tuple format {Element, Groups}
2. table groups with tuple format {Group, Elements}

Groups and Elements would be maps (so that finding an element of the map does not require traversing an array).

Adding an Element to a Group would mean:
  • An insert in the table elements where the Group gets added to the Groups map (bonus of using a map: no duplicates can be created).
  • An insert in the table groups where the Element gets added to the Elements map.
Retrieving the groups an element belongs to is simple as getting the Element in the table elements, the groups will be keys of the Groups map.
Retrieving the elements a group contains to is simple as getting the Group in the table groups, the elements will be keys of the Elements map.

Deleting is far from being optimized though. For instance, deleting an Element means:
  • Getting the Groups it belongs to with a lookup in the table elements.
  • For every Group in the Groups map, remove the Element from the Elements map.
It becomes something like this:

case ets:lookup(elements, Element) of
    [{Element, Groups}] ->
        ets:delete(elements, Element),
        lists:foreach(fun({Group}) ->
            case ets:lookup(groups, Group) of
                [{Group, Elements}] ->
                    case maps:remove(Element, Elements) of
                        Elements1 when map_size(Elements1) == 0 ->
                            ets:delete(groups, Group);
                        Elements1 ->
                            ets:insert(groups, {Group, Elements1})
                    end;
                _ ->
                    ok
            end
        end, maps:keys(Groups))
    _ ->
        ok
end.


Ouch. Same goes for groups. And this is to delete one element, imagine if bigger operations need to be made.


B. Another possibility would be to have 2 ETS tables of type bag, that contain the following terms:

1. table elements with tuple format {Element, Group}
2. table groups with tuple format {Group, Element}

Adding an Element to a Group means:
  • An insert in the table elements of the tuple {Element, Group}.
  • An insert in the table groups of the tuple {Group, Element}.
Retrieving the groups an element belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(elements, {Element, _ = '_'}).

Retrieving the elements a group belongs to requires an ets:match_object/2 or similar such as:
ets:match_object(groups, {Group, _ = '_'}).

So retrieving requires the traversing of a table, but it should be relatively quick, given that the match is done on the index itself.

Deleting is not particularly optimized though, because it requires the traversing of a table with elements that are not indexed. For instance, deleting an Element would look something like:
ets:match_delete(elements, {Element, _ = '_'}). %% fast, indexed
ets:match_delete(groups, {_ = '_', Element}). %% not fast, table is being traversed.


What would you do? Any takers are welcome ^^_

Cheers,
r.