Tuples referencing each other problem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Tuples referencing each other problem

Grzegorz Junka
Is it possible to create two tuples that reference each other in Erlang?
Something like:

T1 = {T2}
T2 = {T1}

My understanding is that when using pointers (i.e. in C++) one needs to
mutate the first tuple in order to store the reference to the second
tuple once it has been created.

This is of course a generalization of a more specific problem of
creating liked structures like a double linked list or a tree with leafs
containing links to each other. Are there any techniques that would
allow creating algorithms operating on such structures in Erlang?

GrzegorzJ


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Krzysztof Jurewicz
Grzegorz Junka writes:

> Is it possible to create two tuples that reference each other in Erlang?
> Something like:
>
> T1 = {T2}
> T2 = {T1}

Data structures in Erlang are generally immutable, hence it is not possible to do it in a simple way. To achieve this, you would need to introduce mutability, for example store tuples in a database and reference them by database ids (you may want to look at ETS for this purpose).
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Mikael Pettersson-5
In reply to this post by Grzegorz Junka
Grzegorz Junka writes:
 > Is it possible to create two tuples that reference each other in Erlang?
 > Something like:
 >
 > T1 = {T2}
 > T2 = {T1}
 >
 > My understanding is that when using pointers (i.e. in C++) one needs to
 > mutate the first tuple in order to store the reference to the second
 > tuple once it has been created.
 >
 > This is of course a generalization of a more specific problem of
 > creating liked structures like a double linked list or a tree with leafs
 > containing links to each other. Are there any techniques that would
 > allow creating algorithms operating on such structures in Erlang?

Short answer: No, you cannot create circular data structures in
Erlang.

Erlang "terms" are made up of leafs (pids, ports, numbers, atoms,
the empty list [], binaries, refs) and aggregates (list cells,
tuples, function closures, maps).  Leafs do not refer to other
terms, and aggregates are write-once: they're fully initialized
when created and then remain immutable.  When you see an "update"
operation like setelement/2, it actually creates a copy where
the target element has been changed, leaving the old term unchanged.

There are various ways to simulate assignment:

1. Put labels in your data where you'd otherwise have direct
   references, and have a mapping from labels to terms.  Update
   that mapping when needed -- but you also have to pass it around
   so the update is seen by subsequent code.

   Erlang now has a built-in "map" type that provides such mappings,
   otherwise there are tree and list based mapping ADTs in the standard
   library, or you implement your own.

2. Use an ETS table.  This is an external (to your process) bag
   of Erlang terms, where you can insert/lookup/delete terms.
   This is a stateful object, so inserts/deletes update the
   existing table rather than creating a copy with that change.

   You still have to "indirect" via labels, but you avoid the
   need for passing around an updated mapping.

3. Use a helper process that keeps the state and allows you to
   query or update it through messages.  Semantically this is
   very similar to an ETS table.

4. You can use the process dictionary, but it's not recommended
   as it's a shared resource with both the standard libraries and
   any other code that might have the same idea.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Richard A. O'Keefe-2
In reply to this post by Grzegorz Junka


On 17/09/17 7:27 PM, Grzegorz Junka wrote:
> Is it possible to create two tuples that reference each other in Erlang?
> Something like:
>
> T1 = {T2}
> T2 = {T1}

No.
(1) When you create a new term in Erlang it must be fully
     specified.
(2) You cannot change it once it has been created.

> My understanding is that when using pointers (i.e. in C++) one needs to
> mutate the first tuple in order to store the reference to the second
> tuple once it has been created.

In Lisp also.  Oddly enough, not in Prolog:
| ?- T1 = {T2}, T2 = {T1}.
T1 = {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{

Erlang is closer to Prolog than it is to C++, but logical variables
are one of the things it did not copy.
>
> This is of course a generalization of a more specific problem of
> creating liked structures like a double linked list or a tree with leafs
> containing links to each other. Are there any techniques that would
> allow creating algorithms operating on such structures in Erlang?

No, and thank goodness!  You wouldn't *believe* how much trouble
such structures are in Lisp.  Or the grief it caused in Prolog.
Just as an example, what do you expect term comparison to do?

Doubly linked lists are almost always a very bad idea.
The back links in the Document Object Model (trees) are
a *stunningly* bad idea, because they prevent sharing.  (When
processing SGML and XML I use a "Document *Value* Model"
library I developed years ago.  It is so much easier and so
much more efficient that it really is not funny.)

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Richard A. O'Keefe-2
On further reflection, I suppose I should ask:
  Why is it that you think you need cyclic structures?
  What do you actually want to *do* with them?

As a matter of curiosity, have you read Okasaki's
book on functional data structures?

(Didn't someone convert Okasaki's Edison library
to Erlang at some point?  What happened to that?)
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Grzegorz Junka

Many thanks to all for your responses and suggestions.

To answer your question Richard, I don't actually need a cyclical structure. As I mentioned, this is a simplification of a more specific problem. I didn't want to describe all the specifics to ask a question. I need to design a tree, a variant of a HAMT tree, but which can be traversed in both directions, from the root to leafs and from leafs to the root (independently, i.e. starting from a reference to a leaf rather than recursively). Tuples that reference each other is one solution but not the only one fortunately, i.e. I can add ids to each node and use another data structure to store the mapping id-node, then store the id of the parent in the child.

Thank you for the pointer to the book.

GrzegorzJ


On 18/09/2017 00:53, Richard A. O'Keefe wrote:
On further reflection, I suppose I should ask:
 Why is it that you think you need cyclic structures?
 What do you actually want to *do* with them?

As a matter of curiosity, have you read Okasaki's
book on functional data structures?

(Didn't someone convert Okasaki's Edison library
to Erlang at some point?  What happened to that?)
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Brady Powers
When I think about a data structure that needs to be traversed in both directions, I immediately think of a zipper.
https://ferd.ca/yet-another-article-on-zippers.html


It might not be appropriate, it's just where my mind goes. 
On Thursday, September 21, 2017, 10:24:17 AM EDT, Grzegorz Junka <[hidden email]> wrote:


Many thanks to all for your responses and suggestions.

To answer your question Richard, I don't actually need a cyclical structure. As I mentioned, this is a simplification of a more specific problem. I didn't want to describe all the specifics to ask a question. I need to design a tree, a variant of a HAMT tree, but which can be traversed in both directions, from the root to leafs and from leafs to the root (independently, i.e. starting from a reference to a leaf rather than recursively). Tuples that reference each other is one solution but not the only one fortunately, i.e. I can add ids to each node and use another data structure to store the mapping id-node, then store the id of the parent in the child.

Thank you for the pointer to the book.

GrzegorzJ


On 18/09/2017 00:53, Richard A. O'Keefe wrote:
On further reflection, I suppose I should ask:
 Why is it that you think you need cyclic structures?
 What do you actually want to *do* with them?

As a matter of curiosity, have you read Okasaki's
book on functional data structures?

(Didn't someone convert Okasaki's Edison library
to Erlang at some point?  What happened to that?)
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Richard A. O'Keefe-2
In reply to this post by Grzegorz Junka


On 22/09/17 2:23 AM, Grzegorz Junka wrote:
> To answer your question Richard, I don't actually need a cyclical
> structure. As I mentioned, this is a simplification of a more specific
> problem. I didn't want to describe all the specifics to ask a question.
> I need to design a tree, a variant of a HAMT tree, but which can be
> traversed in both directions, from the root to leafs and from leafs to
> the root (independently, i.e. starting from a reference to a leaf rather
> than recursively).

Of course this pushes back the question a further step.
Why do you want to traverse the tree in both directions?

Might the "zipper" approach be any use?

If you want references to leaves, such references could be paths.
I used that technique once in a programming language that used
reference counting, so that link cycles meant no collection.
(Yes, I know modern reference-based collectors have solved that
problem.)  I've also used that technique for processing XML in
a pure functional language.

In an imperative context, that doesn't work very well because the
tree structure might change.  The DOM suffers from trying to make
this "work" in some sense, but trying to iterate over a graph
while something else is changing it is never going to be easy.

It is possible that the Haskell "Functional Graph Library"
may be relevant to your needs.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Grzegorz Junka
Zipper looks interesting but I don't think it would suit here. Keeping
paths to leafs could work but since the tree is dynamic those paths
would often change and it would be very difficult to update paths that
have already been stored.

In short, I have keys, which may be any Erlang terms, and numerical Ids
assigned to those terms. Keys must be sorted. Numerical Ids are
increasing monotonically. Then the following lookups should be efficient:

1. Having a key quickly search for its numeric Id
2. Having a numeric Id quickly get back the key

Also the following conditions should be met:

3. The same key should be always assigned the same numeric Id
4. One key has always one numeric Id (and vice versa), so they are
always added or removed together

Since keys can be of any length I don't want to store them more than
once. And since they must be sorted I was thinking about using a B+Tree.
Then another structure to map the numeric Id back to their keys.

An obvious solution would be to make the B+Tree bidirectional - the key
would be recreated by traversing the tree back from a leaf to the root.
The other structure would then need to keep a mapping between the
numeric Id and the reference to the leaf for the key corresponding to
that numeric Id.

But an equally valid solution would be to tag all nodes of the B+Tree
with numbers and store tags to parents in their children. Then in
another structure, e.g. an array, the mapping between the tag and its
node. That would also allow to traverse the tree backward from leafs to
the root.

GrzegorzJ


On 22/09/2017 00:37, Richard A. O'Keefe wrote:

>
>
> On 22/09/17 2:23 AM, Grzegorz Junka wrote:
>> To answer your question Richard, I don't actually need a cyclical
>> structure. As I mentioned, this is a simplification of a more specific
>> problem. I didn't want to describe all the specifics to ask a question.
>> I need to design a tree, a variant of a HAMT tree, but which can be
>> traversed in both directions, from the root to leafs and from leafs to
>> the root (independently, i.e. starting from a reference to a leaf rather
>> than recursively).
>
> Of course this pushes back the question a further step.
> Why do you want to traverse the tree in both directions?
>
> Might the "zipper" approach be any use?
>
> If you want references to leaves, such references could be paths.
> I used that technique once in a programming language that used
> reference counting, so that link cycles meant no collection.
> (Yes, I know modern reference-based collectors have solved that
> problem.)  I've also used that technique for processing XML in
> a pure functional language.
>
> In an imperative context, that doesn't work very well because the
> tree structure might change.  The DOM suffers from trying to make
> this "work" in some sense, but trying to iterate over a graph
> while something else is changing it is never going to be easy.
>
> It is possible that the Haskell "Functional Graph Library"
> may be relevant to your needs.
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Jesper Louis Andersen-2
Consider an ETS table and insert

[{Key, Id}, {Id, Key}]

This ensures fast lookup in both directions. Alternatively, run this in a map. I often do this when mapping monitor references in code.

On Sat, Sep 23, 2017 at 3:50 PM Grzegorz Junka <[hidden email]> wrote:
Zipper looks interesting but I don't think it would suit here. Keeping
paths to leafs could work but since the tree is dynamic those paths
would often change and it would be very difficult to update paths that
have already been stored.

In short, I have keys, which may be any Erlang terms, and numerical Ids
assigned to those terms. Keys must be sorted. Numerical Ids are
increasing monotonically. Then the following lookups should be efficient:

1. Having a key quickly search for its numeric Id
2. Having a numeric Id quickly get back the key

Also the following conditions should be met:

3. The same key should be always assigned the same numeric Id
4. One key has always one numeric Id (and vice versa), so they are
always added or removed together

Since keys can be of any length I don't want to store them more than
once. And since they must be sorted I was thinking about using a B+Tree.
Then another structure to map the numeric Id back to their keys.

An obvious solution would be to make the B+Tree bidirectional - the key
would be recreated by traversing the tree back from a leaf to the root.
The other structure would then need to keep a mapping between the
numeric Id and the reference to the leaf for the key corresponding to
that numeric Id.

But an equally valid solution would be to tag all nodes of the B+Tree
with numbers and store tags to parents in their children. Then in
another structure, e.g. an array, the mapping between the tag and its
node. That would also allow to traverse the tree backward from leafs to
the root.

GrzegorzJ


On 22/09/2017 00:37, Richard A. O'Keefe wrote:
>
>
> On 22/09/17 2:23 AM, Grzegorz Junka wrote:
>> To answer your question Richard, I don't actually need a cyclical
>> structure. As I mentioned, this is a simplification of a more specific
>> problem. I didn't want to describe all the specifics to ask a question.
>> I need to design a tree, a variant of a HAMT tree, but which can be
>> traversed in both directions, from the root to leafs and from leafs to
>> the root (independently, i.e. starting from a reference to a leaf rather
>> than recursively).
>
> Of course this pushes back the question a further step.
> Why do you want to traverse the tree in both directions?
>
> Might the "zipper" approach be any use?
>
> If you want references to leaves, such references could be paths.
> I used that technique once in a programming language that used
> reference counting, so that link cycles meant no collection.
> (Yes, I know modern reference-based collectors have solved that
> problem.)  I've also used that technique for processing XML in
> a pure functional language.
>
> In an imperative context, that doesn't work very well because the
> tree structure might change.  The DOM suffers from trying to make
> this "work" in some sense, but trying to iterate over a graph
> while something else is changing it is never going to be easy.
>
> It is possible that the Haskell "Functional Graph Library"
> may be relevant to your needs.
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Grzegorz Junka

Thanks but that won't work. As mentioned keys must be sorted so that I can quickly query for range from min/max key. I could add an index but in that case the key would be stored twice, once in the index and once in the value, which I wanted to avoid.

GrzegorzJ


On 24/09/2017 15:29, Jesper Louis Andersen wrote:
Consider an ETS table and insert

[{Key, Id}, {Id, Key}]

This ensures fast lookup in both directions. Alternatively, run this in a map. I often do this when mapping monitor references in code.

On Sat, Sep 23, 2017 at 3:50 PM Grzegorz Junka <[hidden email]> wrote:
Zipper looks interesting but I don't think it would suit here. Keeping
paths to leafs could work but since the tree is dynamic those paths
would often change and it would be very difficult to update paths that
have already been stored.

In short, I have keys, which may be any Erlang terms, and numerical Ids
assigned to those terms. Keys must be sorted. Numerical Ids are
increasing monotonically. Then the following lookups should be efficient:

1. Having a key quickly search for its numeric Id
2. Having a numeric Id quickly get back the key

Also the following conditions should be met:

3. The same key should be always assigned the same numeric Id
4. One key has always one numeric Id (and vice versa), so they are
always added or removed together

Since keys can be of any length I don't want to store them more than
once. And since they must be sorted I was thinking about using a B+Tree.
Then another structure to map the numeric Id back to their keys.

An obvious solution would be to make the B+Tree bidirectional - the key
would be recreated by traversing the tree back from a leaf to the root.
The other structure would then need to keep a mapping between the
numeric Id and the reference to the leaf for the key corresponding to
that numeric Id.

But an equally valid solution would be to tag all nodes of the B+Tree
with numbers and store tags to parents in their children. Then in
another structure, e.g. an array, the mapping between the tag and its
node. That would also allow to traverse the tree backward from leafs to
the root.

GrzegorzJ


On 22/09/2017 00:37, Richard A. O'Keefe wrote:
>
>
> On 22/09/17 2:23 AM, Grzegorz Junka wrote:
>> To answer your question Richard, I don't actually need a cyclical
>> structure. As I mentioned, this is a simplification of a more specific
>> problem. I didn't want to describe all the specifics to ask a question.
>> I need to design a tree, a variant of a HAMT tree, but which can be
>> traversed in both directions, from the root to leafs and from leafs to
>> the root (independently, i.e. starting from a reference to a leaf rather
>> than recursively).
>
> Of course this pushes back the question a further step.
> Why do you want to traverse the tree in both directions?
>
> Might the "zipper" approach be any use?
>
> If you want references to leaves, such references could be paths.
> I used that technique once in a programming language that used
> reference counting, so that link cycles meant no collection.
> (Yes, I know modern reference-based collectors have solved that
> problem.)  I've also used that technique for processing XML in
> a pure functional language.
>
> In an imperative context, that doesn't work very well because the
> tree structure might change.  The DOM suffers from trying to make
> this "work" in some sense, but trying to iterate over a graph
> while something else is changing it is never going to be easy.
>
> It is possible that the Haskell "Functional Graph Library"
> may be relevant to your needs.
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Jesper Louis Andersen-2
Of course it will work: the ETS table is an ordered_set. This ensures that you have Term order. Now, in Erlang, numbers and thus integers have order less than everything else. In short, min/1 is given by a call to ets:first/1. To get max, we observe that the atom 'x' is not part of your keys, nor part of the integers. So it is larger than any integer. If we ask ets:prev/2 on that 'x', we get max/1.

This covers the range operations on IDs. For the range operations on keys, it is even easier because they already have sorted order. You can also get rather fast traversal by using ets:select/1,2 and using the continuations to walk the data set in larger chunks.

On Sun, Sep 24, 2017 at 6:02 PM Grzegorz Junka <[hidden email]> wrote:

Thanks but that won't work. As mentioned keys must be sorted so that I can quickly query for range from min/max key. I could add an index but in that case the key would be stored twice, once in the index and once in the value, which I wanted to avoid.

GrzegorzJ


On 24/09/2017 15:29, Jesper Louis Andersen wrote:
Consider an ETS table and insert

[{Key, Id}, {Id, Key}]

This ensures fast lookup in both directions. Alternatively, run this in a map. I often do this when mapping monitor references in code.

On Sat, Sep 23, 2017 at 3:50 PM Grzegorz Junka <[hidden email]> wrote:
Zipper looks interesting but I don't think it would suit here. Keeping
paths to leafs could work but since the tree is dynamic those paths
would often change and it would be very difficult to update paths that
have already been stored.

In short, I have keys, which may be any Erlang terms, and numerical Ids
assigned to those terms. Keys must be sorted. Numerical Ids are
increasing monotonically. Then the following lookups should be efficient:

1. Having a key quickly search for its numeric Id
2. Having a numeric Id quickly get back the key

Also the following conditions should be met:

3. The same key should be always assigned the same numeric Id
4. One key has always one numeric Id (and vice versa), so they are
always added or removed together

Since keys can be of any length I don't want to store them more than
once. And since they must be sorted I was thinking about using a B+Tree.
Then another structure to map the numeric Id back to their keys.

An obvious solution would be to make the B+Tree bidirectional - the key
would be recreated by traversing the tree back from a leaf to the root.
The other structure would then need to keep a mapping between the
numeric Id and the reference to the leaf for the key corresponding to
that numeric Id.

But an equally valid solution would be to tag all nodes of the B+Tree
with numbers and store tags to parents in their children. Then in
another structure, e.g. an array, the mapping between the tag and its
node. That would also allow to traverse the tree backward from leafs to
the root.

GrzegorzJ


On 22/09/2017 00:37, Richard A. O'Keefe wrote:
>
>
> On 22/09/17 2:23 AM, Grzegorz Junka wrote:
>> To answer your question Richard, I don't actually need a cyclical
>> structure. As I mentioned, this is a simplification of a more specific
>> problem. I didn't want to describe all the specifics to ask a question.
>> I need to design a tree, a variant of a HAMT tree, but which can be
>> traversed in both directions, from the root to leafs and from leafs to
>> the root (independently, i.e. starting from a reference to a leaf rather
>> than recursively).
>
> Of course this pushes back the question a further step.
> Why do you want to traverse the tree in both directions?
>
> Might the "zipper" approach be any use?
>
> If you want references to leaves, such references could be paths.
> I used that technique once in a programming language that used
> reference counting, so that link cycles meant no collection.
> (Yes, I know modern reference-based collectors have solved that
> problem.)  I've also used that technique for processing XML in
> a pure functional language.
>
> In an imperative context, that doesn't work very well because the
> tree structure might change.  The DOM suffers from trying to make
> this "work" in some sense, but trying to iterate over a graph
> while something else is changing it is never going to be easy.
>
> It is possible that the Haskell "Functional Graph Library"
> may be relevant to your needs.
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Grzegorz Junka

Fair enough, I missed that you can set ordered_set in ETS, but nevertheless it won't work with maps. However, even if I use ETS the Key is still stored twice, isn't? (once as the key for ordered_set, once as the value when the Id is the key).


On 24/09/2017 16:43, Jesper Louis Andersen wrote:
Of course it will work: the ETS table is an ordered_set. This ensures that you have Term order. Now, in Erlang, numbers and thus integers have order less than everything else. In short, min/1 is given by a call to ets:first/1. To get max, we observe that the atom 'x' is not part of your keys, nor part of the integers. So it is larger than any integer. If we ask ets:prev/2 on that 'x', we get max/1.

This covers the range operations on IDs. For the range operations on keys, it is even easier because they already have sorted order. You can also get rather fast traversal by using ets:select/1,2 and using the continuations to walk the data set in larger chunks.

On Sun, Sep 24, 2017 at 6:02 PM Grzegorz Junka <[hidden email]> wrote:

Thanks but that won't work. As mentioned keys must be sorted so that I can quickly query for range from min/max key. I could add an index but in that case the key would be stored twice, once in the index and once in the value, which I wanted to avoid.

GrzegorzJ


On 24/09/2017 15:29, Jesper Louis Andersen wrote:
Consider an ETS table and insert

[{Key, Id}, {Id, Key}]

This ensures fast lookup in both directions. Alternatively, run this in a map. I often do this when mapping monitor references in code.

On Sat, Sep 23, 2017 at 3:50 PM Grzegorz Junka <[hidden email]> wrote:
Zipper looks interesting but I don't think it would suit here. Keeping
paths to leafs could work but since the tree is dynamic those paths
would often change and it would be very difficult to update paths that
have already been stored.

In short, I have keys, which may be any Erlang terms, and numerical Ids
assigned to those terms. Keys must be sorted. Numerical Ids are
increasing monotonically. Then the following lookups should be efficient:

1. Having a key quickly search for its numeric Id
2. Having a numeric Id quickly get back the key

Also the following conditions should be met:

3. The same key should be always assigned the same numeric Id
4. One key has always one numeric Id (and vice versa), so they are
always added or removed together

Since keys can be of any length I don't want to store them more than
once. And since they must be sorted I was thinking about using a B+Tree.
Then another structure to map the numeric Id back to their keys.

An obvious solution would be to make the B+Tree bidirectional - the key
would be recreated by traversing the tree back from a leaf to the root.
The other structure would then need to keep a mapping between the
numeric Id and the reference to the leaf for the key corresponding to
that numeric Id.

But an equally valid solution would be to tag all nodes of the B+Tree
with numbers and store tags to parents in their children. Then in
another structure, e.g. an array, the mapping between the tag and its
node. That would also allow to traverse the tree backward from leafs to
the root.

GrzegorzJ


On 22/09/2017 00:37, Richard A. O'Keefe wrote:
>
>
> On 22/09/17 2:23 AM, Grzegorz Junka wrote:
>> To answer your question Richard, I don't actually need a cyclical
>> structure. As I mentioned, this is a simplification of a more specific
>> problem. I didn't want to describe all the specifics to ask a question.
>> I need to design a tree, a variant of a HAMT tree, but which can be
>> traversed in both directions, from the root to leafs and from leafs to
>> the root (independently, i.e. starting from a reference to a leaf rather
>> than recursively).
>
> Of course this pushes back the question a further step.
> Why do you want to traverse the tree in both directions?
>
> Might the "zipper" approach be any use?
>
> If you want references to leaves, such references could be paths.
> I used that technique once in a programming language that used
> reference counting, so that link cycles meant no collection.
> (Yes, I know modern reference-based collectors have solved that
> problem.)  I've also used that technique for processing XML in
> a pure functional language.
>
> In an imperative context, that doesn't work very well because the
> tree structure might change.  The DOM suffers from trying to make
> this "work" in some sense, but trying to iterate over a graph
> while something else is changing it is never going to be easy.
>
> It is possible that the Haskell "Functional Graph Library"
> may be relevant to your needs.
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Jesper Louis Andersen-2
On Mon, Sep 25, 2017 at 8:35 AM Grzegorz Junka <[hidden email]> wrote:

Fair enough, I missed that you can set ordered_set in ETS, but nevertheless it won't work with maps. However, even if I use ETS the Key is still stored twice, isn't? (once as the key for ordered_set, once as the value when the Id is the key).



It depends. If it is a large binary (larger than 64 characters) it will go on the binary heap once unless you form it again and again in your code. Otherwise it will take up the double amount of space.

We still don't know what the underlying problem statement is. This makes it harder to solve because whenever we come up with a solution, a new constraint is added and we have to adapt.

One advantage of using something like a gb_trees for the above is that you only have the key once in the process heap and everything else will be pointers. It is also possible to use the above scheme with gb_trees. But then again, with ETS you can do key lookup from any process, whereas it has to factor through the tree owner with gb_trees.

There is also the risk your problem is entirely too large to fit in memory at all, but we don't yet know the size of your N and how large of a machine you are willing to scale to, so it is hard to do any napkin math on the size here.

In short, we need more info if we are to come up with a better solution :)

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Grzegorz Junka

On 25/09/2017 15:48, Jesper Louis Andersen wrote:
On Mon, Sep 25, 2017 at 8:35 AM Grzegorz Junka <[hidden email]> wrote:

Fair enough, I missed that you can set ordered_set in ETS, but nevertheless it won't work with maps. However, even if I use ETS the Key is still stored twice, isn't? (once as the key for ordered_set, once as the value when the Id is the key).



It depends. If it is a large binary (larger than 64 characters) it will go on the binary heap once unless you form it again and again in your code. Otherwise it will take up the double amount of space.

I don't think it's that easy. If ordered_set is using B+Tree then the key would be split into segments to annotate nodes of the tree. But the value would be left untouched. Binaries could potentially be reused by referencing segments of the same binary. So if binaries would be reused or not depends on the actual implementation of ordered_set. Also, with ETS the data must be copied between the database and the process. It's possible that the VM will not actually copy the binary but instead will create another reference to it in the process. But this is all valid only for binaries. For any other Erlang term what I wrote earlier would hold. I am not sure I want to rely on a solution with so many unknowns.


We still don't know what the underlying problem statement is. This makes it harder to solve because whenever we come up with a solution, a new constraint is added and we have to adapt.

I did state all the constraints I have:

In short, I have keys, which may be any Erlang terms, and numerical Ids assigned to those terms. Keys must be sorted. Numerical Ids are increasing monotonically. Then the following lookups should be efficient:

1. Having a key quickly search for its numeric Id
2. Having a numeric Id quickly get back the key

Also the following conditions should be met:

3. The same key should be always assigned the same numeric Id
4. One key has always one numeric Id (and vice versa), so they are always added or removed together

Since keys can be of any length I don't want to store them more than once.


One advantage of using something like a gb_trees for the above is that you only have the key once in the process heap and everything else will be pointers. It is also possible to use the above scheme with gb_trees. But then again, with ETS you can do key lookup from any process, whereas it has to factor through the tree owner with gb_trees.

gb_tree would not prevent from having to store the key twice (once as the key for gb_tree and once as the value). Not sure why you mention gb_tree here?


There is also the risk your problem is entirely too large to fit in memory at all, but we don't yet know the size of your N and how large of a machine you are willing to scale to, so it is hard to do any napkin math on the size here.

It's part of a bigger problem and I don't want to be getting into describing it in its entirety. Essentially I want to design a sorted index of terms that can hold billions of entries. For that I want it to be distributed among multiple processes, each process holding a part of the whole index. For now I am just looking into a data structure suitable for implementing one of those processes. ETS would be a valid solution if not that it's opaque - I don't have control over how the data is stored. Let's say that I am investigating alternatives.


In short, we need more info if we are to come up with a better solution :)

That's the whole point. How would you implement an RDF/triple database in Erlang? :)

GrzegorzJ


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Jesper Louis Andersen-2
On Mon, Sep 25, 2017 at 9:16 PM Grzegorz Junka <[hidden email]> wrote:

gb_tree would not prevent from having to store the key twice (once as the key for gb_tree and once as the value). Not sure why you mention gb_tree here?


The reason I mention this is because if we have:

Key = T, % for some arbitrary term T
Id = Horizon, % for some horizon value
Tree2 = gb_trees:enter(Key, Id, Tree1),
Tree3 = gb_trees:enter(Id, Key, Tree2),

then the `Key` is immutable and thus shared. It will in practice be of pointer size (4-8 bytes). This only holds if you are in a gb_trees setting because if we start going to ETS for this, then we obviously might lose the sharing. Perhaps it does work if you insert multiple entries at once.

That's the whole point. How would you implement an RDF/triple database in Erlang? :)


The obvious candidate to beat is Postgres with an appropriately designed table structure and indexes. If we can't beat this, we are not even in game. It is also fairly easy to write, so first, we should use that as a baseline implementation.

If our RDF store is fairly small and can fit in memory, I'd probably just store one ETS table doing a bidrectional ID<->Key mapping as above. And then, depending on query needs store those ID's as {{spo, Subject, Predicate}, Object} or such in ETS. This should allow fairly efficient query, as most of the data resides in memory and lookup is going to be fast. Some data waste is expected in these solutions. It might be beneficial to just ignore the ID mapping for a start and do it naively. This also establishes a model which can be run in unison with other models via QuickCheck so we can check for agreement with a simpler model.

Going to billions of RDF triples will break the above scheme. I'd probably do like the Cayley datastore in Go: store data in eleveldb and steal the Cayley-format. In this format you store everything in one leveldb database where different parts of the storage are mapped by different prefixes (IDs are <<"z", SHA1:160/integer>> for instance. To run fast SPO, OSP, POS ... indexed hashes are stored for these mappings as well. In addition a horizon and history is tracked for each quad so we can answer a query in the past faithfully if needed. Sharding/Batching might come in handy here at a later point.

Leveled trees such as the one leveldb uses tend to be quite efficient compared to B+-trees in many situations. It might be in this case too. Because Cayley uses unique SHA1 hashes for data, we obtain that keys with the same subject will be iterable because they will have the same prefix in that part of the data store. We essentially pay with writes up-front to make the lookups faster. Also note that leveldb will naturally keep often used data in memory if tuned.

TL;DR: either data fits in memory or it doesn't :P


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Richard A. O'Keefe-2
In reply to this post by Grzegorz Junka


On 25/09/17 7:35 PM, Grzegorz Junka wrote:
> Fair enough, I missed that you can set ordered_set in ETS, but
> nevertheless it won't work with maps. However, even if I use ETS the Key
> is still stored twice, isn't? (once as the key for ordered_set, once as
> the value when the Id is the key).

And you care about this why?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Grzegorz Junka

On 25/09/2017 21:06, Richard A. O'Keefe wrote:

>
>
> On 25/09/17 7:35 PM, Grzegorz Junka wrote:
>> Fair enough, I missed that you can set ordered_set in ETS, but
>> nevertheless it won't work with maps. However, even if I use ETS the Key
>> is still stored twice, isn't? (once as the key for ordered_set, once as
>> the value when the Id is the key).
>
> And you care about this why?
> ________________________________

Because I assume that the key can be of any length. So obviously want to
avoid duplication.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Grzegorz Junka
In reply to this post by Jesper Louis Andersen-2

On 25/09/2017 20:53, Jesper Louis Andersen wrote:
On Mon, Sep 25, 2017 at 9:16 PM Grzegorz Junka <[hidden email]> wrote:

gb_tree would not prevent from having to store the key twice (once as the key for gb_tree and once as the value). Not sure why you mention gb_tree here?


The reason I mention this is because if we have:

Key = T, % for some arbitrary term T
Id = Horizon, % for some horizon value
Tree2 = gb_trees:enter(Key, Id, Tree1),
Tree3 = gb_trees:enter(Id, Key, Tree2),

then the `Key` is immutable and thus shared. It will in practice be of pointer size (4-8 bytes). This only holds if you are in a gb_trees setting because if we start going to ETS for this, then we obviously might lose the sharing. Perhaps it does work if you insert multiple entries at once.

That's the whole point. How would you implement an RDF/triple database in Erlang? :)


The obvious candidate to beat is Postgres with an appropriately designed table structure and indexes. If we can't beat this, we are not even in game. It is also fairly easy to write, so first, we should use that as a baseline implementation.

If our RDF store is fairly small and can fit in memory, I'd probably just store one ETS table doing a bidrectional ID<->Key mapping as above. And then, depending on query needs store those ID's as {{spo, Subject, Predicate}, Object} or such in ETS. This should allow fairly efficient query, as most of the data resides in memory and lookup is going to be fast. Some data waste is expected in these solutions. It might be beneficial to just ignore the ID mapping for a start and do it naively. This also establishes a model which can be run in unison with other models via QuickCheck so we can check for agreement with a simpler model.

Going to billions of RDF triples will break the above scheme. I'd probably do like the Cayley datastore in Go: store data in eleveldb and steal the Cayley-format. In this format you store everything in one leveldb database where different parts of the storage are mapped by different prefixes (IDs are <<"z", SHA1:160/integer>> for instance. To run fast SPO, OSP, POS ... indexed hashes are stored for these mappings as well. In addition a horizon and history is tracked for each quad so we can answer a query in the past faithfully if needed. Sharding/Batching might come in handy here at a later point.

Leveled trees such as the one leveldb uses tend to be quite efficient compared to B+-trees in many situations. It might be in this case too. Because Cayley uses unique SHA1 hashes for data, we obtain that keys with the same subject will be iterable because they will have the same prefix in that part of the data store. We essentially pay with writes up-front to make the lookups faster. Also note that leveldb will naturally keep often used data in memory if tuned.

TL;DR: either data fits in memory or it doesn't :P


Yeah, there dozens of possible solutions. Bear in mind that if you implement RDF on top of Postgres then you need to implement SPARQL on top of SQL. This is no longer an Erlang database. You can implement Riak or Couch like database on top of Postgres. The question is, why? The main reason for using Erlang is to be able distribute the database across multiple cores and nodes. Postgres as an SQL database scales very well but up to some point. After a terabyte of data or so it starts breaking down. You can extend the performance with some tricks but that's it.

It's not as much about finding a solution that is good enough (Postgres, ETS) but about designing an algorithm that allows to store and query triples in a way that can scale into thousands of nodes. Erlang is perfect for that because you can start with implementing an algorithm that works with a few thousand of processes on multiple CPU cores on one computer. If that works then the same algorithm can transparently be extended to run those processes across multiple nodes. And if that works, the algorithm can probably run millions of processes on those nodes.

You can compare this problem to a choice between a single big and complex solution (Postgres/SQL or Cray/mainframe) and many small distributed and cooperating nodes (Riak/NoSQL or a mesh of interconnected PCs).

RDF is by design a way of representing data that is most capable of distributing across nodes (distributing table cells rather than whole columns or tables). In such an environment the biggest problem, in my opinion, is to efficiently handle the LIMIT and ORDER query modifiers. Imagine having to sort a billion of RDF triples across a thousand of nodes just to return 10 triples with the least values/objects.

GrzegorzJ



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Tuples referencing each other problem

Richard A. O'Keefe-2
In reply to this post by Grzegorz Junka


On 26/09/17 11:01 AM, Grzegorz Junka wrote:
>
> Because I assume that the key can be of any length. So obviously want to
> avoid duplication.

In Unix, simple file names can typically be up to 255 characters
long.  In two rather large directories I just checked, the
average length was 20 characters in one and 11 in the other,
a long way from the maximum.

So the "obviously" doesn't really apply.  It's only if
many keys *are* rather long that you have a problem.

If you use an in-memory data base such as a gb_tree,
there will be no duplication.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
12