Choice in Distributed Databases for a Key/Value Store

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Choice in Distributed Databases for a Key/Value Store

asdf asdf
Hello everyone,

I am at the point where I have many Erlang nodes, and I am going to have to move to a distributed database. Right now, I am using a basic setup: each Erlang node has a copy of the same Redis DB, and all of those DBs are slaves(non-writable copies) of a master. A big problem with this is obvious - If the db goes down, the node goes down. If the master goes down, the slaves won’t get updated, so I would like to move to a distributed db that all of my nodes can read/write to that can not/does not go down.

The nodes do ~50 reads per write, and are constantly reading, so read speed and consistency is my real concern. I believe this will be the node’s main speed factor.

Another thing is that all of my data is key/key/value , so it would mimic the structure of ID -> name -> “Fred”, ID->age->20, so I don’t need a SQL DB.

A big thing also is that I don’t need disc copies, as a I have a large backup store where the values are generated from.

I have looked at as many options as I can ->

- looks perfect, but there are 0 resources on learning how to use it outside of their docs and no Erlang driver, which is huge because I would both have to learn how to write a c driver and everything about this just to get it to work. 

- looks good too, but apparently there is a small community and apparently isn’t updated often

- Looks very very cool, seems great, but there is 0 active community and their GitHub isn’t updated often. This is a distributed all in-memory database, written in Erlang.


So from my research, which consisted heavily of this blog:https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores , I have narrowed it down to these three.

BUT you are all the real experts and have built huge applications in Erlang, what do you use? What do you have experience in that performs well with Erlang nodes spread across multiple machines and possibly multiple data centers?

Thanks for your time.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Choice in Distributed Databases for a Key/Value Store

Phillip Toland
I have some experience with Cassandra, and based on your description of your needs it sounds like Cassandra would be overkill. I think you could make it work, but it may not be worth the effort.

Cassandra uses a ring architecture similar to Riak. Reads and writes happen on multiple nodes and the entire transaction doesn't succeed until it has succeeded on a certain number of nodes. For your read-heavy workload you would probably want to set that value higher for writes and lower for reads. For example, you might say that a write happens on 3 nodes and succeeds when it is successful on all of those nodes. You might feel safe then saying that a read only needs to succeed on one node to be successful. This is something you would definitely want to test thoroughly so that you understand the performance tradeoffs of changing those values.

Also, recent versions of Cassandra have adopted a table model and query language (CQL) that are superficially similar to RDBMS tables and SQL, but are actually completely different. This led to a lot of cognitive dissonance for my team as we would do things that made sense for an RDBMS, and that we could express in CQL, but were totally the wrong thing to do for Cassandra's architecture.

Personally, I would look at Riak before Cassandra if you think that the ring architecture makes sense for you. Because it doesn't have the trappings of tables and a SQL-like language, we found it much more straightforward to reason about the strengths and limitations of the system. It is very much a straightforward key/value data store.

However, you mentioned read consistency being important, and Cassandra and Riak both trade off read consistency for availability. They are "eventually consistent" systems (https://en.wikipedia.org/wiki/Eventual_consistency). I suggest you read up on the CAP theorem (https://en.wikipedia.org/wiki/CAP_theorem) and decide what tradeoffs you are willing to make before choosing a database. 

Good luck!

~phil

On September 15, 2017 at 11:43:52 AM, code wiget ([hidden email]) wrote:

Hello everyone,

I am at the point where I have many Erlang nodes, and I am going to have to move to a distributed database. Right now, I am using a basic setup: each Erlang node has a copy of the same Redis DB, and all of those DBs are slaves(non-writable copies) of a master. A big problem with this is obvious - If the db goes down, the node goes down. If the master goes down, the slaves won’t get updated, so I would like to move to a distributed db that all of my nodes can read/write to that can not/does not go down.

The nodes do ~50 reads per write, and are constantly reading, so read speed and consistency is my real concern. I believe this will be the node’s main speed factor.

Another thing is that all of my data is key/key/value , so it would mimic the structure of ID -> name -> “Fred”, ID->age->20, so I don’t need a SQL DB.

A big thing also is that I don’t need disc copies, as a I have a large backup store where the values are generated from.

I have looked at as many options as I can ->

- looks perfect, but there are 0 resources on learning how to use it outside of their docs and no Erlang driver, which is huge because I would both have to learn how to write a c driver and everything about this just to get it to work. 

- looks good too, but apparently there is a small community and apparently isn’t updated often

- Looks very very cool, seems great, but there is 0 active community and their GitHub isn’t updated often. This is a distributed all in-memory database, written in Erlang.


So from my research, which consisted heavily of this blog:https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores , I have narrowed it down to these three.

BUT you are all the real experts and have built huge applications in Erlang, what do you use? What do you have experience in that performs well with Erlang nodes spread across multiple machines and possibly multiple data centers?

Thanks for your time.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Choice in Distributed Databases for a Key/Value Store

Heinz Nikolaus Gies-2
In reply to this post by asdf asdf
Have you considered Risk? It’s a distributed erlang nosql/k/v store.
On 15. Sep 2017, at 18:43, code wiget <[hidden email]> wrote:

Hello everyone,

I am at the point where I have many Erlang nodes, and I am going to have to move to a distributed database. Right now, I am using a basic setup: each Erlang node has a copy of the same Redis DB, and all of those DBs are slaves(non-writable copies) of a master. A big problem with this is obvious - If the db goes down, the node goes down. If the master goes down, the slaves won’t get updated, so I would like to move to a distributed db that all of my nodes can read/write to that can not/does not go down.

The nodes do ~50 reads per write, and are constantly reading, so read speed and consistency is my real concern. I believe this will be the node’s main speed factor.

Another thing is that all of my data is key/key/value , so it would mimic the structure of ID -> name -> “Fred”, ID->age->20, so I don’t need a SQL DB.

A big thing also is that I don’t need disc copies, as a I have a large backup store where the values are generated from.

I have looked at as many options as I can ->

- looks perfect, but there are 0 resources on learning how to use it outside of their docs and no Erlang driver, which is huge because I would both have to learn how to write a c driver and everything about this just to get it to work. 

- looks good too, but apparently there is a small community and apparently isn’t updated often

- Looks very very cool, seems great, but there is 0 active community and their GitHub isn’t updated often. This is a distributed all in-memory database, written in Erlang.


So from my research, which consisted heavily of this blog:https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores , I have narrowed it down to these three.

BUT you are all the real experts and have built huge applications in Erlang, what do you use? What do you have experience in that performs well with Erlang nodes spread across multiple machines and possibly multiple data centers?

Thanks for your time.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Choice in Distributed Databases for a Key/Value Store

Bence Golda
...or Barrel-DB: https://barrel-db.org/ ?

BR, Bence

On 09/15/2017 19:48, Heinz N. Gies wrote:

> Have you considered Risk? It’s a distributed erlang nosql/k/v store.
>> On 15. Sep 2017, at 18:43, code wiget <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> Hello everyone,
>>
>> I am at the point where I have many Erlang nodes, and I am going to
>> have to move to a distributed database. Right now, I am using a basic
>> setup: each Erlang node has a copy of the same Redis DB, and all of
>> those DBs are slaves(non-writable copies) of a master. A big problem
>> with this is obvious - If the db goes down, the node goes down. If the
>> master goes down, the slaves won’t get updated, so I would like to
>> move to a distributed db that all of my nodes can read/write to that
>> can not/does not go down.
>>
>> The nodes do ~50 reads per write, and are constantly reading, so read
>> speed and consistency is my real concern. I believe this will be the
>> node’s main speed factor.
>>
>> Another thing is that all of my data is key/key/value , so it would
>> mimic the structure of ID -> name -> “Fred”, ID->age->20, so I don’t
>> need a SQL DB.
>>
>> A big thing also is that I don’t need disc copies, as a I have a large
>> backup store where the values are generated from.
>>
>> I have looked at as many options as I can ->
>>
>> Voldemort : http://project-voldemort.com/ 
>> - looks perfect, but there are 0 resources on learning how to use it
>> outside of their docs and no Erlang driver, which is huge because I
>> would both have to learn how to write a c driver and everything about
>> this just to get it to work.
>>
>> Cassandra: http://cassandra.apache.org/
>> - looks good too, but apparently there is a small community and
>> apparently isn’t updated often
>>
>> Scalaris: https://github.com/scalaris-team/scalaris/blob/master/user-dev-guide/main.pdf
>> - Looks very very cool, seems great, but there is 0 active community
>> and their GitHub isn’t updated often. This is a distributed all
>> in-memory database, written in Erlang.
>>
>>
>> So from my research, which consisted heavily of this
>> blog:https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores ,
>> I have narrowed it down to these three.
>>
>> BUT you are all the real experts and have built huge applications in
>> Erlang, what do you use? What do you have experience in that performs
>> well with Erlang nodes spread across multiple machines and possibly
>> multiple data centers?
>>
>> Thanks for your time.
>>
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email] <mailto:[hidden email]>
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Choice in Distributed Databases for a Key/Value Store

Nathaniel Waisbrot
In reply to this post by asdf asdf
Scatter-shot reply:

Since you're using Redis right now, have you considered Redis Cluster (https://redis.io/topics/cluster-tutorial)?

I'm using Cassandra and don't feel that it's got a small community or slow pace of updates. There are a lot of NoSQL databases and they all have quite different tradeoffs which tends to fragment the community, so your expectations may be too high.

Riak, ElasticSearch, EtcD, MongoDB, etc. You have many (too many!) options. When you say "read speed and consistency" what sort of consistency are you looking for? Is eventual consistency good, or do you require that every read that takes place after a write gets the new data?




On Sep 15, 2017, at 12:43 PM, code wiget <[hidden email]> wrote:

Hello everyone,

I am at the point where I have many Erlang nodes, and I am going to have to move to a distributed database. Right now, I am using a basic setup: each Erlang node has a copy of the same Redis DB, and all of those DBs are slaves(non-writable copies) of a master. A big problem with this is obvious - If the db goes down, the node goes down. If the master goes down, the slaves won’t get updated, so I would like to move to a distributed db that all of my nodes can read/write to that can not/does not go down.

The nodes do ~50 reads per write, and are constantly reading, so read speed and consistency is my real concern. I believe this will be the node’s main speed factor.

Another thing is that all of my data is key/key/value , so it would mimic the structure of ID -> name -> “Fred”, ID->age->20, so I don’t need a SQL DB.

A big thing also is that I don’t need disc copies, as a I have a large backup store where the values are generated from.

I have looked at as many options as I can ->

- looks perfect, but there are 0 resources on learning how to use it outside of their docs and no Erlang driver, which is huge because I would both have to learn how to write a c driver and everything about this just to get it to work. 

- looks good too, but apparently there is a small community and apparently isn’t updated often

- Looks very very cool, seems great, but there is 0 active community and their GitHub isn’t updated often. This is a distributed all in-memory database, written in Erlang.


So from my research, which consisted heavily of this blog:https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores , I have narrowed it down to these three.

BUT you are all the real experts and have built huge applications in Erlang, what do you use? What do you have experience in that performs well with Erlang nodes spread across multiple machines and possibly multiple data centers?

Thanks for your time.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Choice in Distributed Databases for a Key/Value Store

asdf asdf
HI,

Thank you all for your replies.

Nathaniel: The reads must be 'eventually' consistent, at least within a second. The problem is that it updates user connection information, and they will be unable to connect if our read does not get information from the write. So if we update, the connection before the write is fully committed will fail. I suppose it is ok if they cannot connect and just have to reconnect, but ideally they should be able to connect every time.

So Riak seems like a great solution, but speed wise really worries me. We are trying to connect as many clients as possible per server, this is very important as it saves us money. If the reads take 2-3x as long, this could be very slow and bad. According to this article: https://github.com/citrusbyte/redis-comparison, Riak is up to 10x slower than Redis. This would really hurt our operations.

To those who commented redis-cluster, my problem with a cluster solution is that redis-cluster seemed to be in an experimental stage. It also has the problem where if all copies of a node die, then the cluster will lose all that data and it is up to the user to not lose that data. All of this has to be handled by the user, and this seems like it will get tedious when there are multiple nodes and all it would take is for one admin to mess it up.

So this is where Aerospike comes in. Reading about them on the web they come off as the perfect tool for a version of redis that is distributed: https://stackoverflow.com/questions/24482337/how-is-aerospike-different-from-other-key-value-nosql-databases . But for some reason, they don’t get as much attention as redis

Does anyone have experience with Aerospike? For my application, it seems like a no brainer.

Thank you all again,
On Sep 15, 2017, at 2:02 PM, Nathaniel Waisbrot <[hidden email]> wrote:

Scatter-shot reply:

Since you're using Redis right now, have you considered Redis Cluster (https://redis.io/topics/cluster-tutorial)?

I'm using Cassandra and don't feel that it's got a small community or slow pace of updates. There are a lot of NoSQL databases and they all have quite different tradeoffs which tends to fragment the community, so your expectations may be too high.

Riak, ElasticSearch, EtcD, MongoDB, etc. You have many (too many!) options. When you say "read speed and consistency" what sort of consistency are you looking for? Is eventual consistency good, or do you require that every read that takes place after a write gets the new data?




On Sep 15, 2017, at 12:43 PM, code wiget <[hidden email]> wrote:

Hello everyone,

I am at the point where I have many Erlang nodes, and I am going to have to move to a distributed database. Right now, I am using a basic setup: each Erlang node has a copy of the same Redis DB, and all of those DBs are slaves(non-writable copies) of a master. A big problem with this is obvious - If the db goes down, the node goes down. If the master goes down, the slaves won’t get updated, so I would like to move to a distributed db that all of my nodes can read/write to that can not/does not go down.

The nodes do ~50 reads per write, and are constantly reading, so read speed and consistency is my real concern. I believe this will be the node’s main speed factor.

Another thing is that all of my data is key/key/value , so it would mimic the structure of ID -> name -> “Fred”, ID->age->20, so I don’t need a SQL DB.

A big thing also is that I don’t need disc copies, as a I have a large backup store where the values are generated from.

I have looked at as many options as I can ->

- looks perfect, but there are 0 resources on learning how to use it outside of their docs and no Erlang driver, which is huge because I would both have to learn how to write a c driver and everything about this just to get it to work. 

- looks good too, but apparently there is a small community and apparently isn’t updated often

- Looks very very cool, seems great, but there is 0 active community and their GitHub isn’t updated often. This is a distributed all in-memory database, written in Erlang.


So from my research, which consisted heavily of this blog:https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores , I have narrowed it down to these three.

BUT you are all the real experts and have built huge applications in Erlang, what do you use? What do you have experience in that performs well with Erlang nodes spread across multiple machines and possibly multiple data centers?

Thanks for your time.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Choice in Distributed Databases for a Key/Value Store

Heinz Nikolaus Gies-2
I would not give too much on those ‘benchmarks’, they’re highly bogus and that’s if you’re treating them kindly.

For a starter it uses default settings and they are not even provided. Redis is a in memory store by default, is it even saving the data? How are risk or Cassandra set up, unlike mongo or redis the others those are build to be clustered, are the default configs used for them disabling unless overhead? Does it mean risk, that is storing every write on disks, perhaps 3 times, is only 10x slower compared to a database that never writes to disk and only keeps one copy?

For you own sanity, print that benchmark, find a burn proof area (safety matters!) and set it on fire then move on and benchmark for yourself with a real use case and sensible data.


On 18. Sep 2017, at 17:34, code wiget <[hidden email]> wrote:

HI,

Thank you all for your replies.

Nathaniel: The reads must be 'eventually' consistent, at least within a second. The problem is that it updates user connection information, and they will be unable to connect if our read does not get information from the write. So if we update, the connection before the write is fully committed will fail. I suppose it is ok if they cannot connect and just have to reconnect, but ideally they should be able to connect every time.

So Riak seems like a great solution, but speed wise really worries me. We are trying to connect as many clients as possible per server, this is very important as it saves us money. If the reads take 2-3x as long, this could be very slow and bad. According to this article: https://github.com/citrusbyte/redis-comparison, Riak is up to 10x slower than Redis. This would really hurt our operations.

To those who commented redis-cluster, my problem with a cluster solution is that redis-cluster seemed to be in an experimental stage. It also has the problem where if all copies of a node die, then the cluster will lose all that data and it is up to the user to not lose that data. All of this has to be handled by the user, and this seems like it will get tedious when there are multiple nodes and all it would take is for one admin to mess it up.

So this is where Aerospike comes in. Reading about them on the web they come off as the perfect tool for a version of redis that is distributed: https://stackoverflow.com/questions/24482337/how-is-aerospike-different-from-other-key-value-nosql-databases . But for some reason, they don’t get as much attention as redis

Does anyone have experience with Aerospike? For my application, it seems like a no brainer.

Thank you all again,
On Sep 15, 2017, at 2:02 PM, Nathaniel Waisbrot <[hidden email]> wrote:

Scatter-shot reply:

Since you're using Redis right now, have you considered Redis Cluster (https://redis.io/topics/cluster-tutorial)?

I'm using Cassandra and don't feel that it's got a small community or slow pace of updates. There are a lot of NoSQL databases and they all have quite different tradeoffs which tends to fragment the community, so your expectations may be too high.

Riak, ElasticSearch, EtcD, MongoDB, etc. You have many (too many!) options. When you say "read speed and consistency" what sort of consistency are you looking for? Is eventual consistency good, or do you require that every read that takes place after a write gets the new data?




On Sep 15, 2017, at 12:43 PM, code wiget <[hidden email]> wrote:

Hello everyone,

I am at the point where I have many Erlang nodes, and I am going to have to move to a distributed database. Right now, I am using a basic setup: each Erlang node has a copy of the same Redis DB, and all of those DBs are slaves(non-writable copies) of a master. A big problem with this is obvious - If the db goes down, the node goes down. If the master goes down, the slaves won’t get updated, so I would like to move to a distributed db that all of my nodes can read/write to that can not/does not go down.

The nodes do ~50 reads per write, and are constantly reading, so read speed and consistency is my real concern. I believe this will be the node’s main speed factor.

Another thing is that all of my data is key/key/value , so it would mimic the structure of ID -> name -> “Fred”, ID->age->20, so I don’t need a SQL DB.

A big thing also is that I don’t need disc copies, as a I have a large backup store where the values are generated from.

I have looked at as many options as I can ->

- looks perfect, but there are 0 resources on learning how to use it outside of their docs and no Erlang driver, which is huge because I would both have to learn how to write a c driver and everything about this just to get it to work. 

- looks good too, but apparently there is a small community and apparently isn’t updated often

- Looks very very cool, seems great, but there is 0 active community and their GitHub isn’t updated often. This is a distributed all in-memory database, written in Erlang.


So from my research, which consisted heavily of this blog:https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores , I have narrowed it down to these three.

BUT you are all the real experts and have built huge applications in Erlang, what do you use? What do you have experience in that performs well with Erlang nodes spread across multiple machines and possibly multiple data centers?

Thanks for your time.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Choice in Distributed Databases for a Key/Value Store

Paul Oliver-2
I'd recommend checking jepsen.io for testing of distributed systems. There's a very thorough review of Aerospike there with some results that may give you pause. https://aphyr.com/posts/324-jepsen-aerospike

On Tue, Sep 19, 2017 at 4:54 AM Heinz N. Gies <[hidden email]> wrote:
I would not give too much on those ‘benchmarks’, they’re highly bogus and that’s if you’re treating them kindly.

For a starter it uses default settings and they are not even provided. Redis is a in memory store by default, is it even saving the data? How are risk or Cassandra set up, unlike mongo or redis the others those are build to be clustered, are the default configs used for them disabling unless overhead? Does it mean risk, that is storing every write on disks, perhaps 3 times, is only 10x slower compared to a database that never writes to disk and only keeps one copy?

For you own sanity, print that benchmark, find a burn proof area (safety matters!) and set it on fire then move on and benchmark for yourself with a real use case and sensible data.


On 18. Sep 2017, at 17:34, code wiget <[hidden email]> wrote:

HI,

Thank you all for your replies.

Nathaniel: The reads must be 'eventually' consistent, at least within a second. The problem is that it updates user connection information, and they will be unable to connect if our read does not get information from the write. So if we update, the connection before the write is fully committed will fail. I suppose it is ok if they cannot connect and just have to reconnect, but ideally they should be able to connect every time.

So Riak seems like a great solution, but speed wise really worries me. We are trying to connect as many clients as possible per server, this is very important as it saves us money. If the reads take 2-3x as long, this could be very slow and bad. According to this article: https://github.com/citrusbyte/redis-comparison, Riak is up to 10x slower than Redis. This would really hurt our operations.

To those who commented redis-cluster, my problem with a cluster solution is that redis-cluster seemed to be in an experimental stage. It also has the problem where if all copies of a node die, then the cluster will lose all that data and it is up to the user to not lose that data. All of this has to be handled by the user, and this seems like it will get tedious when there are multiple nodes and all it would take is for one admin to mess it up.

So this is where Aerospike comes in. Reading about them on the web they come off as the perfect tool for a version of redis that is distributed: https://stackoverflow.com/questions/24482337/how-is-aerospike-different-from-other-key-value-nosql-databases . But for some reason, they don’t get as much attention as redis

Does anyone have experience with Aerospike? For my application, it seems like a no brainer.

Thank you all again,
On Sep 15, 2017, at 2:02 PM, Nathaniel Waisbrot <[hidden email]> wrote:

Scatter-shot reply:

Since you're using Redis right now, have you considered Redis Cluster (https://redis.io/topics/cluster-tutorial)?

I'm using Cassandra and don't feel that it's got a small community or slow pace of updates. There are a lot of NoSQL databases and they all have quite different tradeoffs which tends to fragment the community, so your expectations may be too high.

Riak, ElasticSearch, EtcD, MongoDB, etc. You have many (too many!) options. When you say "read speed and consistency" what sort of consistency are you looking for? Is eventual consistency good, or do you require that every read that takes place after a write gets the new data?




On Sep 15, 2017, at 12:43 PM, code wiget <[hidden email]> wrote:

Hello everyone,

I am at the point where I have many Erlang nodes, and I am going to have to move to a distributed database. Right now, I am using a basic setup: each Erlang node has a copy of the same Redis DB, and all of those DBs are slaves(non-writable copies) of a master. A big problem with this is obvious - If the db goes down, the node goes down. If the master goes down, the slaves won’t get updated, so I would like to move to a distributed db that all of my nodes can read/write to that can not/does not go down.

The nodes do ~50 reads per write, and are constantly reading, so read speed and consistency is my real concern. I believe this will be the node’s main speed factor.

Another thing is that all of my data is key/key/value , so it would mimic the structure of ID -> name -> “Fred”, ID->age->20, so I don’t need a SQL DB.

A big thing also is that I don’t need disc copies, as a I have a large backup store where the values are generated from.

I have looked at as many options as I can ->

- looks perfect, but there are 0 resources on learning how to use it outside of their docs and no Erlang driver, which is huge because I would both have to learn how to write a c driver and everything about this just to get it to work. 

- looks good too, but apparently there is a small community and apparently isn’t updated often

- Looks very very cool, seems great, but there is 0 active community and their GitHub isn’t updated often. This is a distributed all in-memory database, written in Erlang.


So from my research, which consisted heavily of this blog:https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores , I have narrowed it down to these three.

BUT you are all the real experts and have built huge applications in Erlang, what do you use? What do you have experience in that performs well with Erlang nodes spread across multiple machines and possibly multiple data centers?

Thanks for your time.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Choice in Distributed Databases for a Key/Value Store

Ulf Wiger-2
O.T.

Hehe, that Aerospike review mentions that they claim 100% uptime (with seemingly undisclosed precision).

«This makes Aerospike’s uptime infinitely better than the Ericsson AXD301 switch, which delivered somewhere between five and nine nines of availability in a system comprised of 1.1 million lines of Erlang.»

(Never mind that the claim that 100% would be "infinitely better" than e.g. 99.999% is ludicrous in itself. I assume it was made tongue-in-cheek.)

Of course, this is ridiculous, as also the in-depth review demonstrates. Uptime figures are only useful in context, and comparing uptime claims of systems with different purposes is not particularly meaningful. For example, for the AXD 301, disturbance would be registered as downtime if one network interface (of potentially hundreds) was unable to process calls for 15 seconds (which is, BTW, what happened in the "9 nines" case: a restart of a single device board.) Is a database cluster still "up" if it stays responsive but has lost your data?

When claiming availability, you have to be very specific.

The article also mentions cluster sizes of "between 1 and 100 nodes". Guaranteeing 100% uptime in a 1-node 'cluster' is simply not possible.

Anyway, that article is two years old. Today, Aerospike claims "demonstrated uptime of five 9s". Still very good, but I guess "infinitely worse" than it used to be. ;-) (http://www.aerospike.com/benefits/high-availability/)

BR,
Ulf W

PS This is not to defend the "9 nines" claim, which was never officially made by Ericsson. It was made in a press release by British Telecom. Ericsson doesn't divulge the actual uptime figures of its systems, but at least at one time it was ok to claim publicly that the average recorded field uptime of AXD 301 systems was "better than 5 nines".

2017-09-19 1:00 GMT+02:00 Paul Oliver <[hidden email]>:
I'd recommend checking jepsen.io for testing of distributed systems. There's a very thorough review of Aerospike there with some results that may give you pause. https://aphyr.com/posts/324-jepsen-aerospike

On Tue, Sep 19, 2017 at 4:54 AM Heinz N. Gies <[hidden email]> wrote:
I would not give too much on those ‘benchmarks’, they’re highly bogus and that’s if you’re treating them kindly.

For a starter it uses default settings and they are not even provided. Redis is a in memory store by default, is it even saving the data? How are risk or Cassandra set up, unlike mongo or redis the others those are build to be clustered, are the default configs used for them disabling unless overhead? Does it mean risk, that is storing every write on disks, perhaps 3 times, is only 10x slower compared to a database that never writes to disk and only keeps one copy?

For you own sanity, print that benchmark, find a burn proof area (safety matters!) and set it on fire then move on and benchmark for yourself with a real use case and sensible data.


On 18. Sep 2017, at 17:34, code wiget <[hidden email]> wrote:

HI,

Thank you all for your replies.

Nathaniel: The reads must be 'eventually' consistent, at least within a second. The problem is that it updates user connection information, and they will be unable to connect if our read does not get information from the write. So if we update, the connection before the write is fully committed will fail. I suppose it is ok if they cannot connect and just have to reconnect, but ideally they should be able to connect every time.

So Riak seems like a great solution, but speed wise really worries me. We are trying to connect as many clients as possible per server, this is very important as it saves us money. If the reads take 2-3x as long, this could be very slow and bad. According to this article: https://github.com/citrusbyte/redis-comparison, Riak is up to 10x slower than Redis. This would really hurt our operations.

To those who commented redis-cluster, my problem with a cluster solution is that redis-cluster seemed to be in an experimental stage. It also has the problem where if all copies of a node die, then the cluster will lose all that data and it is up to the user to not lose that data. All of this has to be handled by the user, and this seems like it will get tedious when there are multiple nodes and all it would take is for one admin to mess it up.

So this is where Aerospike comes in. Reading about them on the web they come off as the perfect tool for a version of redis that is distributed: https://stackoverflow.com/questions/24482337/how-is-aerospike-different-from-other-key-value-nosql-databases . But for some reason, they don’t get as much attention as redis

Does anyone have experience with Aerospike? For my application, it seems like a no brainer.

Thank you all again,
On Sep 15, 2017, at 2:02 PM, Nathaniel Waisbrot <[hidden email]> wrote:

Scatter-shot reply:

Since you're using Redis right now, have you considered Redis Cluster (https://redis.io/topics/cluster-tutorial)?

I'm using Cassandra and don't feel that it's got a small community or slow pace of updates. There are a lot of NoSQL databases and they all have quite different tradeoffs which tends to fragment the community, so your expectations may be too high.

Riak, ElasticSearch, EtcD, MongoDB, etc. You have many (too many!) options. When you say "read speed and consistency" what sort of consistency are you looking for? Is eventual consistency good, or do you require that every read that takes place after a write gets the new data?




On Sep 15, 2017, at 12:43 PM, code wiget <[hidden email]> wrote:

Hello everyone,

I am at the point where I have many Erlang nodes, and I am going to have to move to a distributed database. Right now, I am using a basic setup: each Erlang node has a copy of the same Redis DB, and all of those DBs are slaves(non-writable copies) of a master. A big problem with this is obvious - If the db goes down, the node goes down. If the master goes down, the slaves won’t get updated, so I would like to move to a distributed db that all of my nodes can read/write to that can not/does not go down.

The nodes do ~50 reads per write, and are constantly reading, so read speed and consistency is my real concern. I believe this will be the node’s main speed factor.

Another thing is that all of my data is key/key/value , so it would mimic the structure of ID -> name -> “Fred”, ID->age->20, so I don’t need a SQL DB.

A big thing also is that I don’t need disc copies, as a I have a large backup store where the values are generated from.

I have looked at as many options as I can ->

- looks perfect, but there are 0 resources on learning how to use it outside of their docs and no Erlang driver, which is huge because I would both have to learn how to write a c driver and everything about this just to get it to work. 

- looks good too, but apparently there is a small community and apparently isn’t updated often

- Looks very very cool, seems great, but there is 0 active community and their GitHub isn’t updated often. This is a distributed all in-memory database, written in Erlang.


So from my research, which consisted heavily of this blog:https://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores , I have narrowed it down to these three.

BUT you are all the real experts and have built huge applications in Erlang, what do you use? What do you have experience in that performs well with Erlang nodes spread across multiple machines and possibly multiple data centers?

Thanks for your time.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions