mnesia -- a naive question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

mnesia -- a naive question

Lloyd R. Prentice-2
Hello,

Wasabi is a new cloud storage service that promotes lower storage costs and greater speed than Amazon S3:

https://wasabi.com/

During the dev phase I'm running mnesia on the back-end of my current web project. I much like the seamless way that mnesia integrates into Erlang as well as its replication feature. But folks have warned about the hassles of mnesia net splits.

Problem is that I have no operations experience to objectively weigh options. But I do want to bridge over all points of failure as cost-and-time-effectively as possible.

So, my question is if and how I can integrate Wasabi (or Amazon S3 for that matter) into my operation to significantly reduce the probability of data loss?


Many thanks,

LRP



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: mnesia -- a naive question

Jesper Louis Andersen-2
A couple of points:

* Mnesia protects you against the scenario where one of your nodes fail. It doesn't automatically protect you against the network splitting, and requires some manual recovery on the flip side of such an event. For rather small clusters, this is manageable by manual operation. Larger systems will be far harder to maintain because the risk of netsplits and node loss goes up whenever you add a new node.

* I don't know about Wasabi, but Amazon's EC2 nodes are ephemeral in the sense they can go away at a moments notice. And when this happens, the data on the node is gone. Thus, to achieve persistent storage, you must either store data off the EC2 node, presumably in S3, RDS, DynamoDB and so on. Or use an EBS volume, attached to the EC2 node to provide persistent disk space (on which your mnesia database can reside).

* The game is all about risk mitigation. If you regularly take a mnesia backup and store it into S3, or something like it, you can get speedy recovery to that point in time should the accident happen. If you want better point-in-time-recovery, you can try running two mnesia nodes, but you need to heed two important caveats:
    - You probably want your nodes to run in different zones so a failure in one zone doesn't take down everything.
    - Amazons network is brittle and likely to drop connections which are seen as netsplits.

* Mnesia mitigates risk by assuming the nodes are fairly robust and stable, as well as the network between them. If you buy good expensive hardware, this is a likely assumption and the noise of error will be low. So manual intervention in the case of an error is probably what is needed anyway (to fix the faulty hardware as well).

* Amazon and other leased environments tend to have brittle network connections and flaky machines. To mitigate this, your system must make no assumptions about stability and handle this up front. Mnesia wasn't really built to work in such an environment.



On Sat, Jul 29, 2017 at 10:23 PM <[hidden email]> wrote:
Hello,

Wasabi is a new cloud storage service that promotes lower storage costs and greater speed than Amazon S3:

https://wasabi.com/

During the dev phase I'm running mnesia on the back-end of my current web project. I much like the seamless way that mnesia integrates into Erlang as well as its replication feature. But folks have warned about the hassles of mnesia net splits.

Problem is that I have no operations experience to objectively weigh options. But I do want to bridge over all points of failure as cost-and-time-effectively as possible.

So, my question is if and how I can integrate Wasabi (or Amazon S3 for that matter) into my operation to significantly reduce the probability of data loss?


Many thanks,

LRP



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: mnesia -- a naive question

Lloyd R. Prentice-2
Hi Jesper,

Your points are reassuring. Thank you.

Wasabi promotes their site as 6x faster and 1/5th the cost of Amazon S3. In the spirit of due diligence my next steps are:

1. Do upload/recovery tests with large files to see minimal likely time for recovery
2. Visit Wasabi to check them out. They're in Boston so easy to do
3. For dev/testing/very early production I'm thinking of hosting two or maybe three Erlang Nitrogen + mnesia servers in house
4. See if I can come up with a script to detect outage and initiate recovery
5. This doesn't address replication across Zones, but one step at a time

I had been considering Riak KV, but this seems easier to implement with less overhead.

I still have many questions. But I'm months from actual beta launch, so this plan at least provides a starting point for critique and refinement.

Wish me luck.

All the best,

Lloyd

-----Original Message-----
From: "Jesper Louis Andersen" <[hidden email]>
Sent: Sunday, July 30, 2017 9:13am
To: [hidden email], "Erlang" <[hidden email]>
Subject: Re: [erlang-questions] mnesia -- a naive question

A couple of points:

* Mnesia protects you against the scenario where one of your nodes fail. It
doesn't automatically protect you against the network splitting, and
requires some manual recovery on the flip side of such an event. For rather
small clusters, this is manageable by manual operation. Larger systems will
be far harder to maintain because the risk of netsplits and node loss goes
up whenever you add a new node.

* I don't know about Wasabi, but Amazon's EC2 nodes are ephemeral in the
sense they can go away at a moments notice. And when this happens, the data
on the node is gone. Thus, to achieve persistent storage, you must either
store data off the EC2 node, presumably in S3, RDS, DynamoDB and so on. Or
use an EBS volume, attached to the EC2 node to provide persistent disk
space (on which your mnesia database can reside).

* The game is all about risk mitigation. If you regularly take a mnesia
backup and store it into S3, or something like it, you can get speedy
recovery to that point in time should the accident happen. If you want
better point-in-time-recovery, you can try running two mnesia nodes, but
you need to heed two important caveats:
    - You probably want your nodes to run in different zones so a failure
in one zone doesn't take down everything.
    - Amazons network is brittle and likely to drop connections which are
seen as netsplits.

* Mnesia mitigates risk by assuming the nodes are fairly robust and stable,
as well as the network between them. If you buy good expensive hardware,
this is a likely assumption and the noise of error will be low. So manual
intervention in the case of an error is probably what is needed anyway (to
fix the faulty hardware as well).

* Amazon and other leased environments tend to have brittle network
connections and flaky machines. To mitigate this, your system must make no
assumptions about stability and handle this up front. Mnesia wasn't really
built to work in such an environment.



On Sat, Jul 29, 2017 at 10:23 PM <[hidden email]> wrote:

> Hello,
>
> Wasabi is a new cloud storage service that promotes lower storage costs
> and greater speed than Amazon S3:
>
> https://wasabi.com/
>
> During the dev phase I'm running mnesia on the back-end of my current web
> project. I much like the seamless way that mnesia integrates into Erlang as
> well as its replication feature. But folks have warned about the hassles of
> mnesia net splits.
>
> Problem is that I have no operations experience to objectively weigh
> options. But I do want to bridge over all points of failure as
> cost-and-time-effectively as possible.
>
> So, my question is if and how I can integrate Wasabi (or Amazon S3 for
> that matter) into my operation to significantly reduce the probability of
> data loss?
>
>
> Many thanks,
>
> LRP
>
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: mnesia -- a naive question

Jesper Louis Andersen-2
I often recommend people to crawl before walk before run before fly before teleport.

Mnesia is a fine choice due to the low impedance. Just store Erlang terms and you are up and running. As long as you are trying to validate a product or a solution, it is more important to move fast than it is to worry too much about operational problems. The reason is that your data size is likely to be small and thus it is fairly easy to just restore everything.

Once you are more established and have a valid proof-of-concept, you can start looking into a solution that has better durability and resilience. The key aspect is to design your system with this change and extension in mind: if you plan on using something like Riak, which is AP and has no transactions, your current solution shouldn't rely too much on those kinds of things. A Postgresql instance is likely to work fine up to a couple dozen terabytes as well.

On the other hand: Mnesia seems to have served Klarna well. And their business is likely to be far larger than yours for the coming years. So perhaps one can scale a Mnesia based system somewhat easily while keeping the system operational.

A key observation is that a modern server is so friggin' large we cut them up into small pieces and leased out small pieces as virtual machines: most systems doesn't need a full machine anymore. But it also means that vertical scaling is likely to work up to a point that is far greater than earlier on.

As for operations: almost all of Google's SRE handbook is worth studying. In this particular case, you want to have a target availability set before you deploy the system. Are you going for 99.9% uptime over 3 months, or more? Most systems are actually fine around 99% and well-designed Erlang systems are likely to give you more than that in the software, leaving most errors to be hardware faults. At 99% you usually have ample time to recover. 

On Sun, Jul 30, 2017 at 9:33 PM <[hidden email]> wrote:
Hi Jesper,

Your points are reassuring. Thank you.

Wasabi promotes their site as 6x faster and 1/5th the cost of Amazon S3. In the spirit of due diligence my next steps are:

1. Do upload/recovery tests with large files to see minimal likely time for recovery
2. Visit Wasabi to check them out. They're in Boston so easy to do
3. For dev/testing/very early production I'm thinking of hosting two or maybe three Erlang Nitrogen + mnesia servers in house
4. See if I can come up with a script to detect outage and initiate recovery
5. This doesn't address replication across Zones, but one step at a time

I had been considering Riak KV, but this seems easier to implement with less overhead.

I still have many questions. But I'm months from actual beta launch, so this plan at least provides a starting point for critique and refinement.

Wish me luck.

All the best,

Lloyd

-----Original Message-----
From: "Jesper Louis Andersen" <[hidden email]>
Sent: Sunday, July 30, 2017 9:13am
To: [hidden email], "Erlang" <[hidden email]>
Subject: Re: [erlang-questions] mnesia -- a naive question

A couple of points:

* Mnesia protects you against the scenario where one of your nodes fail. It
doesn't automatically protect you against the network splitting, and
requires some manual recovery on the flip side of such an event. For rather
small clusters, this is manageable by manual operation. Larger systems will
be far harder to maintain because the risk of netsplits and node loss goes
up whenever you add a new node.

* I don't know about Wasabi, but Amazon's EC2 nodes are ephemeral in the
sense they can go away at a moments notice. And when this happens, the data
on the node is gone. Thus, to achieve persistent storage, you must either
store data off the EC2 node, presumably in S3, RDS, DynamoDB and so on. Or
use an EBS volume, attached to the EC2 node to provide persistent disk
space (on which your mnesia database can reside).

* The game is all about risk mitigation. If you regularly take a mnesia
backup and store it into S3, or something like it, you can get speedy
recovery to that point in time should the accident happen. If you want
better point-in-time-recovery, you can try running two mnesia nodes, but
you need to heed two important caveats:
    - You probably want your nodes to run in different zones so a failure
in one zone doesn't take down everything.
    - Amazons network is brittle and likely to drop connections which are
seen as netsplits.

* Mnesia mitigates risk by assuming the nodes are fairly robust and stable,
as well as the network between them. If you buy good expensive hardware,
this is a likely assumption and the noise of error will be low. So manual
intervention in the case of an error is probably what is needed anyway (to
fix the faulty hardware as well).

* Amazon and other leased environments tend to have brittle network
connections and flaky machines. To mitigate this, your system must make no
assumptions about stability and handle this up front. Mnesia wasn't really
built to work in such an environment.



On Sat, Jul 29, 2017 at 10:23 PM <[hidden email]> wrote:

> Hello,
>
> Wasabi is a new cloud storage service that promotes lower storage costs
> and greater speed than Amazon S3:
>
> https://wasabi.com/
>
> During the dev phase I'm running mnesia on the back-end of my current web
> project. I much like the seamless way that mnesia integrates into Erlang as
> well as its replication feature. But folks have warned about the hassles of
> mnesia net splits.
>
> Problem is that I have no operations experience to objectively weigh
> options. But I do want to bridge over all points of failure as
> cost-and-time-effectively as possible.
>
> So, my question is if and how I can integrate Wasabi (or Amazon S3 for
> that matter) into my operation to significantly reduce the probability of
> data loss?
>
>
> Many thanks,
>
> LRP
>
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: mnesia -- a naive question

Lloyd R. Prentice-2
Thank you Jesper for your thoughtful and generous insights and advice.

I will definitely look at Google's SRE handbook.

It would be great to see a definitive book on deployment and maintenance of Erlang systems from beta to full production in the Erlang canon.

Thanks again,

Lloyd

-----Original Message-----
From: "Jesper Louis Andersen" <[hidden email]>
Sent: Sunday, July 30, 2017 3:58pm
To: [hidden email]
Cc: "Erlang" <[hidden email]>
Subject: Re: [erlang-questions] mnesia -- a naive question

I often recommend people to crawl before walk before run before fly before
teleport.

Mnesia is a fine choice due to the low impedance. Just store Erlang terms
and you are up and running. As long as you are trying to validate a product
or a solution, it is more important to move fast than it is to worry too
much about operational problems. The reason is that your data size is
likely to be small and thus it is fairly easy to just restore everything.

Once you are more established and have a valid proof-of-concept, you can
start looking into a solution that has better durability and resilience.
The key aspect is to design your system with this change and extension in
mind: if you plan on using something like Riak, which is AP and has no
transactions, your current solution shouldn't rely too much on those kinds
of things. A Postgresql instance is likely to work fine up to a couple
dozen terabytes as well.

On the other hand: Mnesia seems to have served Klarna well. And their
business is likely to be far larger than yours for the coming years. So
perhaps one can scale a Mnesia based system somewhat easily while keeping
the system operational.

A key observation is that a modern server is so friggin' large we cut them
up into small pieces and leased out small pieces as virtual machines: most
systems doesn't need a full machine anymore. But it also means that
vertical scaling is likely to work up to a point that is far greater than
earlier on.

As for operations: almost all of Google's SRE handbook is worth studying.
In this particular case, you want to have a target availability set before
you deploy the system. Are you going for 99.9% uptime over 3 months, or
more? Most systems are actually fine around 99% and well-designed Erlang
systems are likely to give you more than that in the software, leaving most
errors to be hardware faults. At 99% you usually have ample time to
recover.

On Sun, Jul 30, 2017 at 9:33 PM <[hidden email]> wrote:

> Hi Jesper,
>
> Your points are reassuring. Thank you.
>
> Wasabi promotes their site as 6x faster and 1/5th the cost of Amazon S3.
> In the spirit of due diligence my next steps are:
>
> 1. Do upload/recovery tests with large files to see minimal likely time
> for recovery
> 2. Visit Wasabi to check them out. They're in Boston so easy to do
> 3. For dev/testing/very early production I'm thinking of hosting two or
> maybe three Erlang Nitrogen + mnesia servers in house
> 4. See if I can come up with a script to detect outage and initiate
> recovery
> 5. This doesn't address replication across Zones, but one step at a time
>
> I had been considering Riak KV, but this seems easier to implement with
> less overhead.
>
> I still have many questions. But I'm months from actual beta launch, so
> this plan at least provides a starting point for critique and refinement.
>
> Wish me luck.
>
> All the best,
>
> Lloyd
>
> -----Original Message-----
> From: "Jesper Louis Andersen" <[hidden email]>
> Sent: Sunday, July 30, 2017 9:13am
> To: [hidden email], "Erlang" <[hidden email]>
> Subject: Re: [erlang-questions] mnesia -- a naive question
>
> A couple of points:
>
> * Mnesia protects you against the scenario where one of your nodes fail. It
> doesn't automatically protect you against the network splitting, and
> requires some manual recovery on the flip side of such an event. For rather
> small clusters, this is manageable by manual operation. Larger systems will
> be far harder to maintain because the risk of netsplits and node loss goes
> up whenever you add a new node.
>
> * I don't know about Wasabi, but Amazon's EC2 nodes are ephemeral in the
> sense they can go away at a moments notice. And when this happens, the data
> on the node is gone. Thus, to achieve persistent storage, you must either
> store data off the EC2 node, presumably in S3, RDS, DynamoDB and so on. Or
> use an EBS volume, attached to the EC2 node to provide persistent disk
> space (on which your mnesia database can reside).
>
> * The game is all about risk mitigation. If you regularly take a mnesia
> backup and store it into S3, or something like it, you can get speedy
> recovery to that point in time should the accident happen. If you want
> better point-in-time-recovery, you can try running two mnesia nodes, but
> you need to heed two important caveats:
>     - You probably want your nodes to run in different zones so a failure
> in one zone doesn't take down everything.
>     - Amazons network is brittle and likely to drop connections which are
> seen as netsplits.
>
> * Mnesia mitigates risk by assuming the nodes are fairly robust and stable,
> as well as the network between them. If you buy good expensive hardware,
> this is a likely assumption and the noise of error will be low. So manual
> intervention in the case of an error is probably what is needed anyway (to
> fix the faulty hardware as well).
>
> * Amazon and other leased environments tend to have brittle network
> connections and flaky machines. To mitigate this, your system must make no
> assumptions about stability and handle this up front. Mnesia wasn't really
> built to work in such an environment.
>
>
>
> On Sat, Jul 29, 2017 at 10:23 PM <[hidden email]> wrote:
>
> > Hello,
> >
> > Wasabi is a new cloud storage service that promotes lower storage costs
> > and greater speed than Amazon S3:
> >
> > https://wasabi.com/
> >
> > During the dev phase I'm running mnesia on the back-end of my current web
> > project. I much like the seamless way that mnesia integrates into Erlang
> as
> > well as its replication feature. But folks have warned about the hassles
> of
> > mnesia net splits.
> >
> > Problem is that I have no operations experience to objectively weigh
> > options. But I do want to bridge over all points of failure as
> > cost-and-time-effectively as possible.
> >
> > So, my question is if and how I can integrate Wasabi (or Amazon S3 for
> > that matter) into my operation to significantly reduce the probability of
> > data loss?
> >
> >
> > Many thanks,
> >
> > LRP
> >
> >
> >
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email]
> > http://erlang.org/mailman/listinfo/erlang-questions
> >
>
>
>


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions