Mnesia

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Mnesia

Steve Davis
Hi,

I’m almost ashamed to say that it’s taken me over 5 years to come around to understanding the value of mnesia.

I bought the “well-known” negatives too fast. I have explored relational connectors, DHT solutions and a number of other approaches...

It’s dawned on me *finally* that 90% of the time, a well implemented mnesia solution would have been better, faster, and cheaper.

Did I mention "better"?

Has anyone else had this experience?

/s
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Steve Davis
Yes :)

Let’s discuss what the tick boxes are.
And be specific on where they break down (e.g. terabyte data storage).
And given where they break down, what paths to take.

I suspect there’s not too many use cases that aren’t addressed, and those unreasonably dominate discussion about mnesia.

/s

In general, 

On Oct 12, 2015, at 7:29 PM, T Ty <[hidden email]> wrote:

Always lol

Use mnesia until it is no longer useful. As long as you have all the tick boxes marked mnesia is great. And once it breaks down then either consider moving to another system or radically changing your architecture the way WhatsApp did.

On Tue, Oct 13, 2015 at 1:08 AM, Steve Davis <[hidden email]> wrote:
Hi,

I’m almost ashamed to say that it’s taken me over 5 years to come around to understanding the value of mnesia.

I bought the “well-known” negatives too fast. I have explored relational connectors, DHT solutions and a number of other approaches...

It’s dawned on me *finally* that 90% of the time, a well implemented mnesia solution would have been better, faster, and cheaper.

Did I mention "better"?

Has anyone else had this experience?

/s
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Chaitanya Chalasani-4
In reply to this post by Steve Davis
I have almost always managed to delivery solutions using just mnesia. In one case when the data volumes were too high, I have partitioned the data model to use only mnesia for transactions and the rest to oracle.

> On 13-Oct-2015, at 5:38 AM, Steve Davis <[hidden email]> wrote:
>
> Hi,
>
> I’m almost ashamed to say that it’s taken me over 5 years to come around to understanding the value of mnesia.
>
> I bought the “well-known” negatives too fast. I have explored relational connectors, DHT solutions and a number of other approaches...
>
> It’s dawned on me *finally* that 90% of the time, a well implemented mnesia solution would have been better, faster, and cheaper.
>
> Did I mention "better"?
>
> Has anyone else had this experience?
>
> /s
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Martin Karlsson-2
In reply to this post by Steve Davis
Hi Steve,

I'm with you:)

The negatives are being mentioned a lot and unfortunately put people
off instead of just being used as good things to know about when you
are picking your technology.

If you know about the quirks you can usually come up with a workaround
that matches your use-case.

It is about weighing pros and cons. Why not list a few of the
negatives and what you can do to work around them.

PROBLEM: Net splits must be manually handled.
WORKAROUND:
- https://github.com/uwiger/unsplit for automatic healing
- Use the majority options (when you need to heal you choose the
majority partition when you pick master nodes). This should reduce
(eliminate?) risk of data loss.

PROBLEM: disc_only_copies uses dets and is limited to 2GB
WORKAROUND:
- Partition the tables
- Use the mnesia_ex patch with leveldb backend (to be released soon I
think, it was mentioned in another email just recently)
- Perhaps you can use disc_copies and hold your data-set in RAM

PROBLEM: Prone to overloading
WORKAROUND: Load regulate. Reduce concurrency. I think I read
somewhere that mnesia and ets works best with a max concurrency of the
number of scheduler threads and degrades from there on.

PROBLEM: Slow startup
WORKAROUND: Increase the number of table loaders.

PROBLEM: Upgrading table definition. transform_table can have problems
with big and distributed tables.
WORKAROUND: Treat large tables like a key value store to reduce risk
of having to modify the record

I'm sure there are more problems and workarounds especially if you
start pushing it (for example like whatsapp did).

Also I've heard dirty_write is not safe with replicated tables. I
don't know if this is true but is holding me back from using dirty
transactions rather than transactions (which I don't need apart from
safety on replicated tables). If anyone knows about this one please
let me know.

Feel free to add/remove/correct items:)

Cheers,
Martin


On 13 October 2015 at 13:08, Steve Davis <[hidden email]> wrote:

> Hi,
>
> I’m almost ashamed to say that it’s taken me over 5 years to come around to understanding the value of mnesia.
>
> I bought the “well-known” negatives too fast. I have explored relational connectors, DHT solutions and a number of other approaches...
>
> It’s dawned on me *finally* that 90% of the time, a well implemented mnesia solution would have been better, faster, and cheaper.
>
> Did I mention "better"?
>
> Has anyone else had this experience?
>
> /s
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Garrett Smith-5
There's no substitute for production experience. If Mnesia is what you
really really really want to use - it's super awesome - just dive in
and use it. As you say, if you run into issues, you can just work
around them.

Personally, I'm interested in the physics of a system at scale, less
so the brands involved. Get something in motion, measure it, fix it.

On Mon, Oct 12, 2015 at 8:04 PM, Martin Karlsson
<[hidden email]> wrote:

> Hi Steve,
>
> I'm with you:)
>
> The negatives are being mentioned a lot and unfortunately put people
> off instead of just being used as good things to know about when you
> are picking your technology.
>
> If you know about the quirks you can usually come up with a workaround
> that matches your use-case.
>
> It is about weighing pros and cons. Why not list a few of the
> negatives and what you can do to work around them.
>
> PROBLEM: Net splits must be manually handled.
> WORKAROUND:
> - https://github.com/uwiger/unsplit for automatic healing
> - Use the majority options (when you need to heal you choose the
> majority partition when you pick master nodes). This should reduce
> (eliminate?) risk of data loss.
>
> PROBLEM: disc_only_copies uses dets and is limited to 2GB
> WORKAROUND:
> - Partition the tables
> - Use the mnesia_ex patch with leveldb backend (to be released soon I
> think, it was mentioned in another email just recently)
> - Perhaps you can use disc_copies and hold your data-set in RAM
>
> PROBLEM: Prone to overloading
> WORKAROUND: Load regulate. Reduce concurrency. I think I read
> somewhere that mnesia and ets works best with a max concurrency of the
> number of scheduler threads and degrades from there on.
>
> PROBLEM: Slow startup
> WORKAROUND: Increase the number of table loaders.
>
> PROBLEM: Upgrading table definition. transform_table can have problems
> with big and distributed tables.
> WORKAROUND: Treat large tables like a key value store to reduce risk
> of having to modify the record
>
> I'm sure there are more problems and workarounds especially if you
> start pushing it (for example like whatsapp did).
>
> Also I've heard dirty_write is not safe with replicated tables. I
> don't know if this is true but is holding me back from using dirty
> transactions rather than transactions (which I don't need apart from
> safety on replicated tables). If anyone knows about this one please
> let me know.
>
> Feel free to add/remove/correct items:)
>
> Cheers,
> Martin
>
>
> On 13 October 2015 at 13:08, Steve Davis <[hidden email]> wrote:
>> Hi,
>>
>> I’m almost ashamed to say that it’s taken me over 5 years to come around to understanding the value of mnesia.
>>
>> I bought the “well-known” negatives too fast. I have explored relational connectors, DHT solutions and a number of other approaches...
>>
>> It’s dawned on me *finally* that 90% of the time, a well implemented mnesia solution would have been better, faster, and cheaper.
>>
>> Did I mention "better"?
>>
>> Has anyone else had this experience?
>>
>> /s
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Lloyd R. Prentice-2
In reply to this post by Martin Karlsson-2
Hello,

During a break at the Chicago Erlang Workshop Fred Hebert mentioned that Mnesia is a fine database for the 1990s.

Mnesia was a major draw when I was first attracted to Erlang in part because I hate transiting syntactic boundaries when I'm programming.

I asked Fred what it would it take to upgrade Mnesia for the 21st century (or, at least, for the next decade). He didn't know.

Martin's list of negatives may be a good place to start.

So, just how much effort and knowledge would it take to overcome or ameliorate these and other unspecified negatives of Mnesia to produce Mnesia2?

Best,

LRP


-----Original Message-----
From: "Martin Karlsson" <[hidden email]>
Sent: Monday, October 12, 2015 9:04pm
To: "Steve Davis" <[hidden email]>
Cc: "Erlang Questions" <[hidden email]>
Subject: Re: [erlang-questions] Mnesia

Hi Steve,

I'm with you:)

The negatives are being mentioned a lot and unfortunately put people
off instead of just being used as good things to know about when you
are picking your technology.

If you know about the quirks you can usually come up with a workaround
that matches your use-case.

It is about weighing pros and cons. Why not list a few of the
negatives and what you can do to work around them.

PROBLEM: Net splits must be manually handled.
WORKAROUND:
- https://github.com/uwiger/unsplit for automatic healing
- Use the majority options (when you need to heal you choose the
majority partition when you pick master nodes). This should reduce
(eliminate?) risk of data loss.

PROBLEM: disc_only_copies uses dets and is limited to 2GB
WORKAROUND:
- Partition the tables
- Use the mnesia_ex patch with leveldb backend (to be released soon I
think, it was mentioned in another email just recently)
- Perhaps you can use disc_copies and hold your data-set in RAM

PROBLEM: Prone to overloading
WORKAROUND: Load regulate. Reduce concurrency. I think I read
somewhere that mnesia and ets works best with a max concurrency of the
number of scheduler threads and degrades from there on.

PROBLEM: Slow startup
WORKAROUND: Increase the number of table loaders.

PROBLEM: Upgrading table definition. transform_table can have problems
with big and distributed tables.
WORKAROUND: Treat large tables like a key value store to reduce risk
of having to modify the record

I'm sure there are more problems and workarounds especially if you
start pushing it (for example like whatsapp did).

Also I've heard dirty_write is not safe with replicated tables. I
don't know if this is true but is holding me back from using dirty
transactions rather than transactions (which I don't need apart from
safety on replicated tables). If anyone knows about this one please
let me know.

Feel free to add/remove/correct items:)

Cheers,
Martin


On 13 October 2015 at 13:08, Steve Davis <[hidden email]> wrote:

> Hi,
>
> I’m almost ashamed to say that it’s taken me over 5 years to come around to understanding the value of mnesia.
>
> I bought the “well-known” negatives too fast. I have explored relational connectors, DHT solutions and a number of other approaches...
>
> It’s dawned on me *finally* that 90% of the time, a well implemented mnesia solution would have been better, faster, and cheaper.
>
> Did I mention "better"?
>
> Has anyone else had this experience?
>
> /s
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

zxq9-2
On 2015年10月13日 火曜日 13:46:14 [hidden email] wrote:

> Hello,
>
> During a break at the Chicago Erlang Workshop Fred Hebert mentioned that Mnesia is a fine database for the 1990s.
>
> Mnesia was a major draw when I was first attracted to Erlang in part because I hate transiting syntactic boundaries when I'm programming.
>
> I asked Fred what it would it take to upgrade Mnesia for the 21st century (or, at least, for the next decade). He didn't know.
>
> Martin's list of negatives may be a good place to start.
>
> So, just how much effort and knowledge would it take to overcome or ameliorate these and other unspecified negatives of Mnesia to produce Mnesia2?

As with any view, perspective is what drives the perception of utility.

If we assume that the only thing that matters is web applications or server-oriented architectures where every potential user in the world is going to be pounding the crap out of a central data center or, even worse, a floating set of vaguely provisioned services within something like AWS, then sure. Maybe Mnesia isn't the thing for that, and certainly a LOT of development is focused on that case.

Most of that development is iterative rehashes of what everyone else has already done, though. This is a red flag to me. Social networks, messaging systems, click analytics, GIS frameworks for click analytics, and lots of other things called "analytics" but really boil down to adtech, adtech, adtech, adtech, and some more adtech, and marketing campaigns about campaigns about adtech.

The elephant in the room being that the ad market is falling apart because it isn't worth anything near like what we imagined. If views were what mattered it would be impossible for Twitter to be in such trouble, Arstechnica be struggling to maintain its editorial standards, or sites like Huffington Post and Forbes to be forced to display clickbait panels and load 40 ~ 70 third party plants per page load.

For this case -- the forefront of tech buzz and where a huge, and very publicly visible percentage of investment and development effort go -- maybe Mnesia is not the best tool.

But for other use cases it is immensely useful, *especially* as a live cache with db features. Small and medium business represents about ~ 90% of the activity in many economies. Here in Japan the figure floats around 92%; in the US it is something comparable. When small businesses latch on to a solution that *actually* helps them become more efficient (instead of shelling out for yet another upgrade to a spreadsheet program that will be used to do exactly what they would have done with a spreadsheet in the 80's -- or even funnier, for a version of that spreadsheet that still does the same things, but slower, in a browser), and that solution becomes widespread enough to have an actual impact *the entire economy* improves without anyone really noticing why.

That is AMAZING. This is much more interesting than chasing click statistics in the interest of brokering ad sales at the speed of light so that users can continue to ignore them.

I mention the SMB use case because it is one with which I am familiar. Fred holds his views because of the sector he deals with. Mnesia is by no means the *only* database useful in developing for the SMB market, but it is a profoundly useful tool among several necessary to make cross-platform development for SMBs a non-suicidal task. A hilariously low percentage of tech investment targets the SMB market (aiming instead at huge business, the web, and the consumer electronics tie-in (the only interesting part of the list)), and there are accordingly very few tools available to make development for that market reasonable.

[Insert rant about the deleterious effects of a legally enforced monopoly on small business software.]

So is mnesia a db for the 1990s? Maybe. I don't know. That is a very broad (and totally unqualified) statement to make, so it is hard to argue about. I suppose Postgres, Oracle, DB2 and anything else with a schema fall in the same category, but last I heard those were pretty darn useful for a wide variety of very real problems people experienced in the 1990s and continue to experience today. What I can say for certain is that I continue to find Mnesia to be of profound utility today, in 2015, whether or not it is "a good database for the 1990s". The *variety* of data I have dealt with in business is so broad that no single database paradigm, and certainly no particular database system, can manage it. Not reasonably, anyway. I'm not aware of any other systems that compare very well with Mnesia in functionality, especially interoperating with Erlang (or almost any language/runtime) the same way, so its hard to compare. That leaves Mnesia in the "unique tool" category, as opposed to the "easy to compare within a commodity market of similar alternatives" category.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Lloyd R. Prentice-2
Hi Craig,

Among the most wise things I've heard on the topic so far.

How can we get a crisp summary of your points and implications of Martin's negatives high up in the official mnesia docs. (I really don't know the process.) Would have saved me many hours of uncertainty.

All the best,

Lloyd



Sent from my iPad

> On Oct 13, 2015, at 8:03 PM, zxq9 <[hidden email]> wrote:
>
>> On 2015年10月13日 火曜日 13:46:14 [hidden email] wrote:
>> Hello,
>>
>> During a break at the Chicago Erlang Workshop Fred Hebert mentioned that Mnesia is a fine database for the 1990s.
>>
>> Mnesia was a major draw when I was first attracted to Erlang in part because I hate transiting syntactic boundaries when I'm programming.
>>
>> I asked Fred what it would it take to upgrade Mnesia for the 21st century (or, at least, for the next decade). He didn't know.
>>
>> Martin's list of negatives may be a good place to start.
>>
>> So, just how much effort and knowledge would it take to overcome or ameliorate these and other unspecified negatives of Mnesia to produce Mnesia2?
>
> As with any view, perspective is what drives the perception of utility.
>
> If we assume that the only thing that matters is web applications or server-oriented architectures where every potential user in the world is going to be pounding the crap out of a central data center or, even worse, a floating set of vaguely provisioned services within something like AWS, then sure. Maybe Mnesia isn't the thing for that, and certainly a LOT of development is focused on that case.
>
> Most of that development is iterative rehashes of what everyone else has already done, though. This is a red flag to me. Social networks, messaging systems, click analytics, GIS frameworks for click analytics, and lots of other things called "analytics" but really boil down to adtech, adtech, adtech, adtech, and some more adtech, and marketing campaigns about campaigns about adtech.
>
> The elephant in the room being that the ad market is falling apart because it isn't worth anything near like what we imagined. If views were what mattered it would be impossible for Twitter to be in such trouble, Arstechnica be struggling to maintain its editorial standards, or sites like Huffington Post and Forbes to be forced to display clickbait panels and load 40 ~ 70 third party plants per page load.
>
> For this case -- the forefront of tech buzz and where a huge, and very publicly visible percentage of investment and development effort go -- maybe Mnesia is not the best tool.
>
> But for other use cases it is immensely useful, *especially* as a live cache with db features. Small and medium business represents about ~ 90% of the activity in many economies. Here in Japan the figure floats around 92%; in the US it is something comparable. When small businesses latch on to a solution that *actually* helps them become more efficient (instead of shelling out for yet another upgrade to a spreadsheet program that will be used to do exactly what they would have done with a spreadsheet in the 80's -- or even funnier, for a version of that spreadsheet that still does the same things, but slower, in a browser), and that solution becomes widespread enough to have an actual impact *the entire economy* improves without anyone really noticing why.
>
> That is AMAZING. This is much more interesting than chasing click statistics in the interest of brokering ad sales at the speed of light so that users can continue to ignore them.
>
> I mention the SMB use case because it is one with which I am familiar. Fred holds his views because of the sector he deals with. Mnesia is by no means the *only* database useful in developing for the SMB market, but it is a profoundly useful tool among several necessary to make cross-platform development for SMBs a non-suicidal task. A hilariously low percentage of tech investment targets the SMB market (aiming instead at huge business, the web, and the consumer electronics tie-in (the only interesting part of the list)), and there are accordingly very few tools available to make development for that market reasonable.
>
> [Insert rant about the deleterious effects of a legally enforced monopoly on small business software.]
>
> So is mnesia a db for the 1990s? Maybe. I don't know. That is a very broad (and totally unqualified) statement to make, so it is hard to argue about. I suppose Postgres, Oracle, DB2 and anything else with a schema fall in the same category, but last I heard those were pretty darn useful for a wide variety of very real problems people experienced in the 1990s and continue to experience today. What I can say for certain is that I continue to find Mnesia to be of profound utility today, in 2015, whether or not it is "a good database for the 1990s". The *variety* of data I have dealt with in business is so broad that no single database paradigm, and certainly no particular database system, can manage it. Not reasonably, anyway. I'm not aware of any other systems that compare very well with Mnesia in functionality, especially interoperating with Erlang (or almost any language/runtime) the same way, so its hard to compare. That leaves Mnesia in the "unique tool" category, as opposed to the "easy to compare within a commodity market of similar alternatives" category.
>
> -Craig
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Richard A. O'Keefe-2
In reply to this post by Lloyd R. Prentice-2

On 14/10/2015, at 6:46 am, <[hidden email]> <[hidden email]> wrote:
>
> I asked Fred what it would it take to upgrade Mnesia for the 21st century (or, at least, for the next decade). He didn't know.

There's one thing that strikes me.

I could go to a shop today and buy a 1 TB external drive for
NZD 75, including Goods and Services Tax of 15%.  (At least
that's what the ad I saw a couple of days ago said.)
That's almost exactly USD 50.  This is a drive that fits in
a shirt pocket, with room left over for all sorts of junk.

To make Mnesia a data base for the 2010s, it has to be able to
handle at least 1TB of data.  Heck, I've got enough goodies-for-
research money left that I could get the department to buy me
ten of these gadgets, so let's say Mnesia
 - should be able to handle a single table in the low TB
 - should be able to handle a collection of tables in the
   tens of TB
 - where "handle" includes creating, populating, checking,
   recovering, and accessing in "a reasonable time".

That's a "single machine data base for the 2010s".
Of course there are multicore, cluster, and network issues as
well.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Lloyd R. Prentice-2
Hi Richard,

So if we dig into the code, what exactly needs to change to make that happen?

This is a great teachable moment where the wizards of Erlang can help us with less understanding significantly advance our Erlang skills--- if nothing else, guide us through the design decisions that shaped mnesia, the architecture, and the significant code passages that impose current limitations.

Why? To broaden the base of folks capable of extending and advancing the Erlang legacy.

All the best,

Lloyd

Sent from my iPad

> On Oct 13, 2015, at 9:16 PM, "Richard A. O'Keefe" <[hidden email]> wrote:
>
>
>> On 14/10/2015, at 6:46 am, <[hidden email]> <[hidden email]> wrote:
>>
>> I asked Fred what it would it take to upgrade Mnesia for the 21st century (or, at least, for the next decade). He didn't know.
>
> There's one thing that strikes me.
>
> I could go to a shop today and buy a 1 TB external drive for
> NZD 75, including Goods and Services Tax of 15%.  (At least
> that's what the ad I saw a couple of days ago said.)
> That's almost exactly USD 50.  This is a drive that fits in
> a shirt pocket, with room left over for all sorts of junk.
>
> To make Mnesia a data base for the 2010s, it has to be able to
> handle at least 1TB of data.  Heck, I've got enough goodies-for-
> research money left that I could get the department to buy me
> ten of these gadgets, so let's say Mnesia
> - should be able to handle a single table in the low TB
> - should be able to handle a collection of tables in the
>   tens of TB
> - where "handle" includes creating, populating, checking,
>   recovering, and accessing in "a reasonable time".
>
> That's a "single machine data base for the 2010s".
> Of course there are multicore, cluster, and network issues as
> well.
>
>
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Richard A. O'Keefe-2

On 14/10/2015, at 4:19 pm, Lloyd R. Prentice <[hidden email]> wrote:

> Hi Richard,
>
> So if we dig into the code, what exactly needs to change to make that happen?

How should I know?
I couldn't implement a data base to save my life.
It will certainly be more than just finding the place
where it says "2GB" and changing a number.
Things *change* at scale.

I've got a data set that's 18GB as raw text,
and a student wants to do some data mining on a recent
data set that's too big to fit on the 8GB memory stick
he keeps bringing me excerpts on; it would be a good
fit for Mnesia...

mnesia_ext + LevelDB looks really good; it would be nice
to know, on downloading a new release of Erlang/OTP, that
it would be _there_ and *the documentation integrated*.

By the way, a key fact about why Mnesia is the way it is
can be found in the documentation:
        Mnesia is primarily intended to be a
        memory-resident database.  Some of its
        design tradeoffs reflect this.
But on a 16GiB 64-bit machine (yeah, I know it's small,
but it's a couple of years old) a 32-bit limit doesn't
make as much sense for a memory-resident data base as it
used to either.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Kenneth Lakin
On 10/13/2015 10:24 PM, Richard A. O'Keefe wrote:
> But on a 16GiB 64-bit machine ... a 32-bit limit doesn't
> make as much sense for a memory-resident data base as it
> used to either.

I'm slightly confused.

As I understood it, the size of a disc_copies Mnesia table was only
limited by the system's available RAM, and one didn't need to worry
about the size of the on-disk representation of the data.

Did I misunderstand this, or were you referring to DETS-backed
disc_only_copies tables?


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Gordon Guthrie-5
In reply to this post by Richard A. O'Keefe-2
Mnesia is also a pain when it partitions - you have to write your own reconciliation programmes.

There has been a lot of work done on eventual consistency in *the other* Erlang database - Riak (disclaimer I am working at Basho now)

Riak implements eventual consistency - and post-partition self-healing using Consistent Replicated Data Types (or CRDTs) and the canonical set of standalone CRDT libraries is written in Erlang:

There is a comprehensive reading list here:

The combination of using Klarna’s (forthcoming) leveldb backend and a CRDT eventual consistency layer on top would be an interesting start offering a distributed transactional database with eventual consistency

Gordon

On 14 Oct 2015, at 02:16, Richard A. O'Keefe <[hidden email]> wrote:


On 14/10/2015, at 6:46 am, <[hidden email]> <[hidden email]> wrote:

I asked Fred what it would it take to upgrade Mnesia for the 21st century (or, at least, for the next decade). He didn't know.

There's one thing that strikes me.

I could go to a shop today and buy a 1 TB external drive for
NZD 75, including Goods and Services Tax of 15%.  (At least
that's what the ad I saw a couple of days ago said.)
That's almost exactly USD 50.  This is a drive that fits in
a shirt pocket, with room left over for all sorts of junk.

To make Mnesia a data base for the 2010s, it has to be able to
handle at least 1TB of data.  Heck, I've got enough goodies-for-
research money left that I could get the department to buy me
ten of these gadgets, so let's say Mnesia
- should be able to handle a single table in the low TB
- should be able to handle a collection of tables in the
  tens of TB
- where "handle" includes creating, populating, checking,
  recovering, and accessing in "a reasonable time".

That's a "single machine data base for the 2010s".
Of course there are multicore, cluster, and network issues as
well.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Richard Carlsson-3
The simple view of Mnesia is that it's a transaction layer on top of ETS tables, with some varying forms of backing storage. Transactions are pessimistic, based on locking. Some small optimizations have been done to the locking mechanism in recent years, but it's maybe more could be done in that area. It might also be possible to add built-in support for optimistic transactions based on timestamps or compare-and-swap, for certain usage patterns. The semantics of dirty reads/writes need to be better documented, and if possible cleaned up a bit, because the behaviour can depend on table type and whether or not the tables are local or remote. The table size problems can probably be considered to be solved by mnesia_ext with leveldb or other backends. None of this will make it less of a 90's database though. Adding eventual consistency (e.g. based on vector clocks) as alternative to transactions would make it more modern.

The big thing that would help, as Gordon mentioned, is a new distribution/replication layer. The existing one basically assumes that tables are not huge and the network between nodes is fast and reliable with netsplits being rare, like in a small cluster in a telecom base station. We use Mnesia, but not in distributed mode - we have a custom distribution layer (stable, but very limited) on top of local Mnesia instances that are not directly aware of each other.

    /Richard



        /Richard

On Wed, Oct 14, 2015 at 9:25 AM, Gordon Guthrie <[hidden email]> wrote:
Mnesia is also a pain when it partitions - you have to write your own reconciliation programmes.

There has been a lot of work done on eventual consistency in *the other* Erlang database - Riak (disclaimer I am working at Basho now)

Riak implements eventual consistency - and post-partition self-healing using Consistent Replicated Data Types (or CRDTs) and the canonical set of standalone CRDT libraries is written in Erlang:

There is a comprehensive reading list here:

The combination of using Klarna’s (forthcoming) leveldb backend and a CRDT eventual consistency layer on top would be an interesting start offering a distributed transactional database with eventual consistency

Gordon

On 14 Oct 2015, at 02:16, Richard A. O'Keefe <[hidden email]> wrote:


On 14/10/2015, at 6:46 am, <[hidden email]> <[hidden email]> wrote:

I asked Fred what it would it take to upgrade Mnesia for the 21st century (or, at least, for the next decade). He didn't know.

There's one thing that strikes me.

I could go to a shop today and buy a 1 TB external drive for
NZD 75, including Goods and Services Tax of 15%.  (At least
that's what the ad I saw a couple of days ago said.)
That's almost exactly USD 50.  This is a drive that fits in
a shirt pocket, with room left over for all sorts of junk.

To make Mnesia a data base for the 2010s, it has to be able to
handle at least 1TB of data.  Heck, I've got enough goodies-for-
research money left that I could get the department to buy me
ten of these gadgets, so let's say Mnesia
- should be able to handle a single table in the low TB
- should be able to handle a collection of tables in the
  tens of TB
- where "handle" includes creating, populating, checking,
  recovering, and accessing in "a reasonable time".

That's a "single machine data base for the 2010s".
Of course there are multicore, cluster, and network issues as
well.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Vance Shipley
On Wed, Oct 14, 2015 at 2:16 PM, Richard Carlsson
<[hidden email]> wrote:
> The simple view of Mnesia is that it's a transaction layer on top of ETS tables, with some varying forms of backing storage.

I am careful not to use the term "database" when referring to mnesia
as it leads to unfair comparisons and unreasonable expectations.  I
use terms such as "distributed tables" or "data store".  It's a more
low level tool than what most people think of as a "database".

In training I describe a progression in scale out like this:

     single process - store data in StateData (i.e. gb_trees)
     multiple processes, single node - store data in ets tables
     multiple processes, multiple nodes - store data in mnesia

... and that's before we talk about persistence or transactions.

--
     -Vance
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Gordon Guthrie-5
In reply to this post by Richard Carlsson-3
I always planned reliability for Mnesia as hot/cold

A shared resilient network storage fabric underneath both a running Mnesia instance and a non-running one.

Failover would then be losing a server and bringing up the other one. Obviously this can be expensive with large tables that need to be loaded fully into memory, etc, etc…

Gordon

On 14 Oct 2015, at 09:46, Richard Carlsson <[hidden email]> wrote:

The simple view of Mnesia is that it's a transaction layer on top of ETS tables, with some varying forms of backing storage. Transactions are pessimistic, based on locking. Some small optimizations have been done to the locking mechanism in recent years, but it's maybe more could be done in that area. It might also be possible to add built-in support for optimistic transactions based on timestamps or compare-and-swap, for certain usage patterns. The semantics of dirty reads/writes need to be better documented, and if possible cleaned up a bit, because the behaviour can depend on table type and whether or not the tables are local or remote. The table size problems can probably be considered to be solved by mnesia_ext with leveldb or other backends. None of this will make it less of a 90's database though. Adding eventual consistency (e.g. based on vector clocks) as alternative to transactions would make it more modern.

The big thing that would help, as Gordon mentioned, is a new distribution/replication layer. The existing one basically assumes that tables are not huge and the network between nodes is fast and reliable with netsplits being rare, like in a small cluster in a telecom base station. We use Mnesia, but not in distributed mode - we have a custom distribution layer (stable, but very limited) on top of local Mnesia instances that are not directly aware of each other.

    /Richard



        /Richard

On Wed, Oct 14, 2015 at 9:25 AM, Gordon Guthrie <[hidden email]> wrote:
Mnesia is also a pain when it partitions - you have to write your own reconciliation programmes.

There has been a lot of work done on eventual consistency in *the other* Erlang database - Riak (disclaimer I am working at Basho now)

Riak implements eventual consistency - and post-partition self-healing using Consistent Replicated Data Types (or CRDTs) and the canonical set of standalone CRDT libraries is written in Erlang:

There is a comprehensive reading list here:

The combination of using Klarna’s (forthcoming) leveldb backend and a CRDT eventual consistency layer on top would be an interesting start offering a distributed transactional database with eventual consistency

Gordon

On 14 Oct 2015, at 02:16, Richard A. O'Keefe <[hidden email]> wrote:


On 14/10/2015, at 6:46 am, <[hidden email]> <[hidden email]> wrote:

I asked Fred what it would it take to upgrade Mnesia for the 21st century (or, at least, for the next decade). He didn't know.

There's one thing that strikes me.

I could go to a shop today and buy a 1 TB external drive for
NZD 75, including Goods and Services Tax of 15%.  (At least
that's what the ad I saw a couple of days ago said.)
That's almost exactly USD 50.  This is a drive that fits in
a shirt pocket, with room left over for all sorts of junk.

To make Mnesia a data base for the 2010s, it has to be able to
handle at least 1TB of data.  Heck, I've got enough goodies-for-
research money left that I could get the department to buy me
ten of these gadgets, so let's say Mnesia
- should be able to handle a single table in the low TB
- should be able to handle a collection of tables in the
  tens of TB
- where "handle" includes creating, populating, checking,
  recovering, and accessing in "a reasonable time".

That's a "single machine data base for the 2010s".
Of course there are multicore, cluster, and network issues as
well.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions




_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Magnus Henoch-6
In reply to this post by Steve Davis
"Lloyd R. Prentice" <[hidden email]> writes:

> Hi Craig,
>
> Among the most wise things I've heard on the topic so far.
>
> How can we get a crisp summary of your points and implications
> of Martin's negatives high up in the official mnesia docs. (I
> really don't know the process.) Would have saved me many hours
> of uncertainty.

The process for adding something to the documentation is
submitting a pull request to the erlang/otp repository on Github.
The Mnesia documentation is in the lib/mnesia/doc/src directory.

You could either clone the repository, edit the documentation in
your favourite text editor, commit and push your changes, and open
a pull request from your branch, or you could edit the
documentation through the Github web interface and submit the pull
request from there.  Perhaps the Mnesia overview page would be a
good place for this:

https://github.com/erlang/otp/blob/maint/lib/mnesia/doc/src/Mnesia_overview.xml

Click the pen icon near the top to start editing.

Regards,
Magnus
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Richard A. O'Keefe-2
In reply to this post by Kenneth Lakin
According to http://www.erlang.org/faq/mnesia.html
"Dets uses 32 bit integers for file offsets,
so the largest possible mnesia table (for now) is 4Gb".
There is nothing there about the limits being different
for in memory vs on disc.
Poke around a bit more and you will find explicit claims
that this limit applies both to disc only and to disc copies,
e.g., "disc_copies tables are limited by their dets backend".
You will also find the figures of 2GB and 3GB floating around.

One thing that would be really nice would have to have
accurate current limits prominently signposted (not necessarily
*displayed*, just pointed to will do) near the beginning of the
Mnesia manual.

I've had serious problems with the Mnesia documentation in the
past, so I'm quite prepared to believe that I have my facts
totally wrong.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Dan Gudmundsson-2
You are right in that you are wrong, and so is the documentation then, 
disc_copies tables is no longer using dets files as their backend storage.
That restriction have been removed (a long time ago).

Tough, disc_only_copies are still using dets files and have the 32bits limit.

A pointer (or a patch) to the documentation would be nice.


On Thu, Oct 15, 2015 at 7:25 AM <[hidden email]> wrote:
According to http://www.erlang.org/faq/mnesia.html
"Dets uses 32 bit integers for file offsets,
so the largest possible mnesia table (for now) is 4Gb".
There is nothing there about the limits being different
for in memory vs on disc.
Poke around a bit more and you will find explicit claims
that this limit applies both to disc only and to disc copies,
e.g., "disc_copies tables are limited by their dets backend".
You will also find the figures of 2GB and 3GB floating around.

One thing that would be really nice would have to have
accurate current limits prominently signposted (not necessarily
*displayed*, just pointed to will do) near the beginning of the
Mnesia manual.

I've had serious problems with the Mnesia documentation in the
past, so I'm quite prepared to believe that I have my facts
totally wrong.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Mnesia

Valentin Micic-2
Well, one can always use table fragmentation -- individual dets files are still going to be limited to 32-bits but table can go beyond that.
We have used this approach in the past (fragmented a very big table across 512 dets files), and it worked reasonably well…
There were a few performance problems mnesia had on a specific OS platforms (SOLARIS in particular) with a very poor file I/O caching.
But the one thing that was missing from mnesia, that actually made us abandon it for projects that required huge data volumes was a lack of flexibility regarding storage mechanism used.

And before you start thinking that I used more pot than usual, let me clarify:

Yes, once may chose between RAM, DISK or combination of the two. However, if you want to control the amount of RAM used by mensia, you are left with only once choice: disc only copy, which puts some performance constraints. 
Thankfully, Erlang lends itself very nicely to "roll-your-own" approach and we managed to do just that with a relative ease.

 Thus, if I were to suggest an improvement for mnesia, I'd say that having a disc_only_copy with a flexible caching may do the trick (well, this what we had to do to make our lives good again -- we did not use mnesia but a combination of ets and dets files with custom dets fragmentation).

Kind reagards

V/


On 15 Oct 2015, at 8:00 AM, Dan Gudmundsson wrote:

You are right in that you are wrong, and so is the documentation then, 
disc_copies tables is no longer using dets files as their backend storage.
That restriction have been removed (a long time ago).

Tough, disc_only_copies are still using dets files and have the 32bits limit.

A pointer (or a patch) to the documentation would be nice.


On Thu, Oct 15, 2015 at 7:25 AM <[hidden email]> wrote:
According to http://www.erlang.org/faq/mnesia.html
"Dets uses 32 bit integers for file offsets,
so the largest possible mnesia table (for now) is 4Gb".
There is nothing there about the limits being different
for in memory vs on disc.
Poke around a bit more and you will find explicit claims
that this limit applies both to disc only and to disc copies,
e.g., "disc_copies tables are limited by their dets backend".
You will also find the figures of 2GB and 3GB floating around.

One thing that would be really nice would have to have
accurate current limits prominently signposted (not necessarily
*displayed*, just pointed to will do) near the beginning of the
Mnesia manual.

I've had serious problems with the Mnesia documentation in the
past, so I'm quite prepared to believe that I have my facts
totally wrong.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
12