Quantcast

Mnesia, disk logging, and synchronous disk logging

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Mnesia, disk logging, and synchronous disk logging

Scott Lystig Fritchie
Howdy.  Is there a difference of opinion/definition on what
"synchronous" in Mnesia's synchronous disk logging means?

In the context of disc_copies tables ... it seems to me that Mnesia's
use of the phrase means:

    * The transaction coordinator waits synchronously for 2PC votes
      from all participants.

    * Each participant uses disk_log:log/2 or disk_log:blog/2 to
      record local votes and commit/abort decisions, but participants
      are *not* using the disk_log:sync/1 to force the log to disk.

The disk_log:sync/1 function has an extremely high penalty, but
sometimes that penalty is worth the cost.  For example, some
read+write transactions may contain data that you *really* do not want
to lose.  For data that important, if all replicas suddenly lose
power, it is possible to lose the logs and thus the newly-updated data
before it is written safely to disk on each replica machine.

But I can't find a Mnesia transaction knob/button that I can
twist/press to request that level of safety.  Is there such a thing?

-Scott
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mnesia, disk logging, and synchronous disk logging

Håkan Mattsson-2
On Mon, 23 Jan 2006, Scott Lystig Fritchie wrote:

SLF> Howdy.  Is there a difference of opinion/definition on what
SLF> "synchronous" in Mnesia's synchronous disk logging means?
SLF>
SLF> In the context of disc_copies tables ... it seems to me that Mnesia's
SLF> use of the phrase means:
SLF>
SLF>     * The transaction coordinator waits synchronously for 2PC votes
SLF>       from all participants.

In Mnesia the coordinator does always wait
synchronously for 2PC (and 3PC) votes from all
participants, regardless of the transaction being
"synchronous" or not.

In "synchronous" transactions", the coordinator does
also wait for the participants to complete their part
of the commit work in the transaction before the
control is returned to the caller.

SLF>     * Each participant uses disk_log:log/2 or disk_log:blog/2 to
SLF>       record local votes and commit/abort decisions, but participants
SLF>       are *not* using the disk_log:sync/1 to force the log to disk.

Correct.

SLF> The disk_log:sync/1 function has an extremely high penalty, but
SLF> sometimes that penalty is worth the cost.  For example, some
SLF> read+write transactions may contain data that you *really* do not want
SLF> to lose.  For data that important, if all replicas suddenly lose
SLF> power, it is possible to lose the logs and thus the newly-updated data
SLF> before it is written safely to disk on each replica machine.

I agree that such a feature can be useful.
At least if the there are no write caches
enabled in the disk hardware. Otherwise
you could lose some data anyway in case
of a power failure.

SLF> But I can't find a Mnesia transaction knob/button that I can
SLF> twist/press to request that level of safety.  Is there such a thing?

No currently there are no such thing in Mnesia.

/Håkan
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mnesia, disk logging, and synchronous disk logging

Scott Lystig Fritchie
>>>>> "hm" == Hakan Mattsson <[hidden email]> writes:

hm> In Mnesia the coordinator does always wait synchronously for 2PC
hm> (and 3PC) votes from all participants, regardless of the
hm> transaction being "synchronous" or not.

That makes sense ... the coordinator can do Very Bad Things if it
doesn't gather all votes.

hm> I agree that such a feature can be useful.  At least if the there
hm> are no write caches enabled in the disk hardware. Otherwise you
hm> could lose some data anyway in case of a power failure.

Even if your disk subsystem(*) has an NVRAM write-back cache, there is
risk of data loss unless you explicitly the fsync(2) system call.

With Mnesia using the disk_log module, which in turn usually uses
write(2) only, you are not certain that the OS will have copied
write(2)'s data to the disk device.  In most cases, the kernel can
(and will) wait for many seconds before flushing that data to the disk
device.

SLF> But I can't find a Mnesia transaction knob/button that I can
SLF> twist/press to request that level of safety.  Is there such a
SLF> thing?

hm> No currently there are no such thing in Mnesia.

That's what I'd thought.

Assuming that I wanted to try to add that to Mnesia ... I think I'd
need to add extra info to the commit record that's sent to each
participant.  Something that said: this log record is important enough
to use fsync after writing.  Hm.

I suppose a poor man's safety net would be to run a shell script like
this on each Mnesia node with disc_copies or disc_only_copies:

    while [ 1 ]; do
        sync
        sleep 1
    done

Easy to do, doesn't require code changes, and would limit worst-case
data loss to roughly 1-2 seconds.  (Assuming that disc_log and the
file Port that disc_log uses do not do any buffering.)  On the other
hand, performance may suck.

Too bad disk drives are so too darn slow.

-Scott

(*) Even if the disk logical device is a NVRAM/solid-state disk drive.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mnesia, disk logging, and synchronous disk logging

Dan Gudmundsson

Talking about poor mans solutions, you can also use mnesia:dump_log(),
which closes the files after operation.

The log dumping is otherwise automatic which you can control with time or
number of transactions, see manual.

A per transaction disk sync option requires some hacking though.

/Dan

Scott Lystig Fritchie writes:
 > >>>>> "hm" == Hakan Mattsson <[hidden email]> writes:
 >
 > hm> In Mnesia the coordinator does always wait synchronously for 2PC
 > hm> (and 3PC) votes from all participants, regardless of the
 > hm> transaction being "synchronous" or not.
 >
 > That makes sense ... the coordinator can do Very Bad Things if it
 > doesn't gather all votes.
 >
 > hm> I agree that such a feature can be useful.  At least if the there
 > hm> are no write caches enabled in the disk hardware. Otherwise you
 > hm> could lose some data anyway in case of a power failure.
 >
 > Even if your disk subsystem(*) has an NVRAM write-back cache, there is
 > risk of data loss unless you explicitly the fsync(2) system call.
 >
 > With Mnesia using the disk_log module, which in turn usually uses
 > write(2) only, you are not certain that the OS will have copied
 > write(2)'s data to the disk device.  In most cases, the kernel can
 > (and will) wait for many seconds before flushing that data to the disk
 > device.
 >
 > SLF> But I can't find a Mnesia transaction knob/button that I can
 > SLF> twist/press to request that level of safety.  Is there such a
 > SLF> thing?
 >
 > hm> No currently there are no such thing in Mnesia.
 >
 > That's what I'd thought.
 >
 > Assuming that I wanted to try to add that to Mnesia ... I think I'd
 > need to add extra info to the commit record that's sent to each
 > participant.  Something that said: this log record is important enough
 > to use fsync after writing.  Hm.
 >
 > I suppose a poor man's safety net would be to run a shell script like
 > this on each Mnesia node with disc_copies or disc_only_copies:
 >
 >     while [ 1 ]; do
 >         sync
 >         sleep 1
 >     done
 >
 > Easy to do, doesn't require code changes, and would limit worst-case
 > data loss to roughly 1-2 seconds.  (Assuming that disc_log and the
 > file Port that disc_log uses do not do any buffering.)  On the other
 > hand, performance may suck.
 >
 > Too bad disk drives are so too darn slow.
 >
 > -Scott
 >
 > (*) Even if the disk logical device is a NVRAM/solid-state disk drive.

--
Dan Gudmundsson               Project:    Mnesia, Erlang/OTP
Ericsson Utvecklings AB       Phone:      +46  8 727 5762
UAB/F/P                       Mobile:     +46 70 519 9469
S-125 25 Stockholm            Visit addr: Armborstv 1

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mnesia, disk logging, and synchronous disk logging

Bengt Kleberg
In reply to this post by Scott Lystig Fritchie
On 2006-01-25 22:54, Scott Lystig Fritchie wrote:
...deleted
> Even if your disk subsystem(*) has an NVRAM write-back cache, there is
> risk of data loss unless you explicitly the fsync(2) system call.

if you are running linux also remember this:

''The Linux fsync man page says:


"It does not necessarily ensure that the entry in the directory
containing the file has also reached disk. For that an explicit fsync on
the file descriptor of the directory is also needed."''

(http://archives.postgresql.org/pgsql-hackers/2004-10/msg01037.php)


bengt
Loading...