HEART

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: HEART

Michael L Martin
But who watches the watchdog?


On 05/12/2015 05:37 PM, Miles Fidelman wrote:

>> On Mon, May 11, 2015 at 9:13 PM, Roberto Ostinelli <
>> > wrote:
>>
>> >/  In non-erlang systems, I would have standard watchdogs that
>> launch an
>> />/  application on OS boot, and then monitor it and relaunch it if
>> necessary.
>> /
>>
>> The heart system in Erlang is a simple watchdog, mostly used if you
>> nothing
>> else that will restart your application. In an SysV init system,
>> there is
>> no automatic watching and restart. In RcNG in FreeBSD, there is no
>> restart.
>> In OpenBSDs rc, there is no automatic restart.
>>
>
> Wait a minute - isn't that what respawn does in a SysV init environment?
>
> Of course that only works if the VM well and truly dies.  If it locks
> up, you still have a problem.
>
> Anybody see a reason you couldn't:
> 1. start BEAM with respawn
> 2. start a separate watchdog process that periodically runs a few
> tests to see if BEAM is running properly,
> and if not, KILLs the process - at which point respawn would take care
> of a restart.
>
> Miles Fidelman
>
>

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: HEART

zxq9-2
On 2015年5月12日 火曜日 19:59:55 Michael L Martin wrote:
> But who watches the watchdog?

Depends on which type you're talking about.
Some (most?) watchdog/procdoc type systems start two processes that watch each other in addition to the target process(es).

When writing these for my own daemons I usually make an "undead mode" where a monitor daemon is created to watch the service daemon, and the service daemon itself acts as the monitor for the monitor daemon.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: HEART

Roberto Ostinelli
In reply to this post by Roger Lipscombe-2
Hi,
Still experiencing weirdnesses though.

My upstart script is:

script
  export HOME=/root
  cd /usr/local/myapp
    exec bin/myapp foreground > /dev/null 2>&1
end script


When I start to attach to the app's node, I get:

$ /usr/local/myapp/bin/myapp attach
pong
Can't access pipe directory /tmp//usr/local/myapp/: No such file or directory


However, if I start my app manually:

$ /usr/local/myapp/bin/myapp start

Then everything works fine:

$ /usr/local/cometa/bin/cometa attach
pong
Attaching to /tmp//usr/local/myapp/erlang.pipe.1 (^D to exit)

(myapp@myapp.example.com)1>


Can some kind soul explain to me what is going on?

Thank you,
r.







On Tue, May 12, 2015 at 8:44 PM, Roger Lipscombe <[hidden email]> wrote:
On 12 May 2015 at 18:45, Roberto Ostinelli
<[hidden email]> wrote:
> Right. Unfortunately I can't find a way to oass this pid to the original script that starts it (using upstart).

We use relx-generated releases with upstart. Simply run "bin/myapp
foreground" from the upstart script.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: HEART

dmkolesnikov
Hi,

You are using ‘start’ command when you start node manually but upstart script uses ‘foreground’. 
Try to use start in both places. I think foreground bypassed pipe creation.    

- Dmitry

On 13 May 2015, at 14:41, Roberto Ostinelli <[hidden email]> wrote:

Hi,
Still experiencing weirdnesses though.

My upstart script is:

script
  export HOME=/root
  cd /usr/local/myapp
    exec bin/myapp foreground > /dev/null 2>&1
end script


When I start to attach to the app's node, I get:

$ /usr/local/myapp/bin/myapp attach
pong
Can't access pipe directory /tmp//usr/local/myapp/: No such file or directory


However, if I start my app manually:

$ /usr/local/myapp/bin/myapp start

Then everything works fine:

$ /usr/local/cometa/bin/cometa attach
pong
Attaching to /tmp//usr/local/myapp/erlang.pipe.1 (^D to exit)

(myapp@myapp.example.com)1>


Can some kind soul explain to me what is going on?

Thank you,
r.







On Tue, May 12, 2015 at 8:44 PM, Roger Lipscombe <[hidden email]> wrote:
On 12 May 2015 at 18:45, Roberto Ostinelli
<[hidden email]> wrote:
> Right. Unfortunately I can't find a way to oass this pid to the original script that starts it (using upstart).

We use relx-generated releases with upstart. Simply run "bin/myapp
foreground" from the upstart script.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: HEART

Tristan Sloughter-4
In reply to this post by Roberto Ostinelli
Use remote_console instead of attach.
 
--
Tristan Sloughter
 
 
 
On Wed, May 13, 2015, at 06:41 AM, Roberto Ostinelli wrote:
Hi,
Still experiencing weirdnesses though.
 
My upstart script is:
 
script
export HOME=/root
cd /usr/local/myapp
    exec bin/myapp foreground > /dev/null 2>&1
end script
 
 
When I start to attach to the app's node, I get:
 
$ /usr/local/myapp/bin/myapp attach
pong
Can't access pipe directory /tmp//usr/local/myapp/: No such file or directory
 
 
However, if I start my app manually:
 
$ /usr/local/myapp/bin/myapp start
 
Then everything works fine:
 
$ /usr/local/cometa/bin/cometa attach
pong
Attaching to /tmp//usr/local/myapp/erlang.pipe.1 (^D to exit)
 
(myapp@myapp.example.com)1>
 
 
Can some kind soul explain to me what is going on?
 
Thank you,
r.
 
 
 
 
 
 
 
On Tue, May 12, 2015 at 8:44 PM, Roger Lipscombe <[hidden email]> wrote:
On 12 May 2015 at 18:45, Roberto Ostinelli
<[hidden email]> wrote:
> Right. Unfortunately I can't find a way to oass this pid to the original script that starts it (using upstart).

We use relx-generated releases with upstart. Simply run "bin/myapp
foreground" from the upstart script.
 

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: HEART

Roberto Ostinelli
Ok,
After a while of experimenting I want to provide some feedback to the great help that was provided here.
I've come up with a init.d script (currently running on Ubuntu) that provides me with what I needed.

As a reminder:
  • I have an Erlang 17.4 release generated with rebar.
  • I want to have the release started on system boot.
  • I want to the VM monitored and restarted it if it crashes.

First, ensure that the `-heart` option is used in your `vm.args` file. Heart will monitor the VM and restart it if needed.

Second, create the file `/etc/init.d/myapp`:


########################################################################

#!/usr/bin/env bash
# myapp daemon
# chkconfig: 345 20 80
# description: myapp daemon
# processname: myapp

NAME=myapp
PROJECT_ROOT_PATH=/usr/local/$NAME
APP_SCRIPT="bin/$NAME"

# export
export HOME=/root

case "$1" in
start)
    printf "%-50s" "Starting $NAME..."

    # start
    cd $PROJECT_ROOT_PATH
    $APP_SCRIPT start > /dev/null 2>&1;

    # wait for pid
    for (( i=0; i<10; ++i )); do
        OUT=`$APP_SCRIPT getpid`;
        if [ $? == 0 ]; then PID=$OUT; break; fi
        sleep 1;
    done

    if [ -z "$PID" ]; then
        printf "%s\n" "Failsd"
    else
        printf "%s\n" "Ok"
    fi
;;
status)
    printf "%-50s" "Checking $NAME..."

    # wait for pid
    cd $PROJECT_ROOT_PATH
    $APP_SCRIPT getpid > /dev/null 2>&1;

    if [ $? != 0 ]; then
        printf "%s\n" "Node is not running!"
    else
        printf "%s\n" "Ok"
    fi
;;
stop)
    printf "%-50s" "Stopping $NAME..."

    # cd and stop
    cd $PROJECT_ROOT_PATH
    $APP_SCRIPT stop > /dev/null 2>&1;

    if [ $? != 0 ]; then
        printf "%s\n" "Node is not running!"
    else
        printf "%s\n" "Ok"
    fi
;;

restart)
    $0 stop
    $0 start
;;

*)
    echo "Usage: $0 {status|start|stop|restart}"
    exit 1
esac


########################################################################

You can use this file as normal services:

```
$ sudo service cometa start
Starting myapp...                                Ok
```


Third, ensure this script is used at boot time:

`sudo update-rc.d myapp defaults`



Side note: you can see that the script waits to exit the start function until the PID is retrieved from the VM.
This is not strictly necessary, although in this way you can even consider dumping it into PID files or perform other types of monitoring actions instead of using HEART.


Hope this helps someone in my same spot.

Best,
r.




On Wed, May 13, 2015 at 3:54 PM, Tristan Sloughter <[hidden email]> wrote:
Use remote_console instead of attach.
 
--
Tristan Sloughter
 
 
 
On Wed, May 13, 2015, at 06:41 AM, Roberto Ostinelli wrote:
Hi,
Still experiencing weirdnesses though.
 
My upstart script is:
 
script
export HOME=/root
cd /usr/local/myapp
    exec bin/myapp foreground > /dev/null 2>&1
end script
 
 
When I start to attach to the app's node, I get:
 
$ /usr/local/myapp/bin/myapp attach
pong
Can't access pipe directory /tmp//usr/local/myapp/: No such file or directory
 
 
However, if I start my app manually:
 
$ /usr/local/myapp/bin/myapp start
 
Then everything works fine:
 
$ /usr/local/cometa/bin/cometa attach
pong
Attaching to /tmp//usr/local/myapp/erlang.pipe.1 (^D to exit)
 
(myapp@myapp.example.com)1>
 
 
Can some kind soul explain to me what is going on?
 
Thank you,
r.
 
 
 
 
 
 
 
On Tue, May 12, 2015 at 8:44 PM, Roger Lipscombe <[hidden email]> wrote:
On 12 May 2015 at 18:45, Roberto Ostinelli
<[hidden email]> wrote:
> Right. Unfortunately I can't find a way to oass this pid to the original script that starts it (using upstart).

We use relx-generated releases with upstart. Simply run "bin/myapp
foreground" from the upstart script.
 


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: HEART

Daniil Churikov-2
Sorry Roberto, forgot to cc the list.

If you will use heart be aware of OOM. We had some issues with it,
in essence OOM kills both heart and VM.

2015-05-13 15:08 GMT+01:00 Roberto Ostinelli <[hidden email]>:
Ok,
After a while of experimenting I want to provide some feedback to the great help that was provided here.
I've come up with a init.d script (currently running on Ubuntu) that provides me with what I needed.

As a reminder:
  • I have an Erlang 17.4 release generated with rebar.
  • I want to have the release started on system boot.
  • I want to the VM monitored and restarted it if it crashes.

First, ensure that the `-heart` option is used in your `vm.args` file. Heart will monitor the VM and restart it if needed.

Second, create the file `/etc/init.d/myapp`:


########################################################################

#!/usr/bin/env bash
# myapp daemon
# chkconfig: 345 20 80
# description: myapp daemon
# processname: myapp

NAME=myapp
PROJECT_ROOT_PATH=/usr/local/$NAME
APP_SCRIPT="bin/$NAME"

# export
export HOME=/root

case "$1" in
start)
    printf "%-50s" "Starting $NAME..."

    # start
    cd $PROJECT_ROOT_PATH
    $APP_SCRIPT start > /dev/null 2>&1;

    # wait for pid
    for (( i=0; i<10; ++i )); do
        OUT=`$APP_SCRIPT getpid`;
        if [ $? == 0 ]; then PID=$OUT; break; fi
        sleep 1;
    done

    if [ -z "$PID" ]; then
        printf "%s\n" "Failsd"
    else
        printf "%s\n" "Ok"
    fi
;;
status)
    printf "%-50s" "Checking $NAME..."

    # wait for pid
    cd $PROJECT_ROOT_PATH
    $APP_SCRIPT getpid > /dev/null 2>&1;

    if [ $? != 0 ]; then
        printf "%s\n" "Node is not running!"
    else
        printf "%s\n" "Ok"
    fi
;;
stop)
    printf "%-50s" "Stopping $NAME..."

    # cd and stop
    cd $PROJECT_ROOT_PATH
    $APP_SCRIPT stop > /dev/null 2>&1;

    if [ $? != 0 ]; then
        printf "%s\n" "Node is not running!"
    else
        printf "%s\n" "Ok"
    fi
;;

restart)
    $0 stop
    $0 start
;;

*)
    echo "Usage: $0 {status|start|stop|restart}"
    exit 1
esac


########################################################################

You can use this file as normal services:

```
$ sudo service cometa start
Starting myapp...                                Ok
```


Third, ensure this script is used at boot time:

`sudo update-rc.d myapp defaults`



Side note: you can see that the script waits to exit the start function until the PID is retrieved from the VM.
This is not strictly necessary, although in this way you can even consider dumping it into PID files or perform other types of monitoring actions instead of using HEART.


Hope this helps someone in my same spot.

Best,
r.




On Wed, May 13, 2015 at 3:54 PM, Tristan Sloughter <[hidden email]> wrote:
Use remote_console instead of attach.
 
--
Tristan Sloughter
 
 
 
On Wed, May 13, 2015, at 06:41 AM, Roberto Ostinelli wrote:
Hi,
Still experiencing weirdnesses though.
 
My upstart script is:
 
script
export HOME=/root
cd /usr/local/myapp
    exec bin/myapp foreground > /dev/null 2>&1
end script
 
 
When I start to attach to the app's node, I get:
 
$ /usr/local/myapp/bin/myapp attach
pong
Can't access pipe directory /tmp//usr/local/myapp/: No such file or directory
 
 
However, if I start my app manually:
 
$ /usr/local/myapp/bin/myapp start
 
Then everything works fine:
 
$ /usr/local/cometa/bin/cometa attach
pong
Attaching to /tmp//usr/local/myapp/erlang.pipe.1 (^D to exit)
 
(myapp@myapp.example.com)1>
 
 
Can some kind soul explain to me what is going on?
 
Thank you,
r.
 
 
 
 
 
 
 
On Tue, May 12, 2015 at 8:44 PM, Roger Lipscombe <[hidden email]> wrote:
On 12 May 2015 at 18:45, Roberto Ostinelli
<[hidden email]> wrote:
> Right. Unfortunately I can't find a way to oass this pid to the original script that starts it (using upstart).

We use relx-generated releases with upstart. Simply run "bin/myapp
foreground" from the upstart script.
 


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: HEART

Matthias Lang
In reply to this post by Michael L Martin
On 13. May 2015, Michael L Martin wrote:

> But who watches the watchdog?

Late reply. One approach is to have a hardware watchdog.

On the embedded system I work on, Erlang kicks a custom 'heart'
program.  The heart program kicks the hardware watchdog.

Hangs in Erlang code are dealt with by timeouts and supervisors.
Hangs in the VM are dealt with by 'heart'.
Hangs in 'heart' are dealt with by the hardware watchdog.

This approach is sufficient to make hangs an insignificant contributor
to downtime in a five-nines environment (signalling in the SS7
network), in my experience.

Matt
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: HEART

Dominic Letz
Even later reply. To chime in on Danills comment. It is a linux specific problem that the kernel OOM killer will go ahead and kill the whole process tree meaning Erlang and Heart at the same time because heart is a child process of erl.

For this reason I would strongly advise against using heart on linux based systems at this time. One option is using supervisord which is very easy to configure and use with erlang - and does not have the above problem.

Here is a complete sample configuration with '...' as placeholders for your deployment:

/etc/supervisord/conf.d/erl.conf:
[program:erl]
autorestart=true
command=/usr/bin/erl -noinput -noshell +K true -boot ... -config ... 
directory=...
environment=HOME=...
redirect_stderr=true
user=...


Best


On Sun, May 24, 2015 at 4:33 AM, Matthias Lang <[hidden email]> wrote:
On 13. May 2015, Michael L Martin wrote:

> But who watches the watchdog?

Late reply. One approach is to have a hardware watchdog.

On the embedded system I work on, Erlang kicks a custom 'heart'
program.  The heart program kicks the hardware watchdog.

Hangs in Erlang code are dealt with by timeouts and supervisors.
Hangs in the VM are dealt with by 'heart'.
Hangs in 'heart' are dealt with by the hardware watchdog.

This approach is sufficient to make hangs an insignificant contributor
to downtime in a five-nines environment (signalling in the SS7
network), in my experience.

Matt
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



--
Dominic Letz
Director of R&D


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
12