supervisor:restart_child/2

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

supervisor:restart_child/2

Sean Hinde-2
Ulf>

> When I call
>
> supervisor:terminate_child(ThisSuper, ThatChild),
> supervisor:restart_child(ThisSuper, ThatChild).
>
> Then the supervisor doesn't count this towards the restart
> frequency. That is, I can do this forever, and the supervisor will
> never escalate.
>

Seems reasonable to me. Presumably the assumption is that as you are
explicitly calling these functions you are in control of things therefore it
is not a fault condition. Is it?

- Sean



NOTICE AND DISCLAIMER:
This email (including attachments) is confidential.  If you have received
this email in error please notify the sender immediately and delete this
email from your system without copying or disseminating it or placing any
reliance upon its contents.  We cannot accept liability for any breaches of
confidence arising through use of email.  Any opinions expressed in this
email (including attachments) are those of the author and do not necessarily
reflect our opinions.  We will not accept responsibility for any commitments
made by our employees outside the scope of our business.  We do not warrant
the accuracy or completeness of such information.




Reply | Threaded
Open this post in threaded view
|

supervisor:restart_child/2

Ulf Wiger-4
On Fri, 23 Feb 2001, Sean Hinde wrote:

>Ulf>
>
>> When I call
>>
>> supervisor:terminate_child(ThisSuper, ThatChild),
>> supervisor:restart_child(ThisSuper, ThatChild).
>>
>> Then the supervisor doesn't count this towards the restart
>> frequency. That is, I can do this forever, and the supervisor will
>> never escalate.
>
>Seems reasonable to me. Presumably the assumption is that as you are
>explicitly calling these functions you are in control of things
>therefore it is not a fault condition. Is it?

Well, I'm not sure... (:

The reason I even care is that I have re-written supervisor so that
it is able to tell the child how many times it has restarted, and
by extension, whether it is starting for the first time, or whether
it is, for example, an escalated restart.

In this context, what does it mean when someone explicitly
terminates a child and restarts it? Shouldn't I update the count of
how many times it's restarted? And if I do, shouldn't I terminate
the supervisor if the restart intensity is exceeded?

One alternative is to pretend as if nothing has happened. This will,
of course, not fool the child -- it will know that _something_
happened...

Another alternative is to reset the restart count. This might have
surprising effects.

My guess is that if you're going to explicitly terminate and restart a
child, you're probably only going to do it once in a fortnight or so,
so it most likely won't matter. So, I made restart_child/2 also update
the restart count. This means that it can also trigger an escalated
restart.



My hacked supervisor.erl and the hacked behaviour modules seem to
work just fine. I can't post them, as the archive is too large, but if
anyone wants to check them out, let me know.

BTW, I've also rewritten the supervisor to use monitor instead of
links (actually, it uses both), so now a child can't mess up the
supervision by explicitly unlinking.


/Uffe
--
Ulf Wiger                                    tfn: +46  8 719 81 95
Senior System Architect                      mob: +46 70 519 81 95
Strategic Product & System Management    ATM Multiservice Networks
Data Backbone & Optical Services Division      Ericsson Telecom AB




Reply | Threaded
Open this post in threaded view
|

supervisor:restart_child/2 (throttling)

Pascal Brisset
Ulf Wiger writes:
 > The reason I even care is that I have re-written supervisor so that
 > it is able to tell the child how many times it has restarted, and
 > by extension, whether it is starting for the first time, or whether
 > it is, for example, an escalated restart.

I would support some extensions to supervisor.erl too.

Suppose a supervised server tries to acquire some resource when it
starts, and crashes or terminates if it can't. If the resource is
unavailable, we don't really want the child to be shut down with
reached_max_restart_intensity.  The child can avoid this by waiting
for a fixed time on startup or before crashing, but then the
availability of the server would be less than optimal (if the child
has been running correctly for some time, we should try to restart it
immediately, and begin to delay the restarts only if it keeps crashing).

Letting the child know that it is having restart problems would
definitely helps. It would be even better if the supervisor itself
could be configured to throttle the child down, i.e. regulate its
restart frequency with some kind of exponential back-off.

-- Pascal

--- supervisor.erl Thu Dec 14 09:25:59 2000
+++ rsupervisor.erl     Sun Jan 21 20:51:28 2001
@@ -445,9 +462,11 @@
        {ok, NState} ->
            restart(NState#state.strategy, Child, NState);
        {terminate, NState} ->
-           report_error(shutdown, reached_max_restart_intensity,
+           report_error(regulate, restart_delayed,
                         Child, State#state.name),
-           {shutdown, remove_child(Child, NState)}
+           %% We should use a smarter delay here, and sleep asynchronously.
+           receive after State#state.period * 1000 -> ok end,
+           restart(Child, State)
     end.