Stopping a master process and all its workers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Stopping a master process and all its workers

Torben Hoffmann
Hi,

I am a bit in doubt about what would be the cleanest way to stop a part of my supervision tree.

The tree looks like this:
top_sup
  ├─ master
  └─ worker_sup
       └─ worker

I have multiple instances of the top_sup supervisor, one for each master.
So when I need to stop a master and all its workers I would have to stop its top_sup and everything below it.

Controlled stopping needs to be different from a crash as a controlled stop should remove some persistent data for the master and workers.

Should I just do an exit(top_sup, normal) for the controlled stop?
Or should I implement a stop function all the way down?
Are there any subtleties that I need to cater for? Have I given enough information for this question to make sense?

Either will do the job. Just wondering what experience others have on this.

Cheers,
Torben

--

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Stopping a master process and all its workers

YD Jiang
yes just exit the top supervisor. any cleanup can be done in {:EXIT ...} message, or handle_info({:EXIT ...} if a gen_server.

On Thu, Apr 12, 2018 at 10:47 PM Torben Hoffmann <[hidden email]> wrote:
Hi,

I am a bit in doubt about what would be the cleanest way to stop a part of my supervision tree.

The tree looks like this:
top_sup
  ├─ master
  └─ worker_sup
       └─ worker

I have multiple instances of the top_sup supervisor, one for each master.
So when I need to stop a master and all its workers I would have to stop its top_sup and everything below it.

Controlled stopping needs to be different from a crash as a controlled stop should remove some persistent data for the master and workers.

Should I just do an exit(top_sup, normal) for the controlled stop?
Or should I implement a stop function all the way down?
Are there any subtleties that I need to cater for? Have I given enough information for this question to make sense?

Either will do the job. Just wondering what experience others have on this.

Cheers,
Torben

--
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Stopping a master process and all its workers

Jesper Louis Andersen-2
In reply to this post by Torben Hoffmann
On Thu, Apr 12, 2018 at 4:47 PM Torben Hoffmann <[hidden email]> wrote:
Are there any subtleties that I need to cater for? Have I given enough information for this question to make sense?


Yes:

* What is the API accessing this tree? If we start stopping the tree, how are those API calls going to behave while the tree is being closed down?

* Many such trees needs some kind of "connection draining phase" where they finish their current work, but doesn't start up new work while they are being drained.

* If you dynamically start/stop workers, then you might be able to set the number of workers to the special case of 0 and then stop the tree.

* Surely, there is a supervisor on top of `top_sup` and it it the one who needs to terminate its child. Consider that some supervisor in your application has to be "permanent/persistent" over the lifetime of the application, so you always have a point to which you can "hang" your workers. This allows you to use supervisor:terminate_child/2, but do note its documentation about restarting: your child is likely to be temporary, which means you need to have some kind of management for this if restarts happen in the system.

* Dynamic alteration of the state should be logged: "worker state was changed from 8 workers to 0", but it shouldn't report such an event as an ERROR in the syslog sense. This is INFO/NOTICE level.

Final important comment:

Do extensive tests of the failure scenario! Graceful recovery is nice, but if you don't test it somewhat, you are essentially sacrificing a goat on the altar of the god of your choice and you pray to said god that things end up being nice for you.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Stopping a master process and all its workers

Torben Hoffmann
I omitted a detail: all of the processes are proxies for external resources that they manage, ie, they simply have to start and stop them and respond to monitoring events from the resources.
So no real work is actually being done in the processes.
This simplifies things and I should have added that in the first place.

I have a supervisor above the top_sup and that is indeed the one that will kill top_sup - I framed the question to get a focus on what happens from top_sup and down.

Given that my "worker" monitor external resources they are all transient - if my program crashes the external resources may be around after I restart, so I am currently building persistence to handle this.

All of this will be tested quite heavily. The correspondence to the external resources will be funny to deal with, eg, what if an external resource has died while my program was doing a reset? Fun times ahead.

Cheers,
Torben

p.s. sorry about the top reply, but Gmail's Inbox has removed that feature or I'm too stupid to figure it out.

On Thu, Apr 12, 2018 at 5:20 PM Jesper Louis Andersen <[hidden email]> wrote:
On Thu, Apr 12, 2018 at 4:47 PM Torben Hoffmann <[hidden email]> wrote:
Are there any subtleties that I need to cater for? Have I given enough information for this question to make sense?


Yes:

* What is the API accessing this tree? If we start stopping the tree, how are those API calls going to behave while the tree is being closed down?

* Many such trees needs some kind of "connection draining phase" where they finish their current work, but doesn't start up new work while they are being drained.

* If you dynamically start/stop workers, then you might be able to set the number of workers to the special case of 0 and then stop the tree.

* Surely, there is a supervisor on top of `top_sup` and it it the one who needs to terminate its child. Consider that some supervisor in your application has to be "permanent/persistent" over the lifetime of the application, so you always have a point to which you can "hang" your workers. This allows you to use supervisor:terminate_child/2, but do note its documentation about restarting: your child is likely to be temporary, which means you need to have some kind of management for this if restarts happen in the system.

* Dynamic alteration of the state should be logged: "worker state was changed from 8 workers to 0", but it shouldn't report such an event as an ERROR in the syslog sense. This is INFO/NOTICE level.

Final important comment:

Do extensive tests of the failure scenario! Graceful recovery is nice, but if you don't test it somewhat, you are essentially sacrificing a goat on the altar of the god of your choice and you pray to said god that things end up being nice for you.


--

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions