simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Oliver Korpilla
Hello.

I have transient workers under a simple_one_for_one supervisor. I start them dynamically through start_child/2.

How exactly do these get restarted? With the same parameter list I supplied when first starting them?

Concrete example:

I have a central process listening to a TCP port, then accepting a connection, creating a worker for it through a supervisor, and then using gen_tcp:controlling_process/2 to transfer control of that new socket to the worker. The worker starts with one parameter: the socket.

Now lets say the worker crashes on unexpected input and gets restarted. Is the socket now closed? Will the same socket again be provided to the process on restart?

Thanks,
Oliver
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Chandru-4
Hi Oliver,

There is no point supervising a process which is handling a TCP connection as even if the supervisor restarts this process, the socket will not be reinstated. You just have to spawn a new process which will stay alive as long as the connection is active. Once the TCP connection is lost, the process dies, and a new process is spawned when the client reconnects.


cheers,
Chandru

On 13 May 2016 at 14:25, Oliver Korpilla <[hidden email]> wrote:
Hello.

I have transient workers under a simple_one_for_one supervisor. I start them dynamically through start_child/2.

How exactly do these get restarted? With the same parameter list I supplied when first starting them?

Concrete example:

I have a central process listening to a TCP port, then accepting a connection, creating a worker for it through a supervisor, and then using gen_tcp:controlling_process/2 to transfer control of that new socket to the worker. The worker starts with one parameter: the socket.

Now lets say the worker crashes on unexpected input and gets restarted. Is the socket now closed? Will the same socket again be provided to the process on restart?

Thanks,
Oliver
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Ryan Stewart
Oliver: there is no restart for children under a simple 1-1 supervisor as documented under Supervision Principles at http://erlang.org/doc/man/supervisor.html#id243029

Otoh, I have to strongly disagree with Chandru about there being "no point" in supervising a connection process. Depending on the nature of your application, there can be a *huge* benefit in supervision, which is orderly shutdown. With a proper supervisory tree, you can ensure that all connection processes have a chance to finish their work and shut down cleanly when you stop your application.

On Fri, May 13, 2016 at 8:47 AM Chandru <[hidden email]> wrote:
Hi Oliver,

There is no point supervising a process which is handling a TCP connection as even if the supervisor restarts this process, the socket will not be reinstated. You just have to spawn a new process which will stay alive as long as the connection is active. Once the TCP connection is lost, the process dies, and a new process is spawned when the client reconnects.


cheers,
Chandru

On 13 May 2016 at 14:25, Oliver Korpilla <[hidden email]> wrote:
Hello.

I have transient workers under a simple_one_for_one supervisor. I start them dynamically through start_child/2.

How exactly do these get restarted? With the same parameter list I supplied when first starting them?

Concrete example:

I have a central process listening to a TCP port, then accepting a connection, creating a worker for it through a supervisor, and then using gen_tcp:controlling_process/2 to transfer control of that new socket to the worker. The worker starts with one parameter: the socket.

Now lets say the worker crashes on unexpected input and gets restarted. Is the socket now closed? Will the same socket again be provided to the process on restart?

Thanks,
Oliver
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Chandru-4
On 14 May 2016 at 00:59, Ryan Stewart <[hidden email]> wrote:
Oliver: there is no restart for children under a simple 1-1 supervisor as documented under Supervision Principles at http://erlang.org/doc/man/supervisor.html#id243029

Otoh, I have to strongly disagree with Chandru about there being "no point" in supervising a connection process. Depending on the nature of your application, there can be a *huge* benefit in supervision, which is orderly shutdown. With a proper supervisory tree, you can ensure that all connection processes have a chance to finish their work and shut down cleanly when you stop your application.

Cleanup can be done in the terminate callback if you use process_flag(trap_exit, true). You need to supervise a process primarily if you want it *restarted*, not for cleanup. Yes, you do have orderly shutdown mechanisms if the number of restarts exceed a "normal" value, but that is not the situation here. 

Oliver has a server process to which clients connect. He does not need a supervisor for this because it is up to the client to reconnect if a connection is lost.

cheers,
Chandru

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Loïc Hoguin-3
On 05/14/2016 09:46 AM, Chandru wrote:

> On 14 May 2016 at 00:59, Ryan Stewart <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Oliver: there is no restart for children under a simple 1-1
>     supervisor as documented under Supervision Principles at
>     http://erlang.org/doc/man/supervisor.html#id243029
>
>     Otoh, I have to strongly disagree with Chandru about there being "no
>     point" in supervising a connection process. Depending on the nature
>     of your application, there can be a *huge* benefit in supervision,
>     which is orderly shutdown. With a proper supervisory tree, you can
>     ensure that all connection processes have a chance to finish their
>     work and shut down cleanly when you stop your application.
>
>
> Cleanup can be done in the terminate callback if you use
> process_flag(trap_exit, true). You need to supervise a process primarily
> if you want it *restarted*, not for cleanup. Yes, you do have orderly
> shutdown mechanisms if the number of restarts exceed a "normal" value,
> but that is not the situation here.

This is bad advice. Cleanup should be done in a separate process,
otherwise sending a kill signal sent to the wrong process will leave
things dirty.

> Oliver has a server process to which clients connect. He does not need a
> supervisor for this because it is up to the client to reconnect if a
> connection is lost.

This is incorrect.

The primary function of supervisors is to provide a hierarchy of
processes belonging to an application. With such a hierarchy it becomes
possible to find and query information about any single process in your
system in a standard manner.

Secondary functions of supervisors include restarting, reporting (with
SASL) and upgrades. But they're entirely optional.

Having processes run without being attached to a supervisor is a big
mistake, because they either become invisible, or force you to implement
a non-standard way to find them. This makes introspecting a system more
complex than it should be.

The only acceptable use for non-supervised processes is to perform
asynchronous calls to functions. Those processes however should be
relatively short-lived, and you should really make sure that they can't
get stuck indefinitely.

--
Loïc Hoguin
http://ninenines.eu
Author of The Erlanger Playbook,
A book about software development using Erlang
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Jesper Louis Andersen-2
In reply to this post by Chandru-4

On Sat, May 14, 2016 at 9:46 AM, Chandru <[hidden email]> wrote:
Cleanup can be done in the terminate callback if you use process_flag(trap_exit, true).

There is a subtlety here which, in some circumstances, creates trouble and one has to be aware of. This is in addition to what Loïc mentions.

Observation: In many systems, processes have dependencies and needs each other to operate. Nice cleanup cannot be done in the situation where your immediate dependencies are gone, but under a controlled shutdown, you can arrange the shutdown order such that the shutdown is graceful.

Example: You are processing some kind of request in a web server. You decide to close down the application. Now, what you want to happen is a graceful shutdown: you stop accepting new requests, but you run the current requests to an end (up to some timeout). In settings where the server count is very dynamic and servers can get "elastically" added or removed, this is important because otherwise you would be losing requests under shutdown.

Observation: processes under supervision have an ordering imposed on them. Their termination happens in the opposite order of spawning. This can be used to enforce the gracefulness constraint.

Another trick, which works cross-application, is to use the Module:prep_stop/1 phase of termination to tell your other applications (ranch, cowboy, yaws) that they should start graceful termination of their workers. Once completed, you can stop your own supervision tree.

"Naked" processes which are not linked into a supervision tree can terminate in any order they see fit. In particular, the Erlang VM regards itself as being "done" once all its applications are shut down. In particular a line process with process_flag(trap_exit, true) waiting on disk is not going to be allowed to hold up termination of the VM. A naked process has its use, but one has to be extremely careful around them. If they start failing, you have no easy way to know what their relation is.

--
J.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Oliver Korpilla
Hello,

and thank you all for your responses.

I originally adopted simple_one_for_one supervisor because I had a problem with how other supervisors clean up processes.

For the TCP connectors simple_one_for_one will be fine. As noted by others, they cannot really come back unless they reconnect, so that is fine. So, a simple_one_for_one supervisor acts like every child, regardless of child spec, as if it was temporary?

I have another big batch of processes independent of the connectors. These serve individual requests emanating from the TCP layer, where an ID establishes which handler belongs to which batch of messages (i.e. each TCP payload contains an ID in its own proprietary header). Now, I originally saw these as transient workers I would like to have restarted, but since they are stateless and can be created on demand, I either can supervise them simple_one_for_one (and create them on demand when the one for a given ID is missing) or I can create them as transient children under a one_for_one and let that restart it on a crash.

I originally went for simple_one_for_one because of the better performance and because it cleans up children after they terminate. I guess in case of one_for_one I have to clean up all children which shut down normally by calling terminate_child and delete_child on them. (I originally hoped one_for_one would do this if a child exited normally, but either I bungled my tests or it simply doesn't, even for transient children).

Any recommendations?

Thanks,
Oliver
 

Gesendet: Samstag, 14. Mai 2016 um 12:48 Uhr
Von: "Jesper Louis Andersen" <[hidden email]>
An: Chandru <[hidden email]>
Cc: "Erlang-Questions Questions" <[hidden email]>
Betreff: Re: [erlang-questions] simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

 
On Sat, May 14, 2016 at 9:46 AM, Chandru <[hidden email]> wrote:Cleanup can be done in the terminate callback if you use process_flag(trap_exit, true).
There is a subtlety here which, in some circumstances, creates trouble and one has to be aware of. This is in addition to what Loïc mentions.

Observation: In many systems, processes have dependencies and needs each other to operate. Nice cleanup cannot be done in the situation where your immediate dependencies are gone, but under a controlled shutdown, you can arrange the shutdown order such that the shutdown is graceful.
 
Example: You are processing some kind of request in a web server. You decide to close down the application. Now, what you want to happen is a graceful shutdown: you stop accepting new requests, but you run the current requests to an end (up to some timeout). In settings where the server count is very dynamic and servers can get "elastically" added or removed, this is important because otherwise you would be losing requests under shutdown.
 
Observation: processes under supervision have an ordering imposed on them. Their termination happens in the opposite order of spawning. This can be used to enforce the gracefulness constraint.
 
Another trick, which works cross-application, is to use the Module:prep_stop/1 phase of termination to tell your other applications (ranch, cowboy, yaws) that they should start graceful termination of their workers. Once completed, you can stop your own supervision tree.
 
"Naked" processes which are not linked into a supervision tree can terminate in any order they see fit. In particular, the Erlang VM regards itself as being "done" once all its applications are shut down. In particular a line process with process_flag(trap_exit, true) waiting on disk is not going to be allowed to hold up termination of the VM. A naked process has its use, but one has to be extremely careful around them. If they start failing, you have no easy way to know what their relation is.

 --
J._______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions[http://erlang.org/mailman/listinfo/erlang-questions]
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Garrett Smith-5
On Sat, May 14, 2016 at 5:21 AM, Oliver Korpilla <[hidden email]> wrote:
> Hello,
>
> and thank you all for your responses.
>
> I originally adopted simple_one_for_one supervisor because I had a problem with how other supervisors clean up processes.
>
> For the TCP connectors simple_one_for_one will be fine. As noted by others, they cannot really come back unless they reconnect, so that is fine. So, a simple_one_for_one supervisor acts like every child, regardless of child spec, as if it was temporary?
>
> I have another big batch of processes independent of the connectors. These serve individual requests emanating from the TCP layer, where an ID establishes which handler belongs to which batch of messages (i.e. each TCP payload contains an ID in its own proprietary header). Now, I originally saw these as transient workers I would like to have restarted, but since they are stateless and can be created on demand, I either can supervise them simple_one_for_one (and create them on demand when the one for a given ID is missing) or I can create them as transient children under a one_for_one and let that restart it on a crash.

If these processes only ever act on behalf of the TCP connection,
consider not using them at all. Just let the TCP connections do the
work.

Processes should correspond to _real world_ independent threads of
execution, not mental abstractions.

If you do have separate threads of execution (e.g. TCP connection is
providing updates to the client while it waits on these spawned
workers) use a separate simple_one_for_one (sofo) supervisor for the
workers and link your connection/worker processes.

> I originally went for simple_one_for_one because of the better performance and because it cleans up children after they terminate. I guess in case of one_for_one I have to clean up all children which shut down normally by calling terminate_child and delete_child on them. (I originally hoped one_for_one would do this if a child exited normally, but either I bungled my tests or it simply doesn't, even for transient children).

If you're ever routinely "cleaning up" after a supervisor, it's a bad
sign. Configure (one-time init payload) your supervisors and let them
do their thing. If you're accumulating a lot of terminated child
processes, you want a sofo supervisor.

> Any recommendations?

It sounds like you're motivated to get a "restart" scenario here. What
is your goal from the end-user (client of your app) point of view
here? Without a specific goal that you understand and can defend, your
default approach I think is always crash - and let the client
reestablish a connection.

Some worthy goals:

- Don't abruptly close the connection but return a well formed error
(e.g. HTTP 500, etc.)
- Handle specific well understood error conditions with limited
retries (e.g. reconnect to a database with the hope the outage is
short term)
- Tell the client to retry a different end-point (e.g. HTTP 302)

Each of these needs goals needs to be implemented - you're not going
to get any of them with a supervisor process restart. Short of a
worthy goal, just crash, maintaining your system integrity for
processing new connections, and rely on the client (outside your
system) to perform the "restart".
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Oliver Korpilla
Hello, Garrett.
 
The TCP layer is a stand-in for a real, more complex protocol stack with very different characteristics. I assure you the abstractions are absolutely necessary and map real-world entities being served in a system quite more complex than a TCP server. While the project is currently at the scale of a technology demonstration it is supposed to grow into a full-fledged application. Sorry I have to be so vague.
 
Relying on the client for retries might work. My client handlers are "stateless" in that they can come back at any time from the DB and serve a request, even if they have to tell the client to abort its current operation. I have to investigate the behavior of the given clients more. That's likely the real solution.
 
Thanks!
Oliver
 
Gesendet: Samstag, 14. Mai 2016 um 17:11 Uhr
Von: "Garrett Smith" <[hidden email]>
An: "Oliver Korpilla" <[hidden email]>
Cc: "Jesper Louis Andersen" <[hidden email]>, "Erlang-Questions Questions" <[hidden email]>
Betreff: Re: [erlang-questions] simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)
On Sat, May 14, 2016 at 5:21 AM, Oliver Korpilla <[hidden email]> wrote:
> Hello,
>
> and thank you all for your responses.
>
> I originally adopted simple_one_for_one supervisor because I had a problem with how other supervisors clean up processes.
>
> For the TCP connectors simple_one_for_one will be fine. As noted by others, they cannot really come back unless they reconnect, so that is fine. So, a simple_one_for_one supervisor acts like every child, regardless of child spec, as if it was temporary?
>
> I have another big batch of processes independent of the connectors. These serve individual requests emanating from the TCP layer, where an ID establishes which handler belongs to which batch of messages (i.e. each TCP payload contains an ID in its own proprietary header). Now, I originally saw these as transient workers I would like to have restarted, but since they are stateless and can be created on demand, I either can supervise them simple_one_for_one (and create them on demand when the one for a given ID is missing) or I can create them as transient children under a one_for_one and let that restart it on a crash.

If these processes only ever act on behalf of the TCP connection,
consider not using them at all. Just let the TCP connections do the
work.

Processes should correspond to _real world_ independent threads of
execution, not mental abstractions.

If you do have separate threads of execution (e.g. TCP connection is
providing updates to the client while it waits on these spawned
workers) use a separate simple_one_for_one (sofo) supervisor for the
workers and link your connection/worker processes.

> I originally went for simple_one_for_one because of the better performance and because it cleans up children after they terminate. I guess in case of one_for_one I have to clean up all children which shut down normally by calling terminate_child and delete_child on them. (I originally hoped one_for_one would do this if a child exited normally, but either I bungled my tests or it simply doesn't, even for transient children).

If you're ever routinely "cleaning up" after a supervisor, it's a bad
sign. Configure (one-time init payload) your supervisors and let them
do their thing. If you're accumulating a lot of terminated child
processes, you want a sofo supervisor.

> Any recommendations?

It sounds like you're motivated to get a "restart" scenario here. What
is your goal from the end-user (client of your app) point of view
here? Without a specific goal that you understand and can defend, your
default approach I think is always crash - and let the client
reestablish a connection.

Some worthy goals:

- Don't abruptly close the connection but return a well formed error
(e.g. HTTP 500, etc.)
- Handle specific well understood error conditions with limited
retries (e.g. reconnect to a database with the hope the outage is
short term)
- Tell the client to retry a different end-point (e.g. HTTP 302)

Each of these needs goals needs to be implemented - you're not going
to get any of them with a supervisor process restart. Short of a
worthy goal, just crash, maintaining your system integrity for
processing new connections, and rely on the client (outside your
system) to perform the "restart".

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Garrett Smith-5
I trust your judgement on the need for separate processes :)

But complexity doesn't justify a _process_ - just a relatively more
complex function. If the real world entities being served map to
independent _threads of execution_ in time and space, process. If not,
consider not a process. I only underscore this because it's not
uncommon to see folks use processes (a runtime construct) used to
model abstract programming logic (a design time construct at best,
often simply an emotional/mental fancy). Not suggesting you fall into
that category, just a highlight ;)

On Sat, May 14, 2016 at 9:15 AM, Oliver Korpilla <[hidden email]> wrote:

> Hello, Garrett.
>
> The TCP layer is a stand-in for a real, more complex protocol stack with
> very different characteristics. I assure you the abstractions are absolutely
> necessary and map real-world entities being served in a system quite more
> complex than a TCP server. While the project is currently at the scale of a
> technology demonstration it is supposed to grow into a full-fledged
> application. Sorry I have to be so vague.
>
> Relying on the client for retries might work. My client handlers are
> "stateless" in that they can come back at any time from the DB and serve a
> request, even if they have to tell the client to abort its current
> operation. I have to investigate the behavior of the given clients more.
> That's likely the real solution.
>
> Thanks!
> Oliver
>
> Gesendet: Samstag, 14. Mai 2016 um 17:11 Uhr
> Von: "Garrett Smith" <[hidden email]>
> An: "Oliver Korpilla" <[hidden email]>
> Cc: "Jesper Louis Andersen" <[hidden email]>,
> "Erlang-Questions Questions" <[hidden email]>
> Betreff: Re: [erlang-questions] simple_one_for_one supervisor - what happens
> at restart? (also: gen_tcp)
> On Sat, May 14, 2016 at 5:21 AM, Oliver Korpilla <[hidden email]>
> wrote:
>> Hello,
>>
>> and thank you all for your responses.
>>
>> I originally adopted simple_one_for_one supervisor because I had a problem
>> with how other supervisors clean up processes.
>>
>> For the TCP connectors simple_one_for_one will be fine. As noted by
>> others, they cannot really come back unless they reconnect, so that is fine.
>> So, a simple_one_for_one supervisor acts like every child, regardless of
>> child spec, as if it was temporary?
>>
>> I have another big batch of processes independent of the connectors. These
>> serve individual requests emanating from the TCP layer, where an ID
>> establishes which handler belongs to which batch of messages (i.e. each TCP
>> payload contains an ID in its own proprietary header). Now, I originally saw
>> these as transient workers I would like to have restarted, but since they
>> are stateless and can be created on demand, I either can supervise them
>> simple_one_for_one (and create them on demand when the one for a given ID is
>> missing) or I can create them as transient children under a one_for_one and
>> let that restart it on a crash.
>
> If these processes only ever act on behalf of the TCP connection,
> consider not using them at all. Just let the TCP connections do the
> work.
>
> Processes should correspond to _real world_ independent threads of
> execution, not mental abstractions.
>
> If you do have separate threads of execution (e.g. TCP connection is
> providing updates to the client while it waits on these spawned
> workers) use a separate simple_one_for_one (sofo) supervisor for the
> workers and link your connection/worker processes.
>
>> I originally went for simple_one_for_one because of the better performance
>> and because it cleans up children after they terminate. I guess in case of
>> one_for_one I have to clean up all children which shut down normally by
>> calling terminate_child and delete_child on them. (I originally hoped
>> one_for_one would do this if a child exited normally, but either I bungled
>> my tests or it simply doesn't, even for transient children).
>
> If you're ever routinely "cleaning up" after a supervisor, it's a bad
> sign. Configure (one-time init payload) your supervisors and let them
> do their thing. If you're accumulating a lot of terminated child
> processes, you want a sofo supervisor.
>
>> Any recommendations?
>
> It sounds like you're motivated to get a "restart" scenario here. What
> is your goal from the end-user (client of your app) point of view
> here? Without a specific goal that you understand and can defend, your
> default approach I think is always crash - and let the client
> reestablish a connection.
>
> Some worthy goals:
>
> - Don't abruptly close the connection but return a well formed error
> (e.g. HTTP 500, etc.)
> - Handle specific well understood error conditions with limited
> retries (e.g. reconnect to a database with the hope the outage is
> short term)
> - Tell the client to retry a different end-point (e.g. HTTP 302)
>
> Each of these needs goals needs to be implemented - you're not going
> to get any of them with a supervisor process restart. Short of a
> worthy goal, just crash, maintaining your system integrity for
> processing new connections, and rely on the client (outside your
> system) to perform the "restart".
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Ryan Stewart
In reply to this post by Loïc Hoguin-3
On Sat, May 14, 2016 at 3:28 AM Loïc Hoguin <[hidden email]> wrote:
The primary function of supervisors is to provide a hierarchy of
processes belonging to an application. With such a hierarchy it becomes
possible to find and query information about any single process in your
system in a standard manner.

I'm rather interested in this aspect of supervised processes. Especially in the case of a simple_one_for_one supervisor, it's unlikely that the supervised processes will be registered, and it's possible that there could be a rather large number of them--maybe in the tens or hundreds of thousands, depending on the use case. I'm curious how others deal with finding a specific temporary process if, for instance, you want to check on the progress of the work it's doing.

My current solution is to have a locally registered "manager" process as a supervised sibling to the SOFO worker supervisor, and the manager just has a dict that maps UUIDs to worker PIDs. I.e. creating a "worker" process entails both a supervisor:start_child() call and storing the worker id -> pid mapping in the manager. Is this a typical way to handle temporary workers?

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Oliver Korpilla
Hello, Ryan.

Isn't this normally the job of a registry like gproc or am I misunderstanding your requirements?

I usually start the worker dynamically through the supervisor and let it do its own registration.

Cheers,
Oliver
 

Gesendet: Samstag, 14. Mai 2016 um 18:57 Uhr
Von: "Ryan Stewart" <[hidden email]>
An: "Loïc Hoguin" <[hidden email]>, Chandru <[hidden email]>
Cc: "Erlang-Questions Questions" <[hidden email]>
Betreff: Re: [erlang-questions] simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

On Sat, May 14, 2016 at 3:28 AM Loïc Hoguin <[hidden email]> wrote:The primary function of supervisors is to provide a hierarchy of
processes belonging to an application. With such a hierarchy it becomes
possible to find and query information about any single process in your
system in a standard manner.
 
I'm rather interested in this aspect of supervised processes. Especially in the case of a simple_one_for_one supervisor, it's unlikely that the supervised processes will be registered, and it's possible that there could be a rather large number of them--maybe in the tens or hundreds of thousands, depending on the use case. I'm curious how others deal with finding a specific temporary process if, for instance, you want to check on the progress of the work it's doing.
 
My current solution is to have a locally registered "manager" process as a supervised sibling to the SOFO worker supervisor, and the manager just has a dict that maps UUIDs to worker PIDs. I.e. creating a "worker" process entails both a supervisor:start_child() call and storing the worker id -> pid mapping in the manager. Is this a typical way to handle temporary workers?_______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions[http://erlang.org/mailman/listinfo/erlang-questions]
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Ryan Stewart
Oliver, gproc looks exactly right; thanks for the pointer! I just didn't know about it, so essentially I was building my own with very limited features. Pardon my ignorance. I've been working in Erlang for a little over 2 years, and I only just feel like I have a handle on how everything fits together. I still don't know what all libraries are out there to do common jobs.

Hmm, that seems like a good subject for a fresh thread...

On Sat, May 14, 2016 at 12:16 PM Oliver Korpilla <[hidden email]> wrote:
Hello, Ryan.

Isn't this normally the job of a registry like gproc or am I misunderstanding your requirements?

I usually start the worker dynamically through the supervisor and let it do its own registration.

Cheers,
Oliver
 

Gesendet: Samstag, 14. Mai 2016 um 18:57 Uhr
Von: "Ryan Stewart" <[hidden email]>
An: "Loïc Hoguin" <[hidden email]>, Chandru <[hidden email]>
Cc: "Erlang-Questions Questions" <[hidden email]>
Betreff: Re: [erlang-questions] simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

On Sat, May 14, 2016 at 3:28 AM Loïc Hoguin <[hidden email]> wrote:The primary function of supervisors is to provide a hierarchy of
processes belonging to an application. With such a hierarchy it becomes
possible to find and query information about any single process in your
system in a standard manner.
 
I'm rather interested in this aspect of supervised processes. Especially in the case of a simple_one_for_one supervisor, it's unlikely that the supervised processes will be registered, and it's possible that there could be a rather large number of them--maybe in the tens or hundreds of thousands, depending on the use case. I'm curious how others deal with finding a specific temporary process if, for instance, you want to check on the progress of the work it's doing.
 
My current solution is to have a locally registered "manager" process as a supervised sibling to the SOFO worker supervisor, and the manager just has a dict that maps UUIDs to worker PIDs. I.e. creating a "worker" process entails both a supervisor:start_child() call and storing the worker id -> pid mapping in the manager. Is this a typical way to handle temporary workers?_______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions[http://erlang.org/mailman/listinfo/erlang-questions]

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Garrett Smith-5
I'm glossing over these responses, but the points about gproc raises
an issue I had with some earlier responses - that plugging a process
into a supervisory hierarchy lets you somehow keep track of it. I
don't think that's good advice - querying supervisors is a bad idea.
If you want to track processes, use gproc.

gproc should be in core IMO - at least the publish/discovery facility
for processes. Though maybe I've missed a recent update.

On Sat, May 14, 2016 at 12:24 PM, Ryan Stewart <[hidden email]> wrote:

> Oliver, gproc looks exactly right; thanks for the pointer! I just didn't
> know about it, so essentially I was building my own with very limited
> features. Pardon my ignorance. I've been working in Erlang for a little over
> 2 years, and I only just feel like I have a handle on how everything fits
> together. I still don't know what all libraries are out there to do common
> jobs.
>
> Hmm, that seems like a good subject for a fresh thread...
>
> On Sat, May 14, 2016 at 12:16 PM Oliver Korpilla <[hidden email]>
> wrote:
>>
>> Hello, Ryan.
>>
>> Isn't this normally the job of a registry like gproc or am I
>> misunderstanding your requirements?
>>
>> I usually start the worker dynamically through the supervisor and let it
>> do its own registration.
>>
>> Cheers,
>> Oliver
>>
>>
>> Gesendet: Samstag, 14. Mai 2016 um 18:57 Uhr
>> Von: "Ryan Stewart" <[hidden email]>
>> An: "Loïc Hoguin" <[hidden email]>, Chandru
>> <[hidden email]>
>> Cc: "Erlang-Questions Questions" <[hidden email]>
>> Betreff: Re: [erlang-questions] simple_one_for_one supervisor - what
>> happens at restart? (also: gen_tcp)
>>
>> On Sat, May 14, 2016 at 3:28 AM Loïc Hoguin <[hidden email]> wrote:The
>> primary function of supervisors is to provide a hierarchy of
>> processes belonging to an application. With such a hierarchy it becomes
>> possible to find and query information about any single process in your
>> system in a standard manner.
>>
>> I'm rather interested in this aspect of supervised processes. Especially
>> in the case of a simple_one_for_one supervisor, it's unlikely that the
>> supervised processes will be registered, and it's possible that there could
>> be a rather large number of them--maybe in the tens or hundreds of
>> thousands, depending on the use case. I'm curious how others deal with
>> finding a specific temporary process if, for instance, you want to check on
>> the progress of the work it's doing.
>>
>> My current solution is to have a locally registered "manager" process as a
>> supervised sibling to the SOFO worker supervisor, and the manager just has a
>> dict that maps UUIDs to worker PIDs. I.e. creating a "worker" process
>> entails both a supervisor:start_child() call and storing the worker id ->
>> pid mapping in the manager. Is this a typical way to handle temporary
>> workers?_______________________________________________ erlang-questions
>> mailing list [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions[http://erlang.org/mailman/listinfo/erlang-questions]
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Chandru-4
In reply to this post by Loïc Hoguin-3
On 14 May 2016 at 09:28, Loïc Hoguin <[hidden email]> wrote:
On 05/14/2016 09:46 AM, Chandru wrote:
On 14 May 2016 at 00:59, Ryan Stewart <[hidden email]
<mailto:[hidden email]>> wrote:

    Oliver: there is no restart for children under a simple 1-1
    supervisor as documented under Supervision Principles at
    http://erlang.org/doc/man/supervisor.html#id243029

    Otoh, I have to strongly disagree with Chandru about there being "no
    point" in supervising a connection process. Depending on the nature
    of your application, there can be a *huge* benefit in supervision,
    which is orderly shutdown. With a proper supervisory tree, you can
    ensure that all connection processes have a chance to finish their
    work and shut down cleanly when you stop your application.


Cleanup can be done in the terminate callback if you use
process_flag(trap_exit, true). You need to supervise a process primarily
if you want it *restarted*, not for cleanup. Yes, you do have orderly
shutdown mechanisms if the number of restarts exceed a "normal" value,
but that is not the situation here.

This is bad advice. Cleanup should be done in a separate process, otherwise sending a kill signal sent to the wrong process will leave things dirty.

No, it's not. The reason a terminate callback is provided in a gen_server is so that a process can clean up when it terminates, not to delegate it to other processes.
 
Oliver has a server process to which clients connect. He does not need a
supervisor for this because it is up to the client to reconnect if a
connection is lost.

This is incorrect.

The primary function of supervisors is to provide a hierarchy of processes belonging to an application. With such a hierarchy it becomes possible to find and query information about any single process in your system in a standard manner.

No, it's not. From the manual:

The supervisor is responsible for starting, stopping and monitoring its child processes. The basic idea of a supervisor is that it shall keep its child processes alive by restarting them when necessary.
 

Secondary functions of supervisors include restarting, reporting (with SASL) and upgrades. But they're entirely optional.

See above - what you state is secondary is its primary process. In fact, making the process hierarchy discoverable is a secondary benefit of using a supervisor.
 
Having processes run without being attached to a supervisor is a big mistake, because they either become invisible, or force you to implement a non-standard way to find them. This makes introspecting a system more complex than it should be.
 
Look carefully at the example I provided in the gist and Oliver's use case. It is perfectly sound advice. If you are ever walking your supervisor hierarchy do something with your application, you are doing it wrong.

Chandru


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Fred Hebert-2
On 05/16, Chandru wrote:
>
>No, it's not. The reason a terminate callback is provided in a gen_server
>is so that a process can clean up when it terminates, not to delegate it to
>other processes.
>

I'm gonna side with Loïc here. The terminate callback is good for any
process-local cleanup or optimistic work, but is by no means a safe way
to terminate anything.

For example, if you have many children to terminate and through some
interleaving brutall_kill is triggered (or anyone calls exit(Pid,
kill)), whatever work you wanted to do in terminate will be skipped by a
non-trappable exit signal.

Using terminate as your sole termination clean up is risky. It is better
to assume that it will not be called every time, only in controlled
terminations and some accidental ones. This is especially true of
non-collected resources -- not ports nor ETS tables -- specifically live
dependencies such as other processes mid-discussion.

The other side has to be able to cope with the termination of its peer;
this can be done through monitors, sometimes through link+trap_exit. If
recovery is not possible, just dying is appropriate.

>
>No, it's not. From the manual:
>
>The supervisor is responsible for starting, stopping and monitoring its
>child processes. The basic idea of a supervisor is that it shall keep its
>child processes alive by restarting them when necessary.
>

In practice, the release handling mechanisms will make use of that
supervision structure to walk the tree: that's why you declare whether a
supervisor's child are workers or supervisors (leaf or inner node!)

The tree is being walked the entire way through.

That being said, I personally try to avoid calling the supervisor to
know who its children are and prefer named nodes. For me the supervisor
is first and foremost a definition of a unit of failure, of dependencies
between workers or subtrees.

>
>Look carefully at the example I provided in the gist and Oliver's use case.
>It is perfectly sound advice. If you are ever walking your supervisor
>hierarchy do something with your application, you are doing it wrong.
>

See release upgrades; if you need to walk your entire system at once,
doing it through supervisors is not a bad idea.

Funnily enough, the supervision structure isn't all that is being
trusted though. When an app is shut down, the application controller (or
is it the master?) also runs through all of the processes on the node
and looks for those for whose it is the group leader and then force
kills them -- preventing the terminate function from being called.

Regards,
Fred.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Fred Hebert-2
On 05/16, Fred Hebert wrote:
>Funnily enough, the supervision structure isn't all that is being
>trusted though. When an app is shut down, the application controller
>(or is it the master?) also runs through all of the processes on the
>node and looks for those for whose it is the group leader and then
>force kills them -- preventing the terminate function from being
>called.
>

I wanted to add that this is only done once all of the supervision trees
are shut down recursively for that app. What is then cleaned up brutally
is unsupervised processes.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Oliver Korpilla
In reply to this post by Fred Hebert-2
Hello, Fred.
 

> Funnily enough, the supervision structure isn't all that is being
> trusted though. When an app is shut down, the application controller (or
> is it the master?) also runs through all of the processes on the node
> and looks for those for whose it is the group leader and then force
> kills them -- preventing the terminate function from being called.

Thanks for this information, I did not know that!

Soooo...

Do supervisors with strategies _other than_ simple_one_for_one restart dynamically started children? Like: I add a transient child dynamically with its full child spec. Will one_for_one restart it with its original parameters if it fails?

Beyond that, my scenario is the following:

I have thousands of clients at any time. These may run semi-complex procedures, setting stuff up, changing their centrally managed communication, quitting.

I personally thought this was exactly the scenario transient was made for - cleaning up behind workers who exit normally (I mean: releasing the child info from the supervisor data structures), allowing restart of those who crashed. But in my system even those exiting normally seem to persist in the supervisor DS, which (I thought) according to the very definition of transient should not happen and will leak memory (and performance when walking lists) eventually.

Now, since I call terminate_child in the supervisor itself on the children that may cause the problem. This may be a bug in my design, which is like this:

Level A) Supervisor (one_for_one strategy)
           -- 1 to n relationship -->
Level B)    Semi-permanent Worker that runs individual procedures in parallel and acts as monitor
              -- 1 to n relationship -->
Level C)         Individual short-lived procedure spawned to run one short message sequence that ends with an update of system or client state

Now, I want my Semi-Permanent Worker (B) as stateless as possible. It maintains a set of flags that can be reloaded from DB at any time so that it can be determined which procedures are allowed to start. It starts C procedures (or routes messages to running ones) and monitors them. It acts like a supervisor that needs to know more about the children because it is designed to start the right ones, so I implemented it as a worker.

The individual C procedures are short, either honoring the OTP principles or simple gen_fsms, walking through one or several steps of message exchange in the predefined protocol with the client.

This would all be fine and dandy, but it is an individual procedure that is handling client shutdown (and hence the need to terminate its "boss" B). Currently I finish it by making a call to the supervisor to terminate the boss child and then cleanup occurs. This works but requires me still to call delete_child in order to make sure the supervisor data structures are not full of zombie children.

I could also signal in the C child procedure exit to its B boss monitor that this shutdown means a takedown of B altogether ("no more interaction with this client"). I don't want to have an error message pop up because of this (as some EXIT signals produce automatically, it seems) as this is supposed to be a regular case. So, maybe exit with shutdown?

The worst solution to me would be if the boss worker needs to track if one of the procedures he runs is a shutdown of a client and finish when it finishes. That would imply giving it more state and logic which I hoped to avoid.

The end result, however, is that I want the supervisor to keep as clean an internal state as possible because not only there will be 1000s of clients in the system at any time, hopefully this will sum up to millions of clients during its uptime due to the transient nature of these clients.

Now,
can this requirement be fulfilled with existing one_for_one supervisor? Is there any scenario where one_for_one guaranteedly cleans up the child from its data structures when it exits normally when it is configured transient? Or is there no such scenario? Previous emails left me confused about this.

Thank you and cheers,
Oliver
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Chandru-4
In reply to this post by Fred Hebert-2
On 17 May 2016 at 00:49, Fred Hebert <[hidden email]> wrote:
On 05/16, Chandru wrote:

No, it's not. The reason a terminate callback is provided in a gen_server
is so that a process can clean up when it terminates, not to delegate it to
other processes.


I'm gonna side with Loďc here. The terminate callback is good for any process-local cleanup or optimistic work, but is by no means a safe way to terminate anything.

For example, if you have many children to terminate and through some interleaving brutall_kill is triggered (or anyone calls exit(Pid, kill)), whatever work you wanted to do in terminate will be skipped by a non-trappable exit signal.

Agreed, but how does adding the connection handling process (in Oliver's use case) to a simple_one_for_one supervisor help? It doesn't give him anything other than the illusion of being "supervised".

Chandru


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)

Ryan Stewart
I think you're still ignoring the ordering guarantees of supervision. If "top_sup" supervises [important_server_process, "worker_sup"], and "worker_sup" is a simple_one_for_one where you keep your transient/temporary workers, then "top_sup" will ensure that important_server_process doesn't get the shutdown signal until all workers are finished. It may not apply in this specific case, but I've found it invaluable.

On Tue, May 17, 2016 at 3:09 PM Chandru <[hidden email]> wrote:
On 17 May 2016 at 00:49, Fred Hebert <[hidden email]> wrote:
On 05/16, Chandru wrote:

No, it's not. The reason a terminate callback is provided in a gen_server
is so that a process can clean up when it terminates, not to delegate it to
other processes.


I'm gonna side with Loďc here. The terminate callback is good for any process-local cleanup or optimistic work, but is by no means a safe way to terminate anything.


For example, if you have many children to terminate and through some interleaving brutall_kill is triggered (or anyone calls exit(Pid, kill)), whatever work you wanted to do in terminate will be skipped by a non-trappable exit signal.

Agreed, but how does adding the connection handling process (in Oliver's use case) to a simple_one_for_one supervisor help? It doesn't give him anything other than the illusion of being "supervised".

Chandru

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
12