Suspending Erlang Processes

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Suspending Erlang Processes

Duncan Paul Attard
I am tracing an Erlang process, say, `P` by invoking the BIF `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from some process. As per the Erlang docs, the latter process becomes the tracer for `P`, which I shall call `Trc_Q`.

Suppose now, that process `P` spawns a new process `Q`. Since the flag `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q` will automatically be traced by `Trc_P` as well.

---

I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership of tracing `Q` to it, so that the resulting configuration will be that of process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.

However, Erlang permits **at most** one tracer per process, so I cannot achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from `Trc_Q`. The only way possible is to do it in two steps:

1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`.

In the time span between steps **1.** and **2.** above, it might be possible that trace events by process `Q` are **lost** because at that moment, there is no tracer attached. One way of mitigating this is to perform the following:

1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is eventually suspended by the VM);
2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`;
4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q` can continue executing.

From what I was able to find out, while `Q` is suspended, messages sent to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q, receive, Msg}` trace events accordingly without any loss.

I have one constraint which led me to look at suspend/resume process: I cannot modify the code of `P` or `Q`, so inserting `receive` expressions to block said processes is out of the question.

However, I am hesitant to use suspend/resume, since the Erlang docs explicitly say that these are to be used for *debugging purposes only*. Any idea as to why this is the case?

Thanks.

Duncan.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Suspending Erlang Processes

Rickard Green-2


On Mon, Sep 30, 2019 at 1:57 PM Duncan Paul Attard <[hidden email]> wrote:
>
> I am tracing an Erlang process, say, `P` by invoking the BIF `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from some process. As per the Erlang docs, the latter process becomes the tracer for `P`, which I shall call `Trc_Q`.
>
> Suppose now, that process `P` spawns a new process `Q`. Since the flag `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q` will automatically be traced by `Trc_P` as well.
>
> ---
>
> I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership of tracing `Q` to it, so that the resulting configuration will be that of process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.
>

Unfortunately I do not have any ideas on how to accomplish this.

> However, Erlang permits **at most** one tracer per process, so I cannot achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from `Trc_Q`. The only way possible is to do it in two steps:
>
> 1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`.
>
> In the time span between steps **1.** and **2.** above, it might be possible that trace events by process `Q` are **lost** because at that moment, there is no tracer attached. One way of mitigating this is to perform the following:
>
> 1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is eventually suspended by the VM);
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`;
> 4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q` can continue executing.
>
> From what I was able to find out, while `Q` is suspended, messages sent to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q, receive, Msg}` trace events accordingly without any loss.
>

This is not a feature, it is a bug (introduced in erts 10.0, OTP 21.0) that will be fixed. The trace message should have been delivered even though the receiver was suspended.

You cannot even rely on this behavior while this bug is present. If you (or any process in the system) send the suspended process a non-message signal (monitor, demonitor, link, unlink, exit, process_info, ...), the bug will be bypassed and the trace message will be delivered.

> However, I am hesitant to use suspend/resume, since the Erlang docs explicitly say that these are to be used for *debugging purposes only*.

Mission accomplished! :-)

> Any idea as to why this is the case?
>

The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Suspending Erlang Processes

Duncan Paul Attard
Thanks for the explanation and for pointing the bug out. So it seems to me that there is no way to stop ‘receive’ trace events from being generated, despite the use of suspend. I guess this stems out from the asynchronous nature of the actor model.

The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.

I understand and do agree that synchronisation in Erlang, and in general, the actor model is modelled via message exchanges only, and that utilising other primitives such as suspend and resume does not adhere to this model. 

Yet, in my particular use case (runtime verification), I am restricting myself to systems/processes that *cannot* be instrumented with additional instructions (in this case, receive clauses) so as to block their execution at specific points. Thus, the only way left to me would be to suspend a process whilst it is executing, *without* my having the knowledge of what instruction the process in question is executing. To give you a bit of context, I am creating a monitoring system ‘M' that is layered on top of a given system that one wishes to monitor, 'S'. ‘M' observes the execution of ’S’ via EVM tracing to try and detect infringements of certain logical properties specified over ’S’.

The docs mention that suspend and resume are reserved for debugging purposes, and like you said in your reply, draws attention to the fact that careless use of these two functions can lead to inadvertent deadlocks. You mentioned also that automatic deadlock detection has been removed, hinting that that the implementation of suspend and resume might change in future releases of Erlang. I understand that. Besides this however, is there any other reason that suspend and resume should *not* be used? For instance, would executing suspend at any point, say, mess up the internal state of the suspended process? This question is in light of what I said above, namely that I would suspend a process whilst it is executing without having knowledge of what instruction the suspendee is executing. ‘suspend_process/1’ blocks the suspender until suspendee is eventually suspended: does "eventually suspended" mean that it is safe to assume that the VM brings suspendee to a state where it is ok to suspend it? 

And out of sheer curiosity, is a suspendee suspended as soon as possible, or does the scheduler execute its remaining number of reductions before suspending it and returns control back to the suspender?

Once again, thanks a lot for your kind help Rickard. 

Best regards,
Duncan.







On 01 Oct 2019, at 22:06, Rickard Green <[hidden email]> wrote:



On Mon, Sep 30, 2019 at 1:57 PM Duncan Paul Attard <[hidden email]> wrote:
>
> I am tracing an Erlang process, say, `P` by invoking the BIF `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from some process. As per the Erlang docs, the latter process becomes the tracer for `P`, which I shall call `Trc_Q`.
>
> Suppose now, that process `P` spawns a new process `Q`. Since the flag `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q` will automatically be traced by `Trc_P` as well.
>
> ---
>
> I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership of tracing `Q` to it, so that the resulting configuration will be that of process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.
>

Unfortunately I do not have any ideas on how to accomplish this.

> However, Erlang permits **at most** one tracer per process, so I cannot achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from `Trc_Q`. The only way possible is to do it in two steps:
>
> 1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`.
>
> In the time span between steps **1.** and **2.** above, it might be possible that trace events by process `Q` are **lost** because at that moment, there is no tracer attached. One way of mitigating this is to perform the following:
>
> 1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is eventually suspended by the VM);
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`;
> 4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q` can continue executing.
>
> From what I was able to find out, while `Q` is suspended, messages sent to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q, receive, Msg}` trace events accordingly without any loss.
>

This is not a feature, it is a bug (introduced in erts 10.0, OTP 21.0) that will be fixed. The trace message should have been delivered even though the receiver was suspended.

You cannot even rely on this behavior while this bug is present. If you (or any process in the system) send the suspended process a non-message signal (monitor, demonitor, link, unlink, exit, process_info, ...), the bug will be bypassed and the trace message will be delivered.

> However, I am hesitant to use suspend/resume, since the Erlang docs explicitly say that these are to be used for *debugging purposes only*.

Mission accomplished! :-)

> Any idea as to why this is the case?
>

The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Suspending Erlang Processes

Kenneth Lundin-5
In reply to this post by Rickard Green-2
As a follow up on Rickards answer I think it would be interesting if you can explain why you want different tracers per process?
If we know what problem you want to solve we can most probably come with better suggestions.

I also recommend that you use tracing via the dbg module which is intended to be a more user friendly API towards tracing. The trace BIFs might give some more detailed control but dbg has support for most use cases and makes it easier to do the right thing, at least that is the intention.

Also worth mentioning is that the tracing mechanisms are really not intended to use to achieve a certain functionality which is part of the application, they are intended to be used temporarily for debugging/profiling purposes. Since there is only one tracer at the time the use of tracing as part of the "ordinary" implementation of an application there will be conflicts as soon as any tracing or profiling is needed and probably the intended functionality of the application will then be broken.

/Kenneth, Erlang/OTP Ericsson

On Tue, Oct 1, 2019 at 10:07 PM Rickard Green <[hidden email]> wrote:


On Mon, Sep 30, 2019 at 1:57 PM Duncan Paul Attard <[hidden email]> wrote:
>
> I am tracing an Erlang process, say, `P` by invoking the BIF `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from some process. As per the Erlang docs, the latter process becomes the tracer for `P`, which I shall call `Trc_Q`.
>
> Suppose now, that process `P` spawns a new process `Q`. Since the flag `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q` will automatically be traced by `Trc_P` as well.
>
> ---
>
> I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership of tracing `Q` to it, so that the resulting configuration will be that of process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.
>

Unfortunately I do not have any ideas on how to accomplish this.

> However, Erlang permits **at most** one tracer per process, so I cannot achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from `Trc_Q`. The only way possible is to do it in two steps:
>
> 1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`.
>
> In the time span between steps **1.** and **2.** above, it might be possible that trace events by process `Q` are **lost** because at that moment, there is no tracer attached. One way of mitigating this is to perform the following:
>
> 1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is eventually suspended by the VM);
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`;
> 4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q` can continue executing.
>
> From what I was able to find out, while `Q` is suspended, messages sent to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q, receive, Msg}` trace events accordingly without any loss.
>

This is not a feature, it is a bug (introduced in erts 10.0, OTP 21.0) that will be fixed. The trace message should have been delivered even though the receiver was suspended.

You cannot even rely on this behavior while this bug is present. If you (or any process in the system) send the suspended process a non-message signal (monitor, demonitor, link, unlink, exit, process_info, ...), the bug will be bypassed and the trace message will be delivered.

> However, I am hesitant to use suspend/resume, since the Erlang docs explicitly say that these are to be used for *debugging purposes only*.

Mission accomplished! :-)

> Any idea as to why this is the case?
>

The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Suspending Erlang Processes

Rickard Green-2
In reply to this post by Duncan Paul Attard


On Wed, Oct 2, 2019 at 8:41 AM Duncan Paul Attard <[hidden email]> wrote:
Thanks for the explanation and for pointing the bug out. So it seems to me that there is no way to stop ‘receive’ trace events from being generated, despite the use of suspend. I guess this stems out from the asynchronous nature of the actor model.

The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.

I understand and do agree that synchronisation in Erlang, and in general, the actor model is modelled via message exchanges only, and that utilising other primitives such as suspend and resume does not adhere to this model. 

Yet, in my particular use case (runtime verification), I am restricting myself to systems/processes that *cannot* be instrumented with additional instructions (in this case, receive clauses) so as to block their execution at specific points. Thus, the only way left to me would be to suspend a process whilst it is executing, *without* my having the knowledge of what instruction the process in question is executing. To give you a bit of context, I am creating a monitoring system ‘M' that is layered on top of a given system that one wishes to monitor, 'S'. ‘M' observes the execution of ’S’ via EVM tracing to try and detect infringements of certain logical properties specified over ’S’.

The docs mention that suspend and resume are reserved for debugging purposes, and like you said in your reply, draws attention to the fact that careless use of these two functions can lead to inadvertent deadlocks. You mentioned also that automatic deadlock detection has been removed, hinting that that the implementation of suspend and resume might change in future releases of Erlang. I understand that. Besides this however, is there any other reason that suspend and resume should *not* be used? For instance, would executing suspend at any point, say, mess up the internal state of the suspended process?

Not unless there is a bug in the suspend functionality.
 
This question is in light of what I said above, namely that I would suspend a process whilst it is executing without having knowledge of what instruction the suspendee is executing. ‘suspend_process/1’ blocks the suspender until suspendee is eventually suspended: does "eventually suspended" mean that it is safe to assume that the VM brings suspendee to a state where it is ok to suspend it? 


Yes. The suspender sends the suspendee an asynchronous signal. The suspendee wont suspend until it receives and handle the suspend signal. Currently handling of such signals only occur when a process is scheduled in, when executing a receive (depending on the state of the message queue), and when scheduled out from a dirty scheduler. Signals are handled in received order. If there are a lot of signals to handle, not all of them are necessarily handled before continuing execution. That is, the suspendee may pass one of these points where it handles incoming signals and then continue execution even if there should be a suspend signal waiting for it.
 
And out of sheer curiosity, is a suspendee suspended as soon as possible, or does the scheduler execute its remaining number of reductions before suspending it and returns control back to the suspender?


As described above, it suspends when it sees the suspend signal which might or might not happen before the remaining number of reductions has been exhausted.
 
Once again, thanks a lot for your kind help Rickard. 

Best regards,
Duncan.




Regards,
Rickard

--
Rickard Green, Erlang/OTP, Ericsson AB

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Suspending Erlang Processes

Duncan Paul Attard
In reply to this post by Kenneth Lundin-5
Kenneth, Rickard,

Let me give you a bit of context. 

I’m working on a runtime verification (RV) tool that focusses on components systems in an asynchronous setting. I’ve chosen Erlang because it nicely models this setting and also facilitates certain aspects in the development of said tool. Very briefly, in RV, the concept is that of instrumenting the system with other processes (called monitors in the RV community, but have nothing to do with Erlang monitors) that analyse the parts of the system (e.g., one process or a group of them, which I will refer to as a "system component") to detect and flag the infringement of some property specified over the component.

These properties (which are written using a high-level logic such as Linear Temporal Logic, Hennessy-Milner Logic, etc.), define things like “Process P cannot send message M to Q when such and such condition arises” or “Process P must exit when a particular message M is sent to it”, etc. A monitor, or rather, the monitor source code, is synthesised from a property and “attached” to the component to be monitored. The following is more or less the general workflow:

1. A property is written in a text file using one of the logics mentioned;
2. The property is parsed and compiled to generate the monitor (in Erlang source code, in my particular case);
3. The monitor is spawned as a process that analyses a system component of interest as this executes.

The monitor needs to somehow acquire the runtime events emitted by processes, and this it does via the built-in Erlang tracing (i.e., the monitor is itself a tracer process). The important thing to note is that the monitors, despite being processes themselves, may be considered as a meta-layer over the system, and therefore, do not technically form part of the “ordinary” implementation of the system. This means that monitors can be introduced or removed from the system as needed, and merely function as a second layer that strives to observe the system with *minimal* interference. 

This brings me to Kenneth’s point, that tracing is a tool intended for debugging/profiling purposes. I agree, and in fact, RV might be considered as a flavour of debugging or profiling that is done at runtime. It differs (amongst other things) from debugging and profiling, in that monitors are the product of autogenerated code resulting from *formal* logical properties. From what I gather, debugging or profiling obtains trace events in a similar way to the one I’m using for monitoring. I also understand and agree with you Kenneth that, if a system process is being monitored by one of my monitors, then it cannot be profiled or debugged due to the one-tracer limit imposed by the EVM. 

Also, the reason I’m not using ‘dbg' but 'erlang:trace/3’ directly is that I want the full flexibility of tracing (I might require it in later stages of my research). Way back when I started the project I was not aware of the extent of the functionality ‘dbg’ offers, and so to play it safe (and after reading Francesco and Simon’s book), I decided to go for the tracing BIFs.

Finally, the reason I require different tracers (in my case, monitors) for different system processes (or groups) is that it makes the specification of correctness properties much more manageable. The gist of the idea is that it is far easier to specify a property over a restricted set of processes (e.g., just one process which exhibits *sequential* execution) than it is for a large number of processes, as then the property needs to account for all the possible interleavings of trace events exhibited by different processes. So in a sense, different monitors over different system components allow me to partition and view the otherwise whole trace of the system as a collection of separate traces for different components. Naturally, the monitors generated from smaller properties tend to be small and lightweight themselves, and are easier to work with. Moreover, this allows me to switch off certain monitors dynamically at runtime for system components that might not require monitoring anymore, while leaving others on. 

Since a system can be viewed as always starting from one root process, I attach (i.e., start tracing) a special root monitor to this system root process. The root monitor creates new monitors on the fly for certain child processes that are spawned by the root system process. Now, to collect trace events without loss, the root monitor is configured with ’set_on_spawn’, meaning that new children of the root system process are automatically traced by the root monitor at first. To spawn a dedicated monitor ‘Mon_C' for some child process ‘C’, the following is executed:

1. Root monitor ‘Mon_R’ is currently tracing the new child process ‘C’ ('set_on_spawn' flag was set on 'Mon_R');
2. The new monitor ‘Mon_C’ created for child process ‘C’ switches tracing *off* for ‘C’ (i.e., erlang:trace(Pid_C, false, ..)), so the (previous) monitor ‘Mon_R' stops being the tracer of ‘C’;
3. New monitor ‘Mon_C’ switches tracing back *on* for child process ‘C’ and becomes its new tracer.

To minimise trace event loss between steps 2 and 3, I was thinking of suspending child process ‘C' before step 2, and resuming it after step 3 This way, ‘C’ is at least blocked, and cannot spawn new processes itself or send messages. I cannot however prevent other processes from sending ‘C' messages, meaning that there might be a chance of ‘receive’ events being lost in the space of time between steps 2 and 3. Therefore, my suggestion still does not banish the problem but merely mitigates it, as steps 2 and 3 do not happen atomically. I wonder whether such a BIF could be realisable, such that the ownership of tracing can be transferred atomically between tracers without incurring any loss of trace events (between monitors ‘Mon_R’ and ‘Mon_C’ in my case).

FYI, much of the work I’ve discussed has already been published in a previous paper we’ve written in the past. The paper can be found here: http://staff.um.edu.mt/afra1/papers/sefm17.pdf. If you’re interested please let me know.

Many thanks for your help!
Duncan




On 02 Oct 2019, at 09:11, Kenneth Lundin <[hidden email]> wrote:

As a follow up on Rickards answer I think it would be interesting if you can explain why you want different tracers per process?
If we know what problem you want to solve we can most probably come with better suggestions.

I also recommend that you use tracing via the dbg module which is intended to be a more user friendly API towards tracing. The trace BIFs might give some more detailed control but dbg has support for most use cases and makes it easier to do the right thing, at least that is the intention.

Also worth mentioning is that the tracing mechanisms are really not intended to use to achieve a certain functionality which is part of the application, they are intended to be used temporarily for debugging/profiling purposes. Since there is only one tracer at the time the use of tracing as part of the "ordinary" implementation of an application there will be conflicts as soon as any tracing or profiling is needed and probably the intended functionality of the application will then be broken.

/Kenneth, Erlang/OTP Ericsson

On Tue, Oct 1, 2019 at 10:07 PM Rickard Green <[hidden email]> wrote:


On Mon, Sep 30, 2019 at 1:57 PM Duncan Paul Attard <[hidden email]> wrote:
>
> I am tracing an Erlang process, say, `P` by invoking the BIF `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from some process. As per the Erlang docs, the latter process becomes the tracer for `P`, which I shall call `Trc_Q`.
>
> Suppose now, that process `P` spawns a new process `Q`. Since the flag `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q` will automatically be traced by `Trc_P` as well.
>
> ---
>
> I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership of tracing `Q` to it, so that the resulting configuration will be that of process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.
>

Unfortunately I do not have any ideas on how to accomplish this.

> However, Erlang permits **at most** one tracer per process, so I cannot achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from `Trc_Q`. The only way possible is to do it in two steps:
>
> 1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`.
>
> In the time span between steps **1.** and **2.** above, it might be possible that trace events by process `Q` are **lost** because at that moment, there is no tracer attached. One way of mitigating this is to perform the following:
>
> 1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is eventually suspended by the VM);
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`;
> 4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q` can continue executing.
>
> From what I was able to find out, while `Q` is suspended, messages sent to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q, receive, Msg}` trace events accordingly without any loss.
>

This is not a feature, it is a bug (introduced in erts 10.0, OTP 21.0) that will be fixed. The trace message should have been delivered even though the receiver was suspended.

You cannot even rely on this behavior while this bug is present. If you (or any process in the system) send the suspended process a non-message signal (monitor, demonitor, link, unlink, exit, process_info, ...), the bug will be bypassed and the trace message will be delivered.

> However, I am hesitant to use suspend/resume, since the Erlang docs explicitly say that these are to be used for *debugging purposes only*.

Mission accomplished! :-)

> Any idea as to why this is the case?
>

The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Suspending Erlang Processes

Duncan Paul Attard
Hi Kenneth, Rickard,

I was wondering whether you have any suggestions regarding this please.


All the best,

Duncan


On 03 Oct 2019, at 11:27, Duncan Paul Attard <[hidden email]> wrote:

Kenneth, Rickard,

Let me give you a bit of context. 

I’m working on a runtime verification (RV) tool that focusses on components systems in an asynchronous setting. I’ve chosen Erlang because it nicely models this setting and also facilitates certain aspects in the development of said tool. Very briefly, in RV, the concept is that of instrumenting the system with other processes (called monitors in the RV community, but have nothing to do with Erlang monitors) that analyse the parts of the system (e.g., one process or a group of them, which I will refer to as a "system component") to detect and flag the infringement of some property specified over the component.

These properties (which are written using a high-level logic such as Linear Temporal Logic, Hennessy-Milner Logic, etc.), define things like “Process P cannot send message M to Q when such and such condition arises” or “Process P must exit when a particular message M is sent to it”, etc. A monitor, or rather, the monitor source code, is synthesised from a property and “attached” to the component to be monitored. The following is more or less the general workflow:

1. A property is written in a text file using one of the logics mentioned;
2. The property is parsed and compiled to generate the monitor (in Erlang source code, in my particular case);
3. The monitor is spawned as a process that analyses a system component of interest as this executes.

The monitor needs to somehow acquire the runtime events emitted by processes, and this it does via the built-in Erlang tracing (i.e., the monitor is itself a tracer process). The important thing to note is that the monitors, despite being processes themselves, may be considered as a meta-layer over the system, and therefore, do not technically form part of the “ordinary” implementation of the system. This means that monitors can be introduced or removed from the system as needed, and merely function as a second layer that strives to observe the system with *minimal* interference. 

This brings me to Kenneth’s point, that tracing is a tool intended for debugging/profiling purposes. I agree, and in fact, RV might be considered as a flavour of debugging or profiling that is done at runtime. It differs (amongst other things) from debugging and profiling, in that monitors are the product of autogenerated code resulting from *formal* logical properties. From what I gather, debugging or profiling obtains trace events in a similar way to the one I’m using for monitoring. I also understand and agree with you Kenneth that, if a system process is being monitored by one of my monitors, then it cannot be profiled or debugged due to the one-tracer limit imposed by the EVM. 

Also, the reason I’m not using ‘dbg' but 'erlang:trace/3’ directly is that I want the full flexibility of tracing (I might require it in later stages of my research). Way back when I started the project I was not aware of the extent of the functionality ‘dbg’ offers, and so to play it safe (and after reading Francesco and Simon’s book), I decided to go for the tracing BIFs.

Finally, the reason I require different tracers (in my case, monitors) for different system processes (or groups) is that it makes the specification of correctness properties much more manageable. The gist of the idea is that it is far easier to specify a property over a restricted set of processes (e.g., just one process which exhibits *sequential* execution) than it is for a large number of processes, as then the property needs to account for all the possible interleavings of trace events exhibited by different processes. So in a sense, different monitors over different system components allow me to partition and view the otherwise whole trace of the system as a collection of separate traces for different components. Naturally, the monitors generated from smaller properties tend to be small and lightweight themselves, and are easier to work with. Moreover, this allows me to switch off certain monitors dynamically at runtime for system components that might not require monitoring anymore, while leaving others on. 

Since a system can be viewed as always starting from one root process, I attach (i.e., start tracing) a special root monitor to this system root process. The root monitor creates new monitors on the fly for certain child processes that are spawned by the root system process. Now, to collect trace events without loss, the root monitor is configured with ’set_on_spawn’, meaning that new children of the root system process are automatically traced by the root monitor at first. To spawn a dedicated monitor ‘Mon_C' for some child process ‘C’, the following is executed:

1. Root monitor ‘Mon_R’ is currently tracing the new child process ‘C’ ('set_on_spawn' flag was set on 'Mon_R');
2. The new monitor ‘Mon_C’ created for child process ‘C’ switches tracing *off* for ‘C’ (i.e., erlang:trace(Pid_C, false, ..)), so the (previous) monitor ‘Mon_R' stops being the tracer of ‘C’;
3. New monitor ‘Mon_C’ switches tracing back *on* for child process ‘C’ and becomes its new tracer.

To minimise trace event loss between steps 2 and 3, I was thinking of suspending child process ‘C' before step 2, and resuming it after step 3 This way, ‘C’ is at least blocked, and cannot spawn new processes itself or send messages. I cannot however prevent other processes from sending ‘C' messages, meaning that there might be a chance of ‘receive’ events being lost in the space of time between steps 2 and 3. Therefore, my suggestion still does not banish the problem but merely mitigates it, as steps 2 and 3 do not happen atomically. I wonder whether such a BIF could be realisable, such that the ownership of tracing can be transferred atomically between tracers without incurring any loss of trace events (between monitors ‘Mon_R’ and ‘Mon_C’ in my case).

FYI, much of the work I’ve discussed has already been published in a previous paper we’ve written in the past. The paper can be found here: http://staff.um.edu.mt/afra1/papers/sefm17.pdf. If you’re interested please let me know.

Many thanks for your help!
Duncan




On 02 Oct 2019, at 09:11, Kenneth Lundin <[hidden email]> wrote:

As a follow up on Rickards answer I think it would be interesting if you can explain why you want different tracers per process?
If we know what problem you want to solve we can most probably come with better suggestions.

I also recommend that you use tracing via the dbg module which is intended to be a more user friendly API towards tracing. The trace BIFs might give some more detailed control but dbg has support for most use cases and makes it easier to do the right thing, at least that is the intention.

Also worth mentioning is that the tracing mechanisms are really not intended to use to achieve a certain functionality which is part of the application, they are intended to be used temporarily for debugging/profiling purposes. Since there is only one tracer at the time the use of tracing as part of the "ordinary" implementation of an application there will be conflicts as soon as any tracing or profiling is needed and probably the intended functionality of the application will then be broken.

/Kenneth, Erlang/OTP Ericsson

On Tue, Oct 1, 2019 at 10:07 PM Rickard Green <[hidden email]> wrote:


On Mon, Sep 30, 2019 at 1:57 PM Duncan Paul Attard <[hidden email]> wrote:
>
> I am tracing an Erlang process, say, `P` by invoking the BIF `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from some process. As per the Erlang docs, the latter process becomes the tracer for `P`, which I shall call `Trc_Q`.
>
> Suppose now, that process `P` spawns a new process `Q`. Since the flag `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q` will automatically be traced by `Trc_P` as well.
>
> ---
>
> I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership of tracing `Q` to it, so that the resulting configuration will be that of process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.
>

Unfortunately I do not have any ideas on how to accomplish this.

> However, Erlang permits **at most** one tracer per process, so I cannot achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from `Trc_Q`. The only way possible is to do it in two steps:
>
> 1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`.
>
> In the time span between steps **1.** and **2.** above, it might be possible that trace events by process `Q` are **lost** because at that moment, there is no tracer attached. One way of mitigating this is to perform the following:
>
> 1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is eventually suspended by the VM);
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`;
> 4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q` can continue executing.
>
> From what I was able to find out, while `Q` is suspended, messages sent to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q, receive, Msg}` trace events accordingly without any loss.
>

This is not a feature, it is a bug (introduced in erts 10.0, OTP 21.0) that will be fixed. The trace message should have been delivered even though the receiver was suspended.

You cannot even rely on this behavior while this bug is present. If you (or any process in the system) send the suspended process a non-message signal (monitor, demonitor, link, unlink, exit, process_info, ...), the bug will be bypassed and the trace message will be delivered.

> However, I am hesitant to use suspend/resume, since the Erlang docs explicitly say that these are to be used for *debugging purposes only*.

Mission accomplished! :-)

> Any idea as to why this is the case?
>

The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Suspending Erlang Processes

Kenneth Lundin-5
Hi Duncan,

My initial thought was that why do you need many tracers to achieve what you want. 
What do you say about the approach of having 1 global tracer which act as a dispatcher to all your monitors.
As all trace messages have information about the process from which the trace event originates you can do a mapping between
the message and the monitor and then distribute the message to that monitor which could act in the same way as if it was the tracer for a specific process or process group.

/Regards Kenneth

On Thu, Oct 17, 2019 at 9:01 AM Duncan Paul Attard <[hidden email]> wrote:
Hi Kenneth, Rickard,

I was wondering whether you have any suggestions regarding this please.


All the best,

Duncan


On 03 Oct 2019, at 11:27, Duncan Paul Attard <[hidden email]> wrote:

Kenneth, Rickard,

Let me give you a bit of context. 

I’m working on a runtime verification (RV) tool that focusses on components systems in an asynchronous setting. I’ve chosen Erlang because it nicely models this setting and also facilitates certain aspects in the development of said tool. Very briefly, in RV, the concept is that of instrumenting the system with other processes (called monitors in the RV community, but have nothing to do with Erlang monitors) that analyse the parts of the system (e.g., one process or a group of them, which I will refer to as a "system component") to detect and flag the infringement of some property specified over the component.

These properties (which are written using a high-level logic such as Linear Temporal Logic, Hennessy-Milner Logic, etc.), define things like “Process P cannot send message M to Q when such and such condition arises” or “Process P must exit when a particular message M is sent to it”, etc. A monitor, or rather, the monitor source code, is synthesised from a property and “attached” to the component to be monitored. The following is more or less the general workflow:

1. A property is written in a text file using one of the logics mentioned;
2. The property is parsed and compiled to generate the monitor (in Erlang source code, in my particular case);
3. The monitor is spawned as a process that analyses a system component of interest as this executes.

The monitor needs to somehow acquire the runtime events emitted by processes, and this it does via the built-in Erlang tracing (i.e., the monitor is itself a tracer process). The important thing to note is that the monitors, despite being processes themselves, may be considered as a meta-layer over the system, and therefore, do not technically form part of the “ordinary” implementation of the system. This means that monitors can be introduced or removed from the system as needed, and merely function as a second layer that strives to observe the system with *minimal* interference. 

This brings me to Kenneth’s point, that tracing is a tool intended for debugging/profiling purposes. I agree, and in fact, RV might be considered as a flavour of debugging or profiling that is done at runtime. It differs (amongst other things) from debugging and profiling, in that monitors are the product of autogenerated code resulting from *formal* logical properties. From what I gather, debugging or profiling obtains trace events in a similar way to the one I’m using for monitoring. I also understand and agree with you Kenneth that, if a system process is being monitored by one of my monitors, then it cannot be profiled or debugged due to the one-tracer limit imposed by the EVM. 

Also, the reason I’m not using ‘dbg' but 'erlang:trace/3’ directly is that I want the full flexibility of tracing (I might require it in later stages of my research). Way back when I started the project I was not aware of the extent of the functionality ‘dbg’ offers, and so to play it safe (and after reading Francesco and Simon’s book), I decided to go for the tracing BIFs.

Finally, the reason I require different tracers (in my case, monitors) for different system processes (or groups) is that it makes the specification of correctness properties much more manageable. The gist of the idea is that it is far easier to specify a property over a restricted set of processes (e.g., just one process which exhibits *sequential* execution) than it is for a large number of processes, as then the property needs to account for all the possible interleavings of trace events exhibited by different processes. So in a sense, different monitors over different system components allow me to partition and view the otherwise whole trace of the system as a collection of separate traces for different components. Naturally, the monitors generated from smaller properties tend to be small and lightweight themselves, and are easier to work with. Moreover, this allows me to switch off certain monitors dynamically at runtime for system components that might not require monitoring anymore, while leaving others on. 

Since a system can be viewed as always starting from one root process, I attach (i.e., start tracing) a special root monitor to this system root process. The root monitor creates new monitors on the fly for certain child processes that are spawned by the root system process. Now, to collect trace events without loss, the root monitor is configured with ’set_on_spawn’, meaning that new children of the root system process are automatically traced by the root monitor at first. To spawn a dedicated monitor ‘Mon_C' for some child process ‘C’, the following is executed:

1. Root monitor ‘Mon_R’ is currently tracing the new child process ‘C’ ('set_on_spawn' flag was set on 'Mon_R');
2. The new monitor ‘Mon_C’ created for child process ‘C’ switches tracing *off* for ‘C’ (i.e., erlang:trace(Pid_C, false, ..)), so the (previous) monitor ‘Mon_R' stops being the tracer of ‘C’;
3. New monitor ‘Mon_C’ switches tracing back *on* for child process ‘C’ and becomes its new tracer.

To minimise trace event loss between steps 2 and 3, I was thinking of suspending child process ‘C' before step 2, and resuming it after step 3 This way, ‘C’ is at least blocked, and cannot spawn new processes itself or send messages. I cannot however prevent other processes from sending ‘C' messages, meaning that there might be a chance of ‘receive’ events being lost in the space of time between steps 2 and 3. Therefore, my suggestion still does not banish the problem but merely mitigates it, as steps 2 and 3 do not happen atomically. I wonder whether such a BIF could be realisable, such that the ownership of tracing can be transferred atomically between tracers without incurring any loss of trace events (between monitors ‘Mon_R’ and ‘Mon_C’ in my case).

FYI, much of the work I’ve discussed has already been published in a previous paper we’ve written in the past. The paper can be found here: http://staff.um.edu.mt/afra1/papers/sefm17.pdf. If you’re interested please let me know.

Many thanks for your help!
Duncan




On 02 Oct 2019, at 09:11, Kenneth Lundin <[hidden email]> wrote:

As a follow up on Rickards answer I think it would be interesting if you can explain why you want different tracers per process?
If we know what problem you want to solve we can most probably come with better suggestions.

I also recommend that you use tracing via the dbg module which is intended to be a more user friendly API towards tracing. The trace BIFs might give some more detailed control but dbg has support for most use cases and makes it easier to do the right thing, at least that is the intention.

Also worth mentioning is that the tracing mechanisms are really not intended to use to achieve a certain functionality which is part of the application, they are intended to be used temporarily for debugging/profiling purposes. Since there is only one tracer at the time the use of tracing as part of the "ordinary" implementation of an application there will be conflicts as soon as any tracing or profiling is needed and probably the intended functionality of the application will then be broken.

/Kenneth, Erlang/OTP Ericsson

On Tue, Oct 1, 2019 at 10:07 PM Rickard Green <[hidden email]> wrote:


On Mon, Sep 30, 2019 at 1:57 PM Duncan Paul Attard <[hidden email]> wrote:
>
> I am tracing an Erlang process, say, `P` by invoking the BIF `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from some process. As per the Erlang docs, the latter process becomes the tracer for `P`, which I shall call `Trc_Q`.
>
> Suppose now, that process `P` spawns a new process `Q`. Since the flag `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q` will automatically be traced by `Trc_P` as well.
>
> ---
>
> I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership of tracing `Q` to it, so that the resulting configuration will be that of process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.
>

Unfortunately I do not have any ideas on how to accomplish this.

> However, Erlang permits **at most** one tracer per process, so I cannot achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from `Trc_Q`. The only way possible is to do it in two steps:
>
> 1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`.
>
> In the time span between steps **1.** and **2.** above, it might be possible that trace events by process `Q` are **lost** because at that moment, there is no tracer attached. One way of mitigating this is to perform the following:
>
> 1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is eventually suspended by the VM);
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`;
> 4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q` can continue executing.
>
> From what I was able to find out, while `Q` is suspended, messages sent to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q, receive, Msg}` trace events accordingly without any loss.
>

This is not a feature, it is a bug (introduced in erts 10.0, OTP 21.0) that will be fixed. The trace message should have been delivered even though the receiver was suspended.

You cannot even rely on this behavior while this bug is present. If you (or any process in the system) send the suspended process a non-message signal (monitor, demonitor, link, unlink, exit, process_info, ...), the bug will be bypassed and the trace message will be delivered.

> However, I am hesitant to use suspend/resume, since the Erlang docs explicitly say that these are to be used for *debugging purposes only*.

Mission accomplished! :-)

> Any idea as to why this is the case?
>

The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Suspending Erlang Processes

Duncan Paul Attard
Hi Kenneth,

Thanks for your reply. 

I wanted different tracers since I would like to target specific processes, and not just all of them. I had initially thought about your approach. However I’m afraid that for my case, having a central tracer that would distribute the different trace events to different monitors (according to the originating PID) could mean that I have to collect trace events even for processes that I don’t need to monitor (I would just filter these out, albeit this is extra processing). 

My idea was to keep tracing to a minimum for the sake of performance by tracing processes selectively (and dynamically at runtime). A second shortcoming I see with this approach is that potentially, the central tracer might at times experience high loads due to the trace events that could collect in its mailbox while it is busy routing or filtering trace events: this in turn could keep recipient monitors waiting longer than necessary just to receive a handful of trace events. I suspect that this might also impact memory consumption. What is more, in the case where the central tracer fails, all events would be lost. 

Separate tracers could mitigate these two issues, since they are less likely to create a hotspot, and independent tracers may fail without hampering the progress of other tracers/monitors.

With reference to my previous question, "I wonder whether such a BIF could be realisable, such that the ownership of tracing can be transferred atomically between tracers without incurring any loss of trace events”, would such a BIF be possible to implement in the foreseeable future?


Best,
Duncan


On 17 Oct 2019, at 09:48, Kenneth Lundin <[hidden email]> wrote:

Hi Duncan,

My initial thought was that why do you need many tracers to achieve what you want. 
What do you say about the approach of having 1 global tracer which act as a dispatcher to all your monitors.
As all trace messages have information about the process from which the trace event originates you can do a mapping between
the message and the monitor and then distribute the message to that monitor which could act in the same way as if it was the tracer for a specific process or process group.

/Regards Kenneth

On Thu, Oct 17, 2019 at 9:01 AM Duncan Paul Attard <[hidden email]> wrote:
Hi Kenneth, Rickard,

I was wondering whether you have any suggestions regarding this please.


All the best,

Duncan


On 03 Oct 2019, at 11:27, Duncan Paul Attard <[hidden email]> wrote:

Kenneth, Rickard,

Let me give you a bit of context. 

I’m working on a runtime verification (RV) tool that focusses on components systems in an asynchronous setting. I’ve chosen Erlang because it nicely models this setting and also facilitates certain aspects in the development of said tool. Very briefly, in RV, the concept is that of instrumenting the system with other processes (called monitors in the RV community, but have nothing to do with Erlang monitors) that analyse the parts of the system (e.g., one process or a group of them, which I will refer to as a "system component") to detect and flag the infringement of some property specified over the component.

These properties (which are written using a high-level logic such as Linear Temporal Logic, Hennessy-Milner Logic, etc.), define things like “Process P cannot send message M to Q when such and such condition arises” or “Process P must exit when a particular message M is sent to it”, etc. A monitor, or rather, the monitor source code, is synthesised from a property and “attached” to the component to be monitored. The following is more or less the general workflow:

1. A property is written in a text file using one of the logics mentioned;
2. The property is parsed and compiled to generate the monitor (in Erlang source code, in my particular case);
3. The monitor is spawned as a process that analyses a system component of interest as this executes.

The monitor needs to somehow acquire the runtime events emitted by processes, and this it does via the built-in Erlang tracing (i.e., the monitor is itself a tracer process). The important thing to note is that the monitors, despite being processes themselves, may be considered as a meta-layer over the system, and therefore, do not technically form part of the “ordinary” implementation of the system. This means that monitors can be introduced or removed from the system as needed, and merely function as a second layer that strives to observe the system with *minimal* interference. 

This brings me to Kenneth’s point, that tracing is a tool intended for debugging/profiling purposes. I agree, and in fact, RV might be considered as a flavour of debugging or profiling that is done at runtime. It differs (amongst other things) from debugging and profiling, in that monitors are the product of autogenerated code resulting from *formal* logical properties. From what I gather, debugging or profiling obtains trace events in a similar way to the one I’m using for monitoring. I also understand and agree with you Kenneth that, if a system process is being monitored by one of my monitors, then it cannot be profiled or debugged due to the one-tracer limit imposed by the EVM. 

Also, the reason I’m not using ‘dbg' but 'erlang:trace/3’ directly is that I want the full flexibility of tracing (I might require it in later stages of my research). Way back when I started the project I was not aware of the extent of the functionality ‘dbg’ offers, and so to play it safe (and after reading Francesco and Simon’s book), I decided to go for the tracing BIFs.

Finally, the reason I require different tracers (in my case, monitors) for different system processes (or groups) is that it makes the specification of correctness properties much more manageable. The gist of the idea is that it is far easier to specify a property over a restricted set of processes (e.g., just one process which exhibits *sequential* execution) than it is for a large number of processes, as then the property needs to account for all the possible interleavings of trace events exhibited by different processes. So in a sense, different monitors over different system components allow me to partition and view the otherwise whole trace of the system as a collection of separate traces for different components. Naturally, the monitors generated from smaller properties tend to be small and lightweight themselves, and are easier to work with. Moreover, this allows me to switch off certain monitors dynamically at runtime for system components that might not require monitoring anymore, while leaving others on. 

Since a system can be viewed as always starting from one root process, I attach (i.e., start tracing) a special root monitor to this system root process. The root monitor creates new monitors on the fly for certain child processes that are spawned by the root system process. Now, to collect trace events without loss, the root monitor is configured with ’set_on_spawn’, meaning that new children of the root system process are automatically traced by the root monitor at first. To spawn a dedicated monitor ‘Mon_C' for some child process ‘C’, the following is executed:

1. Root monitor ‘Mon_R’ is currently tracing the new child process ‘C’ ('set_on_spawn' flag was set on 'Mon_R');
2. The new monitor ‘Mon_C’ created for child process ‘C’ switches tracing *off* for ‘C’ (i.e., erlang:trace(Pid_C, false, ..)), so the (previous) monitor ‘Mon_R' stops being the tracer of ‘C’;
3. New monitor ‘Mon_C’ switches tracing back *on* for child process ‘C’ and becomes its new tracer.

To minimise trace event loss between steps 2 and 3, I was thinking of suspending child process ‘C' before step 2, and resuming it after step 3 This way, ‘C’ is at least blocked, and cannot spawn new processes itself or send messages. I cannot however prevent other processes from sending ‘C' messages, meaning that there might be a chance of ‘receive’ events being lost in the space of time between steps 2 and 3. Therefore, my suggestion still does not banish the problem but merely mitigates it, as steps 2 and 3 do not happen atomically. I wonder whether such a BIF could be realisable, such that the ownership of tracing can be transferred atomically between tracers without incurring any loss of trace events (between monitors ‘Mon_R’ and ‘Mon_C’ in my case).

FYI, much of the work I’ve discussed has already been published in a previous paper we’ve written in the past. The paper can be found here: http://staff.um.edu.mt/afra1/papers/sefm17.pdf. If you’re interested please let me know.

Many thanks for your help!
Duncan




On 02 Oct 2019, at 09:11, Kenneth Lundin <[hidden email]> wrote:

As a follow up on Rickards answer I think it would be interesting if you can explain why you want different tracers per process?
If we know what problem you want to solve we can most probably come with better suggestions.

I also recommend that you use tracing via the dbg module which is intended to be a more user friendly API towards tracing. The trace BIFs might give some more detailed control but dbg has support for most use cases and makes it easier to do the right thing, at least that is the intention.

Also worth mentioning is that the tracing mechanisms are really not intended to use to achieve a certain functionality which is part of the application, they are intended to be used temporarily for debugging/profiling purposes. Since there is only one tracer at the time the use of tracing as part of the "ordinary" implementation of an application there will be conflicts as soon as any tracing or profiling is needed and probably the intended functionality of the application will then be broken.

/Kenneth, Erlang/OTP Ericsson

On Tue, Oct 1, 2019 at 10:07 PM Rickard Green <[hidden email]> wrote:


On Mon, Sep 30, 2019 at 1:57 PM Duncan Paul Attard <[hidden email]> wrote:
>
> I am tracing an Erlang process, say, `P` by invoking the BIF `erlang:trace(Pid_P, true, [set_on_spawn, procs, send, 'receive'])` from some process. As per the Erlang docs, the latter process becomes the tracer for `P`, which I shall call `Trc_Q`.
>
> Suppose now, that process `P` spawns a new process `Q`. Since the flag `set_on_spawn` was specified in the call to `erlang:trace/3` above, `Q` will automatically be traced by `Trc_P` as well.
>
> ---
>
> I want to spawn a **new** tracer, `Trc_Q`, and transfer the ownership of tracing `Q` to it, so that the resulting configuration will be that of process `P` being traced by tracer `Trc_P`, `Q` by `Trc_Q`.
>

Unfortunately I do not have any ideas on how to accomplish this.

> However, Erlang permits **at most** one tracer per process, so I cannot achieve said configuration by invoking `erlang:trace(Pid_Q, true, ..)` from `Trc_Q`. The only way possible is to do it in two steps:
>
> 1. Tracer `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`.
>
> In the time span between steps **1.** and **2.** above, it might be possible that trace events by process `Q` are **lost** because at that moment, there is no tracer attached. One way of mitigating this is to perform the following:
>
> 1. Suspend process `Q` by calling `erlang:suspend_process(Pid_Q)` from `Trc_Q` (note that as per Erlang docs, `Trc_Q` remains blocked until `Q` is eventually suspended by the VM);
> 2. `Trc_Q` calls `erlang:trace(Pid_Q, false, ..)` to stop `Trc_P` from tracing `Q`;
> 3. `Trc_Q` calls `erlang:trace(Pid_Q, true, ..)` again to start tracing `Q`;
> 4. Finally, `Trc_Q` calls `erlang:resume_process(Pid_Q)` so that `Q` can continue executing.
>
> From what I was able to find out, while `Q` is suspended, messages sent to it are queued, and when resumed, `Trc_Q` receives the `{trace, Pid_Q, receive, Msg}` trace events accordingly without any loss.
>

This is not a feature, it is a bug (introduced in erts 10.0, OTP 21.0) that will be fixed. The trace message should have been delivered even though the receiver was suspended.

You cannot even rely on this behavior while this bug is present. If you (or any process in the system) send the suspended process a non-message signal (monitor, demonitor, link, unlink, exit, process_info, ...), the bug will be bypassed and the trace message will be delivered.

> However, I am hesitant to use suspend/resume, since the Erlang docs explicitly say that these are to be used for *debugging purposes only*.

Mission accomplished! :-)

> Any idea as to why this is the case?
>

The language was designed with other communication primitives intended for use. Suspend/Resume was explicitly introduced for debugging purposes only, and not for usage by ordinary Erlang programs. They will most likely not disappear, but debug functionality in general are not treated as carefully by us at OTP as other ordinary functionality with regards to compatibility, etc. We for example removed the automatic deadlock prevention in suspend_process() that existed prior to erts 10.0 due to performance reasons.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions