|
|
I've got a gen_server:call that -- very occasionally -- suffers from a
timeout. Obviously this means that my gen_server is already busy doing
something else.
Is there a way that I can instrument gen_server so that it will log
something if it takes too long to return from a callback? That is: my
handle_call is timing out because, presumably, there's another
handle_call (or handle_info, etc.) that's blocking. I'd like *that* to
be logged if it takes longer than, say, 200 milliseconds to return.
Or, I guess, if the message queue length is excessive at any point.
Caveat: I've got ~20,000 of these gen_server processes, and this only
happens intermittently in production, so I'm thinking that tracing
*isn't* the right answer here.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
|
|
On Thu, Sep 19, 2019 at 1:33 PM Roger Lipscombe < [hidden email]> wrote:
>
> I've got a gen_server:call that -- very occasionally -- suffers from a
> timeout. Obviously this means that my gen_server is already busy doing
> something else.
>
> Is there a way that I can instrument gen_server so that it will log
> something if it takes too long to return from a callback? That is: my
> handle_call is timing out because, presumably, there's another
> handle_call (or handle_info, etc.) that's blocking. I'd like *that* to
> be logged if it takes longer than, say, 200 milliseconds to return.
>
> Or, I guess, if the message queue length is excessive at any point.
>
> Caveat: I've got ~20,000 of these gen_server processes, and this only
> happens intermittently in production, so I'm thinking that tracing
> *isn't* the right answer here.
Idea from an ignorant (of built-in features) point of view: call handler
begins by sends a `{call_started, ID, Other_Interesting_Info}` to a
bookkeeping process, then, before replying, `{call_completed, ID}`.
Bookkeeper thus always knows the current in-progress calls and the stats
for completed ones, so it can now log whichever ones violated your
ostensible SLA. This sounds awfully similar to just posting timer
start/end messages to StatsD or some such, but is more flexible.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
|
|
On Thu, Sep 19, 2019 at 7:33 PM Roger Lipscombe <[hidden email]> wrote:
Is there a way that I can instrument gen_server so that it will log
something if it takes too long to return from a callback? That is: my
handle_call is timing out because, presumably, there's another
handle_call (or handle_info, etc.) that's blocking. I'd like *that* to
be logged if it takes longer than, say, 200 milliseconds to return.
The calling process is terminated by means of an exit signal, by calling `exit(timeout)`. You can capture this with an exception handler and then use erlang:process_info/2 to capture the gen_servers state when this happens (or you can try sys:get_state if it does return at some point). Essentially you turn your calling process into the logger. This works insofar there are relatively few of these errors, and it can explain what the called process is sitting and doing while this happens.
One thing to be aware of is that if you capture the timeout, then the real reply message might arrive late in your mailbox. You will have to handle this accordingly, should you opt to make the process continue.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
|
|
Copy paste from our video_frame.erl:
send_frame(Stream, #video_frame{content = Content, codec = Codec, flavor = Flavor, dts = DTS} = Frame, SourceTag) when is_pid(Stream) -> Frame1 = case SourceTag of undefined -> Frame; _ -> Frame#video_frame{source = SourceTag} end, try gen_server:call(Stream, Frame1) catch exit:{normal, _} -> {error, stopped}; exit:{noproc, _} -> {error, stopped}; exit:{shutdown, _} -> {error, stopped}; exit:{timeout, _}:Stack -> Dict = case process_info(Stream, dictionary) of {dictionary, Dict_} -> Dict_; undefined -> [] end, RemoteStack = process_info(Stream, current_stacktrace), Name = case proplists:get_value(name, Dict, <<"dead stream">>) of N when is_binary(N) -> N; N when is_atom(N) -> N; N -> iolist_to_binary(io_lib:format("~p", [N])) end, Status = proplists:get_value(status, Dict), case Status of S when S == starting_transcoder -> ok; _ -> events:error("failed to send frame ~p,~p,~p,~p, source_tag=~p to ~s (~p) in status ~p\n~p\nremote stack:\n~p", [ Content, Codec, Flavor, round(DTS), Frame1#video_frame.source, Name, Stream, Status, Stack, RemoteStack]) end, % [io:format("~10.. s: ~p~n", [K,V]) || {K,V} <- process_info(Stream)] {error, timeout} end.
this code is helping us for years. You should remove video specific stuff.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
|
|
Er, so I neglected to mention that the gen_server is remote (same
host, different node), so process_info doesn't work :(
On Fri, 20 Sep 2019 at 01:45, Max Lapshin < [hidden email]> wrote:
>
> Copy paste from our video_frame.erl:
>
>
> send_frame(Stream, #video_frame{content = Content, codec = Codec, flavor = Flavor, dts = DTS} = Frame, SourceTag) when is_pid(Stream) ->
> Frame1 = case SourceTag of
> undefined -> Frame;
> _ -> Frame#video_frame{source = SourceTag}
> end,
> try gen_server:call(Stream, Frame1)
> catch
> exit:{normal, _} ->
> {error, stopped};
> exit:{noproc, _} ->
> {error, stopped};
> exit:{shutdown, _} ->
> {error, stopped};
> exit:{timeout, _}:Stack ->
> Dict = case process_info(Stream, dictionary) of
> {dictionary, Dict_} -> Dict_;
> undefined -> []
> end,
> RemoteStack = process_info(Stream, current_stacktrace),
> Name = case proplists:get_value(name, Dict, <<"dead stream">>) of
> N when is_binary(N) -> N;
> N when is_atom(N) -> N;
> N -> iolist_to_binary(io_lib:format("~p", [N]))
> end,
> Status = proplists:get_value(status, Dict),
> case Status of
> S when S == starting_transcoder -> ok;
> _ -> events:error("failed to send frame ~p,~p,~p,~p, source_tag=~p to ~s (~p) in status ~p\n~p\nremote stack:\n~p", [
> Content, Codec, Flavor, round(DTS), Frame1#video_frame.source,
> Name, Stream, Status, Stack, RemoteStack])
> end,
> % [io:format("~10.. s: ~p~n", [K,V]) || {K,V} <- process_info(Stream)]
> {error, timeout}
> end.
>
>
>
> this code is helping us for years. You should remove video specific stuff.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
|
|
You can use rpc for that:
rpc:call(node(RPid), erlang, process_info, [RPid, [dictionary, current_stacktrace]]).
On Fri, Sep 20, 2019 at 2:57 PM Roger Lipscombe < [hidden email]> wrote: Er, so I neglected to mention that the gen_server is remote (same
host, different node), so process_info doesn't work :(
On Fri, 20 Sep 2019 at 01:45, Max Lapshin <[hidden email]> wrote:
>
> Copy paste from our video_frame.erl:
>
>
> send_frame(Stream, #video_frame{content = Content, codec = Codec, flavor = Flavor, dts = DTS} = Frame, SourceTag) when is_pid(Stream) ->
> Frame1 = case SourceTag of
> undefined -> Frame;
> _ -> Frame#video_frame{source = SourceTag}
> end,
> try gen_server:call(Stream, Frame1)
> catch
> exit:{normal, _} ->
> {error, stopped};
> exit:{noproc, _} ->
> {error, stopped};
> exit:{shutdown, _} ->
> {error, stopped};
> exit:{timeout, _}:Stack ->
> Dict = case process_info(Stream, dictionary) of
> {dictionary, Dict_} -> Dict_;
> undefined -> []
> end,
> RemoteStack = process_info(Stream, current_stacktrace),
> Name = case proplists:get_value(name, Dict, <<"dead stream">>) of
> N when is_binary(N) -> N;
> N when is_atom(N) -> N;
> N -> iolist_to_binary(io_lib:format("~p", [N]))
> end,
> Status = proplists:get_value(status, Dict),
> case Status of
> S when S == starting_transcoder -> ok;
> _ -> events:error("failed to send frame ~p,~p,~p,~p, source_tag=~p to ~s (~p) in status ~p\n~p\nremote stack:\n~p", [
> Content, Codec, Flavor, round(DTS), Frame1#video_frame.source,
> Name, Stream, Status, Stack, RemoteStack])
> end,
> % [io:format("~10.. s: ~p~n", [K,V]) || {K,V} <- process_info(Stream)]
> {error, timeout}
> end.
>
>
>
> this code is helping us for years. You should remove video specific stuff.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
--
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
|
|
In the end, I used 'dbg'; here's the relevant snippet:
AssertElapsed =
fun(Pid, Call, Req, Then) ->
Now = erlang:monotonic_time(),
ElapsedMicros = erlang:convert_time_unit(Now - Then, native,
microsecond),
if
ElapsedMicros > 5000 ->
io:format("~s ~p(~p) took ~B us\n",
[lists:flatten(ec_date:format("c",
calendar:universal_time())), Call, Req, ElapsedMicros]);
true ->
ok
end,
{message_queue_len, QueueLen} = erlang:process_info(Pid,
message_queue_len),
if
QueueLen > 5 ->
io:format("~s ~p\n",
[lists:flatten(ec_date:format("c", calendar:universal_time())),
erlang:process_info(Pid, messages)]);
true ->
ok
end
end.
%% TState :: map(Fun -> [{Time, Arg0}]).
TState0 = #{}.
TFun =
fun(_Msg = {trace, _Pid, call, {Mod, Fun, [Arg0 | _]}}, TState) ->
% push an entry timestamp.
Call = {erlang:monotonic_time(), Arg0},
maps:update_with(Fun, fun(Calls) -> [Call | Calls] end, [Call], TState);
(_Msg = {trace, _Pid, return_from, {Mod, Fun, _Arity},
_Result}, TState) ->
% pop entry timestamp.
[{Then, Arg0} | Calls] = maps:get(Fun, TState),
AssertElapsed(Pid, Fun, Arg0, Then),
maps:update(Fun, Calls, TState);
(Msg, TState) ->
io:format("~p\n", [Msg]),
TState
end.
dbg:start().
dbg:tracer(process, {TFun, TState0}).
dbg:tpl(Mod, handle_call, '_', dbg:fun2ms(fun(_) -> return_trace() end)).
dbg:tpl(Mod, handle_info, '_', dbg:fun2ms(fun(_) -> return_trace() end)).
dbg:p(Pid, c).
In my particular case, handle_info was occasionally taking ~30
seconds, depending on the input, meaning that if a gen_server:call
occurred immediately afterwards, there was a good chance it would time
out.
Aside: I *always* forget that maps:update_with/4 doesn't apply Fun to Init.
On Thu, 19 Sep 2019 at 18:33, Roger Lipscombe < [hidden email]> wrote:
>
> I've got a gen_server:call that -- very occasionally -- suffers from a
> timeout. Obviously this means that my gen_server is already busy doing
> something else.
>
> Is there a way that I can instrument gen_server so that it will log
> something if it takes too long to return from a callback? That is: my
> handle_call is timing out because, presumably, there's another
> handle_call (or handle_info, etc.) that's blocking. I'd like *that* to
> be logged if it takes longer than, say, 200 milliseconds to return.
>
> Or, I guess, if the message queue length is excessive at any point.
>
> Caveat: I've got ~20,000 of these gen_server processes, and this only
> happens intermittently in production, so I'm thinking that tracing
> *isn't* the right answer here.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
|
|