Hang in ssl_connection:call/2 with ranch

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Hang in ssl_connection:call/2 with ranch

Roger Lipscombe-2
(OTP-21.0)

I'm looking at a deadlock on one of our servers (fortunately only in
staging bring-up, so it's not bothering anyone yet).

I've got a ranch protocol handler blocked in ranch:accept_ack, waiting
for a 'shoot' message. That's never being sent because the
ranch_conns_sup process is blocked in ssl:controlling_process ->
ssl_connection:new_user -> ssl_connection:call ->
gen_statem:call_dirty -> gen:do_call.

This is *not* happening on any of our other servers. If I restart the
node, it happens again when a client connects.

It's also (afaict) only affecting two of the configured ranch
listeners; the other two appear to be fine.

I've got a crash dump: all I can see is the ranch_conns_sup process is
blocking in gen_statem:call_dirty but the receiving process appears
(afaict) to be happily sitting in gen_statem:loop_receive, with
message_queue_len = 0, so I don't know why the call's not completing.

Any ideas? What else can I look at, assuming it continues to happen?

(OTP-21.0, ranch 1.3.2)
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Hang in ssl_connection:call/2 with ranch

Roger Lipscombe-2
I found the underlying problem: the cert/key files are busted: <<"\n">>.

The failure mode is ... surprising, though.

On Fri, 26 Apr 2019 at 21:35, Roger Lipscombe <[hidden email]> wrote:

>
> (OTP-21.0)
>
> I'm looking at a deadlock on one of our servers (fortunately only in
> staging bring-up, so it's not bothering anyone yet).
>
> I've got a ranch protocol handler blocked in ranch:accept_ack, waiting
> for a 'shoot' message. That's never being sent because the
> ranch_conns_sup process is blocked in ssl:controlling_process ->
> ssl_connection:new_user -> ssl_connection:call ->
> gen_statem:call_dirty -> gen:do_call.
>
> This is *not* happening on any of our other servers. If I restart the
> node, it happens again when a client connects.
>
> It's also (afaict) only affecting two of the configured ranch
> listeners; the other two appear to be fine.
>
> I've got a crash dump: all I can see is the ranch_conns_sup process is
> blocking in gen_statem:call_dirty but the receiving process appears
> (afaict) to be happily sitting in gen_statem:loop_receive, with
> message_queue_len = 0, so I don't know why the call's not completing.
>
> Any ideas? What else can I look at, assuming it continues to happen?
>
> (OTP-21.0, ranch 1.3.2)
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Hang in ssl_connection:call/2 with ranch

Loïc Hoguin-3
Hey,

It was solved in 21.1: https://bugs.erlang.org/browse/ERL-664

Cheers,

On 29/04/2019 14:40, Roger Lipscombe wrote:

> I found the underlying problem: the cert/key files are busted: <<"\n">>.
>
> The failure mode is ... surprising, though.
>
> On Fri, 26 Apr 2019 at 21:35, Roger Lipscombe <[hidden email]> wrote:
>>
>> (OTP-21.0)
>>
>> I'm looking at a deadlock on one of our servers (fortunately only in
>> staging bring-up, so it's not bothering anyone yet).
>>
>> I've got a ranch protocol handler blocked in ranch:accept_ack, waiting
>> for a 'shoot' message. That's never being sent because the
>> ranch_conns_sup process is blocked in ssl:controlling_process ->
>> ssl_connection:new_user -> ssl_connection:call ->
>> gen_statem:call_dirty -> gen:do_call.
>>
>> This is *not* happening on any of our other servers. If I restart the
>> node, it happens again when a client connects.
>>
>> It's also (afaict) only affecting two of the configured ranch
>> listeners; the other two appear to be fine.
>>
>> I've got a crash dump: all I can see is the ranch_conns_sup process is
>> blocking in gen_statem:call_dirty but the receiving process appears
>> (afaict) to be happily sitting in gen_statem:loop_receive, with
>> message_queue_len = 0, so I don't know why the call's not completing.
>>
>> Any ideas? What else can I look at, assuming it continues to happen?
>>
>> (OTP-21.0, ranch 1.3.2)
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>

--
Loïc Hoguin
https://ninenines.eu
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Hang in ssl_connection:call/2 with ranch

Roger Lipscombe-2
Thanks. I'll get us upgraded.

On Mon, 29 Apr 2019 at 13:47, Loïc Hoguin <[hidden email]> wrote:

>
> Hey,
>
> It was solved in 21.1: https://bugs.erlang.org/browse/ERL-664
>
> Cheers,
>
> On 29/04/2019 14:40, Roger Lipscombe wrote:
> > I found the underlying problem: the cert/key files are busted: <<"\n">>.
> >
> > The failure mode is ... surprising, though.
> >
> > On Fri, 26 Apr 2019 at 21:35, Roger Lipscombe <[hidden email]> wrote:
> >>
> >> (OTP-21.0)
> >>
> >> I'm looking at a deadlock on one of our servers (fortunately only in
> >> staging bring-up, so it's not bothering anyone yet).
> >>
> >> I've got a ranch protocol handler blocked in ranch:accept_ack, waiting
> >> for a 'shoot' message. That's never being sent because the
> >> ranch_conns_sup process is blocked in ssl:controlling_process ->
> >> ssl_connection:new_user -> ssl_connection:call ->
> >> gen_statem:call_dirty -> gen:do_call.
> >>
> >> This is *not* happening on any of our other servers. If I restart the
> >> node, it happens again when a client connects.
> >>
> >> It's also (afaict) only affecting two of the configured ranch
> >> listeners; the other two appear to be fine.
> >>
> >> I've got a crash dump: all I can see is the ranch_conns_sup process is
> >> blocking in gen_statem:call_dirty but the receiving process appears
> >> (afaict) to be happily sitting in gen_statem:loop_receive, with
> >> message_queue_len = 0, so I don't know why the call's not completing.
> >>
> >> Any ideas? What else can I look at, assuming it continues to happen?
> >>
> >> (OTP-21.0, ranch 1.3.2)
> > _______________________________________________
> > erlang-questions mailing list
> > [hidden email]
> > http://erlang.org/mailman/listinfo/erlang-questions
> >
>
> --
> Loïc Hoguin
> https://ninenines.eu
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions