Abandoned (ranch) connection processes?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Abandoned (ranch) connection processes?

Roger Lipscombe-2
I'm investigating a problem where I've run out of ranch connections
(I've got max_connections set to 20000).

I'm using a gen_server with {active, once} handling, with ranch_ssl.
As far as I can tell the sequence goes like this:

1. ranch calls my_protocol:start_link.
2. my_protocol:start_link calls proc_lib:spawn_link.
3. my_protocol:init calls ranch:accept_ack.
4. my_protocol:init calls Transport:setopts(Socket, [{active, once}]).
I'm *not* verifying the result.
5. my_protocol:init calls gen_server:enter_loop.

The client, apparently, never sends any data. The socket is, by the
time I examine it, closed (I get {error, close} when I prod it with
ssl:send). It doesn't appear in inet:i().

I never see an {ssl_closed, ...} message in my protocol handler.

It's too early in my application protocol to turn on application level
keep-alives, but I've got TCP keepalive enabled (afaict), and it
doesn't address this problem, apparently.

The only thing I can think of is that the socket is being closed
between ranch:accept_ack and Transport:setopts, and Erlang's not
sending the ssl_closed message. Does this sound likely? How do I deal
with this?

Erlang/OTP-19.3.6.4, incidentally.

Cheers,
Roger.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Abandoned (ranch) connection processes?

Dmitry Kolesnikov-2
Hello,

On 15 Feb 2018, at 13.08, Roger Lipscombe <[hidden email]> wrote:

The only thing I can think of is that the socket is being closed
between ranch:accept_ack and Transport:setopts, and Erlang's not
sending the ssl_closed message. Does this sound likely? How do I deal
with this?

No, it does not sound likely! The bug is either at ranch or your code. 

I think you should try to verify the result of each socket operation before going further on. I am referring here to your statement: I'm *not* verifying the result.


Best Regards, 
Dmitry

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Abandoned (ranch) connection processes?

Roger Lipscombe-2
OK. Let me rephrase that:

- {active, once} obviously has something in place to handle data
arriving and closed sockets *in between* calls to {active, once} --
i.e. it'll be {active, false} for a brief interval. I last looked at
this code in 17.x (before the gen_statem refactoring), so I'm not sure
where it lives now.
- does it deal correctly with closed sockets that close before the
*first* call to {active, once}? In other words: can I expect an
ssl_closed message in this case? Is there something special about the
first call?

On 15 February 2018 at 11:15, Dmitry Kolesnikov <[hidden email]> wrote:

> Hello,
>
> On 15 Feb 2018, at 13.08, Roger Lipscombe <[hidden email]> wrote:
>
> The only thing I can think of is that the socket is being closed
> between ranch:accept_ack and Transport:setopts, and Erlang's not
> sending the ssl_closed message. Does this sound likely? How do I deal
> with this?
>
>
> No, it does not sound likely! The bug is either at ranch or your code.
>
> I think you should try to verify the result of each socket operation before
> going further on. I am referring here to your statement: I'm *not* verifying
> the result.
>
>
> Best Regards,
> Dmitry
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Abandoned (ranch) connection processes?

Roger Lipscombe-2
Related: this particular set of servers was suffering previously from
the bug fixed in
https://github.com/erlang/otp/commit/a5434a323afb1972195aa5f55b4894595df2c24f.

We have a *lot* of battery-powered, Wifi-connected devices. It only
takes a small fraction of a percent of them having a bad day to
exercise all kinds of error conditions.

Is it possible that there's another path through here that's not
cleaning up properly?

On 15 February 2018 at 11:23, Roger Lipscombe <[hidden email]> wrote:

> OK. Let me rephrase that:
>
> - {active, once} obviously has something in place to handle data
> arriving and closed sockets *in between* calls to {active, once} --
> i.e. it'll be {active, false} for a brief interval. I last looked at
> this code in 17.x (before the gen_statem refactoring), so I'm not sure
> where it lives now.
> - does it deal correctly with closed sockets that close before the
> *first* call to {active, once}? In other words: can I expect an
> ssl_closed message in this case? Is there something special about the
> first call?
>
> On 15 February 2018 at 11:15, Dmitry Kolesnikov <[hidden email]> wrote:
>> Hello,
>>
>> On 15 Feb 2018, at 13.08, Roger Lipscombe <[hidden email]> wrote:
>>
>> The only thing I can think of is that the socket is being closed
>> between ranch:accept_ack and Transport:setopts, and Erlang's not
>> sending the ssl_closed message. Does this sound likely? How do I deal
>> with this?
>>
>>
>> No, it does not sound likely! The bug is either at ranch or your code.
>>
>> I think you should try to verify the result of each socket operation before
>> going further on. I am referring here to your statement: I'm *not* verifying
>> the result.
>>
>>
>> Best Regards,
>> Dmitry
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Abandoned (ranch) connection processes?

Fred Hebert-2
In reply to this post by Roger Lipscombe-2
On 02/15, Roger Lipscombe wrote:

>OK. Let me rephrase that:
>
>- {active, once} obviously has something in place to handle data
>arriving and closed sockets *in between* calls to {active, once} --
>i.e. it'll be {active, false} for a brief interval. I last looked at
>this code in 17.x (before the gen_statem refactoring), so I'm not sure
>where it lives now.
>- does it deal correctly with closed sockets that close before the
>*first* call to {active, once}? In other words: can I expect an
>ssl_closed message in this case? Is there something special about the
>first call?
>

Nothing like a quick experiment to show:

    5> {ok, S} = ssl:connect("google.com", 443, [{active, false}]).
    {ok,{sslsocket,{gen_tcp,#Port<0.953>,tls_connection,
                            undefined},
                   <0.91.0>}}
    6> ssl:close(S).
    ok
    7> ssl:setopts(S, [{active,once}]).
    {error,closed}
    8> flush().
    ok

It looks like you do have to check the result of setopts to find that
the connections has failed early.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Abandoned (ranch) connection processes?

Dmitry Kolesnikov-2
In reply to this post by Roger Lipscombe-2
Hello,

As far, I’ve understood from you explantation... We have a following “sequence” of operations when accepting socket and upgrading a connection to ssl:

A. ssl:transport_accept(...)
B. ssl:ssl_accept(...)
C. ssl:setopts(…, [{active, once}])

Before (C) socket in passive mode. No one guarantees that {ssl_closed, …} is delivered to you. Therefore, you need to check error codes of each socket operation and react in correspond manner. E.g. if connection is dropped between B and C then return you an error.  

— CLIP — 

When an SSL socket is in active mode (the default), data from the socket is delivered to the owner of the socket in the form of messages:

{ssl, Socket, Data}

{ssl_closed, Socket}

{ssl_error, Socket, Reason}

— CLIP — 
 
Best Regards, 
Dmitry


On 15 Feb 2018, at 13.23, Roger Lipscombe <[hidden email]> wrote:

OK. Let me rephrase that:

- {active, once} obviously has something in place to handle data
arriving and closed sockets *in between* calls to {active, once} --
i.e. it'll be {active, false} for a brief interval. I last looked at
this code in 17.x (before the gen_statem refactoring), so I'm not sure
where it lives now.
- does it deal correctly with closed sockets that close before the
*first* call to {active, once}? In other words: can I expect an
ssl_closed message in this case? Is there something special about the
first call?

On 15 February 2018 at 11:15, Dmitry Kolesnikov <[hidden email]> wrote:
Hello,

On 15 Feb 2018, at 13.08, Roger Lipscombe <[hidden email]> wrote:

The only thing I can think of is that the socket is being closed
between ranch:accept_ack and Transport:setopts, and Erlang's not
sending the ssl_closed message. Does this sound likely? How do I deal
with this?


No, it does not sound likely! The bug is either at ranch or your code.

I think you should try to verify the result of each socket operation before
going further on. I am referring here to your statement: I'm *not* verifying
the result.


Best Regards,
Dmitry


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Abandoned (ranch) connection processes?

zxq9-2
In reply to this post by Fred Hebert-2
On 2018年2月15日木曜日 6時56分23秒 JST Fred Hebert wrote:
> It looks like you do have to check the result of setopts to find that
> the connections has failed early.

Yet another example where making every line an assertion (if not an assignment) has helped me discover this exact situation in the past.

  ok = ssl:setopts(Socket, [{active, once}]),

Doesn't let things progress along quietly.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Abandoned (ranch) connection processes?

Loïc Hoguin-3
In reply to this post by Fred Hebert-2
On 02/15/2018 12:56 PM, Fred Hebert wrote:

> On 02/15, Roger Lipscombe wrote:
>> OK. Let me rephrase that:
>>
>> - {active, once} obviously has something in place to handle data
>> arriving and closed sockets *in between* calls to {active, once} --
>> i.e. it'll be {active, false} for a brief interval. I last looked at
>> this code in 17.x (before the gen_statem refactoring), so I'm not sure
>> where it lives now.
>> - does it deal correctly with closed sockets that close before the
>> *first* call to {active, once}? In other words: can I expect an
>> ssl_closed message in this case? Is there something special about the
>> first call?
>>
>
> Nothing like a quick experiment to show:
>
>     5> {ok, S} = ssl:connect("google.com", 443, [{active, false}]).
>     {ok,{sslsocket,{gen_tcp,#Port<0.953>,tls_connection,
>                             undefined},
>                    <0.91.0>}}
>     6> ssl:close(S).
>     ok
>     7> ssl:setopts(S, [{active,once}]).
>     {error,closed}
>     8> flush().
>     ok
>
> It looks like you do have to check the result of setopts to find that
> the connections has failed early.

This isn't the same. For example if I do this with gen_tcp I get an
{error,einval} indicating my code is wrong.

12> {ok, S} = gen_tcp:connect("google.com", 80, [{active, false}]).
{ok,#Port<0.474>}
13> gen_tcp:close(S).
ok
14> inet:setopts(S, [{active, once}]).
{error,einval}
15> flush().
ok

If instead of closing myself I wait for the remote endpoint to close the
connection, I will not get an error from setopts:

17> {ok, S} = gen_tcp:connect("google.com", 80, [{active, false}]).
{ok,#Port<0.475>}
18> inet:setopts(S, [{active, once}]).
ok
19> flush().
ok
20> gen_tcp:send(S, "Hello\r\n").
{error,closed}

This contrasts with the behavior from ssl which does tell you the socket
is closed earlier, but doesn't seem to be something consistent across
both transports.

So if your code can use both transports and are not sending any data
upon opening the connection you will need to have a timeout of some kind
to detect that the socket got closed.

Cheers,

--
Loïc Hoguin
https://ninenines.eu
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Abandoned (ranch) connection processes?

Roger Lipscombe-2
In reply to this post by Fred Hebert-2
It's a server, not a client, which might make some difference. So
here's my repro:

    Port = 12090.
    CertFile = CertPath ++ "server.pem".
    KeyFile = CertPath ++ "server.key".

    {ok, _} = application:ensure_all_started(ssl).

    LOpts = [binary, {active, false}, {packet, raw}, {reuseaddr,
true}, {nodelay, true},
                  {certfile, CertFile},
                  {keyfile, KeyFile}].

    {ok, LSocket} = ssl:listen(Port, LOpts).
    {ok, CSocket} = ssl:transport_accept(LSocket, infinity).
    ok = ssl:ssl_accept(CSocket, infinity).

    % At this point:
    % openssl s_client -connect localhost:12090 -CAfile $CAFile

    ok = ssl:setopts(CSocket, [{active, once}]).
    flush().

If I kill openssl (with Ctrl+C) before the setopts call, it returns
ok, and I get the ssl_closed message.


On 15 February 2018 at 11:56, Fred Hebert <[hidden email]> wrote:

> On 02/15, Roger Lipscombe wrote:
>>
>> OK. Let me rephrase that:
>>
>> - {active, once} obviously has something in place to handle data
>> arriving and closed sockets *in between* calls to {active, once} --
>> i.e. it'll be {active, false} for a brief interval. I last looked at
>> this code in 17.x (before the gen_statem refactoring), so I'm not sure
>> where it lives now.
>> - does it deal correctly with closed sockets that close before the
>> *first* call to {active, once}? In other words: can I expect an
>> ssl_closed message in this case? Is there something special about the
>> first call?
>>
>
> Nothing like a quick experiment to show:
>
>    5> {ok, S} = ssl:connect("google.com", 443, [{active, false}]).
>    {ok,{sslsocket,{gen_tcp,#Port<0.953>,tls_connection,
>                            undefined},
>                   <0.91.0>}}
>    6> ssl:close(S).
>    ok
>    7> ssl:setopts(S, [{active,once}]).
>    {error,closed}
>    8> flush().
>    ok
>
> It looks like you do have to check the result of setopts to find that the
> connections has failed early.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Abandoned (ranch) connection processes?

Roger Lipscombe-2
In reply to this post by zxq9-2
On 15 February 2018 at 12:04,  <[hidden email]> wrote:
> Yet another example where making every line an assertion (if not an assignment) has helped me discover this exact situation in the past.
>
>   ok = ssl:setopts(Socket, [{active, once}]),
>
> Doesn't let things progress along quietly.

...except it doesn't work. It returns 'ok' even if the *peer* closed
the socket. See my later repro.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Abandoned (ranch) connection processes?

Roger Lipscombe-2
In reply to this post by Roger Lipscombe-2
OK. I did some spelunking and it seems to be strongly correlated with
the appearance of the following in my logs:

SSL: {connection,{alert,2,20,{"tls_record.erl",488},undefined}}:
ssl_connection.erl:861:Fatal error: unexpected message

If I find the "abandoned" ranch protocol processes, and the associated
SSL pid, and then look for it in my log files (with log_alerts
enabled), I get one of the above messages for every one I've looked at
so far.

On 15 February 2018 at 11:55, Roger Lipscombe <[hidden email]> wrote:

> Related: this particular set of servers was suffering previously from
> the bug fixed in
> https://github.com/erlang/otp/commit/a5434a323afb1972195aa5f55b4894595df2c24f.
>
> We have a *lot* of battery-powered, Wifi-connected devices. It only
> takes a small fraction of a percent of them having a bad day to
> exercise all kinds of error conditions.
>
> Is it possible that there's another path through here that's not
> cleaning up properly?
>
> On 15 February 2018 at 11:23, Roger Lipscombe <[hidden email]> wrote:
>> OK. Let me rephrase that:
>>
>> - {active, once} obviously has something in place to handle data
>> arriving and closed sockets *in between* calls to {active, once} --
>> i.e. it'll be {active, false} for a brief interval. I last looked at
>> this code in 17.x (before the gen_statem refactoring), so I'm not sure
>> where it lives now.
>> - does it deal correctly with closed sockets that close before the
>> *first* call to {active, once}? In other words: can I expect an
>> ssl_closed message in this case? Is there something special about the
>> first call?
>>
>> On 15 February 2018 at 11:15, Dmitry Kolesnikov <[hidden email]> wrote:
>>> Hello,
>>>
>>> On 15 Feb 2018, at 13.08, Roger Lipscombe <[hidden email]> wrote:
>>>
>>> The only thing I can think of is that the socket is being closed
>>> between ranch:accept_ack and Transport:setopts, and Erlang's not
>>> sending the ssl_closed message. Does this sound likely? How do I deal
>>> with this?
>>>
>>>
>>> No, it does not sound likely! The bug is either at ranch or your code.
>>>
>>> I think you should try to verify the result of each socket operation before
>>> going further on. I am referring here to your statement: I'm *not* verifying
>>> the result.
>>>
>>>
>>> Best Regards,
>>> Dmitry
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions