Request for next net_kernel release: new switch "net_setuptime_millis"

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Request for next net_kernel release: new switch "net_setuptime_millis"

Reto Kramer-2
I found where the 7s ping/net_connect delay I was seeing if I was
pinging (or multi_call'ing) a node who's connection had timeout comes
from (see net_kernel excerpt below).

I'm glad it's configurable (thanks for the foresight, whoever did it!),
however I need it smaller than 1s, which is the current minimum I can
set.

Is there any chance that in the next release of the net_kernel we could
add a new (equally undocumented, i.e. unsupported) switch:
net_setuptime_millis?  I suggest the net_setuptime switch dominates if
present to maintain backwards compatibly.

Funny enough, when the node in question (say c) never started in the
first place, then a multi_call to [a,b,c] does not suffer from the
connection setup delay, it's only because a connection (that timed out)
had been setup initially that I see the kernel setuptime timeout/delay.

Perhaps someone could educate me here - it seems that perhaps not all
the data structures associated with the timed out connection are
cleaned up (i.e. behavior is different from when the connection has
never been created in the first place).

Thanks,
- Reto

PS: in the meantime, I'll resort to removing the failed node from the
nodeset I multicall to, and then re-add it using a multicast based
discovery protocol (the latter I have anyway).  It would be nice if my
app could be naive about all that and just blindly multi_call to a node
set.

--------- net_kernel: ---------
[...]
%% Default connection setup timeout in milliseconds.
%% This timeout is set for every distributed action during
%% the connection setup.
-define(SETUPTIME, 7000).
[...]
connecttime() ->
     case application:get_env(kernel, net_setuptime) of
        {ok, Time} when integer(Time), Time > 0, Time < 120 ->
            Time * 1000;
        _ ->
            ?SETUPTIME
     end.
[...]
--------- net_kernel: ---------

______________________
An engineer can do for a dime what any fool can do for a dollar.
   -- unknown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2066 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20040108/ae0933f2/attachment.bin>

Reply | Threaded
Open this post in threaded view
|

Request for next net_kernel release: new switch "net_setuptime_millis"

Patrik Nyblom
Hi,

Well, the reason for the 7s delay is simply that a suspended erlang node
is still in some sense "alive". The listen socket is still listening and
it's still registered with it's local epmd. On the other hand, if the
node is completely halted, epmd has it unregistered and connection
attempts fail instantly. What happens is for a suspended node (suspended
node named b, connected from a):

a issues a request for b to epmd on host2. The request gets
answered immediately, a portnumber is the answer.

a connects to the given port (which succeeds as TCP/IP and the
socket interface works that way).

a sends the initial handshake string

a waits the stipulated timeout for the answer.

-> pang after 7 s

If the network cable is cut to host2, we would instead get:

a tries to connect to epmd on host2. The connection attempt fails
after the stipulated timeout.

-> pang after 7 s

On the other hand, if the node has never been alive:

a issues a request to epmd which immediately answers that there is
no such host.

-> pang after a few millis.

So, it's not that the distribution in node A keeps any data, it's simply
that we have a timeout situation when a node is suspended, regardless if
we have known the node before or not.

TCP/IP is unfortunately not a protocol made for realtime applications...
That's what all this long timeout hazzle comes from. A better network
protocol would make things much funnier for Erlang, no ticking, no
timeouting etc... A reliable protocol that monitors the links... a
protocol designed not for web browsing but for telecom... Sigh...  
Timeouts shorter than 1 s could be useful though. I agree. I'll add the
possibility to define fractions of seconds to the configuration.

Cheers,
/Patrik

Reto Kramer wrote:

> I found where the 7s ping/net_connect delay I was seeing if I was
> pinging (or multi_call'ing) a node who's connection had timeout comes
> from (see net_kernel excerpt below).
>
> I'm glad it's configurable (thanks for the foresight, whoever did
> it!), however I need it smaller than 1s, which is the current minimum
> I can set.
>
> Is there any chance that in the next release of the net_kernel we
> could add a new (equally undocumented, i.e. unsupported) switch:
> net_setuptime_millis? I suggest the net_setuptime switch dominates if
> present to maintain backwards compatibly.
>
> Funny enough, when the node in question (say c) never started in the
> first place, then a multi_call to [a,b,c] does not suffer from the
> connection setup delay, it's only because a connection (that timed
> out) had been setup initially that I see the kernel setuptime
> timeout/delay.
>
> Perhaps someone could educate me here - it seems that perhaps not all
> the data structures associated with the timed out connection are
> cleaned up (i.e. behavior is different from when the connection has
> never been created in the first place).
>
> Thanks,
> - Reto
>
> PS: in the meantime, I'll resort to removing the failed node from the
> nodeset I multicall to, and then re-add it using a multicast based
> discovery protocol (the latter I have anyway). It would be nice if my
> app could be naive about all that and just blindly multi_call to a
> node set.
>
> --------- net_kernel: ---------
> [...]
> %% Default connection setup timeout in milliseconds.
> %% This timeout is set for every distributed action during
> %% the connection setup.
> -define(SETUPTIME, 7000).
> [...]
> connecttime() ->
> case application:get_env(kernel, net_setuptime) of
> {ok, Time} when integer(Time), Time > 0, Time < 120 ->
> Time * 1000;
> _ ->
> ?SETUPTIME
> end.
> [...]
> --------- net_kernel: ---------
>
> ______________________
> An engineer can do for a dime what any fool can do for a dollar.
> -- unknown




Reply | Threaded
Open this post in threaded view
|

Request for next net_kernel release: new switch "net_setuptime_millis"

Sean Hinde-2

On 9 Jan 2004, at 15:03, Patrik Nyblom wrote:
> TCP/IP is unfortunately not a protocol made for realtime
> applications... That's what all this long timeout hazzle comes from. A
> better network protocol would make things much funnier for Erlang, no
> ticking, no timeouting etc... A reliable protocol that monitors the
> links... a protocol designed not for web browsing but for telecom...
> Sigh...  Timeouts shorter than 1 s could be useful though. I agree.
> I'll add the possibility to define fractions of seconds to the
> configuration.

sctp is supported natively in the 2.6.x Linux kernels. Yay!

Sean



Reply | Threaded
Open this post in threaded view
|

Request for next net_kernel release: new switch "net_setuptime_millis"

Michał Ptaszek
Hi Sean,

Are you using sctp in Erlang-node communication?


thanks,
Eduardo Figoli
INSwitch Solutions


----- Original Message -----
From: "Sean Hinde" <sean.hinde>
To: "Patrik Nyblom" <pan>
Cc: <erlang-questions>
Sent: Friday, January 09, 2004 1:20 PM
Subject: Re: Request for next net_kernel release: new switch
"net_setuptime_millis"


>
> On 9 Jan 2004, at 15:03, Patrik Nyblom wrote:
> > TCP/IP is unfortunately not a protocol made for realtime
> > applications... That's what all this long timeout hazzle comes from. A
> > better network protocol would make things much funnier for Erlang, no
> > ticking, no timeouting etc... A reliable protocol that monitors the
> > links... a protocol designed not for web browsing but for telecom...
> > Sigh...  Timeouts shorter than 1 s could be useful though. I agree.
> > I'll add the possibility to define fractions of seconds to the
> > configuration.
>
> sctp is supported natively in the 2.6.x Linux kernels. Yay!
>
> Sean
>



Reply | Threaded
Open this post in threaded view
|

Request for next net_kernel release: new switch "net_setuptime_millis"

Bruce Fitzsimons-2
In reply to this post by Sean Hinde-2
Sean Hinde wrote:

> On 9 Jan 2004, at 15:03, Patrik Nyblom wrote:
>
>> TCP/IP is unfortunately not a protocol made for realtime
>> applications... That's what all this long timeout hazzle comes from.
>> A better network protocol would make things much funnier for Erlang,
>> no ticking, no timeouting etc... A reliable protocol that monitors
>> the links... a protocol designed not for web browsing but for
>> telecom... Sigh...  Timeouts shorter than 1 s could be useful though.
>> I agree. I'll add the possibility to define fractions of seconds to
>> the configuration.
>
>
> sctp is supported natively in the 2.6.x Linux kernels. Yay!
>
Its also rumoured to be in Solaris 10, although I haven't yet downloaded
or installed the beta to check. Double yay. Hopefully with the new
(err...old) poll/select type interface.

/Bruce






Reply | Threaded
Open this post in threaded view
|

Request for next net_kernel release: new switch "net_setuptime_millis"

Patrik Nyblom
In reply to this post by Reto Kramer-2
Hi again,

Here's a patch to make it possible to reduce the setuptime by using
fractions of seconds (like 'erl -sname x -kernel net_setuptime 0.25').

Please observe, the result might be that, on a slow network, perfectly
reachable nodes give connection failures if this parameter is set to low.

It all depends on your network, so use with care :-)

-------------------------------------------------
*** net_kernel.erl.orig Mon Jan 12 09:56:49 2004
--- net_kernel.erl      Mon Jan 12 11:02:47 2004
***************
*** 1336,1343 ****
 
  connecttime() ->
      case application:get_env(kernel, net_setuptime) of
!       {ok, Time} when integer(Time), Time > 0, Time < 120 ->
            Time * 1000;
        _ ->
            ?SETUPTIME
      end.
--- 1336,1345 ----
 
  connecttime() ->
      case application:get_env(kernel, net_setuptime) of
!       {ok, Time} when is_integer(Time), Time > 0, Time < 120 ->
            Time * 1000;
+       {ok, Time} when is_float(Time), Time > 0, Time < 120 ->
+           round(Time * 1000);
        _ ->
            ?SETUPTIME
      end.
-----------------------------------------------------------------------
The code is commited for R9C-1.

Cheers,
/Patrik

Reto Kramer wrote:

> I found where the 7s ping/net_connect delay I was seeing if I was
> pinging (or multi_call'ing) a node who's connection had timeout comes
> from (see net_kernel excerpt below).
>
> I'm glad it's configurable (thanks for the foresight, whoever did
> it!), however I need it smaller than 1s, which is the current minimum
> I can set.
>
> Is there any chance that in the next release of the net_kernel we
> could add a new (equally undocumented, i.e. unsupported) switch:
> net_setuptime_millis? I suggest the net_setuptime switch dominates if
> present to maintain backwards compatibly.
>
> Funny enough, when the node in question (say c) never started in the
> first place, then a multi_call to [a,b,c] does not suffer from the
> connection setup delay, it's only because a connection (that timed
> out) had been setup initially that I see the kernel setuptime
> timeout/delay.
>
> Perhaps someone could educate me here - it seems that perhaps not all
> the data structures associated with the timed out connection are
> cleaned up (i.e. behavior is different from when the connection has
> never been created in the first place).
>
> Thanks,
> - Reto
>
> PS: in the meantime, I'll resort to removing the failed node from the
> nodeset I multicall to, and then re-add it using a multicast based
> discovery protocol (the latter I have anyway). It would be nice if my
> app could be naive about all that and just blindly multi_call to a
> node set.
>
> --------- net_kernel: ---------
> [...]
> %% Default connection setup timeout in milliseconds.
> %% This timeout is set for every distributed action during
> %% the connection setup.
> -define(SETUPTIME, 7000).
> [...]
> connecttime() ->
> case application:get_env(kernel, net_setuptime) of
> {ok, Time} when integer(Time), Time > 0, Time < 120 ->
> Time * 1000;
> _ ->
> ?SETUPTIME
> end.
> [...]
> --------- net_kernel: ---------
>
> ______________________
> An engineer can do for a dime what any fool can do for a dollar.
> -- unknown