net_kernel:start() "the name seems to be in use by another Erlang node"

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

net_kernel:start() "the name seems to be in use by another Erlang node"

Mark Sebald
I am using "net_kernel:start()", to start multiple nodes of the same application, on the same host. Each time I start a node, I don't know how many other nodes were already started, so my thinking was to add an index number to the node name, and if start() returns: "{already_started, pid()}" I will increment the index on the node name, and call start() again.

Instead, when I try to start a second node of the same name I get this:

14:14:29.425 [info] Protocol 'inet_tcp': the name my_node01@my_host seems to be in use by another Erlang node
14:14:29.425 [error] CRASH REPORT Process <0.109.0> with 0 neighbours exited with reason: {error,badarg} in gen_server:init_it/6 line 349

and an error tuple is returned

This seems like an Erlang bug, since the info log message indicates that the situation is well in hand, but then we get a crash.  Is there a better way to start multiple nodes?  Are there any conditions where start() will return "already_started"?

I found this from 2013:  https://github.com/elixir-lang/elixir/issues/1707, where an Elixir user reports the same issue and Jose Valim's responds: 

"I did more research on this one and unfortunately there is nothing we can do. The error comes from within the vm, before Elixir kicks in. I will open up a discussion with the OTP team. ..."

In the end, I  match the case where start() returns an error, and then try starting the next node in my series, which works, but it seems rather ugly.  I am on OTP ver. 20.x.

Mark

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: net_kernel:start() "the name seems to be in use by another Erlang node"

zxq9-2
On 2018年1月28日日曜日 15時51分07秒 JST Mark Sebald wrote:

> I am using "net_kernel:start()", to start multiple nodes of the same
> application, on the same host. Each time I start a node, I don't know how
> many other nodes were already started, so my thinking was to add an index
> number to the node name, and if start() returns: "{already_started, pid()}"
> I will increment the index on the node name, and call start() again.
>
> Instead, when I try to start a second node of the same name I get this:
>
> 14:14:29.425 [info] Protocol 'inet_tcp': the name my_node01@my_host seems
> to be in use by another Erlang node
> 14:14:29.425 [error] CRASH REPORT Process <0.109.0> with 0 neighbours
> exited with reason: {error,badarg} in gen_server:init_it/6 line 349
>
> and an error tuple is returned
>
> This seems like an Erlang bug, since the info log message indicates that
> the situation is well in hand, but then we get a crash.  Is there a better
> way to start multiple nodes?  Are there any conditions where start() will
> return "already_started"?

I've not messed with this in a while, but I do recall being able to do dynamic
start names like this. I don't have a sample of the code that did it for me,
though.

In the shell, however, the following worked.

Node 1:

  Eshell V9.2  (abort with ^G)
  1> net_kernel:start([foo, longnames]).
  {ok,<0.66.0>}
  ([hidden email])2>


Node 2:

  Eshell V9.2  (abort with ^G)
  1> Start =
  1>   fun(Name1, Name2) ->
  1>     case net_kernel:start([Name1, longnames]) of
  1>       {ok, Pid} ->
  1>         {ok, Pid};
  1>       {error, Reason} ->
  1>         ok = io:format("Failed with ~tp~nTrying ~tp~n", [Reason, Name2]),
  1>         net_kernel:start([Name2, longnames])
  1>     end
  1>   end.
  #Fun<erl_eval.12.99386804>
  2> Start(foo, bar).
  Failed with {{shutdown,
                   {failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
               {child,undefined,net_sup_dynamic,
                   {erl_distribution,start_link,[[foo,longnames],false]},
                   permanent,1000,supervisor,
                   [erl_distribution]}}
  Trying bar
 
  =INFO REPORT==== 29-Jan-2018::17:48:11 ===
  Protocol 'inet_tcp': the name [hidden email] seems to be in use by another Erlang node
  {ok,<0.71.0>}
  ([hidden email])3>


The info message from SASL still pops up, but it isn't crashing anything,
just returning the error tuple as expected. I'm not sure what else would
be going on in your case. I know there are a few tricky EPMD issues, I just
never seem to run into them.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: net_kernel:start() "the name seems to be in use by another Erlang node"

arif-2
In reply to this post by Mark Sebald
Hi,

1. if you use longnames, you must _use_ a long name
2. epmd must be running

Eshell V9.2  (abort with ^G)
1> net_kernel:start([foo,longnames]).

=INFO REPORT==== 29-Jan-2018::12:11:46 ===
Can't set long node name!
Please check your configuration
{error,
    {{shutdown,
         {failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
     {child,undefined,net_sup_dynamic,
         {erl_distribution,start_link,[[foo,longnames],false]},
         permanent,1000,supervisor,
         [erl_distribution]}}}
2> net_kernel:start(['foo@192.168.56.101',longnames]).

=INFO REPORT==== 29-Jan-2018::12:12:07 ===
Protocol 'inet_tcp': register/listen error: econnrefused
{error,
    {{shutdown,
         {failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
     {child,undefined,net_sup_dynamic,
         {erl_distribution,start_link,
             [['foo@192.168.56.101',longnames],false]},
         permanent,1000,supervisor,
         [erl_distribution]}}}

From epmd documentation:
The daemon is started automatically by command erl(1) if the node is to
be distributed and no running instance is present.

Since we didn’t start the node so, it's not auotmatic. In fact
me@edmund ~ $ ps -ef | grep epmd
me      2439  2422  0 12:12 pts/2    00:00:00 grep --colour=auto epmd

If you start a node with distribution, and kill it immediately, epmd
will have started:
me@edmund ~ $ erl -sname foo
Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:2:2] [ds:2:2:10]
[async-threads:10] [hipe] [kernel-poll:false]

Eshell V9.2  (abort with ^G)
(foo@edmund)1>
(foo@edmund)1>
User switch command
 --> q
me@edmund ~ $ ps -ef | grep epmd
me      2454  1521  0 12:19 ?        00:00:00
/usr/local/lib/erlang/erts-9.2/bin/epmd -daemon
me      2524  2422  0 12:20 pts/2    00:00:00 grep --colour=auto epmd
me@edmund ~ $

So now you can attempt starting the node again:
3> net_kernel:start(['foo@192.168.56.101',longnames]).
{ok,<0.73.0>}
(foo@192.168.56.101)4>

Also, net_kernel:start/1 does not return {already_started, pid()}, but
{error, Reason} and Reason _could_ be {already_started, pid()}.

If you now try to start another node in the same way with the same name,
you will get the error you are getting:

1> net_kernel:start(['foo@192.168.56.101',longnames]).

=INFO REPORT==== 29-Jan-2018::13:45:13 ===
Protocol 'inet_tcp': the name foo@192.168.56.101 seems to be in use by
another Erlang node
{error,
    {{shutdown,
         {failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
     {child,undefined,net_sup_dynamic,
         {erl_distribution,start_link,
             [['foo@192.168.56.101',longnames],false]},
         permanent,1000,supervisor,
         [erl_distribution]}}}
2>

BR/Arif



Date: Mon, 29 Jan 2018 17:57:09 +0900
From: [hidden email]
To: [hidden email]
Subject: Re: [erlang-questions] net_kernel:start() "the name seems to
        be in use by another Erlang node"
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="UTF-8"

On 2018?1?28???? 15?51?07? JST Mark Sebald wrote:

> I am using "net_kernel:start()", to start multiple nodes of the same
> application, on the same host. Each time I start a node, I don't know
> how many other nodes were already started, so my thinking was to add
> an index number to the node name, and if start() returns: "{already_started, pid()}"
> I will increment the index on the node name, and call start() again.
>
> Instead, when I try to start a second node of the same name I get this:
>
> 14:14:29.425 [info] Protocol 'inet_tcp': the name my_node01@my_host
> seems to be in use by another Erlang node
> 14:14:29.425 [error] CRASH REPORT Process <0.109.0> with 0 neighbours
> exited with reason: {error,badarg} in gen_server:init_it/6 line 349
>
> and an error tuple is returned
>
> This seems like an Erlang bug, since the info log message indicates
> that the situation is well in hand, but then we get a crash.  Is there
> a better way to start multiple nodes?  Are there any conditions where
> start() will return "already_started"?

I've not messed with this in a while, but I do recall being able to do
dynamic start names like this. I don't have a sample of the code that
did it for me, though.

In the shell, however, the following worked.

Node 1:

  Eshell V9.2  (abort with ^G)
  1> net_kernel:start([foo, longnames]).
  {ok,<0.66.0>}
  ([hidden email])2>


Node 2:

  Eshell V9.2  (abort with ^G)
  1> Start =
  1>   fun(Name1, Name2) ->
  1>     case net_kernel:start([Name1, longnames]) of
  1>       {ok, Pid} ->
  1>         {ok, Pid};
  1>       {error, Reason} ->
  1>         ok = io:format("Failed with ~tp~nTrying ~tp~n", [Reason,
Name2]),
  1>         net_kernel:start([Name2, longnames])
  1>     end
  1>   end.
  #Fun<erl_eval.12.99386804>
  2> Start(foo, bar).
  Failed with {{shutdown,
                 
{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
               {child,undefined,net_sup_dynamic,
                 
{erl_distribution,start_link,[[foo,longnames],false]},
                   permanent,1000,supervisor,
                   [erl_distribution]}}
  Trying bar
 
  =INFO REPORT==== 29-Jan-2018::17:48:11 ===
  Protocol 'inet_tcp': the name [hidden email] seems to
be in use by another Erlang node
  {ok,<0.71.0>}
  ([hidden email])3>


The info message from SASL still pops up, but it isn't crashing
anything, just returning the error tuple as expected. I'm not sure what
else would be going on in your case. I know there are a few tricky EPMD
issues, I just never seem to run into them.

-Craig

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: net_kernel:start() "the name seems to be in use by another Erlang node"

arif-2
Sorry, I forgot to add that you will get the error {already_started,
pid()} if you try to start the node from the already started node:

(foo@192.168.56.101)4>
net_kernel:start(['foo@192.168.56.101',longnames]).
{error,{already_started,<0.73.0>}}
(foo@192.168.56.101)5>
 
BR/Arif

On 2018-01-29 13:47, [hidden email] wrote:

> Hi,
>
> 1. if you use longnames, you must _use_ a long name
> 2. epmd must be running
>
> Eshell V9.2  (abort with ^G)
> 1> net_kernel:start([foo,longnames]).
>
> =INFO REPORT==== 29-Jan-2018::12:11:46 ===
> Can't set long node name!
> Please check your configuration
> {error,
>     {{shutdown,
>          {failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
>      {child,undefined,net_sup_dynamic,
>          {erl_distribution,start_link,[[foo,longnames],false]},
>          permanent,1000,supervisor,
>          [erl_distribution]}}}
> 2> net_kernel:start(['foo@192.168.56.101',longnames]).
>
> =INFO REPORT==== 29-Jan-2018::12:12:07 ===
> Protocol 'inet_tcp': register/listen error: econnrefused
> {error,
>     {{shutdown,
>          {failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
>      {child,undefined,net_sup_dynamic,
>          {erl_distribution,start_link,
>              [['foo@192.168.56.101',longnames],false]},
>          permanent,1000,supervisor,
>          [erl_distribution]}}}
>
> From epmd documentation:
> The daemon is started automatically by command erl(1) if the node is to
> be distributed and no running instance is present.
>
> Since we didn’t start the node so, it's not auotmatic. In fact
> me@edmund ~ $ ps -ef | grep epmd
> me      2439  2422  0 12:12 pts/2    00:00:00 grep --colour=auto epmd
>
> If you start a node with distribution, and kill it immediately, epmd
> will have started:
> me@edmund ~ $ erl -sname foo
> Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:2:2] [ds:2:2:10]
> [async-threads:10] [hipe] [kernel-poll:false]
>
> Eshell V9.2  (abort with ^G)
> (foo@edmund)1>
> (foo@edmund)1>
> User switch command
>  --> q
> me@edmund ~ $ ps -ef | grep epmd
> me      2454  1521  0 12:19 ?        00:00:00
> /usr/local/lib/erlang/erts-9.2/bin/epmd -daemon
> me      2524  2422  0 12:20 pts/2    00:00:00 grep --colour=auto epmd
> me@edmund ~ $
>
> So now you can attempt starting the node again:
> 3> net_kernel:start(['foo@192.168.56.101',longnames]).
> {ok,<0.73.0>}
> (foo@192.168.56.101)4>
>
> Also, net_kernel:start/1 does not return {already_started, pid()}, but
> {error, Reason} and Reason _could_ be {already_started, pid()}.
>
> If you now try to start another node in the same way with the same name,
> you will get the error you are getting:
>
> 1> net_kernel:start(['foo@192.168.56.101',longnames]).
>
> =INFO REPORT==== 29-Jan-2018::13:45:13 ===
> Protocol 'inet_tcp': the name foo@192.168.56.101 seems to be in use by
> another Erlang node
> {error,
>     {{shutdown,
>          {failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
>      {child,undefined,net_sup_dynamic,
>          {erl_distribution,start_link,
>              [['foo@192.168.56.101',longnames],false]},
>          permanent,1000,supervisor,
>          [erl_distribution]}}}
> 2>
>
> BR/Arif
>
>
>
> Date: Mon, 29 Jan 2018 17:57:09 +0900
> From: [hidden email]
> To: [hidden email]
> Subject: Re: [erlang-questions] net_kernel:start() "the name seems to
> be in use by another Erlang node"
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="UTF-8"
>
> On 2018?1?28???? 15?51?07? JST Mark Sebald wrote:
>> I am using "net_kernel:start()", to start multiple nodes of the same
>> application, on the same host. Each time I start a node, I don't know
>> how many other nodes were already started, so my thinking was to add
>> an index number to the node name, and if start() returns: "{already_started, pid()}"
>> I will increment the index on the node name, and call start() again.
>>
>> Instead, when I try to start a second node of the same name I get this:
>>
>> 14:14:29.425 [info] Protocol 'inet_tcp': the name my_node01@my_host
>> seems to be in use by another Erlang node
>> 14:14:29.425 [error] CRASH REPORT Process <0.109.0> with 0 neighbours
>> exited with reason: {error,badarg} in gen_server:init_it/6 line 349
>>
>> and an error tuple is returned
>>
>> This seems like an Erlang bug, since the info log message indicates
>> that the situation is well in hand, but then we get a crash.  Is there
>> a better way to start multiple nodes?  Are there any conditions where
>> start() will return "already_started"?
>
> I've not messed with this in a while, but I do recall being able to do
> dynamic start names like this. I don't have a sample of the code that
> did it for me, though.
>
> In the shell, however, the following worked.
>
> Node 1:
>
>   Eshell V9.2  (abort with ^G)
>   1> net_kernel:start([foo, longnames]).
>   {ok,<0.66.0>}
>   ([hidden email])2>
>
>
> Node 2:
>
>   Eshell V9.2  (abort with ^G)
>   1> Start =
>   1>   fun(Name1, Name2) ->
>   1>     case net_kernel:start([Name1, longnames]) of
>   1>       {ok, Pid} ->
>   1>         {ok, Pid};
>   1>       {error, Reason} ->
>   1>         ok = io:format("Failed with ~tp~nTrying ~tp~n", [Reason,
> Name2]),
>   1>         net_kernel:start([Name2, longnames])
>   1>     end
>   1>   end.
>   #Fun<erl_eval.12.99386804>
>   2> Start(foo, bar).
>   Failed with {{shutdown,
>                  
> {failed_to_start_child,net_kernel,{'EXIT',nodistribution}}},
>                {child,undefined,net_sup_dynamic,
>                  
> {erl_distribution,start_link,[[foo,longnames],false]},
>                    permanent,1000,supervisor,
>                    [erl_distribution]}}
>   Trying bar
>  
>   =INFO REPORT==== 29-Jan-2018::17:48:11 ===
>   Protocol 'inet_tcp': the name [hidden email] seems to
> be in use by another Erlang node
>   {ok,<0.71.0>}
>   ([hidden email])3>
>
>
> The info message from SASL still pops up, but it isn't crashing
> anything, just returning the error tuple as expected. I'm not sure what
> else would be going on in your case. I know there are a few tricky EPMD
> issues, I just never seem to run into them.
>
> -Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions