Seeing inet:start_timer badarg crash

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Seeing inet:start_timer badarg crash

Vince Foley
Hello there, I'm seeing some crashes in my logs since updating to OTP 21 that I can't quite track down...

The exit value looks like this:

{
  :badarg,
  [
    {:erlang, :start_timer, [:inet, #PID<0.17713.363>, :inet], []},
    {:inet, :start_timer, 1, [file: 'inet.erl', line: 1763]},
    {:inet, :getaddr, 3, [file: 'inet.erl', line: 591]},
    {:inet_tcp_dist, :do_setup, 7, [file: 'inet_tcp_dist.erl', line: 289]}
  ]
}

It looks like something is going wrong in the distributed node connection system. The result that I can see from this is that the nodes are no longer able to connect to each other.

I am using a mechanism other than the standard `epmd` to discover the other nodes, so I'm wondering if a detail changed that I have to adapt to. I'm using `-epmd_module`. The mechanism I'm using is described in this article:


That might be related but I don't quite see how. Does anyone have any ideas what might be happening here?

Thanks!

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Seeing inet:start_timer badarg crash

Dominic Morneau
Hi Vince,

That’s a bug introduced in:

It happens when using an epmd_module that doesn’t implement the optional address_please callback.

You can work around it by implementing address_please inside your EpmdClient module exactly like this:

Then inet_tcp_dist will pick up your implementation rather than the default case, which is broken because inet.getaddr/3’s last argument is not an address family, it’s a timeout:

Dominic

2018年10月12日(金) 5:58 Vince Foley <[hidden email]>:
Hello there, I'm seeing some crashes in my logs since updating to OTP 21 that I can't quite track down...

The exit value looks like this:

{
  :badarg,
  [
    {:erlang, :start_timer, [:inet, #PID<0.17713.363>, :inet], []},
    {:inet, :start_timer, 1, [file: 'inet.erl', line: 1763]},
    {:inet, :getaddr, 3, [file: 'inet.erl', line: 591]},
    {:inet_tcp_dist, :do_setup, 7, [file: 'inet_tcp_dist.erl', line: 289]}
  ]
}

It looks like something is going wrong in the distributed node connection system. The result that I can see from this is that the nodes are no longer able to connect to each other.

I am using a mechanism other than the standard `epmd` to discover the other nodes, so I'm wondering if a detail changed that I have to adapt to. I'm using `-epmd_module`. The mechanism I'm using is described in this article:


That might be related but I don't quite see how. Does anyone have any ideas what might be happening here?

Thanks!
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Seeing inet:start_timer badarg crash

Vince Foley
This is excellent info, thanks so much!

On Fri, Oct 12, 2018 at 6:39 PM Dominic Morneau <[hidden email]> wrote:
Hi Vince,

That’s a bug introduced in:

It happens when using an epmd_module that doesn’t implement the optional address_please callback.

You can work around it by implementing address_please inside your EpmdClient module exactly like this:

Then inet_tcp_dist will pick up your implementation rather than the default case, which is broken because inet.getaddr/3’s last argument is not an address family, it’s a timeout:

Dominic

2018年10月12日(金) 5:58 Vince Foley <[hidden email]>:
Hello there, I'm seeing some crashes in my logs since updating to OTP 21 that I can't quite track down...

The exit value looks like this:

{
  :badarg,
  [
    {:erlang, :start_timer, [:inet, #PID<0.17713.363>, :inet], []},
    {:inet, :start_timer, 1, [file: 'inet.erl', line: 1763]},
    {:inet, :getaddr, 3, [file: 'inet.erl', line: 591]},
    {:inet_tcp_dist, :do_setup, 7, [file: 'inet_tcp_dist.erl', line: 289]}
  ]
}

It looks like something is going wrong in the distributed node connection system. The result that I can see from this is that the nodes are no longer able to connect to each other.

I am using a mechanism other than the standard `epmd` to discover the other nodes, so I'm wondering if a detail changed that I have to adapt to. I'm using `-epmd_module`. The mechanism I'm using is described in this article:


That might be related but I don't quite see how. Does anyone have any ideas what might be happening here?

Thanks!
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Seeing inet:start_timer badarg crash

Vince Foley
I'll use the workaround, but I went ahead and took a shot at fixing the bug in erlang here:


On Fri, Oct 12, 2018 at 7:54 PM Vince Foley <[hidden email]> wrote:
This is excellent info, thanks so much!

On Fri, Oct 12, 2018 at 6:39 PM Dominic Morneau <[hidden email]> wrote:
Hi Vince,

That’s a bug introduced in:

It happens when using an epmd_module that doesn’t implement the optional address_please callback.

You can work around it by implementing address_please inside your EpmdClient module exactly like this:

Then inet_tcp_dist will pick up your implementation rather than the default case, which is broken because inet.getaddr/3’s last argument is not an address family, it’s a timeout:

Dominic

2018年10月12日(金) 5:58 Vince Foley <[hidden email]>:
Hello there, I'm seeing some crashes in my logs since updating to OTP 21 that I can't quite track down...

The exit value looks like this:

{
  :badarg,
  [
    {:erlang, :start_timer, [:inet, #PID<0.17713.363>, :inet], []},
    {:inet, :start_timer, 1, [file: 'inet.erl', line: 1763]},
    {:inet, :getaddr, 3, [file: 'inet.erl', line: 591]},
    {:inet_tcp_dist, :do_setup, 7, [file: 'inet_tcp_dist.erl', line: 289]}
  ]
}

It looks like something is going wrong in the distributed node connection system. The result that I can see from this is that the nodes are no longer able to connect to each other.

I am using a mechanism other than the standard `epmd` to discover the other nodes, so I'm wondering if a detail changed that I have to adapt to. I'm using `-epmd_module`. The mechanism I'm using is described in this article:


That might be related but I don't quite see how. Does anyone have any ideas what might be happening here?

Thanks!
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions