Strange inet_gethost_native:gethostbyname behavior

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange inet_gethost_native:gethostbyname behavior

David Welton-3
Hi,

I ssh'ed into a system to see what was causing connection problems.

inet_gethost_native:gethostbyname("google.com", inet). kept failing
with {error, try_again}.

Started up a new Erlang shell and tried the same thing.  It worked fine.

Killed the external process it was using, inet_gethost and retried,
and it started working.

This is an older Erlang: erts-6.4.1.2 but inet_gethost hasn't seen a
lot of changes lately, going on the git logs.

Any ideas what 1) may cause this and 2) how to get around it?  The
system is stuck not being able to do DNS lookups, which is
problematic, to say the least.

Thanks
--
David N. Welton

http://www.welton.it/davidw/

http://www.dedasys.com/
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Strange inet_gethost_native:gethostbyname behavior

Raimo Niskanen-2
On Tue, Jan 23, 2018 at 07:02:22PM -0800, David Welton wrote:

> Hi,
>
> I ssh'ed into a system to see what was causing connection problems.
>
> inet_gethost_native:gethostbyname("google.com", inet). kept failing
> with {error, try_again}.
>
> Started up a new Erlang shell and tried the same thing.  It worked fine.
>
> Killed the external process it was using, inet_gethost and retried,
> and it started working.
>
> This is an older Erlang: erts-6.4.1.2 but inet_gethost hasn't seen a
> lot of changes lately, going on the git logs.
>
> Any ideas what 1) may cause this and 2) how to get around it?  The
> system is stuck not being able to do DNS lookups, which is
> problematic, to say the least.
>
> Thanks
> --
> David N. Welton
>
> http://www.welton.it/davidw/
> http://www.dedasys.com/


This sounds like a job for the API call
    inet_gethost_native:control(soft_restart)

It happens that libc in a native resolver process caches DNS replies for
too long, maybe after some network reconfiguration, and this call
restarts all native resolver processes the node uses.

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Strange inet_gethost_native:gethostbyname behavior

David Welton-3
Hi,

Thanks...

>> inet_gethost_native:gethostbyname("google.com", inet). kept failing
>> with {error, try_again}.

>> Any ideas what 1) may cause this and 2) how to get around it?  The
>> system is stuck not being able to do DNS lookups, which is
>> problematic, to say the least.

> It happens that libc in a native resolver process caches DNS replies for
> too long, maybe after some network reconfiguration, and this call
> restarts all native resolver processes the node uses.

Seems odd, though, that things are unable to recover.  I wonder what
might cause that.

--
David N. Welton

http://www.welton.it/davidw/

http://www.dedasys.com/
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Strange inet_gethost_native:gethostbyname behavior

Raimo Niskanen-2
On Thu, Jan 25, 2018 at 09:59:16AM -0800, David Welton wrote:

> Hi,
>
> Thanks...
>
> >> inet_gethost_native:gethostbyname("google.com", inet). kept failing
> >> with {error, try_again}.
>
> >> Any ideas what 1) may cause this and 2) how to get around it?  The
> >> system is stuck not being able to do DNS lookups, which is
> >> problematic, to say the least.
>
> > It happens that libc in a native resolver process caches DNS replies for
> > too long, maybe after some network reconfiguration, and this call
> > restarts all native resolver processes the node uses.
>
> Seems odd, though, that things are unable to recover.  I wonder what
> might cause that.

One thing that has caused this in the past is reconfiguring the DNS
resolver for the machine.  Then the erlang node can have cached negative
responses with long lifetime, but the DNS reconfiguration was intended to
fix those negative responses.

And libc can not do better than using the lifetime of the negative
responses it got.  So the situation will probably recover, but it might
take a week...


>
> --
> David N. Welton
>
> http://www.welton.it/davidw/
>
> http://www.dedasys.com/

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions