two epmds running

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

two epmds running

Anthony Shipman
Sometimes it happens that I discover two epmd processes running. One of
them is in a tight loop consuming 100% of CPU time. My guess is that the
second one is started automatically because the first one is no longer
responding. Is this a known bug in epmd?

--
Anthony Shipman                    Mamas don't let your babies
[hidden email]                   grow up to be outsourced.

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: two epmds running

Bob Ippolito
On Tue, Mar 16, 2010 at 2:23 AM, Anthony Shipman <[hidden email]> wrote:
> Sometimes it happens that I discover two epmd processes running. One of
> them is in a tight loop consuming 100% of CPU time. My guess is that the
> second one is started automatically because the first one is no longer
> responding. Is this a known bug in epmd?

I think we have seen this before, one of them is probably violently
logging "epmd: epmd: error in accept" as well. We have only seen this
on boot-up of a machine, probably due to several Erlang VMs trying to
start up at the same time. We don't currently have a solution for this
issue (mostly because we don't know the root cause yet).

I am not sure we get two of them, it might be just one in our case.

-bob

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: two epmds running

Garrett Smith-5
On Tue, Mar 16, 2010 at 9:36 AM, Bob Ippolito <[hidden email]> wrote:

> On Tue, Mar 16, 2010 at 2:23 AM, Anthony Shipman <[hidden email]> wrote:
>> Sometimes it happens that I discover two epmd processes running. One of
>> them is in a tight loop consuming 100% of CPU time. My guess is that the
>> second one is started automatically because the first one is no longer
>> responding. Is this a known bug in epmd?
>
> I think we have seen this before, one of them is probably violently
> logging "epmd: epmd: error in accept" as well. We have only seen this
> on boot-up of a machine, probably due to several Erlang VMs trying to
> start up at the same time. We don't currently have a solution for this
> issue (mostly because we don't know the root cause yet).
>
> I am not sure we get two of them, it might be just one in our case.

I haven't seen two running, but I've seen none running, which is a
real bummer. I've written a monitor process (probably gen_fsm based)
that keeps an eye on epmd and starts it and reinitializes it when it
goes away. A properly functioning epmd is important enough that you
might consider something similar to ensure that, in your case, that
rogue process is dealt with (killed?).

I suppose that's somewhat flippant -- to say write your own monitor
for this, but losing epmd is like losing your network and people go to
great lengths to keep networks up.

Garrett

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: two epmds running

Bob Ippolito
On Tue, Mar 16, 2010 at 9:10 AM, Garrett Smith <[hidden email]> wrote:

> On Tue, Mar 16, 2010 at 9:36 AM, Bob Ippolito <[hidden email]> wrote:
>> On Tue, Mar 16, 2010 at 2:23 AM, Anthony Shipman <[hidden email]> wrote:
>>> Sometimes it happens that I discover two epmd processes running. One of
>>> them is in a tight loop consuming 100% of CPU time. My guess is that the
>>> second one is started automatically because the first one is no longer
>>> responding. Is this a known bug in epmd?
>>
>> I think we have seen this before, one of them is probably violently
>> logging "epmd: epmd: error in accept" as well. We have only seen this
>> on boot-up of a machine, probably due to several Erlang VMs trying to
>> start up at the same time. We don't currently have a solution for this
>> issue (mostly because we don't know the root cause yet).
>>
>> I am not sure we get two of them, it might be just one in our case.
>
> I haven't seen two running, but I've seen none running, which is a
> real bummer. I've written a monitor process (probably gen_fsm based)
> that keeps an eye on epmd and starts it and reinitializes it when it
> goes away. A properly functioning epmd is important enough that you
> might consider something similar to ensure that, in your case, that
> rogue process is dealt with (killed?).
>
> I suppose that's somewhat flippant -- to say write your own monitor
> for this, but losing epmd is like losing your network and people go to
> great lengths to keep networks up.

Yeah absolutely it needs to be killed when it's in that state. It eats
up a lot of CPU, spews endless crap to syslog, and breaks erlang
distribution on that node. We haven't seen it often enough to feel too
much pain yet but it's something on our roadmap to try and reproduce
and fix or work around it.

When we kill it we also bring down all of the applications on that
node, which sucks because we can't shut them down cleanly since doing
that (at least by the means that our tools know how) depends on epmd
being up. Fortunately we have only seen this happen just after a
reboot.

-bob

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: two epmds running

Joseph Wayne Norton

We have faced the same behavior described by Bob.  The problem occurs only  
when rebooting a server that has two or more Erlang virtual machines  
started by init.  The problem when it happens can easily consume a  
significant amount of disk space in the /var/log directory by epmd's error  
logging.   It is unknown how to directly trigger the problem.


On Wed, 17 Mar 2010 02:13:54 +0900, Bob Ippolito <[hidden email]> wrote:

> On Tue, Mar 16, 2010 at 9:10 AM, Garrett Smith <[hidden email]> wrote:
>> On Tue, Mar 16, 2010 at 9:36 AM, Bob Ippolito <[hidden email]> wrote:
>>> On Tue, Mar 16, 2010 at 2:23 AM, Anthony Shipman <[hidden email]>  
>>> wrote:
>>>> Sometimes it happens that I discover two epmd processes running. One  
>>>> of
>>>> them is in a tight loop consuming 100% of CPU time. My guess is that  
>>>> the
>>>> second one is started automatically because the first one is no longer
>>>> responding. Is this a known bug in epmd?
>>>
>>> I think we have seen this before, one of them is probably violently
>>> logging "epmd: epmd: error in accept" as well. We have only seen this
>>> on boot-up of a machine, probably due to several Erlang VMs trying to
>>> start up at the same time. We don't currently have a solution for this
>>> issue (mostly because we don't know the root cause yet).
>>>
>>> I am not sure we get two of them, it might be just one in our case.
>>
>> I haven't seen two running, but I've seen none running, which is a
>> real bummer. I've written a monitor process (probably gen_fsm based)
>> that keeps an eye on epmd and starts it and reinitializes it when it
>> goes away. A properly functioning epmd is important enough that you
>> might consider something similar to ensure that, in your case, that
>> rogue process is dealt with (killed?).
>>
>> I suppose that's somewhat flippant -- to say write your own monitor
>> for this, but losing epmd is like losing your network and people go to
>> great lengths to keep networks up.
>
> Yeah absolutely it needs to be killed when it's in that state. It eats
> up a lot of CPU, spews endless crap to syslog, and breaks erlang
> distribution on that node. We haven't seen it often enough to feel too
> much pain yet but it's something on our roadmap to try and reproduce
> and fix or work around it.
>
> When we kill it we also bring down all of the applications on that
> node, which sucks because we can't shut them down cleanly since doing
> that (at least by the means that our tools know how) depends on epmd
> being up. Fortunately we have only seen this happen just after a
> reboot.
>
> -bob
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:[hidden email]
>


--
[hidden email]

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: two epmds running

Nicolas Charpentier-2
Hi
If the problem only occurs when two nodes start at the same time, you  
can start epmd before any nodes.
If you are running Linux you can add a init script to start epmd and  
ensure that other init script are run after epmd.

Nicolas

On Mar 17, 2010, at 2:07, "Joseph Wayne Norton" <[hidden email]>  
wrote:

>
> We have faced the same behavior described by Bob.  The problem  
> occurs only when rebooting a server that has two or more Erlang  
> virtual machines started by init.  The problem when it happens can  
> easily consume a significant amount of disk space in the /var/log  
> directory by epmd's error logging.   It is unknown how to directly  
> trigger the problem.
>
>
> On Wed, 17 Mar 2010 02:13:54 +0900, Bob Ippolito <[hidden email]>  
> wrote:
>
>> On Tue, Mar 16, 2010 at 9:10 AM, Garrett Smith <[hidden email]> wrote:
>>> On Tue, Mar 16, 2010 at 9:36 AM, Bob Ippolito <[hidden email]>  
>>> wrote:
>>>> On Tue, Mar 16, 2010 at 2:23 AM, Anthony Shipman  
>>>> <[hidden email]> wrote:
>>>>> Sometimes it happens that I discover two epmd processes running.  
>>>>> One of
>>>>> them is in a tight loop consuming 100% of CPU time. My guess is  
>>>>> that the
>>>>> second one is started automatically because the first one is no  
>>>>> longer
>>>>> responding. Is this a known bug in epmd?
>>>>
>>>> I think we have seen this before, one of them is probably violently
>>>> logging "epmd: epmd: error in accept" as well. We have only seen  
>>>> this
>>>> on boot-up of a machine, probably due to several Erlang VMs  
>>>> trying to
>>>> start up at the same time. We don't currently have a solution for  
>>>> this
>>>> issue (mostly because we don't know the root cause yet).
>>>>
>>>> I am not sure we get two of them, it might be just one in our case.
>>>
>>> I haven't seen two running, but I've seen none running, which is a
>>> real bummer. I've written a monitor process (probably gen_fsm based)
>>> that keeps an eye on epmd and starts it and reinitializes it when it
>>> goes away. A properly functioning epmd is important enough that you
>>> might consider something similar to ensure that, in your case, that
>>> rogue process is dealt with (killed?).
>>>
>>> I suppose that's somewhat flippant -- to say write your own monitor
>>> for this, but losing epmd is like losing your network and people  
>>> go to
>>> great lengths to keep networks up.
>>
>> Yeah absolutely it needs to be killed when it's in that state. It  
>> eats
>> up a lot of CPU, spews endless crap to syslog, and breaks erlang
>> distribution on that node. We haven't seen it often enough to feel  
>> too
>> much pain yet but it's something on our roadmap to try and reproduce
>> and fix or work around it.
>>
>> When we kill it we also bring down all of the applications on that
>> node, which sucks because we can't shut them down cleanly since doing
>> that (at least by the means that our tools know how) depends on epmd
>> being up. Fortunately we have only seen this happen just after a
>> reboot.
>>
>> -bob
>>
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:[hidden email]
>>
>
>
> --
> [hidden email]
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:[hidden email]
>

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: two epmds running

Richard Andrews-5
In reply to this post by Joseph Wayne Norton
I've seen it too and I think it goes like this:

Only seems to happen on SMP machines which have a lot of grunt
Two erlang nodes start and require epmd
Both start epmd
Two epmd instances start and check for a valid epmd (find none)
One of the epmd instance is able to bind and listen on the epmd port
The other fails but believes that it must be able to claim the port.
Doesn't seem to check for another program now listening on the port
and goes around a bust loop trying to bind+listen.


On Wed, Mar 17, 2010 at 12:07 PM, Joseph Wayne Norton
<[hidden email]> wrote:
>
> We have faced the same behavior described by Bob.  The problem occurs only
> when rebooting a server that has two or more Erlang virtual machines started
> by init.  The problem when it happens can easily consume a significant
> amount of disk space in the /var/log directory by epmd's error logging.   It
> is unknown how to directly trigger the problem.
>

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:[hidden email]