process priority

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

process priority

Martin Koroudjiev
Hi,

we have implemented a logging mechanism that calls gen_event:notify
which returns immediately but nevertheless when performing extensive
logging some genserver calls time out. I came to the conclusion that the
erlang process manager is giving more time to the logging process and
thus other processes never receive a chance to handle their queues.

Is this conclusion possible or to seek the reason for the timeouts
somewhere else? And if *yes* is it possible to give the logging process
less priority?

Thanks,

Martin
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: process priority

Ulf Wiger

I assume the logging process writes to disk?

Are you running with a thread pool? If there is a lot of disk I/O, this can cause the emulator loop to block, unless there are enough I/O threads available.

I'm guessing the logging process has 'normal' priority right now? It is possible to set the priority of a gen_event process, e.g. by installing a handler (which gets to execute in the process context, thus being able to call process_flag(priority, P)). However, running a process at 'low' priority is generally not a good idea. It could cause the logging process to fall too much behind, and eventually running out of memory, or it could even in some cases lead to priority inversion, if the number of runnable normal-priority processes is significantly greater than the number of low-priority processes.

The runtime system does have one "feature", where it punishes the sender with extra reductions if the receiver's message queue is large. This could contribute to giving the logger process relatively more time than the other processes.

Fundamentally, though, you have to ensure that you don't produce more log output than your I/O subsystem can manage.

BR,
Ulf W

On 4 Jul 2011, at 12:31, Martin Dimitrov wrote:

> Hi,
>
> we have implemented a logging mechanism that calls gen_event:notify
> which returns immediately but nevertheless when performing extensive
> logging some genserver calls time out. I came to the conclusion that the
> erlang process manager is giving more time to the logging process and
> thus other processes never receive a chance to handle their queues.
>
> Is this conclusion possible or to seek the reason for the timeouts
> somewhere else? And if *yes* is it possible to give the logging process
> less priority?
>
> Thanks,
>
> Martin
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: process priority

Martin Koroudjiev
Yes, the logging process writes to disk.
I am not sure what thread pool means but our in our app only the logger
does I/O operations. Can it block for more than 5 secs so the other
processes to time out?

On 7/4/2011 2:19 PM, Ulf Wiger wrote:

> I assume the logging process writes to disk?
>
> Are you running with a thread pool? If there is a lot of disk I/O, this can cause the emulator loop to block, unless there are enough I/O threads available.
>
> I'm guessing the logging process has 'normal' priority right now? It is possible to set the priority of a gen_event process, e.g. by installing a handler (which gets to execute in the process context, thus being able to call process_flag(priority, P)). However, running a process at 'low' priority is generally not a good idea. It could cause the logging process to fall too much behind, and eventually running out of memory, or it could even in some cases lead to priority inversion, if the number of runnable normal-priority processes is significantly greater than the number of low-priority processes.
>
> The runtime system does have one "feature", where it punishes the sender with extra reductions if the receiver's message queue is large. This could contribute to giving the logger process relatively more time than the other processes.
>
> Fundamentally, though, you have to ensure that you don't produce more log output than your I/O subsystem can manage.
>
> BR,
> Ulf W
>
> On 4 Jul 2011, at 12:31, Martin Dimitrov wrote:
>
>> Hi,
>>
>> we have implemented a logging mechanism that calls gen_event:notify
>> which returns immediately but nevertheless when performing extensive
>> logging some genserver calls time out. I came to the conclusion that the
>> erlang process manager is giving more time to the logging process and
>> thus other processes never receive a chance to handle their queues.
>>
>> Is this conclusion possible or to seek the reason for the timeouts
>> somewhere else? And if *yes* is it possible to give the logging process
>> less priority?
>>
>> Thanks,
>>
>> Martin
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
> Ulf Wiger, CTO, Erlang Solutions, Ltd.
> http://erlang-solutions.com
>
>
>

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: process priority

mazenharake
My 2 cents.

Generally, writing straight to disk is a bad thing. You should have a
table (ets/mnesia) where you write your log lines and a process that
periodically flushes it to disk. Depending on your system load this
will help immensely.

In very rare cases you would tweak the process priorities and if you
do you should consider if your solution is "wrong" rather than the
priority being the bad guy.

GL HF

/M


On 4 July 2011 13:43, Martin Dimitrov <[hidden email]> wrote:

> Yes, the logging process writes to disk.
> I am not sure what thread pool means but our in our app only the logger
> does I/O operations. Can it block for more than 5 secs so the other
> processes to time out?
>
> On 7/4/2011 2:19 PM, Ulf Wiger wrote:
>> I assume the logging process writes to disk?
>>
>> Are you running with a thread pool? If there is a lot of disk I/O, this can cause the emulator loop to block, unless there are enough I/O threads available.
>>
>> I'm guessing the logging process has 'normal' priority right now? It is possible to set the priority of a gen_event process, e.g. by installing a handler (which gets to execute in the process context, thus being able to call process_flag(priority, P)). However, running a process at 'low' priority is generally not a good idea. It could cause the logging process to fall too much behind, and eventually running out of memory, or it could even in some cases lead to priority inversion, if the number of runnable normal-priority processes is significantly greater than the number of low-priority processes.
>>
>> The runtime system does have one "feature", where it punishes the sender with extra reductions if the receiver's message queue is large. This could contribute to giving the logger process relatively more time than the other processes.
>>
>> Fundamentally, though, you have to ensure that you don't produce more log output than your I/O subsystem can manage.
>>
>> BR,
>> Ulf W
>>
>> On 4 Jul 2011, at 12:31, Martin Dimitrov wrote:
>>
>>> Hi,
>>>
>>> we have implemented a logging mechanism that calls gen_event:notify
>>> which returns immediately but nevertheless when performing extensive
>>> logging some genserver calls time out. I came to the conclusion that the
>>> erlang process manager is giving more time to the logging process and
>>> thus other processes never receive a chance to handle their queues.
>>>
>>> Is this conclusion possible or to seek the reason for the timeouts
>>> somewhere else? And if *yes* is it possible to give the logging process
>>> less priority?
>>>
>>> Thanks,
>>>
>>> Martin
>>> _______________________________________________
>>> erlang-questions mailing list
>>> [hidden email]
>>> http://erlang.org/mailman/listinfo/erlang-questions
>> Ulf Wiger, CTO, Erlang Solutions, Ltd.
>> http://erlang-solutions.com
>>
>>
>>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: process priority

Ulf Wiger
In reply to this post by Martin Koroudjiev

The thread pool is controlled by the +A flag, as in e.g. 'erl +A 256 …'

From the 'erts' manual:

"+A size
Sets the number of threads in async thread pool, valid range is 0-1024. Default is 0."

In systems with a lot of disk IO, if your shell tends to become unresponsive, or some processes experience mysterious timeouts, it's a good idea to look at the thread pool. If you have too few threads, IO operations will run directly from the scheduler thread and block that scheduler.

BR,
Ulf W

On 4 Jul 2011, at 13:43, Martin Dimitrov wrote:

> Yes, the logging process writes to disk.
> I am not sure what thread pool means but our in our app only the logger
> does I/O operations. Can it block for more than 5 secs so the other
> processes to time out?
>
> On 7/4/2011 2:19 PM, Ulf Wiger wrote:
>> I assume the logging process writes to disk?
>>
>> Are you running with a thread pool? If there is a lot of disk I/O, this can cause the emulator loop to block, unless there are enough I/O threads available.
>>
>> I'm guessing the logging process has 'normal' priority right now? It is possible to set the priority of a gen_event process, e.g. by installing a handler (which gets to execute in the process context, thus being able to call process_flag(priority, P)). However, running a process at 'low' priority is generally not a good idea. It could cause the logging process to fall too much behind, and eventually running out of memory, or it could even in some cases lead to priority inversion, if the number of runnable normal-priority processes is significantly greater than the number of low-priority processes.
>>
>> The runtime system does have one "feature", where it punishes the sender with extra reductions if the receiver's message queue is large. This could contribute to giving the logger process relatively more time than the other processes.
>>
>> Fundamentally, though, you have to ensure that you don't produce more log output than your I/O subsystem can manage.
>>
>> BR,
>> Ulf W
>>
>> On 4 Jul 2011, at 12:31, Martin Dimitrov wrote:
>>
>>> Hi,
>>>
>>> we have implemented a logging mechanism that calls gen_event:notify
>>> which returns immediately but nevertheless when performing extensive
>>> logging some genserver calls time out. I came to the conclusion that the
>>> erlang process manager is giving more time to the logging process and
>>> thus other processes never receive a chance to handle their queues.
>>>
>>> Is this conclusion possible or to seek the reason for the timeouts
>>> somewhere else? And if *yes* is it possible to give the logging process
>>> less priority?
>>>
>>> Thanks,
>>>
>>> Martin
>>> _______________________________________________
>>> erlang-questions mailing list
>>> [hidden email]
>>> http://erlang.org/mailman/listinfo/erlang-questions
>> Ulf Wiger, CTO, Erlang Solutions, Ltd.
>> http://erlang-solutions.com
>>
>>
>>
>

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: process priority

Tim Watson-5
In reply to this post by mazenharake
> My 2 cents.
>
> Generally, writing straight to disk is a bad thing. You should have a
> table (ets/mnesia) where you write your log lines and a process that
> periodically flushes it to disk. Depending on your system load this
> will help immensely.

You could also consider disk_log, which offers a halfway solution to
some of these issues, writing to disk but not necessarily flushing
immediately. Also Ulf's comments about tweaking the thread pool size
are important, although it's worth baring in mind that this can affect
some linked-in drivers that are using async threads as well.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: process priority

Jachym Holecek
In reply to this post by mazenharake
# Mazen Harake 2011-07-04:
> My 2 cents.
>
> Generally, writing straight to disk is a bad thing. You should have a
> table (ets/mnesia) where you write your log lines and a process that
> periodically flushes it to disk. Depending on your system load this
> will help immensely.

The other way around actually! :) Pass log items to gen_server in synchronous
calls, that gives you end-to-end flow control. Use file in raw mode, leverage
delayed_write flag. Make sure your gen_server has minimal processing overhead.
Any kind of per-item processing in the server is a clear no-go, do all
formatting in caller's context. It helps to do iolist_to_binary/1 in some
carefully chosen callsites -- but you won't need that, following the above
principles will give you a fast-enough solution (sustained load of >50k
messages per second -- iolists about 100B in size -- is no problem, I've seen
something like 70k peak with our logging library). Of course you don't want
to have one logger process for the whole system, instead make it easy for
each application to open as many as it needs for its audit logs, event logs,
trace logs and such.

Using ETS is of course doable, but locking overhead is about the same or
higher than with plain message passing, you lose all flow control thus
risking memory explosion (I have a nice graph handy demonstrating that,
but probably can't publish it), and you copy every messsage twice instead
of just once. (I don't have measurements on the locking and copying stuff,
it's my recollection of reading relevant bits of ERTS -- could be wrong.)

> In very rare cases you would tweak the process priorities and if you
> do you should consider if your solution is "wrong" rather than the
> priority being the bad guy.

I agree with this.

BR,
        -- Jachym
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: process priority

mazenharake
Writing to mnesia/dets requires no locking, nothing mentionable
anyway, since the messages (internal ones) are serialized just like
your gen_server will have its messages serialized (also, use
dirty-operations). This creates end to end flow control, actually even
more so because you won't need the extra process (your logging
process) in between the write.

Perhaps your small 100B messages work so well with delayed_write
because you can write many of them to memory before they are flushed
to a file thus not hogging the disk, but I the bigger messages you
have the larger your in memory size needs to be to to avoid this.
Delayed write does of course work well but I have experience that says
that writing and buffering it up in tables can be helpful to avoid
disk thrashing when messages are large (or higher volume). I don't
remember exactly how much throughput we had (and I don't want to guess
since it will be mere speculation without having hard data) but it
helped immensely.

So I guess OP now have 2 suggestions which of course isn't bad ;)

One should also keep in mind though that different situation may have
different needs, would be interesting to see how they would measure
up.

/M

On 4 July 2011 19:56, Jachym Holecek <[hidden email]> wrote:

> # Mazen Harake 2011-07-04:
>> My 2 cents.
>>
>> Generally, writing straight to disk is a bad thing. You should have a
>> table (ets/mnesia) where you write your log lines and a process that
>> periodically flushes it to disk. Depending on your system load this
>> will help immensely.
>
> The other way around actually! :) Pass log items to gen_server in synchronous
> calls, that gives you end-to-end flow control. Use file in raw mode, leverage
> delayed_write flag. Make sure your gen_server has minimal processing overhead.
> Any kind of per-item processing in the server is a clear no-go, do all
> formatting in caller's context. It helps to do iolist_to_binary/1 in some
> carefully chosen callsites -- but you won't need that, following the above
> principles will give you a fast-enough solution (sustained load of >50k
> messages per second -- iolists about 100B in size -- is no problem, I've seen
> something like 70k peak with our logging library). Of course you don't want
> to have one logger process for the whole system, instead make it easy for
> each application to open as many as it needs for its audit logs, event logs,
> trace logs and such.
>
> Using ETS is of course doable, but locking overhead is about the same or
> higher than with plain message passing, you lose all flow control thus
> risking memory explosion (I have a nice graph handy demonstrating that,
> but probably can't publish it), and you copy every messsage twice instead
> of just once. (I don't have measurements on the locking and copying stuff,
> it's my recollection of reading relevant bits of ERTS -- could be wrong.)
>
>> In very rare cases you would tweak the process priorities and if you
>> do you should consider if your solution is "wrong" rather than the
>> priority being the bad guy.
>
> I agree with this.
>
> BR,
>        -- Jachym
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: process priority

Jachym Holecek
# Mazen Harake 2011-07-05:
> Writing to mnesia/dets requires no locking, nothing mentionable
> anyway, since the messages (internal ones) are serialized just like
> your gen_server will have its messages serialized (also, use
> dirty-operations).

I was talking about VM level locks -- but like I said, I don't
know if the impact is measurable here.

> This creates end to end flow control, actually even more so because
> you won't need the extra process (your logging process) in between
> the write.

By end-to-end I mean a feedback between message producers and their
consumer (or consumers) -- I don't see how you get such behaviour
with table-based approach. What prevents producers from generating
messages faster than consumer/s can read them?

> Perhaps your small 100B messages work so well with delayed_write
> because you can write many of them to memory before they are flushed
> to a file thus not hogging the disk, but I the bigger messages you
> have the larger your in memory size needs to be to to avoid this.

Sure, delayed_write parameters are configurable in the library I have
in mind. It's really more about avoiding OS overhead for many writes,
the disk itself just has to be fast enough to handle the load -- if
it's not, every buffer wil overrun eventually. It's also a matter of
how much delay are you willing to tolerate between enqueing message
and seeing it on disk; and how many messages are you willing to lose
on tragical VM crash.

> Delayed write does of course work well but I have experience that says
> that writing and buffering it up in tables can be helpful to avoid
> disk thrashing when messages are large (or higher volume). I don't
> remember exactly how much throughput we had (and I don't want to guess
> since it will be mere speculation without having hard data) but it
> helped immensely.
>
> So I guess OP now have 2 suggestions which of course isn't bad ;)

Certainly. :-)

> One should also keep in mind though that different situation may have
> different needs, would be interesting to see how they would measure
> up.

Sure -- you can't get persistent queues with gen_server-based approach
for instance; it's designed & optimized for relatively small messages
arriving at very high rates.

If you can recall some details about your workload (average message size,
were they iolists/binaries, if iolists how complex were they, how did
flusher process work roughly -- this sort of thing) I could probably
measure the two approaches in various situations (different message
sizes and producer concurrency levels) over the weekend and share the
results (but not the code, sorry, proprietary stuff).

The overall lesson I've learnt from this is that gen_server calls are
dirt-cheap, with a bit of care here and there.

BR,
        -- Jachym
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: process priority

mazenharake
On 5 July 2011 18:07, Jachym Holecek <[hidden email]> wrote:
> # Mazen Harake 2011-07-05:
>> Writing to mnesia/dets requires no locking, nothing mentionable
>> anyway, since the messages (internal ones) are serialized just like
>> your gen_server will have its messages serialized (also, use
>> dirty-operations).
>
> I was talking about VM level locks -- but like I said, I don't
> know if the impact is measurable here.
>

Not sure what you mean by this? The two scenarios (whether a logging
goes to a gen_server or to a table process) are the same. I don't
think locks are measurable in this instance.

>
>> This creates end to end flow control, actually even more so because
>> you won't need the extra process (your logging process) in between
>> the write.
>
> By end-to-end I mean a feedback between message producers and their
> consumer (or consumers) -- I don't see how you get such behaviour
> with table-based approach. What prevents producers from generating
> messages faster than consumer/s can read them?
>

My mistake, I misunderstood your use of flow control which translated
wrongly in my mind :). The inherited flow control is: the more
processes try to write to a log table, the slower it will get BUT it
will only be slower, not stopped (due to io hogging). That is the idea
with using a table.

>> Perhaps your small 100B messages work so well with delayed_write
>> because you can write many of them to memory before they are flushed
>> to a file thus not hogging the disk, but I the bigger messages you
>> have the larger your in memory size needs to be to to avoid this.
>
> Sure, delayed_write parameters are configurable in the library I have
> in mind. It's really more about avoiding OS overhead for many writes,
> the disk itself just has to be fast enough to handle the load -- if
> it's not, every buffer wil overrun eventually. It's also a matter of
> how much delay are you willing to tolerate between enqueing message
> and seeing it on disk; and how many messages are you willing to lose
> on tragical VM crash.
>

The point is to use the disk as little as possible. If you have a fast
disk it will perform better over all, of course, but as you said the
OS overhead for many writes is what you try to avoid. So buffering up
and writing everything is the better way to do it. Now if this is by
using delayed_write or by buffering in tables is another question. At
least we agree on this :)

>> Delayed write does of course work well but I have experience that says
>> that writing and buffering it up in tables can be helpful to avoid
>> disk thrashing when messages are large (or higher volume). I don't
>> remember exactly how much throughput we had (and I don't want to guess
>> since it will be mere speculation without having hard data) but it
>> helped immensely.
>>
>> So I guess OP now have 2 suggestions which of course isn't bad ;)
>
> Certainly. :-)
>
>> One should also keep in mind though that different situation may have
>> different needs, would be interesting to see how they would measure
>> up.
>
> Sure -- you can't get persistent queues with gen_server-based approach
> for instance; it's designed & optimized for relatively small messages
> arriving at very high rates.
>

This makes me curious if you thought I mean that the tables I propose
will be disk copies. My suggestion was in memory tables only but
having them disk based will make it slower but persistent as you say.

> If you can recall some details about your workload (average message size,
> were they iolists/binaries, if iolists how complex were they, how did
> flusher process work roughly -- this sort of thing) I could probably
> measure the two approaches in various situations (different message
> sizes and producer concurrency levels) over the weekend and share the
> results (but not the code, sorry, proprietary stuff).

IIRC.
Message size: around 1K +- 200B, iolists maybe (not fact)
Messages per second: Don't remember... dare I guess around 10k? (Don't
ask me to bet my money on it ;))
Flusher: Just like delayed write but using the table as the buffer.
I.e. read size every x second(s), if value > N -> flush(value) else if
x' seconds have passed -> flush(value). immediately check for size
again.

Would be interesting to see how it performs.

/M
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions