gen_server locked for some time

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

gen_server locked for some time

Roberto Ostinelli-5
All,
I have a gen_server that in periodic intervals becomes busy, eventually over 10 seconds, while writing bulk incoming data. This gen_server also receives smaller individual data updates.

I could offload the bulk writing routine to separate processes but the smaller individual data updates would then be processed before the bulk processing is over, hence generating an incorrect scenario where smaller more recent data gets overwritten by the bulk processing.

I'm trying to see how to solve the fact that all the gen_server calls during the bulk update would timeout.

Any ideas of best practices?

Thank you,
r.
Reply | Threaded
Open this post in threaded view
|

Re: gen_server locked for some time

Guilherme Andrade
Hello Roberto,

If copying the data to a second process is not (too) costly, you can do just that: have a second process responsible for writing the data - the I/O bound component - and the original one act as a coordinator, directing the second one asynchronously but always conscious of what the secondary process is doing.
This means it will remain available to handle incoming calls (unless seriously overloaded) and, if need be, reject incoming requests preemptively for back pressure, whether instantaneously or after a timeout measured on its own (rather than on the caller) by replying through `gen_server:reply/2` rather than through `{reply, Reply, State}`.

Alternatively, you can also switch the problem around and use some sort of broker to match write requests with the writer process - this has the advantage of making it trivial to scale to multiple writer processes unless there are hard serialization constraints.
With this approach, and in the simplest setup, a single broker process never blocks and can forward the write requests to an available writer process which has explicitly declared itself available for writing (and it only does this between write requests), as well as manage timeouts. The `sbroker` library[1], although no longer maintained, is a true wonder for implementing this sort of pattern.

[1]: https://github.com/fishcakez/sbroker

On Fri, 29 Nov 2019 at 22:47, Roberto Ostinelli <[hidden email]> wrote:
All,
I have a gen_server that in periodic intervals becomes busy, eventually over 10 seconds, while writing bulk incoming data. This gen_server also receives smaller individual data updates.

I could offload the bulk writing routine to separate processes but the smaller individual data updates would then be processed before the bulk processing is over, hence generating an incorrect scenario where smaller more recent data gets overwritten by the bulk processing.

I'm trying to see how to solve the fact that all the gen_server calls during the bulk update would timeout.

Any ideas of best practices?

Thank you,
r.


--
Guilherme
Reply | Threaded
Open this post in threaded view
|

Re: gen_server locked for some time

Guilherme Andrade
(I know see you actually mentioned using separate processes, but my suggestion should still apply, depending on constraints.)

On Fri, 29 Nov 2019 at 23:21, Guilherme Andrade <[hidden email]> wrote:
Hello Roberto,

If copying the data to a second process is not (too) costly, you can do just that: have a second process responsible for writing the data - the I/O bound component - and the original one act as a coordinator, directing the second one asynchronously but always conscious of what the secondary process is doing.
This means it will remain available to handle incoming calls (unless seriously overloaded) and, if need be, reject incoming requests preemptively for back pressure, whether instantaneously or after a timeout measured on its own (rather than on the caller) by replying through `gen_server:reply/2` rather than through `{reply, Reply, State}`.

Alternatively, you can also switch the problem around and use some sort of broker to match write requests with the writer process - this has the advantage of making it trivial to scale to multiple writer processes unless there are hard serialization constraints.
With this approach, and in the simplest setup, a single broker process never blocks and can forward the write requests to an available writer process which has explicitly declared itself available for writing (and it only does this between write requests), as well as manage timeouts. The `sbroker` library[1], although no longer maintained, is a true wonder for implementing this sort of pattern.

[1]: https://github.com/fishcakez/sbroker

On Fri, 29 Nov 2019 at 22:47, Roberto Ostinelli <[hidden email]> wrote:
All,
I have a gen_server that in periodic intervals becomes busy, eventually over 10 seconds, while writing bulk incoming data. This gen_server also receives smaller individual data updates.

I could offload the bulk writing routine to separate processes but the smaller individual data updates would then be processed before the bulk processing is over, hence generating an incorrect scenario where smaller more recent data gets overwritten by the bulk processing.

I'm trying to see how to solve the fact that all the gen_server calls during the bulk update would timeout.

Any ideas of best practices?

Thank you,
r.


--
Guilherme


--
Guilherme
Reply | Threaded
Open this post in threaded view
|

Re: gen_server locked for some time

Mikael Karlsson-7
In reply to this post by Roberto Ostinelli-5
Hi Roberto,

For the smaller data updates using the gen_server:cast, which returns immediate, is an option if you do not need to check any reply values.
Increasing the timeout using gen_server:call/3 is another if your clients accept to be blocked for the complete time during bulk update. In that case you could consider using the gen_statem instead which has a default timeout of infinity for the call function.

I guess the aboves are not possible? so when offloading bulk write to a separate process you can maybe add some state handling in your gen_server (gen_statem) so that you process the smaller updates into an internal queue during the bulk write and postpone processing them until the write is finished. The gen_statem has postpone functions (which will still block your client) as well as insertion of internal events but I am not sure how useful they are in your case.

Mikael

Den fre 29 nov. 2019 kl 23:47 skrev Roberto Ostinelli <[hidden email]>:
All,
I have a gen_server that in periodic intervals becomes busy, eventually over 10 seconds, while writing bulk incoming data. This gen_server also receives smaller individual data updates.

I could offload the bulk writing routine to separate processes but the smaller individual data updates would then be processed before the bulk processing is over, hence generating an incorrect scenario where smaller more recent data gets overwritten by the bulk processing.

I'm trying to see how to solve the fact that all the gen_server calls during the bulk update would timeout.

Any ideas of best practices?

Thank you,
r.
Reply | Threaded
Open this post in threaded view
|

Re: gen_server locked for some time

Mikael Pettersson-5
In reply to this post by Roberto Ostinelli-5
On Fri, Nov 29, 2019 at 11:47 PM Roberto Ostinelli <[hidden email]> wrote:
>
> All,
> I have a gen_server that in periodic intervals becomes busy, eventually over 10 seconds, while writing bulk incoming data. This gen_server also receives smaller individual data updates.
>
> I could offload the bulk writing routine to separate processes but the smaller individual data updates would then be processed before the bulk processing is over, hence generating an incorrect scenario where smaller more recent data gets overwritten by the bulk processing.
>
> I'm trying to see how to solve the fact that all the gen_server calls during the bulk update would timeout.

If there is more logic in the gen_server for incoming data, have it
offload all writes to the separate process, using its message queue as
a buffer.  Otherwise make the sends to the gen_server asynchronous.
Reply | Threaded
Open this post in threaded view
|

Re: gen_server locked for some time

Roberto Ostinelli-5
Thank you all fo suggestions, will investigate options and profile!

Best,
r.

On Sat, Nov 30, 2019 at 11:50 AM Mikael Pettersson <[hidden email]> wrote:
On Fri, Nov 29, 2019 at 11:47 PM Roberto Ostinelli <[hidden email]> wrote:
>
> All,
> I have a gen_server that in periodic intervals becomes busy, eventually over 10 seconds, while writing bulk incoming data. This gen_server also receives smaller individual data updates.
>
> I could offload the bulk writing routine to separate processes but the smaller individual data updates would then be processed before the bulk processing is over, hence generating an incorrect scenario where smaller more recent data gets overwritten by the bulk processing.
>
> I'm trying to see how to solve the fact that all the gen_server calls during the bulk update would timeout.

If there is more logic in the gen_server for incoming data, have it
offload all writes to the separate process, using its message queue as
a buffer.  Otherwise make the sends to the gen_server asynchronous.
Reply | Threaded
Open this post in threaded view
|

Re: gen_server locked for some time

Max Lapshin-2
In flussonic we use concept of fast and slow servers.

Fast gen_server must never get blocked and it is safe to gen_server:call him.   Slow can get blocked on disk I/O or some other job.

Good idea is when slow can ask fast for some job.

Fast can go to slow server, but very carefully: check first if it is safe to make this call.
Reply | Threaded
Open this post in threaded view
|

Re: gen_server locked for some time

Jesper Louis Andersen-2
In reply to this post by Roberto Ostinelli-5
Another path is to cooperate the bulk write in the process. Write in small chunks and go back into the gen_server loop in between those chunks being written. You now have progress, but no separate process.

Another useful variant is to have two processes, but having the split skewed. You prepare iodata() in the main process, and then send that to the other process as a message. This message will be fairly small since large binaries will be transferred by reference. The queue in the other process acts as a linearizing write buffer so ordering doesn't get messed up. You have now moved the bulk write call out of the main process, so it is free to do other processing in between. You might even want a protocol between the two processes to exert some kind of flow control on the system. However, you don't have an even balance between the processes. One is the intelligent orchestrator. The other is the worker, taking the block on the bulk operation.

Another thing is to improve the observability of the system. Start doing measurements on the lag time of the gen_server and plot this in a histogram. Measure the amount of data written in the bulk message. This gives you some real data to work with. The thing is: if you experience blocking in some part of your system, it is likely there is some kind of traffic/request pattern which triggers it. Understand that pattern. It is often covering for some important behavior among users you didn't think about. Anticipation of future uses of the system allows you to be proactive about latency problems.

It is some times better to gate the problem by limiting what a user/caller/request is allowed to do. As an example, you can reject large requests to the system and demand they happen cooperatively between a client and a server. This slows down the client because they have to wait for a server response until they can issue the next request. If the internet is in between, you just injected an artificial RTT + server processing in between calls, implicitly slowing the client down.


On Fri, Nov 29, 2019 at 11:47 PM Roberto Ostinelli <[hidden email]> wrote:
All,
I have a gen_server that in periodic intervals becomes busy, eventually over 10 seconds, while writing bulk incoming data. This gen_server also receives smaller individual data updates.

I could offload the bulk writing routine to separate processes but the smaller individual data updates would then be processed before the bulk processing is over, hence generating an incorrect scenario where smaller more recent data gets overwritten by the bulk processing.

I'm trying to see how to solve the fact that all the gen_server calls during the bulk update would timeout.

Any ideas of best practices?

Thank you,
r.


--
J.
Reply | Threaded
Open this post in threaded view
|

Re: gen_server locked for some time

Roberto Ostinelli-5
Thanks for the tips, Max and Jesper.
In those solutions though how do you guarantee the order of the call? My main issue is to avoid that the slow process does not override more recent but faster data chunks. Do you pile them up in a queue in the received order and treat them after that?

On Mon, Dec 2, 2019 at 3:57 PM Jesper Louis Andersen <[hidden email]> wrote:
Another path is to cooperate the bulk write in the process. Write in small chunks and go back into the gen_server loop in between those chunks being written. You now have progress, but no separate process.

Another useful variant is to have two processes, but having the split skewed. You prepare iodata() in the main process, and then send that to the other process as a message. This message will be fairly small since large binaries will be transferred by reference. The queue in the other process acts as a linearizing write buffer so ordering doesn't get messed up. You have now moved the bulk write call out of the main process, so it is free to do other processing in between. You might even want a protocol between the two processes to exert some kind of flow control on the system. However, you don't have an even balance between the processes. One is the intelligent orchestrator. The other is the worker, taking the block on the bulk operation.

Another thing is to improve the observability of the system. Start doing measurements on the lag time of the gen_server and plot this in a histogram. Measure the amount of data written in the bulk message. This gives you some real data to work with. The thing is: if you experience blocking in some part of your system, it is likely there is some kind of traffic/request pattern which triggers it. Understand that pattern. It is often covering for some important behavior among users you didn't think about. Anticipation of future uses of the system allows you to be proactive about latency problems.

It is some times better to gate the problem by limiting what a user/caller/request is allowed to do. As an example, you can reject large requests to the system and demand they happen cooperatively between a client and a server. This slows down the client because they have to wait for a server response until they can issue the next request. If the internet is in between, you just injected an artificial RTT + server processing in between calls, implicitly slowing the client down.


On Fri, Nov 29, 2019 at 11:47 PM Roberto Ostinelli <[hidden email]> wrote:
All,
I have a gen_server that in periodic intervals becomes busy, eventually over 10 seconds, while writing bulk incoming data. This gen_server also receives smaller individual data updates.

I could offload the bulk writing routine to separate processes but the smaller individual data updates would then be processed before the bulk processing is over, hence generating an incorrect scenario where smaller more recent data gets overwritten by the bulk processing.

I'm trying to see how to solve the fact that all the gen_server calls during the bulk update would timeout.

Any ideas of best practices?

Thank you,
r.


--
J.