I'm in a situation where most straightforward way to solve the task would be to run a 3rd party binary per each active user.
Each of those binaries speaks to the outside world and maintains their own state which is of great importance for the system. Without them user is basically blocked locked.
The scope of the task is to:
- start 3rd party program for all users during application launch
- start 3rd party program on the go for new users
- continiously poll their state (JSON RPC API) and report updated state if it changed since last poll
- restart individual apps when they crash
Idea is to start as many apps as node can carry on and I'm not sure what is the best way to tame this beast.
I'm surely will use Erlexec and probably will create a gen_server that will keep a list of the processes in its state.
Then there will be a pool of workers that will go and poll a fleet once in a while.
That's kinda basic scenario that I can think of, but maybe there are people who had similar problems and can suggest a better approach or advice on how to avoid bottlenecks.
Apart from solving the architecture there is a resource monitoring problem to not overload the node. The memory utilization is not an issue, but I/O is. I wonder what would a right marker to stop spawning new processes? Logically I should somehow measure I/O utilization during poll and act accordingly.
With large enough pool of processes poll should be going all the time :)
On 2018-07-09 16:33, Yevhenii Kurtov wrote:
> - continiously poll their state (JSON RPC API) and report updated state
> it changed since last poll
Be careful with polling -> if each poll job takes 10ms of processing
time, then in a perfect world (and the world is not perfect) the system
can only handle 100 connections per core before simply running out of
CPU time. If most of your targets are not regularly updating, then it's
a real burn as targets with updates will have to wait until they get on
the CPU while targets with nothing to report get in the way.
The usual result is lag-under-load: polling tasks are not done
back-to-back but spread out over time (e.g. every N seconds), and
changes are relayed with a delay that increases in proportion to the
number of polling targets.
And of course, the world is not perfect. The erlang VM needs time on the
CPU, whatever else your application does also needs CPU time, the OS
itself and whatever other things are running will as well (e.g. your
external processes). The BEAM will help somewhat with its ability to
interleave processing between the various polling processes, but there
are limits that polling brings with it.
If you need to provide low-latency updates to an even moderate number of
requests, polling will likely become your bottleneck. If at all
possible, avoid polling and move to push-on-updates as close to your
source of truth as you can.
> Then there will be a pool of workers that will go and poll a fleet once
> in a while.
IME it is usually better to have one process per external exec for such
tasks. The reason for this is that it allows concurrency of the polling
with minimal fuss. If you serialize the polling in a single process,
then the Nth poll target needs to wait until the (N-1)th polling jobs
have been done. If you fire off a bunch of poll requests and wait for
them to come back async, you have to write all the bookkeeping to keep
track of which request goes with which poll target. Usually it is easier
(and often faster ime) to allow the erlang schedulers to rotate through
the set of processes doing polling, with each process handling one poll