Custom timer module

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Custom timer module

ludovic@demblans.com
Hello,

I'm looking for advice about implementing a timer queue in erlang or elixir.

Basically what I need is a queue where I can write a timestamp, a key, and a value. When the local time reaches the timestamp, I want to handle the key/value (I guess receiving a message with the key and fetch, or receiving the key/value).
The timer must be cancelable as the value is very small but will lead to heavy computation. I want to be able to fetch a key from outside of the process at any time and run the computation (and then cancel the scheduled computation).

This is a subset of the `timer` module. But I need to persist the data to disk and reload on application start.

I will have a tiny amount of entries : around 1000.

At the moment I have an ordered_set with `{Timestamp, UserKey} = Key` as keys, I lookup for the first key and send_after to self the `Key` with `erlang:start_timer(max(TimeStamp, 1), self(), {run_deletion, Key})`, which returns a ref that I keep in state so I can ignore timer messages with an older ref. Everything is done inside a gen_server where I load the table from disk after init. Also I have to write the table to disk after each update because it is very important that all events are handled.

I'm about to implement cancellation but the code becomes messy as I do both table management and data management in the same module.

I wonder if there are implementations of this pattern in the community ? I could use a priority queue, but it seems to me that the implementations use a fixed list of priorities, and not arbitrary priorities (like a timestamp).

Thank you
Reply | Threaded
Open this post in threaded view
|

Re: Custom timer module

Ulf Wiger-2
FWIW, the KVDB project [1] implemented both priority queues and timers (which quite flexible scheduling options), with persistence.

Example, from an Erlang Factory presentation 2013 [2]:

12> kvdb_cron:create_crontab(db, timers).
ok
13> kvdb_cron:add( db,timers,"{in 3 secs; 3 times}",[],kvdb_cron,testf,[]).
{ok,{q_key,<<>>,105729091563262,105729094536167}}
14>
CRON!! {{{2013,3,19},{21,38,14}},658320}
CRON!! {{{2013,3,19},{21,38,17}},655700}
CRON!! {{{2013,3,19},{21,38,20}},642523}

I haven't touched the code in a long time, but if there is sufficient interest, I could be persuaded to lend support. :)

BR,
Ulf W


Den mån 13 jan. 2020 kl 16:33 skrev [hidden email] <[hidden email]>:
Hello,

I'm looking for advice about implementing a timer queue in erlang or elixir.

Basically what I need is a queue where I can write a timestamp, a key, and a value. When the local time reaches the timestamp, I want to handle the key/value (I guess receiving a message with the key and fetch, or receiving the key/value).
The timer must be cancelable as the value is very small but will lead to heavy computation. I want to be able to fetch a key from outside of the process at any time and run the computation (and then cancel the scheduled computation).

This is a subset of the `timer` module. But I need to persist the data to disk and reload on application start.

I will have a tiny amount of entries : around 1000.

At the moment I have an ordered_set with `{Timestamp, UserKey} = Key` as keys, I lookup for the first key and send_after to self the `Key` with `erlang:start_timer(max(TimeStamp, 1), self(), {run_deletion, Key})`, which returns a ref that I keep in state so I can ignore timer messages with an older ref. Everything is done inside a gen_server where I load the table from disk after init. Also I have to write the table to disk after each update because it is very important that all events are handled.

I'm about to implement cancellation but the code becomes messy as I do both table management and data management in the same module.

I wonder if there are implementations of this pattern in the community ? I could use a priority queue, but it seems to me that the implementations use a fixed list of priorities, and not arbitrary priorities (like a timestamp).

Thank you
Reply | Threaded
Open this post in threaded view
|

Re: Custom timer module

ludovic@demblans.com
Hi Ulf, thank you very much.
I'm not sure I want to add a full application as a dependency at this time but I will definitely look at the code, this will be a good read !

-- lud

Le Lundi, Janvier 13, 2020 17:14 CET, Ulf Wiger <[hidden email]> a écrit:
 
FWIW, the KVDB project [1] implemented both priority queues and timers (which quite flexible scheduling options), with persistence.
 
Example, from an Erlang Factory presentation 2013 [2]:
 
12> kvdb_cron:create_crontab(db, timers).
ok
13> kvdb_cron:add( db,timers,"{in 3 secs; 3 times}",[],kvdb_cron,testf,[]).
{ok,{q_key,<<>>,105729091563262,105729094536167}}
14>
CRON!! {{{2013,3,19},{21,38,14}},658320}
CRON!! {{{2013,3,19},{21,38,17}},655700}
CRON!! {{{2013,3,19},{21,38,20}},642523}
 
I haven't touched the code in a long time, but if there is sufficient interest, I could be persuaded to lend support. :)
 
BR,
Ulf W
 
Reply | Threaded
Open this post in threaded view
|

Re: Custom timer module

Serge Aleynikov-3
In reply to this post by ludovic@demblans.com
There's a project https://github.com/erlware/erlcron that implements jobs with a timer.  It presently maintains persistence through a crontab config file, but it looks like you need something more advanced, so you can take a look at the code, the core of it is just in one module ecrn_agent, and perhaps add a persistence layer.

Alternatively, switching your existing implementation to store records in a mnesia disk table would likely be the easiest approach.

Best,

Serge

On Mon, Jan 13, 2020 at 10:33 AM [hidden email] <[hidden email]> wrote:
Hello,

I'm looking for advice about implementing a timer queue in erlang or elixir.

Basically what I need is a queue where I can write a timestamp, a key, and a value. When the local time reaches the timestamp, I want to handle the key/value (I guess receiving a message with the key and fetch, or receiving the key/value).
The timer must be cancelable as the value is very small but will lead to heavy computation. I want to be able to fetch a key from outside of the process at any time and run the computation (and then cancel the scheduled computation).

This is a subset of the `timer` module. But I need to persist the data to disk and reload on application start.

I will have a tiny amount of entries : around 1000.

At the moment I have an ordered_set with `{Timestamp, UserKey} = Key` as keys, I lookup for the first key and send_after to self the `Key` with `erlang:start_timer(max(TimeStamp, 1), self(), {run_deletion, Key})`, which returns a ref that I keep in state so I can ignore timer messages with an older ref. Everything is done inside a gen_server where I load the table from disk after init. Also I have to write the table to disk after each update because it is very important that all events are handled.

I'm about to implement cancellation but the code becomes messy as I do both table management and data management in the same module.

I wonder if there are implementations of this pattern in the community ? I could use a priority queue, but it seems to me that the implementations use a fixed list of priorities, and not arbitrary priorities (like a timestamp).

Thank you
Reply | Threaded
Open this post in threaded view
|

Re: Custom timer module

ludovic@demblans.com
Hi, thank you, mnesia could actually be a good fit. I would have to ensure that the schema is created only once (so not being part of the docker container initialization) but that would be better than using tab2file after every write !

I currently have a simple implementation that works well. The only problèm (really not a problem per se) is that it is just a narrow subset of the timer module. It is based on an ETS ordered_set used as a data structure with a peek function that either returns a timed-out entry or a delay to the next entry timeout ; and a gen_server that just handles `timeout` information and run the the next task and/or returns a timeout according to the delay.

As for cron-like application I found that they do not fit well because we sometimes have to pop entries from the table and run them regardless of actual scheduled time.

Thanks again

Le Dimanche, Janvier 19, 2020 15:17 CET, Serge Aleynikov <[hidden email]> a écrit:
 
There's a project https://github.com/erlware/erlcron that implements jobs with a timer.  It presently maintains persistence through a crontab config file, but it looks like you need something more advanced, so you can take a look at the code, the core of it is just in one module ecrn_agent, and perhaps add a persistence layer.
 
Alternatively, switching your existing implementation to store records in a mnesia disk table would likely be the easiest approach.
 
Best,
 
Serge
Reply | Threaded
Open this post in threaded view
|

Re: Custom timer module

Jesper Louis Andersen-2
On Mon, Jan 20, 2020 at 11:13 AM [hidden email] <[hidden email]> wrote:
Hi, thank you, mnesia could actually be a good fit. I would have to ensure that the schema is created only once (so not being part of the docker container initialization) but that would be better than using tab2file after every write !


The route I usually take is to create the schema and produce a FALLBACK.BUP file. You then attach persistent storage to your docker container and arrange that the storage starts out with the FALLBACK.BUP. On first startup, this file unpacks itself into your pristine mnesia schema. And from then on, you have persistent storage. It is usually fair easier to handle than trying to figure out if there is a schema and create it dynamically. There is a rough overview in the GraphQL tutorial I wrote a couple years back:

https://shopgun.github.io/graphql-erlang-tutorial/#_mnesia (scroll up a bit and also read "Mnesia initialization"

As for your problem:

* Think about system_time vs monotonic time. You probably want to track system_time for jobs since monotonic is not persistence-safe, but this creates trouble with leap seconds.
* Track UTC to avoid daylight saving time.
* system_time requires you to handle NTP time jumps. At least monitor them and log them. Think about the warp mode you want for the Erlang node.
* At 1000 entries, full scans to cancel stuff isn't a problem. If it ever becomes, store both {TS, K, V} and {K, TS} in the table. Then you can quickly cancel K.
* One process is authoritative for the table. This guarantees serialization.
* Spawn processes to handle K/V pairs. There are at most 1000 of them.
* ordered_set is the way to go. If we have Now, you wake up and find every record with TS < Now up to a limit (25, say). You then handle those and remove them. Cycle 25 at a time, since this limits a spawn-spike if you suddenly have many timers going off at the same time. Especially important if your work is resource-intensive. You can use e.g., monitors to track the worker-count.


Reply | Threaded
Open this post in threaded view
|

Re: Custom timer module

Roger Lipscombe-2
On Tue, 21 Jan 2020 at 13:45, Jesper Louis Andersen
<[hidden email]> wrote:
> * Track UTC to avoid daylight saving time.

Unless your timestamps are *intended* to be in user's local time. This
is a problem with (e.g.) remote management/deployment solutions: the
user actually *wants* their OS upgrade to be scheduled in their
timezone, according to the daylight saving rules for then, not now.
Reply | Threaded
Open this post in threaded view
|

Re: Custom timer module

Jesper Louis Andersen-2
On Tue, Jan 21, 2020 at 3:57 PM Roger Lipscombe <[hidden email]> wrote:
Unless your timestamps are *intended* to be in user's local time. This
is a problem with (e.g.) remote management/deployment solutions: the
user actually *wants* their OS upgrade to be scheduled in their
timezone, according to the daylight saving rules for then, not now.

That is a separate problem in my book.

Your system will have an internal representation, which you should treat as abstract/unknown. And there is an external representation which you use to enter and extract points-in-time. If you want an OS upgrade to happen at 02:00am ET, you then provide that to your marshal function, and it converts that to your internal representation. If you want this point-in-time in, say, CET, you then extract it from the internal representation into a string where CET is the timezone.

You can choose any internal representation and reference. {Mega, Secs, Micro}, nanoseconds, gregorian basepoint, 1/1 1970 basepoint, floating point, a record, ... as long as you can marhsal/unmarshal.

However, there are some internal representations which are going to be easier to work with. UNIX uses seconds after 1/1 1970, and keeping that at UTC makes things a tad easier as well in the long run. Consider if the machine roams. Now you can do:

$ TZ='EST' /bin/date
Tue 21 Jan 2020 10:32:40 AM EST
$ TZ='CET' /bin/date
Tue 21 Jan 2020 04:32:57 PM CET

This is also useful if multiple people are using the machine, but from different time zones. It is the same internal representation, but our presentation changes and switches based on where we are located. If you make the choice your internal representation moves with the timezone, and you have a roaming system, then you have to rewrite all timestamps on timezone changes. Or you have to keep a non-UTC timezone as the base, but this is not common. Non-UTC timezones are also a hassle if you ever need to combine log files from multiple data center locations each running with their own timezone.

My experience is that it is often easier to do this conversion in the border of your system and then work with a UTC-reference internally. Otherwise, you end up with the need to do conversions on demand in your code. This is certainly doable in languages with type systems, as the type system can protect you against incorrect use. But in a uni-typed world where everything is a term, you usually want a simple representation.




--
J.
Reply | Threaded
Open this post in threaded view
|

Re: Custom timer module

ludovic@demblans.com
In reply to this post by Jesper Louis Andersen-2
Thank you very much for those advices, I'll see if I can apply that to my code !

Cheers

Le Mardi, Janvier 21, 2020 14:44 CET, Jesper Louis Andersen <[hidden email]> a écrit:
 
On Mon, Jan 20, 2020 at 11:13 AM [hidden email] <[hidden email]> wrote:
Hi, thank you, mnesia could actually be a good fit. I would have to ensure that the schema is created only once (so not being part of the docker container initialization) but that would be better than using tab2file after every write !
 
 
The route I usually take is to create the schema and produce a FALLBACK.BUP file. You then attach persistent storage to your docker container and arrange that the storage starts out with the FALLBACK.BUP. On first startup, this file unpacks itself into your pristine mnesia schema. And from then on, you have persistent storage. It is usually fair easier to handle than trying to figure out if there is a schema and create it dynamically. There is a rough overview in the GraphQL tutorial I wrote a couple years back:
 
https://shopgun.github.io/graphql-erlang-tutorial/#_mnesia (scroll up a bit and also read "Mnesia initialization"
 
As for your problem:
 
* Think about system_time vs monotonic time. You probably want to track system_time for jobs since monotonic is not persistence-safe, but this creates trouble with leap seconds.
* Track UTC to avoid daylight saving time.
* system_time requires you to handle NTP time jumps. At least monitor them and log them. Think about the warp mode you want for the Erlang node.
* At 1000 entries, full scans to cancel stuff isn't a problem. If it ever becomes, store both {TS, K, V} and {K, TS} in the table. Then you can quickly cancel K.
* One process is authoritative for the table. This guarantees serialization.
* Spawn processes to handle K/V pairs. There are at most 1000 of them.
* ordered_set is the way to go. If we have Now, you wake up and find every record with TS < Now up to a limit (25, say). You then handle those and remove them. Cycle 25 at a time, since this limits a spawn-spike if you suddenly have many timers going off at the same time. Especially important if your work is resource-intensive. You can use e.g., monitors to track the worker-count.