Quantcast

Best way to implement a simple cache

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Best way to implement a simple cache

Jan Vincent Liwanag
Hi guys,

I wanted your take on how to implement a simple small key-value cache,  
maybe holding around a few hundred tuples. The thing is, I don't want  
the cache to be consuming all my memory so that entries in the cache  
expires if it isn't read for some time or some maximum size is met.

Jan Vincent Liwanag
[hidden email]





________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Best way to implement a simple cache

Jachym Holecek-3
Hello,

# Jan Vincent 2009-11-10:
> I wanted your take on how to implement a simple small key-value
> cache, maybe holding around a few hundred tuples. The thing is, I
> don't want the cache to be consuming all my memory so that entries
> in the cache expires if it isn't read for some time or some maximum
> size is met.

An ETS table owned by a gen_server that runs periodic cleanup on it?

        -- Jachym

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Best way to implement a simple cache

Garrett Smith-5
On Tue, Nov 10, 2009 at 11:25 AM, Jachym Holecek <[hidden email]> wrote:

> Hello,
>
> # Jan Vincent 2009-11-10:
>> I wanted your take on how to implement a simple small key-value
>> cache, maybe holding around a few hundred tuples. The thing is, I
>> don't want the cache to be consuming all my memory so that entries
>> in the cache expires if it isn't read for some time or some maximum
>> size is met.
>
> An ETS table owned by a gen_server that runs periodic cleanup on it?
>

I find myself writing purpose built gen_servers that maintain the
cache their state for this sort of thing. To 'expire' items in the
cache, you could run another process as a timer that calls an expire
method on the gen_server. There are more moving parts here, but
they're decoupled and avoid using ETS for what's a pretty simple
caching requirement.

Garrett

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Best way to implement a simple cache

Jan Vincent Liwanag
Why wouldn't an ETS be good for this?

On Nov 11, 2009, at 1:45 AM, Garrett Smith wrote:

> On Tue, Nov 10, 2009 at 11:25 AM, Jachym Holecek <[hidden email]> wrote:
>> Hello,
>>
>> # Jan Vincent 2009-11-10:
>>> I wanted your take on how to implement a simple small key-value
>>> cache, maybe holding around a few hundred tuples. The thing is, I
>>> don't want the cache to be consuming all my memory so that entries
>>> in the cache expires if it isn't read for some time or some maximum
>>> size is met.
>>
>> An ETS table owned by a gen_server that runs periodic cleanup on it?
>>
>
> I find myself writing purpose built gen_servers that maintain the
> cache their state for this sort of thing. To 'expire' items in the
> cache, you could run another process as a timer that calls an expire
> method on the gen_server. There are more moving parts here, but
> they're decoupled and avoid using ETS for what's a pretty simple
> caching requirement.
>
> Garrett

Jan Vincent Liwanag
[hidden email]





________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Best way to implement a simple cache

Garrett Smith-5
> On Nov 11, 2009, at 1:45 AM, Garrett Smith wrote:
>
>> On Tue, Nov 10, 2009 at 11:25 AM, Jachym Holecek <[hidden email]> wrote:
>>> Hello,
>>>
>>> # Jan Vincent 2009-11-10:
>>>> I wanted your take on how to implement a simple small key-value
>>>> cache, maybe holding around a few hundred tuples. The thing is, I
>>>> don't want the cache to be consuming all my memory so that entries
>>>> in the cache expires if it isn't read for some time or some maximum
>>>> size is met.
>>>
>>> An ETS table owned by a gen_server that runs periodic cleanup on it?
>>>
>>
>> I find myself writing purpose built gen_servers that maintain the
>> cache their state for this sort of thing. To 'expire' items in the
>> cache, you could run another process as a timer that calls an expire
>> method on the gen_server. There are more moving parts here, but
>> they're decoupled and avoid using ETS for what's a pretty simple
>> caching requirement.

On Wed, Nov 11, 2009 at 9:41 AM, Jan Vincent <[hidden email]> wrote:
> Why wouldn't an ETS be good for this?

I'm sure it'd be fine for it :)

I personally like to avoid using the "global" structures for data
management whenever I can get by with a process. I like being able to
look at a function header and know that I'm seeing everything that
could effect the result.

That said, I'm just not experienced with the various patterns in
Erlang to have strong opinions either way. Just my inclination.

Garrett

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Best way to implement a simple cache

Michael McDaniel-4
On Wed, Nov 11, 2009 at 10:23:01AM -0600, Garrett Smith wrote:

> > On Nov 11, 2009, at 1:45 AM, Garrett Smith wrote:
> >
> >> On Tue, Nov 10, 2009 at 11:25 AM, Jachym Holecek <[hidden email]> wrote:
> >>> Hello,
> >>>
> >>> # Jan Vincent 2009-11-10:
> >>>> I wanted your take on how to implement a simple small key-value
> >>>> cache, maybe holding around a few hundred tuples. The thing is, I
> >>>> don't want the cache to be consuming all my memory so that entries
> >>>> in the cache expires if it isn't read for some time or some maximum
> >>>> size is met.
> >>>
> >>> An ETS table owned by a gen_server that runs periodic cleanup on it?
> >>>
> >>
> >> I find myself writing purpose built gen_servers that maintain the
> >> cache their state for this sort of thing. To 'expire' items in the
> >> cache, you could run another process as a timer that calls an expire
> >> method on the gen_server. There are more moving parts here, but
> >> they're decoupled and avoid using ETS for what's a pretty simple
> >> caching requirement.
>
> On Wed, Nov 11, 2009 at 9:41 AM, Jan Vincent <[hidden email]> wrote:
> > Why wouldn't an ETS be good for this?
>
> I'm sure it'd be fine for it :)
>
> I personally like to avoid using the "global" structures for data
> management whenever I can get by with a process. I like being able to
> look at a function header and know that I'm seeing everything that
> could effect the result.
>
> That said, I'm just not experienced with the various patterns in
> Erlang to have strong opinions either way. Just my inclination.
>
> Garrett
>
________________________________________________________________


 ets can be private to a particular process ;



-module(session).

-export( [start/2, loop/2] ).


start( Name, Timeout ) -> spawn( fun() -> init( Name, Timeout ) end)  .
 

init( Name, Timeout )  ->
   register( Name, self() ) ,
   Tbl = Name ,

   ets:new( Tbl, [named_table, private] ) ,
   ets:insert( Tbl, {start, httpd_util:rfc1123_date()} ) ,

   ?MODULE:loop( Name, Timeout )

.%init/1



loop( Name, Timeout ) ->

   Tbl = Name ,
   [{start, Start}] = ets:lookup( Tbl, start) ,

   receive
      {From, _Msg} ->  
         io:fwrite("hello ~p, I started at: ~p~n", [From, Start]) ,

         loop( Name, Timeout )

      after Timeout ->
          io:fwrite("tired now, started at: ~p~n", [Start])

   end
.%loop/2


1> c(session).
2> session:start(fu, 10000).  
<0.42.0>
3> fu ! {self(), hello}.
hello <0.35.0>, I started at: "Wed, 11 Nov 2009 17:11:11 GMT"
{<0.35.0>,hello}
4> ets:lookup(fu, start).
** exception error: bad argument
     in function  ets:lookup/2
        called as ets:lookup(fu,start)
5>
tired now, started at: "Wed, 11 Nov 2009 17:11:11 GMT"



 $ erl  -man  ets


search for private



--
Michael McDaniel
Portland, Oregon, USA
http://trip.autosys.us


________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Best way to implement a simple cache

Ngoc Dao
In reply to this post by Jan Vincent Liwanag
See "Why not use ETS to do this?" part of Cherly:
http://github.com/cliffmoon/cherly


On Thu, Nov 12, 2009 at 12:41 AM, Jan Vincent <[hidden email]> wrote:
> Why wouldn't an ETS be good for this?

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Best way to implement a simple cache

Thijs Terlouw
> > Why wouldn't an ETS be good for this?

I think ETS is better for 'get' operations than going through a
managing process:

A process-approach suffers from sequential access. This might not be
so important since lookups can be very fast, but worth keeping in
mind. A public/protected ETS table with one owning manager-process
doesn't suffer from this problem. If you want to do LRU-eviction
strategy or something like that, you could let your 'set' operations
still go through the gen_server (and evict if required) and 'get'
operations go directly to the ETS table.

Or use Cherly as noted above :)




________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Re: Best way to implement a simple cache

Scott Lystig Fritchie
Thijs <[hidden email]> wrote:

> A process-approach suffers from sequential access. This might not be
> so important since lookups can be very fast, but worth keeping in
> mind. A public/protected ETS table with one owning manager-process
> doesn't suffer from this problem.

... except that public/protected ETS table access isn't free, either.
The VM has to play games with controlling multi-scheduler access to
ETS's underlying hash table/binary tree.  I haven't measured with an
R13B release, but with R12B-5 the cost was definitely non-zero and
significant enough for us to change implementation in one case.

-Scott

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Best way to implement a simple cache

Thijs Terlouw
On Nov 13, 8:36 am, Scott Lystig Fritchie <[hidden email]>
wrote:
> ... except that public/protected ETS table access isn't free, either.
> The VM has to play games with controlling multi-scheduler access to
> ETS's underlying hash table/binary tree.  I haven't measured with an
> R13B release, but with R12B-5 the cost was definitely non-zero and
> significant enough for us to change implementation in one case.

You are right, but in my experience a public ETS table significantly
outperforms process-mediated access. I'm curious how did you optimize
this further?
I found that accessing the ETS table is quite cheap (not zero indeed),
but the cost mainly depends on the size of the data you are accessing.
I have small values for each key and then access is really very fast,
without the complexities of the process (and queueing messages in it's
mailbox).



________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Re: Best way to implement a simple cache

Matt Stancliff

On Nov 12, 2009, at 5:17 PM, Thijs wrote:

> On Nov 13, 8:36 am, Scott Lystig Fritchie <[hidden email]>
> wrote:
>> ... except that public/protected ETS table access isn't free, either.
>> The VM has to play games with controlling multi-scheduler access to
>> ETS's underlying hash table/binary tree.  I haven't measured with an
>
> You are right, but in my experience a public ETS table significantly
> outperforms process-mediated access. I'm curious how did you optimize
> this further?

   Let's solve this with numbers!

Test                    ||  Multiplier  || Percent Slowdown || Reads  
Per Second
=
=
=
=
=
=
=
========================================================================
ets plain                       1.00x           0%                  
553,384.3
ets public                      1.01x           1%                  
550,329.3
ets public named table          1.10x          10%                  
502,504.7
mnesia ram copies               1.87x          87%                  
296,011.2
process based cache             9.86x         886%                    
56,141.6
BDB driver                     56.37x       5,537%                    
9,816.3

While I have my benchmarking hat on, let's check function calls too:
Call Method             ||  Multiplier  || Percent Slowdown || Calls  
Per Second
=
=
=
=
=
=
=
========================================================================
apply(?MODULE, foo, [])        1.00x            0%                  
96,089,747.8
?MODULE:foo()                  1.01x            1%                  
94,809,477.4
foo()                          1.09x            9%                  
88,485,052.9
fun() -> ok end()              3.15x          215%                  
30,489,431.6
apply(foo, [])                 3.50x          250%                  
27,481,044.5
Module:Fun()                   5.22x          422%                  
18,392,358.4

The slowest method of function invocation still yields 18+ million  
calls per second.


Notes:
   • Tests were run on my ancient (almost four year old) iMac (Core  
Duo 2GHz):
Erlang R13B02 (erts-5.7.3) [source] [smp:2:2] [rq:2] [async-threads:0]  
[kernel-poll:false]
   • No scheduler binding to CPUs.
   • For the function call benchmark: foo() -> ok.  Module = ?MODULE,  
Fun = foo.
   • The "process based cache" is my fleshed out production version of  
the duomark process cache.

Disclaimer: The results vary in absolute numbers for multiple runs of  
the benchmark,
but results retain their relative positions in the rankings each time.
I'll get around to making a statistically valid erlang benchmark suite  
one day.


-Matt
--
Matt Stancliff                    San Jose, CA
AIM: seijimr              iPhone: 678-591-9337
"The best way to predict the future is to invent it." --Alan Kay


________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Best way to implement a simple cache

Thijs Terlouw
On Nov 13, 1:10 pm, Matt Stancliff <[hidden email]> wrote:
>    Let's solve this with numbers!

Is this a sequential or parallel benchmark? I suspect it is sequential
access?

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: Re: Best way to implement a simple cache

dmercer
In reply to this post by Matt Stancliff
Matt Stancliff wrote:

> While I have my benchmarking hat on, let's check function calls too:
> Call Method             ||  Multiplier  || Percent Slowdown || Calls
> Per Second
> =
> =
> =
> =
> =
> =
> =
> ========================================================================
> apply(?MODULE, foo, [])        1.00x            0%
> 96,089,747.8
> ?MODULE:foo()                  1.01x            1%
> 94,809,477.4
> foo()                          1.09x            9%
> 88,485,052.9
> fun() -> ok end()              3.15x          215%
> 30,489,431.6
> apply(foo, [])                 3.50x          250%
> 27,481,044.5
> Module:Fun()                   5.22x          422%
> 18,392,358.4

Is this a surprise to anyone else?  I myself would have expected foo() to be
the fastest, since it knows exactly which function to call, whereas the two
that rank above it have to look for the most recent version of foo/0.
Anyone have an explanation for this?



________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Best way to implement a simple cache

ricardo_bcl
This post has NOT been accepted by the mailing list yet.
In reply to this post by Jan Vincent Liwanag
I decided to implement a very simple cache mechanism using ETS:

https://github.com/ricardobcl/ETScache

For now, it only controls the number of elements in cache, because it's faster and it did what I wanted, but can be easily modified to prune it terms of bytes.

Jan Vincent Liwanag wrote
Hi guys,

I wanted your take on how to implement a simple small key-value cache,  
maybe holding around a few hundred tuples. The thing is, I don't want  
the cache to be consuming all my memory so that entries in the cache  
expires if it isn't read for some time or some maximum size is met.

Jan Vincent Liwanag
[hidden email]





________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org
Loading...