Retrieving "semi-constant" data from a function versus Mnesia

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Retrieving "semi-constant" data from a function versus Mnesia

Peter Johansson
Hi to all fellow Erlang-users out there !

I'm working with a web-application prototype, powered by Erlang, at the current time .....
and is for the moment preoccupied with the choose of a suitable implementation regarding in-memory storage of shared/"top-level" configuration-data terms.

This configuration-data terms holds ejson-structures ( typically 5KB - 15KB in size ) and will be consulted by the majority of the request-related
processes based on cookie-association & parameters.

Since this configuration-data relatively rarely will undergo updates but still be read by almost every request-process I consider it's in-memory storage
implementation as highly significant for the process-efficiency over time & shifting payload-situations.

In the case of Mnesia the configuration-terms have to be retrieved by the means of transactions of table-records into the different process-heaps,
that means in-memory copy-operations which obviously will cause some overhead in the environment during the occurrence of peak-like situations.

The other case (the function case) is to template the updated ejson-structures (as the sole "return"-structures) into dedicated function-definitions,
each definition hold in it's own module, and then recompile & reload those modules programmatically via special update-functions. The up to date
configuration-data can then be retrieved by common processes as simple function-calls returning fixed data.

I assume/expect these sole ejson/"return"-structures to be stored into the constant-pools of the modules when these becomes loaded in memory.
In such case the call to any such function into a variable-bound should result in the creation of a memory-reference for that variable pointing to
the fixed structure in the module's constant-pool.

Retrieving the configuration-data in this later manner must be significantly more efficient compare to the case of transactions from Mnesia if considering
both the sizes of the data-structures & the frequency under which they will be consulted/read.

Is this assumption of mine correct/true or have I missed/overlooked something in my assessment of the situation ?


Sending my best regards to you erlangers reading this !  / Peter 

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Motiejus Jakštys
On Sat, May 9, 2015 at 2:35 PM, Peter Johansson <[hidden email]> wrote:
> Hi to all fellow Erlang-users out there !
>
> I'm working with a web-application prototype, powered by Erlang, at the
> current time .....
> and is for the moment preoccupied with the choose of a suitable
> implementation regarding in-memory storage of shared/"top-level"
> configuration-data terms.

This is a great question! However, the answer might be not what you
really expect.

The usual approach suggested by Erlang community is:
1. Build the simplest thing possible. It be sufficient for you in 95%
of the cases. Most likely, your bottle-neck will be somewhere else,
...
2. ... unless you figure out it's not by measuring it. Measure.
3. After (1) and (2) you know your bottle-neck, and have a much better
idea how performance can be improved. Also, when you have the
measurements and the infrastructure, it's much easier to just try
multiple approaches and pick the fastest one.

For me the simplest solution for your problem is to keep the
configuration where it belongs: the application environment (retrieve
with application:get_env/3).

--
Motiejus Jakštys
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Jay Nelson-2
In reply to this post by Peter Johansson
I did an experiment recently with app.config vs https://github.com/jaynel/vbisect
vs compiled behaviour implementation of configuration. I was worried that
ets locks and row contention would penalize app.config as the number of
workers accessing the same config scaled up. I only tried it on my small
4-core laptop so far, I expect different results on a 16-core or more server.
(And all my apprehension may be misplaced as the app.config ets table
is almost always exclusively read-only.)

In theory the code-compiled constants would have the most distributed
access, followed by the binary encoded dictionary (vbisect) because a
single binary lives on the heap and is accessed from multiple places with
only a pointer pass, and lastly app.config. Turns out that the compiled
version was the slowest for one reason I completely overlooked and was
quite surprised by: Module:config_fun/N is 6x slower than a compiled
module:config_fun/N call according to the documentation.

This was enough of an impact to reduce the number of transactions to
Cassandra in half! (I was testing with https://github.com/duomark/elysium
which dynamically manages distributed sockets with stochastic decay
and 10x or more config accesses per query.)  When testing with app.config
vs binary there is a slight trend to ets slowdown with worker scale up, but
not very significant and I would want to see on a bigger machine with more
cores what the impact is.

My guess is that mochiglobal would outperform compiled code, IF you
have a constant module name. I was using a behaviour so that I could
swap configs and therefore had to use Module:fn(…) calls everywhere.
This slowdown would impact any behaviour (e.g., gen_server, gen_fsm)
you implement if it is a hotspot, as the slow function call mechanism is
fundamental to behaviour implementation.

jay

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Joe Armstrong-2
In reply to this post by Peter Johansson
How large is the total data?

If it's small you could define this in a module and not use a database
or process at all.

-module(global_data).
-export(...)

data_one() ->
    ....

Then dynamically change and recompile the module. Any call
global_data:data_one() will pick up the value from the "latest version"

/Joe

On Sat, May 9, 2015 at 2:35 PM, Peter Johansson <[hidden email]> wrote:

> Hi to all fellow Erlang-users out there !
>
> I'm working with a web-application prototype, powered by Erlang, at the
> current time .....
> and is for the moment preoccupied with the choose of a suitable
> implementation regarding in-memory storage of shared/"top-level"
> configuration-data terms.
>
> This configuration-data terms holds ejson-structures ( typically 5KB - 15KB
> in size ) and will be consulted by the majority of the request-related
> processes based on cookie-association & parameters.
>
> Since this configuration-data relatively rarely will undergo updates but
> still be read by almost every request-process I consider it's in-memory
> storage
> implementation as highly significant for the process-efficiency over time &
> shifting payload-situations.
>
> In the case of Mnesia the configuration-terms have to be retrieved by the
> means of transactions of table-records into the different process-heaps,
> that means in-memory copy-operations which obviously will cause some
> overhead in the environment during the occurrence of peak-like situations.
>
> The other case (the function case) is to template the updated
> ejson-structures (as the sole "return"-structures) into dedicated
> function-definitions,
> each definition hold in it's own module, and then recompile & reload those
> modules programmatically via special update-functions. The up to date
> configuration-data can then be retrieved by common processes as simple
> function-calls returning fixed data.
>
> I assume/expect these sole ejson/"return"-structures to be stored into the
> constant-pools of the modules when these becomes loaded in memory.
> In such case the call to any such function into a variable-bound should
> result in the creation of a memory-reference for that variable pointing to
> the fixed structure in the module's constant-pool.
>
> Retrieving the configuration-data in this later manner must be significantly
> more efficient compare to the case of transactions from Mnesia if
> considering
> both the sizes of the data-structures & the frequency under which they will
> be consulted/read.
>
> Is this assumption of mine correct/true or have I missed/overlooked
> something in my assessment of the situation ?
>
>
> Sending my best regards to you erlangers reading this !  / Peter
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Michael Truog
In reply to this post by Jay Nelson-2
On 05/10/2015 01:04 PM, Jay Nelson wrote:
> I did an experiment recently with app.config vs https://github.com/jaynel/vbisect
> vs compiled behaviour implementation of configuration. I was worried that
You might want to try https://github.com/knutin/bisect too.


> ets locks and row contention would penalize app.config as the number of
> workers accessing the same config scaled up. I only tried it on my small
> 4-core laptop so far, I expect different results on a 16-core or more server.
> (And all my apprehension may be misplaced as the app.config ets table
> is almost always exclusively read-only.)
>
> In theory the code-compiled constants would have the most distributed
> access, followed by the binary encoded dictionary (vbisect) because a
> single binary lives on the heap and is accessed from multiple places with
> only a pointer pass, and lastly app.config. Turns out that the compiled
> version was the slowest for one reason I completely overlooked and was
> quite surprised by: Module:config_fun/N is 6x slower than a compiled
> module:config_fun/N call according to the documentation.
>
> This was enough of an impact to reduce the number of transactions to
> Cassandra in half! (I was testing with https://github.com/duomark/elysium
> which dynamically manages distributed sockets with stochastic decay
> and 10x or more config accesses per query.)  When testing with app.config
> vs binary there is a slight trend to ets slowdown with worker scale up, but
> not very significant and I would want to see on a bigger machine with more
> cores what the impact is.
>
> My guess is that mochiglobal would outperform compiled code, IF you
> have a constant module name. I was using a behaviour so that I could
> swap configs and therefore had to use Module:fn(…) calls everywhere.
> This slowdown would impact any behaviour (e.g., gen_server, gen_fsm)
> you implement if it is a hotspot, as the slow function call mechanism is
> fundamental to behaviour implementation.
>
> jay
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Benoit Chesneau-2
In reply to this post by Joe Armstrong-2


On Sun, May 10, 2015 at 10:23 PM Joe Armstrong <[hidden email]> wrote:
How large is the total data?

If it's small you could define this in a module and not use a database
or process at all.

How small should it be? What if the compiled bean is about 888 kilobytes?

- benoit

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Joe Armstrong-2
On Sun, May 10, 2015 at 10:32 PM, Benoit Chesneau <[hidden email]> wrote:

>
>
> On Sun, May 10, 2015 at 10:23 PM Joe Armstrong <[hidden email]> wrote:
>>
>> How large is the total data?
>>
>> If it's small you could define this in a module and not use a database
>> or process at all.
>
>
> How small should it be? What if the compiled bean is about 888 kilobytes?

It depends :-)

There are many factors involved:

   a) How much memory do you have
   b) How often do you want to change the configuration data
   c) How quick do you the change to be

Assume

  a) is a smallish number of GBytes (normal these days)
  b) is "relatively rarely" (I don't know what this means - (aside - this is why
      I always ask people to provide numbers in their questions))
     Say once a day
  c) is a few seconds

Then it should be perfectly doable. Making a module containing all the
config data
and recompiling when necessary is certainly the fastest way to access the data -
this has very fast reads but very slow writes - you could think of a
module as a database
optimized for reads but not writes :-)

/Joe



>
> - benoit
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Jay Nelson-2
In reply to this post by Peter Johansson
(Edited previous posts for relevance, including only Joe’s comments
interspersed with my responses.)
> If it's small you could define this in a module and not use a database
> or process at all.

> -module(global_data).
> -export(...)

> data_one() ->
>    ....

> Then dynamically change and recompile the module. Any call
> global_data:data_one() will pick up the value from the "latest version"

> /Joe
This is exactly the approach I was taking, but I assumed that there would be
various implementations so I used a behaviour to identify the specific config:


The data set is quite small as it is normal app configuration data.
I implemented in the most straightforward obvious way:


It turns out the Config_Module:fn(…) is the killer of performance. This alternative
approach is much more performant if you are ever faced with a similar situation
(example from some code I was browsing recently):


> Making a module containing all the config data
> and recompiling when necessary is certainly the fastest way
> to access the data -this has very fast reads but very slow writes - you could think of a
> module as a database optimized for reads but not writes :-)
So I thought as well, but you *must* call it with a compiled module name or
eliminate the performance benefit. I mostly wanted to guarantee distributed
lock-free code access for many concurrent readers to avoid contention, but
it proved the slowest of the 3 approaches when using a behaviour.

jay


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Peter Johansson
In reply to this post by Joe Armstrong-2
Hi Joe !

Thank you very much for a resolving answer/response indeed !

This is my debut / first ever communication/message-interchange with you by the way   :-)
( but I have been aware of Erlang  ....you ..and some of the other community-active users for a few years now ).

And also ... Thank you way ...way more for your contribution of/to a language being absolutely marvelously flexible & simplistic to use in real life prototyping/"testing out"-situations ( it even beats Python in this regard if you ask me ).


It struck me yesterday that I actually have two other more "generic" Erlang-question as well .....but which happen to fit in completely naturally & closely under this current question-thread by their nature .....so I put them here too. 

1:
Are fixed data, defined & compiled inside functions, always stored inside & referenced from the constant-pool of the particular modules regardless of what Erlang term-types/structures they holds ....or does it exist special situations when such fixed data becomes partly or fully copied into the heap/stack of the calling process ?

2:
In the current web-application project I work with (implemented on top of Yaws) I have the following type of function-call construct ( to achieve server-side method-dispatching )

Variable_module_name:fixed_function_name(),

According to the efficiency-guide of the Erlang-documentation this type of call-construct is more "expensive" (about six times) compare to a fully fixed name function-call.

In what sense is it more expensive ?  ....is it about the time-lag between the point when the VM catch/discovers this call-construct and the point when the functional content (the prepared sub-routines) actually can be executed ?


Once again ....thank you very much for contributing this language to the programmer-community.
Sending my best regards !

Peter , Lund Sverige    

2015-05-11 14:32 GMT+02:00 Joe Armstrong <[hidden email]>:
On Sun, May 10, 2015 at 10:32 PM, Benoit Chesneau <[hidden email]> wrote:
>
>
> On Sun, May 10, 2015 at 10:23 PM Joe Armstrong <[hidden email]> wrote:
>>
>> How large is the total data?
>>
>> If it's small you could define this in a module and not use a database
>> or process at all.
>
>
> How small should it be? What if the compiled bean is about 888 kilobytes?

It depends :-)

There are many factors involved:

   a) How much memory do you have
   b) How often do you want to change the configuration data
   c) How quick do you the change to be

Assume

  a) is a smallish number of GBytes (normal these days)
  b) is "relatively rarely" (I don't know what this means - (aside - this is why
      I always ask people to provide numbers in their questions))
     Say once a day
  c) is a few seconds

Then it should be perfectly doable. Making a module containing all the
config data
and recompiling when necessary is certainly the fastest way to access the data -
this has very fast reads but very slow writes - you could think of a
module as a database
optimized for reads but not writes :-)

/Joe



>
> - benoit


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Hynek Vychodil
In reply to this post by Jay Nelson-2
What prevents you using 

generic_config:data_one(Config, ...)

instead of 

Config_Module:data_one(...)

It is just one more step of indirection. It can be a viable way in the case when write operations are very rare.

Hynek

On Mon, May 11, 2015 at 8:02 PM, Jay Nelson <[hidden email]> wrote:
(Edited previous posts for relevance, including only Joe’s comments
interspersed with my responses.)
> If it's small you could define this in a module and not use a database
> or process at all.

> -module(global_data).
> -export(...)

> data_one() ->
>    ....

> Then dynamically change and recompile the module. Any call
> global_data:data_one() will pick up the value from the "latest version"

> /Joe
This is exactly the approach I was taking, but I assumed that there would be
various implementations so I used a behaviour to identify the specific config:


The data set is quite small as it is normal app configuration data.
I implemented in the most straightforward obvious way:


It turns out the Config_Module:fn(…) is the killer of performance. This alternative
approach is much more performant if you are ever faced with a similar situation
(example from some code I was browsing recently):


> Making a module containing all the config data
> and recompiling when necessary is certainly the fastest way
> to access the data -this has very fast reads but very slow writes - you could think of a
> module as a database optimized for reads but not writes :-)
So I thought as well, but you *must* call it with a compiled module name or
eliminate the performance benefit. I mostly wanted to guarantee distributed
lock-free code access for many concurrent readers to avoid contention, but
it proved the slowest of the 3 approaches when using a behaviour.

jay


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Richard A. O'Keefe-2
In reply to this post by Peter Johansson
> 2:
> In the current web-application project I work with (implemented on top of
> Yaws) I have the following type of function-call construct ( to achieve
> server-side method-dispatching )
>
> Variable_module_name:fixed_function_name(),
>
> According to the efficiency-guide of the Erlang-documentation this type of
> call-construct is more "expensive" (about six times) compare to a fully
> fixed name function-call.
>
> In what sense is it more expensive ?

Time.

> ....is it about the time-lag between
> the point when the VM catch/discovers this call-construct and the point
> when the functional content (the prepared sub-routines) actually can be
> executed ?

If you compile a little example using erlc +"'S'" and then poke
around in beam_emu.c, you'll see that the dynamic function call
takes a couple of C function calls to find the place to go.
One of them is or was erts_find_export_entry(module, function, arity),
surprise surprise, which looks the triple up in a hash table, so it's
fairly clear where the time is going.

Another approach would be to do something like this:
if module m exports fn, generate
    ' apply f'(X1, ..., Xn, m) -> m:f(X1, ..., Xn);
and translate a csll M:f(E1, ..., En) as
    ' apply f'(E1, ..., En, M).

Looking at ...:...(...) calls in an Erlang/OTP release,
I found
    17.5% ?MODULE:function  -- ?MODULE &c
    81.6%  module:function
     0.0%  module:Function  -- some but very few
     0.8%  Module:function
     0.0%  Module:Function  -- some but very few
These figures are a bit dodgy but wouldn't be too far wrong.
In any case they're static figures, not dynamic.

While I think this kind of call _could_ be faster, I suspect
that they are fast _enough_ given their rarity and the other
things going on in a program.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Joe Armstrong-2
In reply to this post by Peter Johansson
On Wed, May 13, 2015 at 1:00 AM, Peter Johansson <[hidden email]> wrote:
> Hi Joe !
>
> Thank you very much for a resolving answer/response indeed !
>
> This is my debut / first ever communication/message-interchange with you by
> the way   :-)
> ( but I have been aware of Erlang  ....you ..and some of the other
> community-active users for a few years now ).

Welcome to the Erlang mailing list, the place were we try to solve all the
world's programming problems :-)

>
> And also ... Thank you way ...way more for your contribution of/to a
> language being absolutely marvelously flexible & simplistic to use in real
> life prototyping/"testing out"-situations ( it even beats Python in this
> regard if you ask me ).
>
>
> It struck me yesterday that I actually have two other more "generic"
> Erlang-question as well .....but which happen to fit in completely naturally
> & closely under this current question-thread by their nature .....so I put
> them here too.
>
> 1:
> Are fixed data, defined & compiled inside functions, always stored inside &
> referenced from the constant-pool of the particular modules regardless of
> what Erlang term-types/structures they holds ....or does it exist special
> situations when such fixed data becomes partly or fully copied into the
> heap/stack of the calling process ?

Answer 1)

   You're not supposed to know. Your expectation should be that
fixed data will be compiled and implemented as efficiently as possible.
If this is not the case and your expectation is not met you should
complain loudly.

aside: this has happened on several occasions. If your expectations are
not met then tell us. Inefficient handling of fixed data is a bug.

Even if I told you the answer - then whatever you observe today might
not be true in a years time.

It seems to me to be crazy to optimise code on the basis of how it
performs *today* and expect these optimisation to be true in a few
years time. The hardware will change.

Optimisation should be the very last thing you do.

You should

    - write clean short easy to understand code

      (the goal is that *you* can understand your own code in a years time)

     <aside>

     How many of you can easily understand your own undocumented
     code a few months after you stopped working on it?

      I could have a book-long rant about this here, but the mail would
      take a year or so to compose.

      A quick 'ask-around' of my colleagues revealed that very few of them
     can  easily understand their own code a few months after they stop
     working on it.

     When you work on something it's in your cerebral cache - you write
     no documentation because it's "so obvious that it needs no explanation"

     You stop working on it - flush the cache - the next time you see
it you
     have to rebuild the cache - which takes a long time. Worse somebody
     else takes over and they have no cache to rebuild.

     I think it really takes years of practise to get the point where
you can write
     code, store it, and be reasonable confident that you will understand it
     in ten to twenty years time.

     40 years ago I wrote and distributed some code. A month
afterwards I got a bug report. I'd stopped working on the program.
There was no documentation and the code was *completely*
incomprehensible. A total rewrite with a far simpler program and
documentation was the result.

    You don't want to know how many projects I've seen that have ground into
the mud of incomprehensibility and been cancelled due to overwhelming
complexity.

    Programming is all about *understanding*

    Once you understand things you *can* write efficient code - but when you
    *do* understand you won't want to.

    But I digress ...

</aside>


    - measure, measure, measure
    - optimize if *absolutely* necessary

If you want your program to go a thousand times faster wait 10 years. Do
nothing and wait. This is by far the easiest way to optimize a program.

<aside> I stuck an SSD in my old macbook and doubled the memory now it
whizzes along - I can now throw away all my failed attempts to make
indexing software etc. go faster ..

It's hardware changes that makes software go faster (assuming
you've already found decent algorithms) - they that can first program
the million core NOC winds. We have tens to hundreds of billions
of computers sitting in idle loops - doing nothing for 99% of the time
and we still talk about making *this* computer that I have right now in
front of me go faster.

<aside>
   For a computation to take place, data and computation power
   must meet at the same place in time and space.
   This is why the "cloud" is popular - we'll send all our data and
   computations to the same place (the "cloud") - problem solved.
   Only it's not. Why move GBytes of data to the computation
   when we should move the computation to the data?
   We need to figure out how to easily move computations
    and use all these computers that are sitting around doing nothing
   rather than optimising any single computer ...
 </aside>

 Bit off topic, but I was talking about optimization, and I do feel we
optimize the wrong things ...

 The only think I want to optimize is the time taken to write a program
 not the execution speed of that program (come back Prolog all is forgiven :-)

  ...

</aside>


This is why having  small clean code base wins in the long run. Erlang
(today) is million of times faster than the mid 1980s version -
this speed up has *not* come from software optimisations but from hardware
changes.

The desire to resist optimize requires saintly dedication - you can
optimize if and only if your program becomes clearer shorter and more
beautiful.

Answer 2)

Measure measure measure

Answer 3)

Wait ten years

Answer 4)

Buy/Borrow a faster machine

Answer 5)

Yes (ish) it is my understanding that fixed data is special cased to keep
it off-stack and heap (or at least it should be)


>
> 2:
> In the current web-application project I work with (implemented on top of
> Yaws) I have the following type of function-call construct ( to achieve
> server-side method-dispatching )
>
> Variable_module_name:fixed_function_name(),
>
> According to the efficiency-guide of the Erlang-documentation this type of
> call-construct is more "expensive" (about six times) compare to a fully
> fixed name function-call.
>
> In what sense is it more expensive ?  ....is it about the time-lag between
> the point when the VM catch/discovers this call-construct and the point when
> the functional content (the prepared sub-routines) actually can be executed
> ?

measure ^ 3 (again)

Cheers

/Joe

>
>
> Once again ....thank you very much for contributing this language to the
> programmer-community.
> Sending my best regards !
>
> Peter , Lund Sverige
>
> 2015-05-11 14:32 GMT+02:00 Joe Armstrong <[hidden email]>:
>>
>> On Sun, May 10, 2015 at 10:32 PM, Benoit Chesneau <[hidden email]>
>> wrote:
>> >
>> >
>> > On Sun, May 10, 2015 at 10:23 PM Joe Armstrong <[hidden email]> wrote:
>> >>
>> >> How large is the total data?
>> >>
>> >> If it's small you could define this in a module and not use a database
>> >> or process at all.
>> >
>> >
>> > How small should it be? What if the compiled bean is about 888
>> > kilobytes?
>>
>> It depends :-)
>>
>> There are many factors involved:
>>
>>    a) How much memory do you have
>>    b) How often do you want to change the configuration data
>>    c) How quick do you the change to be
>>
>> Assume
>>
>>   a) is a smallish number of GBytes (normal these days)
>>   b) is "relatively rarely" (I don't know what this means - (aside - this
>> is why
>>       I always ask people to provide numbers in their questions))
>>      Say once a day
>>   c) is a few seconds
>>
>> Then it should be perfectly doable. Making a module containing all the
>> config data
>> and recompiling when necessary is certainly the fastest way to access the
>> data -
>> this has very fast reads but very slow writes - you could think of a
>> module as a database
>> optimized for reads but not writes :-)
>>
>> /Joe
>>
>>
>>
>> >
>> > - benoit
>
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Jay Nelson-2
In reply to this post by Hynek Vychodil

> On May 13, 2015, at 12:07 AM, Hynek Vychodil <[hidden email]> wrote:
>
> What prevents you using
>
> generic_config:data_one(Config, ...)
>
> instead of
>
> Config_Module:data_one(...)
>
> It is just one more step of indirection. It can be a viable way in the case when write operations are very rare.
>
> Hynek
>

I was writing an OSS library to be reused by others. I employed
erlang’s behaviour mechanism as the “most principled” approach
to genericity. Using erlang.mk or rebar expects the library to be
fetched and used without modification during the build process.
To make it performant with this approach, I have to have documentation
that instructs the user to mirror the example default, and use a
reserved constant pre-defined module name (hoping there is no conflict
with another module already defined in their application).

I will probably benchmark this one day, but using vbisect is a more
reasonable approach and so far the traditional app.config has not
proven to be a big penalty (maybe 10% slowdown, maybe nothing
significant, I don’t have definitive production numbers).

/aside

Another reply had mentioned knutin/bisect… Bisect requires
fixed length key/value, so while it may be faster (I would expect
it’s possible but I don’t know) it has a larger memory usage and
when including things like URLs as values, that memory waste
could be vast for other values like integers. Vbisect is Kresten
Krab’s inspiration from bisect re-implemented with variable-sized
keys and values. He wanted a generic technique to store data in
Hanoi DB which is his erlang implementation of Level DB, a log
merge KV store. Performance testing and PropEr tests give results
that are similar to ets with 10K or so keys in a single dictionary,
although it will slow down with more data as it is a sorted tree of KV.
My fork adds more structure to the code, PropEr testing and some
additional API calls to mirror dict and friends of erlang, so it can be
used as a swap out alternative.

jay

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Jay Nelson-2
In reply to this post by Hynek Vychodil

> On May 13, 2015, at 12:07 AM, Hynek Vychodil <[hidden email]> wrote:
>
> What prevents you using
>
> generic_config:data_one(Config, ...)
>
> instead of
>
> Config_Module:data_one(...)
>
> It is just one more step of indirection. It can be a viable way in the case when write operations are very rare.
>
> Hynek

You’ve inspired me to use a Macro and a build hook to generate
a single line include file with the user-provided module name.
Hopefully I can make it simple enough for 3rd party users to
customize.

Thanks!
jay

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Jay Nelson-2
In reply to this post by Peter Johansson
ROK wrote:

> One of them is or was erts_find_export_entry(module, function, arity),
> surprise surprise, which looks the triple up in a hash table, so it’s
> fairly clear where the time is going.

It’s funny because I recall from the CLOS "Art of the Metaobject Protocol”
that this was a high-speed solution to the method dispatch problem. The
memoization was bragging rights and what made it feasible.

> While I think this kind of call _could_ be faster, I suspect
> that they are fast _enough_ given their rarity and the other
> things going on in a program.

The OTP baseline is not reflective of application developers and their
expected frequent use of behaviours. These always use some form
of Module:function(…).  Or maybe the few places in the OTP baseline
are invoked frequently from a gen_* during normal operation (as you
said, there is a difference between static and dynamic analysis). But as
you also say, if it were really a _practical_ problem the complaints would
cause optimization patches.

jay

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Richard A. O'Keefe-2

On 14/05/2015, at 10:58 am, Jay Nelson <[hidden email]> wrote:

> ROK wrote:
>
>> One of them is or was erts_find_export_entry(module, function, arity),
>> surprise surprise, which looks the triple up in a hash table, so it’s
>> fairly clear where the time is going.
>
> It’s funny because I recall from the CLOS "Art of the Metaobject Protocol”
> that this was a high-speed solution to the method dispatch problem. The
> memoization was bragging rights and what made it feasible.

Memoization in dynamic function calling is old technology.
As I recall, Smalltalk-80 used a cache for dynamic dispatch,
and later implementations used distributed "polymorphic inline
caches".

Simple inline caching would turn

    Module:func(E1, ..., En)
into
    static atomic { mod, ptr } cache;
    atomic {
        if (Module == cache.mod) {
            p = cache.ptr;
        } else {
            p = lookup(Module, func, n);
            cache.mod = Module;
            cache.ptr = p;
        }
    }
    (*p)(E1, ..., En);

One form of polymorphic inline caching would yield

    static atomic { mod, ptr } cache[N];
    q = &cache[hash(Module)%N];
    atomic {
        if (Module = q->mod) {
            p = q->ptr;
        } else {
            p = lookup(Module, func, n);
            q->mod = Module;
            q->ptr = p;
        }
    }
    (*p)(E1, ..., En);

This is typically implemented so that the cache can grow.

Doing this in a lock-free way is a bit tricky, but possible.

My actual point was that since a dynamic call in Erlang
involves at least two C function calls in addition to the
intended Erlang call, 6 times slower than a direct call
is not unreasonable.

Oh, that's emulated.  I don't know what HiPE does with
dynamic function calls, but it *probably* makes normal
calls faster without touching dynamic ones much.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving "semi-constant" data from a function versus Mnesia

Benoit Chesneau-2
In reply to this post by Joe Armstrong-2


On Mon, May 11, 2015 at 2:32 PM Joe Armstrong <[hidden email]> wrote:
On Sun, May 10, 2015 at 10:32 PM, Benoit Chesneau <[hidden email]> wrote:
>
>
> On Sun, May 10, 2015 at 10:23 PM Joe Armstrong <[hidden email]> wrote:
>>
>> How large is the total data?
>>
>> If it's small you could define this in a module and not use a database
>> or process at all.
>
>
> How small should it be? What if the compiled bean is about 888 kilobytes?

It depends :-)

There are many factors involved:

   a) How much memory do you have
   b) How often do you want to change the configuration data
   c) How quick do you the change to be

Assume

  a) is a smallish number of GBytes (normal these days)
  b) is "relatively rarely" (I don't know what this means - (aside - this is why
      I always ask people to provide numbers in their questions))
     Say once a day
  c) is a few seconds

Then it should be perfectly doable. Making a module containing all the
config data
and recompiling when necessary is certainly the fastest way to access the data -
this has very fast reads but very slow writes - you could think of a
module as a database
optimized for reads but not writes :-)

/Joe



I see :) Thanks for the answer!

I'm already using such technic to speed access to unicode data in my IDNA module [1] but always wondered if there is not a more efficient approach since the compiled beam is quite big though less than 1MB. In term of speed it's better that using ETS though I don't have the exact figures right now.

- benoit


 

>
> - benoit

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions