The databases are loaded into memory (mostly) as-is; reference counted binaries are shared with the
application callers using ETS tables, and the original binary search tree is used to lookup addresses.
The data for each entry is decoded on the fly upon successful lookups.
Because, as far as I know, ETS data resides in a separate chunk of memory. Is the idea that ETS will only contain references to original shared binary, but not the binary itself?
Have you checked / measured that it really works that way and not copying whole database to / from ETS for each lookup?
I didn't benchmark it. I looked around on the web for past
discussions on the matter and found a previous thread discussing this
particular problem, in which it was stated that a 6 word overhead would
be incurred for every lookup.
As we're talking about blobs
amounting to potentially dozens of megabytes, I felt very comfortable
with such an overhead. I took those statements as likely being the
truth, as the people involved seemed to know what they were talking
about, but I'd be the first to restructure the current architecture if
shown a better way.
On what request rates do you get performance problems with egeoip? we have never met them.
I have never reached the limit either, but it's a pattern that has previously led me to performance issues on other code very often. At this point it has become ingrained in me that, if I can avoid it at negligible cost to complexity / maintainability / extensibility, then that's what I'll do.
Lest someone consider it premature optimization, I rather see it as being considerate of my future self 6 months from now :-). But that's all very subjective, of course.