The databases are loaded into memory (mostly) as-is; reference counted binaries are shared with the
application callers using ETS tables, and the original binary search tree is used to lookup addresses.
The data for each entry is decoded on the fly upon successful lookups.
Because, as far as I know, ETS data resides in a separate chunk of memory. Is the idea that ETS will only contain references to original shared binary, but not the binary itself?
Have you checked / measured that it really works that way and not copying whole database to / from ETS for each lookup?
I didn't benchmark it. I looked around on the web for past
discussions on the matter and found a previous thread discussing this
particular problem, in which it was stated that a 6 word overhead would
be incurred for every lookup.
As we're talking about blobs
amounting to potentially dozens of megabytes, I felt very comfortable
with such an overhead. I took those statements as likely being the
truth, as the people involved seemed to know what they were talking
about, but I'd be the first to restructure the current architecture if
shown a better way.
On what request rates do you get performance problems with egeoip? we have never met them.
I have never reached the limit either, but it's a pattern that has previously led me to performance issues on other code very often. At this point it has become ingrained in me that, if I can avoid it at negligible cost to complexity / maintainability / extensibility, then that's what I'll do.
Lest someone consider it premature optimization, I rather see it as being considerate of my future self 6 months from now :-). But that's all very subjective, of course.
Locus 1.1.1 was released today. For those who don't have the existing thread handy, it's library for looking up geolocation / ASN of IP addresses, using MaxMind GeoLite2.
Added: - OTP 18, 19.0, 19.1 and 19.2 support (version 1.0.x required 19.3 or higher) - ability of consulting database metadata, source and version through `:get_info` - ability of subscribing database loader events - ability of specifying connect, download start and idle download timeouts - ability of turning off caching
Documentation was moved to HexDocs and test coverage was substantially increased.
Re: [ANN] locus: Geolocation and ASN lookup of IP addresses
Locus 1.6.0 was released today.
Added: - new API method for validating loaded databases (locus:analyze/1) - new command line tool supporting database validation
Changed: - safety of database HTTPS downloads was substantially improved by now rejecting expired certificates, mismatched hostnames, self-signed certificates or unknown certificate authorities - test coverage using MaxMind's test data was greatly extended - database decoder was thoroughly optimized - documentation was mildly improved
Fixed: - misguided rejection of UTF-8 strings with non-printable (but valid) codepoints - unnecessarily strict refusal to load 2.x database formats succeeding 2.0 - infinite recursion in maliciously crafted databases due to circular paths