I've implemented a simple form of rate limiting in this  little project (module http2smtp_rate). The module is probably not what you need but the idea behind it, could be used as
a starting point. The basic idea is the following:
Everytime a request comes in, the request process makes a gen_server call to this server (you could also spawn the limiter processes per source IP address for sharding). This request
will increment and return a counter (specific for this IP address). If the counter exceeds the limit, the request gets terminated immediately. The server has an internal timer that fires once per desired interval and resets all counters.
Global throttling might be challenging. You would need to run a proxy and define throttling policies there. I would discard this option unless you are talking about large service.
Let’s consider local throttling policies, where each api nodes maintain counters independently each other.
It was not clear do you want to reject request or put it into queue so that customers are still served but they latency grows.
There are few easy solutions, which does not require any Erlang
* Linux traffic shaping
If you keen to implement this feature in Erlang then please consider a circuit breaker pattern. You can create a fuse per IP address or sub network to reject a traffic.
> On 21 Sep 2018, at 9.40, Frank Muller <[hidden email]> wrote:
> Hi guys
> We have an HTTP web service based Cowboy which is receiving more and more traffic.
> We would like to implement a throttling mechanism per IP address to limit how many request/sec the same IP address can make.
> Is there any library out there for that?
> Or any (simple yet efficient) algorithm to implement?
> Ideas & feedabacks are very welcome.
> Thanks in advance.
> erlang-questions mailing list
> [hidden email] > http://erlang.org/mailman/listinfo/erlang-questions
Last time I had to do something similar I went with Cloudflare. It's been a few years now and it's still the cheaper option with no management overhad. If delegating this functionality to a SaaS is an option for you, it's worth considering.
The tricky question is about how much overload you might be expecting. The thing is that accepting an HTTP request and parsing it in order to know where it goes is a relatively costly operation (compared to just paying $2, getting a few thousand AWS instances, and throwing pre-built garbage queries at a server somewhere).
If what you're trying to do is absorb and track HTTP traffic to put a cap on even costlier operations at the back-end (or on servers upstream) such as costly processing or DB queries, then you don't necessarily need to be super fast; you just need to be cheaper than the big costly operations you're trying to protect against.
There are various strategies there depending on how slow you can afford to be, many of which have been described here:
using a gen_server to track and count (riskier on large servers as it may represent a single point of contention)
Jobs, which can be augmented with a lot of probes
Locks taken and handled with ETS
Circuit breakers to look at various metrics to force input to stall when becoming unstable
And at Heroku, we also developed a locking library (also in the blog post above) named canal_lock (https://github.com/heroku/canal_lock) which was aimed at being able to do lock-management for backend services where the number of available nodes changes and adapts over time. Still, if you can afford it, circuit breakers or frameworks like jobs do a good there since they can adapt with a lot of local-only information (and approaches such as https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease for self-regulation. Also look at CoDel and other algorithms)
But all of this won't necessarily go far if what you have is overload traffic that is too costly to handle directly in terms of "parsing all the HTTP Requests is already too much CPU". Then you need to move to cheaper means: track connections or per-domain traffic through SNI data rather than plain HTTP requests, and impose limits there. Turn on options such as TCP_DEFER_ACCEPT which tell the kernel not to return you an 'accepted' socket until it has data waiting on the line, which saves on context switching and prevents idle connections from going up to your app (but won't do much if you're using TLS or the PROXY protocol since they expect to send info down the line right away).
When that won't work, you'll need to go down to the OS-level. You can tweak the size of ACK and SYN queues for indiscriminate filtering, or use packet filtering rules where the kernel itself can do quick inspection and dropping of data you cannot accept based on some data they contain. If you're at that level though, you may need a second NIC that is privately accessible just in order to be able to go set and modify these rules while the public NIC is under stress. If that doesn't work, then you need to look at a broader scope in terms of what your data center can do, but I figure this gets far from what you needed here.
In any case, the critical bit is that your system can be modeled as a queuing system; if you want to be able to take control over its stability, you must have the ability to handle validation and expulsion of overflow tasks faster than they can arrive. If you can't do that, you will not control what it is that you drop in terms of overflow. Any point of asynchrony in your system can act as an open valve where a task is reported to go fast when it goes sit into a growing queue down somewhere, so you have to be careful and make sure that your control and filtering at the edge reflects your actual bottlenecks and loads down the line.
It attempts to detect outliers in consumers of resources and throttle them (in your use case, these could be IP addresses performing an unusually large number of requests.) It can also enforce a collective rate limit.