snit (SNI Termination Library) to replace Nginx

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

snit (SNI Termination Library) to replace Nginx

Frank Muller
Hi guys

Anyone familiar with snit?
https://github.com/heroku/snit

We’re facing a performance issue with Nginx used as TLS Termination.
Nginx is in front of our two Erlang webapps. Both running on the same machine, and both based on Cowboy 2.7.0.

The problem:
[1] directly accessing the two webapps (plain HTTP) is fast enough for us, and Cowboy is doing just great.
[2] accessing any of the two apps with Nginx (HTTPS) is 3x-5x slower than  in [1]

We selected Nginx for its ability to hide our apps, and be able to access them both on port 443 (default HTTPS).

Our Nginx config is pretty simple, tuned for SSL/TLS.
______________________________________________
server {
        listen  443 ssl;

        server_name  app1.acme.com; # the 2nd webapp is running on: app2.acme.com

        ssl on;
        ssl_certificate         /etc/nginx/certs/app1/crt.pem;
        ssl_certificate_key /etc/nginx/certs/app1/key.pem;
        ssl_dhparam         /etc/nginx/certs/app1/dh.pem;

        ssl_protocols       TLSv1.2;

        ssl_prefer_server_ciphers on;

        ssl_ecdh_curve
        secp384r1

        ssl_session_cache shared:SSL:50m;
        ssl_session_timeout  1d;
        ssl_session_tickets off;

        ssl_stapling on;
        ssl_stapling_verify on;

        resolver 8.8.8.8 8.8.4.4 valid=300s;
        resolver_timeout 5s;

        ssl_buffer_size 8k; 

        keepalive_timeout 0;


        client_max_body_size 0;
        client_body_buffer_size 4m;
        client_header_timeout  300;
        client_body_timeout    300;
        client_header_buffer_size    1k;
        large_client_header_buffers  4 4k;

        location = /favicon.ico {
           access_log off;
           return 204;
        }

        location / {
           send_timeout           5;

           proxy_http_version 1.1;
           proxy_buffering off;
           proxy_request_buffering off;
           proxy_ignore_headers "Cache-Control" "Expires";
           proxy_max_temp_file_size 30m;
           proxy_connect_timeout 300;
           proxy_read_timeout 300;
           proxy_send_timeout 300;
           proxy_intercept_errors off;

           proxy_set_header        X-Real-IP       $remote_addr;
           proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

           proxy_pass http://127.0.0.1:2222; # the 2nd webapp has: proxy_pass http://127.0.0.1:3333;
        }
}
______________________________________________

Can snit be used to replace Nginx?
Help and suggestions appreciated.

Best
/Frank
Reply | Threaded
Open this post in threaded view
|

Re: snit (SNI Termination Library) to replace Nginx

Mikael Karlsson-7
Hi,
Did you try with proxy_buffering set to on, and/or changing the proxy_buffer_size?
Regards Mikael 


Den lör 9 nov. 2019 00:14Frank Muller <[hidden email]> skrev:
Hi guys

Anyone familiar with snit?
https://github.com/heroku/snit

We’re facing a performance issue with Nginx used as TLS Termination.
Nginx is in front of our two Erlang webapps. Both running on the same machine, and both based on Cowboy 2.7.0.

The problem:
[1] directly accessing the two webapps (plain HTTP) is fast enough for us, and Cowboy is doing just great.
[2] accessing any of the two apps with Nginx (HTTPS) is 3x-5x slower than  in [1]

We selected Nginx for its ability to hide our apps, and be able to access them both on port 443 (default HTTPS).

Our Nginx config is pretty simple, tuned for SSL/TLS.
______________________________________________
server {
        listen  443 ssl;

        server_name  app1.acme.com; # the 2nd webapp is running on: app2.acme.com

        ssl on;
        ssl_certificate         /etc/nginx/certs/app1/crt.pem;
        ssl_certificate_key /etc/nginx/certs/app1/key.pem;
        ssl_dhparam         /etc/nginx/certs/app1/dh.pem;

        ssl_protocols       TLSv1.2;

        ssl_prefer_server_ciphers on;

        ssl_ecdh_curve
        secp384r1

        ssl_session_cache shared:SSL:50m;
        ssl_session_timeout  1d;
        ssl_session_tickets off;

        ssl_stapling on;
        ssl_stapling_verify on;

        resolver 8.8.8.8 8.8.4.4 valid=300s;
        resolver_timeout 5s;

        ssl_buffer_size 8k; 

        keepalive_timeout 0;


        client_max_body_size 0;
        client_body_buffer_size 4m;
        client_header_timeout  300;
        client_body_timeout    300;
        client_header_buffer_size    1k;
        large_client_header_buffers  4 4k;

        location = /favicon.ico {
           access_log off;
           return 204;
        }

        location / {
           send_timeout           5;

           proxy_http_version 1.1;
           proxy_buffering off;
           proxy_request_buffering off;
           proxy_ignore_headers "Cache-Control" "Expires";
           proxy_max_temp_file_size 30m;
           proxy_connect_timeout 300;
           proxy_read_timeout 300;
           proxy_send_timeout 300;
           proxy_intercept_errors off;

           proxy_set_header        X-Real-IP       $remote_addr;
           proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

           proxy_pass http://127.0.0.1:2222; # the 2nd webapp has: proxy_pass http://127.0.0.1:3333;
        }
}
______________________________________________

Can snit be used to replace Nginx?
Help and suggestions appreciated.

Best
/Frank
Reply | Threaded
Open this post in threaded view
|

Re: snit (SNI Termination Library) to replace Nginx

Frank Muller
Hi Mikael

We mainly upload large files (20mB to 100mB) to our two webapps behind Nginx.

And yes, we tried these 2 options but they dont help in this situation. 

In our case, we are completely disabling buffering on Nginx (a feature introduced in version Nginx 1.7.3 taken from its fork Tengine at Taobao: 
https://tengine.taobao.org). Disabling buffer was a big win, but still much slower than direct HTTP access via cowboy.

How did we find out that Nginx was the culprit?
Simply by testing with another TLS termination proxy called Hitch (from Varnish):
Hitch is 1.5x - 2x slower than Cowboy. Unfortunately it only supports one upstream backend server at a time. Thus, we can’t serve our two webapps on port 443. Another constraint is that our two webapps has to run on the same host (a customer’s requirement).

Finally the system is not even under load. Maximum of 10 files upload per hour. 

Forgot to mention our config:

1. Erlang 22.1.6
2. Linux kernel 4.15.0.66 / Ubuntu LTS 18.10 x86_64
3. Physical machine: 32gB of RAM, 8-Cores Intel Xeon CPU E3-1270 v6@3.80GHz
4. Nginx V1.14.0
5. Sysctl tuned by our engineers for handle fast TCP connections with enough open files limits (ulimit -n: 200000)

/Frank

Le sam. 9 nov. 2019 à 03:58, Mikael Karlsson <[hidden email]> a écrit :
Hi,
Did you try with proxy_buffering set to on, and/or changing the proxy_buffer_size?
Regards Mikael 


Den lör 9 nov. 2019 00:14Frank Muller <[hidden email]> skrev:
Hi guys

Anyone familiar with snit?
https://github.com/heroku/snit

We’re facing a performance issue with Nginx used as TLS Termination.
Nginx is in front of our two Erlang webapps. Both running on the same machine, and both based on Cowboy 2.7.0.

The problem:
[1] directly accessing the two webapps (plain HTTP) is fast enough for us, and Cowboy is doing just great.
[2] accessing any of the two apps with Nginx (HTTPS) is 3x-5x slower than  in [1]

We selected Nginx for its ability to hide our apps, and be able to access them both on port 443 (default HTTPS).

Our Nginx config is pretty simple, tuned for SSL/TLS.
______________________________________________
server {
        listen  443 ssl;

        server_name  app1.acme.com; # the 2nd webapp is running on: app2.acme.com

        ssl on;
        ssl_certificate         /etc/nginx/certs/app1/crt.pem;
        ssl_certificate_key /etc/nginx/certs/app1/key.pem;
        ssl_dhparam         /etc/nginx/certs/app1/dh.pem;

        ssl_protocols       TLSv1.2;

        ssl_prefer_server_ciphers on;

        ssl_ecdh_curve
        secp384r1

        ssl_session_cache shared:SSL:50m;
        ssl_session_timeout  1d;
        ssl_session_tickets off;

        ssl_stapling on;
        ssl_stapling_verify on;

        resolver 8.8.8.8 8.8.4.4 valid=300s;
        resolver_timeout 5s;

        ssl_buffer_size 8k; 

        keepalive_timeout 0;


        client_max_body_size 0;
        client_body_buffer_size 4m;
        client_header_timeout  300;
        client_body_timeout    300;
        client_header_buffer_size    1k;
        large_client_header_buffers  4 4k;

        location = /favicon.ico {
           access_log off;
           return 204;
        }

        location / {
           send_timeout           5;

           proxy_http_version 1.1;
           proxy_buffering off;
           proxy_request_buffering off;
           proxy_ignore_headers "Cache-Control" "Expires";
           proxy_max_temp_file_size 30m;
           proxy_connect_timeout 300;
           proxy_read_timeout 300;
           proxy_send_timeout 300;
           proxy_intercept_errors off;

           proxy_set_header        X-Real-IP       $remote_addr;
           proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

           proxy_pass http://127.0.0.1:2222; # the 2nd webapp has: proxy_pass http://127.0.0.1:3333;
        }
}
______________________________________________

Can snit be used to replace Nginx?
Help and suggestions appreciated.

Best
/Frank
Reply | Threaded
Open this post in threaded view
|

Re: snit (SNI Termination Library) to replace Nginx

Dave Cottlehuber-5
On Sat, 9 Nov 2019, at 07:22, Frank Muller wrote:
> We mainly upload large files (20mB to 100mB) to our two webapps behind Nginx.
> ssl_prefer_server_ciphers on;
> ssl_ecdh_curve
> secp384r1

TLDR:

- use TLS1.3 if you can - most of the decisions have been made for you
- ensure your cipher choice is hardware accelerated in your openssl
- look at actual network traffic to see if there's any issues there
- no easy answers, benchmark your setup

I hope this helps point you in the right direction.

"SSL/TLS accounts for less than 1% of the CPU load, less than 10 KB of
memory per connection and less than 2% of network overhead."

-- https://www.imperialviolet.org/2010/06/25/overclocking-ssl.html

It should be possible to transfer traffic over TLS at rates significantly
faster than what you're reporting, obviously. However, I would be surprised
if nginx itself is the problem, given netflix can saturate their pipes with
nginx, admittedly with a lot of tweaking [1], [2] and a custom FreeBSD build.

I would first look to see if you can restrict your ciphers to provide better
performance for your hardware, and highly recommend capturing data with
tcpdump & wireshark to do some network level analysis. This will vary a lot
depending on what control you have over client TLS capabilities [3], and
if you have OpenSSL 1.1.x available, and perhaps http2 on clients also.

Intel's notes from 2016 [4] show a noticeable difference between algorithms
so you need to benchmark your load on your hardware.

Personally, for TLS termination I prefer haproxy[5] but all of hitch, nginx,
snit, haproxy should be able to achieve similar results.[6] is interesting
but 2.x haproxy handles multiple processes itself.

You can use https://www.ssllabs.com/ssltest/analyze.html or
https://github.com/drwetter/testssl.sh to help validate protocol choices.

Useful references:

- https://istlsfastyet.com/
- https://hpbn.co/transport-layer-security-tls/
- https://www.haproxy.com/knowledge-base/ssl/
- https://www.feistyduck.com/books/bulletproof-ssl-and-tls/ get the ebook
direct as amazon seems to have an out of date version

[0]: https://www.imperialviolet.org/2010/06/25/overclocking-ssl.html
[1]: https://openconnect.netflix.com/publications/asiabsd_2015_tls.pdf
[2]: https://openconnect.netflix.com/publications/asiabsd_tls_improved.pdf
[3]: https://wiki.mozilla.org/Security/Server_Side_TLS#Recommended_Ciphersuite
[4]: https://software.intel.com/en-us/articles/accelerating-ssl-load-balancers-with-intel-xeon-e5-v4-processors
[5]: https://www.haproxy.com/blog/haproxy-ssl-termination/
[6]: https://www.freecodecamp.org/news/how-we-fine-tuned-haproxy-to-achieve-2-000-000-concurrent-ssl-connections-d017e61a4d27/
[7]: https://www.ssllabs.com/ssltest/analyze.html
[8]: https://github.com/drwetter/testssl.sh
Reply | Threaded
Open this post in threaded view
|

Re: snit (SNI Termination Library) to replace Nginx

Fred Hebert-2
In reply to this post by Frank Muller
On 11/09, Frank Muller wrote:
>Anyone familiar with snit?
>https://github.com/heroku/snit
>

I'm one of the people who wrote it.

>We’re facing a performance issue with Nginx used as TLS Termination.
>Nginx is in front of our two Erlang webapps. Both running on the same
>machine, and both based on Cowboy 2.7.0.
>
>The problem:
>[1] directly accessing the two webapps (plain HTTP) is fast enough for us,
>and Cowboy is doing just great.
>[2] accessing any of the two apps with Nginx (HTTPS) is 3x-5x slower than
>in [1]
>

Chances are you might have some tuning issues regarding TLS,

If you nevertheless decide to benchmark snit and have it replace nginx,
be aware that snit only handles TLS termination with SNI, and is not a
general proxy; it was in fact a component that was used along with a
router that was built on top of vegur (https://github.com/heroku/vegur)
at Heroku.

As such, it wouldn't replace what nginx does for you. If you decide to
use snit, I would recommend using it to front the nginx instances you
would have anyway, to see if it can terminate TLS faster. But nginx does
other stuff, such as request buffering and offering forms of overload
protection your app would no longer have without it (or another actual
proxy server) offering protection.

Another thing you can do if you find that snit gives you good
performance is look with tcpdump or wireshark and see what TLS options,
extensions, ciphersuites, and elliptic curves are being chosen. Most of
the heavy cryptographic lifting is done by underlying C libraries, and
until you get similar priorities chosen by both servers, the comparison
will not be equitable.

If the settings are the same, then you are starting to compare apples
with apples and the higher-level code may be making a difference.

Regards,
Fred.