|
I've written a small program to demonstrate this:
---BEGIN PROGRAM--- -module(ghbm_bug_test). -compile(export_all). run(PortCount,Url) -> run(2000,PortCount,Url). run(Port,PortCount,Url) -> case Port < PortCount of true -> spawn(fun() -> loop(Port,Url) end), run(Port + 1,PortCount,Url); false -> ok end. loop(Port,Url) -> case Port of 2003 -> io:format("Doing connect\n"); _ -> ok end, Sock = gen_tcp:connect(Url, 8080, [], 5000), case Port of 2003 -> io:format("Did connect\n"); _ -> ok end, case (catch gen_tcp:close(Sock)) of _ -> ok end, loop(Port,Url). ---END PROGRAM--- This program takes a lot of time to connect when connecting via run(50000,"127.0.0.1") and yet connects quickly when I do run(50000, {127,0,0,1}). I am running it with a max number of processes of 1000000 (set via +P). After checking the states of processes I see that many of them spend time in: 3> i(0,200,0). [{current_function,{inet_gethost_native,getit,2}}, What are my possibilities for better DNS? I now that I can use erlang dns instead of native dns, this solves the problem for 127.0.0.1, but when I try a real address (i.e. run(50000,"some.server.of.mine.com")) connections are very slow with both native and erlang DNS. Any advice on a solution? ________________________________________________________________ erlang-questions (at) erlang.org mailing list. See http://www.erlang.org/faq.html To unsubscribe; mailto:[hidden email] |
|
On 8/2/11 12:50 AM, ori brost wrote:
> After checking the states of processes I see that many of them spend > time in: > 3> i(0,200,0). > [{current_function,{inet_gethost_native,getit,2}}, > > > What are my possibilities for better DNS? I now that I can use erlang > dns instead of native dns, this solves the problem for 127.0.0.1, but > when I try a real address (i.e. run(50000,"some.server.of.mine.com")) > connections are very slow with both native and erlang DNS. > > Any advice on a solution? You are doing DNS lookup for every port connect even though destination host does not change for whole PortCount. Did you consider doing DNS lookup before entering loop and cache the result? This would get you O(1) time for hostname lookup instead of O(n). BR, Ivan Ostres ________________________________________________________________ erlang-questions (at) erlang.org mailing list. See http://www.erlang.org/faq.html To unsubscribe; mailto:[hidden email] |
|
In reply to this post by ori brost
On 02-08 01:50, ori brost wrote:
> I've written a small program to demonstrate this: > > ---BEGIN PROGRAM--- > -module(ghbm_bug_test). > -compile(export_all). > > run(PortCount,Url) -> > run(2000,PortCount,Url). > > run(Port,PortCount,Url) -> > case Port < PortCount of > true -> > spawn(fun() -> loop(Port,Url) end), > run(Port + 1,PortCount,Url); > false -> > ok > end. > > loop(Port,Url) -> > case Port of > 2003 -> io:format("Doing connect\n"); > _ -> ok > end, > Sock = gen_tcp:connect(Url, 8080, [], 5000), > case Port of > 2003 -> io:format("Did connect\n"); > _ -> ok > end, > case (catch gen_tcp:close(Sock)) of _ -> ok end, > loop(Port,Url). > ---END PROGRAM--- > > > This program takes a lot of time to connect when connecting via > run(50000,"127.0.0.1") and yet connects quickly when I do run(50000, > {127,0,0,1}). I am running it with a max number of processes of > 1000000 (set via +P). > > After checking the states of processes I see that many of them spend > time in: > 3> i(0,200,0). > [{current_function,{inet_gethost_native,getit,2}}, > > > What are my possibilities for better DNS? I now that I can use erlang > dns instead of native dns, this solves the problem for 127.0.0.1, but > when I try a real address (i.e. run(50000,"some.server.of.mine.com")) > connections are very slow with both native and erlang DNS. > > Any advice on a solution? I will give some tips. Please folow them in order, as most simple tricks are given here first and should solve most of the problems. make sure '127.0.0.1 localhost' entry is in /etc/hosts, and /etc/nsswitch.conf isn't to complicated for 'hosts:' entry, preferably something like 'hosts: dns' should be all there). make also sure there is no "domain" or "search" or "options" in /etc/resolv.conf, (eventually "option inet6" can be accepted) and /etc/gai.conf is unchanged (in case you are using ipv6). If needed use NSCD (or better unscd, which have improved cache and multithreading and fault-tolerance), in front of libc. Tune its cache size if needed. (libc by itself do not cache answers, but 127.0.0.1 in /etc/hosts (which is cache AFAIK until next change), should be fast). If you are going to perform lots of dns lookups of remote/Internet hosts, I suggest you should install DNS rescursive server with cache, (there are lots of good servers) on local machine (or at least on local network). And set 'nameserver 127.0.0.1' or 'nameserver ::1' in /etc/resolv.conf If this will still be too slow, you can skip libc/nscd layer, by makeing Erlang connect directly to nameserver (preferably on same machine), and not use native resolver. See "ERTS User's Guide, Inet configuration". http://www.erlang.org/doc/apps/erts/inet_cfg.html Current configuration can be retrived by inet:get_rc(). You will basically need to put, for example: clear_hosts. clear_ns. clear_search. {resolv_conf, "/dev/null"}. % or any empty file, or just separate resolv.conf for erlang {hosts_file, "/dev/null"}. % or any empty file, or some file with few needed entries {nameserver, {127,0,0,1}}. %% or {nameserver, {127,0,0,1}, 53}., you can add more nameserver for fault-tolerance {lookup, [dns]}. {cache_size, 10000}. into some file (./somewhere/erl_inetrc), and perform, something like export ERL_INETRC=./somehwere/erl_inetrc before starting erlang virtual machine. (actuall even just {lookup, [file]}. should sufice, as erlang will autodetect nameserver from /etc/resolv.conf and entries from /etc/hosts, and even monitor this files for changes in run-time). Erlang dns resolver performs caching, it helps a lot, but have small cache by default. You can increase its size by adding {cache_size, 10000}. to erl_inetrc. (AFAIK if erlang uses native glibc resolver, it do not uses own cache, and leavs this to nscd/unscd or ther mechanisms in glibc or used servers. It is only used when erlang by itself connects to dns server, and not via glibc/nscd). Unfortunetly this cache do not cache negative answers, and probably also do not respect properly TTL values of RRs. And this is only real problem with it, as it works very nicely, and is well designed. This should help, and is very simple to configure in 15 minutes. So lets try. If this is still not sufficient, then you have bug somewhere, or you have realy strange kind of workload. Take look below, for further tips. If this will be still too slow, add Erlang application side cache to this of some kind. ETS table with some garbage collection or some kind of LRU cache, will be probably sufficient. There are many possible and/or existing solutions for caching in erlang. And lastly. If you would want even some more speed, you can run whole DNS server INSIDE erlang, (of course written in erlang), which will perform whole recursive resolving by itself from the root server down to destination and cache everything. (You can also prefill cache with known servers for all top level domains, as they changes rearly, there are only about 5000 of them, and will help a lot - you can use the same trick for locally installed DNS server/cache also). Actually it will not be a server, just a fully recursive resolver, becuase glibc and erlang aren't fully recursive. It unfortunetly will be slightly complex and I am not aware of any DNS server in erlang (beyond my own project which is in alpha stage, and not usefull for anything). I also do not think it will provide significant speedup for any practical workloads, over the already mentioned tips (caching server on local machine and dns lookup by erlang itself). It can eventually save some memory (which is good for caches), and slightly improve latency (by elimniating network traffic over loopback, and context switches). But we already talking about microseconds here, and it do not matter when actuall network traffic is in miliseconds range. (of course if you are not performing UDP send-only DDoS). It also do not helps, as erlang dns resolver already performs caching. Regards, Witek -- Witold Baryluk JID: witold.baryluk // jabster.pl ________________________________________________________________ erlang-questions (at) erlang.org mailing list. See http://www.erlang.org/faq.html To unsubscribe; mailto:[hidden email] |
|
On 02-08 17:56, Witold Baryluk wrote:
> On 02-08 01:50, ori brost wrote: > > I've written a small program to demonstrate this: > > > > What are my possibilities for better DNS? I now that I can use erlang > > dns instead of native dns, this solves the problem for 127.0.0.1, but > > when I try a real address (i.e. run(50000,"some.server.of.mine.com")) > > connections are very slow with both native and erlang DNS. > > > > Any advice on a solution? > > Erlang dns resolver performs caching, it helps a lot, but have small cache by default. > You can increase its size by adding {cache_size, 10000}. to erl_inetrc. > (AFAIK if erlang uses native glibc resolver, it do not uses own cache, > and leavs this to nscd/unscd or ther mechanisms in glibc or used servers. > It is only used when erlang by itself connects to dns server, > and not via glibc/nscd). > ... .... > If this will be still too slow, add Erlang application side cache to this > of some kind. > ETS table with some garbage collection or some kind of LRU cache, > will be probably sufficient. There are many possible and/or existing > solutions for caching in erlang. Last issues to know. You can of course go into some scalabilit issues with Erlang or own cache, especially if you are performing lots of requests for the same name from lots of different processes, just like you example (question how representative it is?). I'm not sure erlang resolver detects that, and performs only one request and rest waits for it to be available in cache (as it will be). Without this second call will disocver that there is no entry in cache, and start own request to nameserver. Similary next calls. We can call it race condition. This can drastically increase network usage. For example this (with erlang dns resolver): [ spawn(fun() -> inet_res:gethostbyname("www.gazeta.pl"), ok end) || X <- lists:seq(1,100) ], ok. will lead to over 100 request to local nameserver (i have {lookup, [dns]}, and {nameserver, {127,0,0,1}}. ). (how bad it will be depends actually if nameserver is on the same machine, and if server/cache software it runs is clever enaugh to not do same mistake as erlang does). If you will perform the same code above one more time, it will generate 0 requests. (This is of course becuase responses, even from locla server, especially if local server doesn't have cached them yet, comes with considerable latency, and all inet_res:gthostbyname are executed befor any answer arives, so no erlang cachs entries are populated to stop them from execuing actuall request). This can be solved by fixing erlang, or fixing you application side cache, becuase most often such queries originate from some structure of your queries and connections. Then simply have a two levels of caches. main being a erlang dns cache, and second being a lots of small caches, each dedicated to the some group of processes which performs releated tasks. For example as they crawl same subdomain, or retrivies all content (images, scripts), from single webpage. After all processes in group are done (you hae downloaded page and all it's images and scripts), you can discard a cache (simple dict, or gb_trese will probably suffice), by terminating this cache process. This is much more scalable solution. You can also enter some dark areas like scalability of ETS, or system limits of source udp ports and sockets. This is beyond scope of this problem, and is well explained in many other places. Thats all. :) -- Witold Baryluk JID: witold.baryluk // jabster.pl |
|
In reply to this post by Ivan Ostres
On Feb 8, 10:51 am, Ivan Ostres <[hidden email]> wrote: > You are doing DNS lookup for every port connect even though destination > host does not change for whole PortCount. Did you consider doing DNS > lookup before entering loop and cache the result? This would get you > O(1) time for hostname lookup instead of O(n). The destination host is load balancer. I hope this make things more clear. > BR, > Ivan Ostres > ________________________________________________________________ erlang-questions (at) erlang.org mailing list. See http://www.erlang.org/faq.html To unsubscribe; mailto:[hidden email] |
|
In reply to this post by Witold Baryluk
Witold posted an excellent summary. Just one more tip:
Having a DNS server faster than 10000 queries/second is not easy. (See the benchmarks of BIND, nsd/unbound, djbdns, and others for the details.) Kenji Rikitake ________________________________________________________________ erlang-questions (at) erlang.org mailing list. See http://www.erlang.org/faq.html To unsubscribe; mailto:[hidden email] |
| Powered by Nabble | Edit this page |
