|
I have a problem with ssl_esock in R12B4 (Linux 32 bit). Symptoms:
1. esock consumes 100% CPU usage 2. poll() spinning constantly with events=POLLIN|POLLRDNORM, revents=POLLIN|POLLRDNORM for the affected SSL fd 3. No other syscalls between polls in strace 4. netstat shows the TCP Rx queue growing for the socket 5. No data messages received at the socket owning erlang process (corollary of 3.) I don't yet have a test case to trigger it but it seems to occur after the remote SSL peer sends a moderate sized block of data (eg. 2kB). Google didn't turn up anything that looked like what I'm seeing and I can't find anything in mor recent OTP changelogs. Does anyone know of this bug and if there is a patch anywhere? -- Rich ________________________________________________________________ erlang-questions mailing list. See http://www.erlang.org/faq.html erlang-questions (at) erlang.org |
|
On Wed, Aug 5, 2009 at 11:38 AM, Richard Andrews<[hidden email]> wrote:
> I have a problem with ssl_esock in R12B4 (Linux 32 bit). Symptoms: > 1. esock consumes 100% CPU usage > 2. poll() spinning constantly with events=POLLIN|POLLRDNORM, > revents=POLLIN|POLLRDNORM for the affected SSL fd > 3. No other syscalls between polls in strace > 4. netstat shows the TCP Rx queue growing for the socket > 5. No data messages received at the socket owning erlang process > (corollary of 3.) > > I don't yet have a test case to trigger it but it seems to occur after > the remote SSL peer sends a moderate sized block of data (eg. 2kB). > > Google didn't turn up anything that looked like what I'm seeing and I > can't find anything in mor recent OTP changelogs. Does anyone know of > this bug and if there is a patch anywhere? I have analysed this bug. There is a fault in the interaction between esock_openssl.c and esock.c. The problem is triggered by bad SSL data over the TCP socket. The trigger is the remote peer behaving badly but the local program suffers catastrophic failure which is not acceptable. The openssl library correctly reports SSL_ERROR_SSL, but there is no way to propagate this back up to the main loop. A return value < 0 is taken to be a blocking artefact and is ignored under the assumption that it will be rectified by a future read. In this case there is a fatal SSL error which is unrecoverable. Calls to SSL_read() return -1 without reading from the fd. The calling code ignores this and goes around the loop again calling poll() which returns immediately because there is still unread data in the TCP Rx queue, etc. I think what needs to happen is that cp->eof or cp->bp needs to be set in response to SSL_ERROR_SSL so that the socket can be cleaned up gracefully. The code comments don't provide enough guidance about which I should set for this case. I'm hoping that someone familiar with the code can help me develop a patch which works the "right" way. Otherwise I'll just wing it. -- Rich ________________________________________________________________ erlang-questions mailing list. See http://www.erlang.org/faq.html erlang-questions (at) erlang.org |
| Powered by Nabble | Edit this page |
