The missing UDP packets

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

The missing UDP packets

Alexander Petrovsky-2
Hello!

The first of all, my erlang version OTP-21.3 (without any minor patches)

I've got the strange floating problem with UDP to receive, which I couldn't debug by myself. Let me describe briefly what I have: There is UDP socket which used for send packets to many peers and receives responses from them (rate ~1pps):

send request: SIP options
receive a response: 100
receive a response: 200

In erlang it's implemented like gen_server which receives messages, the sockets options are: binary, {active, once}.

Every time, when the message is received from the mailbox, code try to do some job, after that try to handle message bursts by calling in the loop (by example 5 times):
gen_udp:recv(Socket, 0, 0), bursts and after that set the socket option {active, once} back.

So, here I'm observing floating strange behavior, from time to time the second!! and always the second!! response is absent. I see this UDP missing packets in tcpdump, I don't see any drops in UDP stack at all.

When I'm trying to add some logs into my erlang code, the problem disappears, when I'm run strace to recvfrom syscall the problem disappears... I't try to debug VM with gdb and make some breakpoints on sock_recvfrom in inet_drv.c (it's OTP-21.3), but the problem disappears (((

I can't find the problem root cause, is it in my code or is it in VM? Maybe someone could help me with that? 

--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991

Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Paul Peregud-2
AFAIK UDP packet can be dropped by anyone. By intermediate nodes and
by your kernel. I would just assume that they can be dropped by your
BEAM VM too. I also don't think that "it's always the second packet"
behavior is strange. Play around with kernel buffers sizes and you
will likely observe some change.

Assumption that UDP packet that was sent will be delivered is wrong.

On Tue, Feb 4, 2020 at 8:32 AM Alexander Petrovsky <[hidden email]> wrote:

>
> Hello!
>
> The first of all, my erlang version OTP-21.3 (without any minor patches)
>
> I've got the strange floating problem with UDP to receive, which I couldn't debug by myself. Let me describe briefly what I have: There is UDP socket which used for send packets to many peers and receives responses from them (rate ~1pps):
>
> send request: SIP options
> receive a response: 100
> receive a response: 200
>
> In erlang it's implemented like gen_server which receives messages, the sockets options are: binary, {active, once}.
>
> Every time, when the message is received from the mailbox, code try to do some job, after that try to handle message bursts by calling in the loop (by example 5 times):
> gen_udp:recv(Socket, 0, 0), bursts and after that set the socket option {active, once} back.
>
> So, here I'm observing floating strange behavior, from time to time the second!! and always the second!! response is absent. I see this UDP missing packets in tcpdump, I don't see any drops in UDP stack at all.
>
> When I'm trying to add some logs into my erlang code, the problem disappears, when I'm run strace to recvfrom syscall the problem disappears... I't try to debug VM with gdb and make some breakpoints on sock_recvfrom in inet_drv.c (it's OTP-21.3), but the problem disappears (((
>
> I can't find the problem root cause, is it in my code or is it in VM? Maybe someone could help me with that?
>
> --
> Петровский Александр / Alexander Petrovsky,
>
> Skype: askjuise
> Phone: +7 931 9877991
>


--
Best regards,
Paul Peregud
+48602112091
Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Alexander Petrovsky-2
Paul, thank you for your quick response, it seems like I'm was not accurate enough in my previous letter )

I do not wonder about drops, I know that drops could occur in buffers in network interface, in kernel buffers and also in buffers in VM. In my case there are not drops in kernel space accordingly to netstat -Sss -anumpcat /proc/net/snmp | grep Udp, cat /proc/net/udp; also it could be the bug in my code, which I can't find by some reason ))

Please, pay attention that after I'm trying to debug this behavior with strace/gdb the problem has gone. I'm interesting, how could I find/observer is there any drops in VM buffers or so... ?

вт, 4 февр. 2020 г. в 16:07, Paul Peregud <[hidden email]>:
AFAIK UDP packet can be dropped by anyone. By intermediate nodes and
by your kernel. I would just assume that they can be dropped by your
BEAM VM too. I also don't think that "it's always the second packet"
behavior is strange. Play around with kernel buffers sizes and you
will likely observe some change.

Assumption that UDP packet that was sent will be delivered is wrong.

On Tue, Feb 4, 2020 at 8:32 AM Alexander Petrovsky <[hidden email]> wrote:
>
> Hello!
>
> The first of all, my erlang version OTP-21.3 (without any minor patches)
>
> I've got the strange floating problem with UDP to receive, which I couldn't debug by myself. Let me describe briefly what I have: There is UDP socket which used for send packets to many peers and receives responses from them (rate ~1pps):
>
> send request: SIP options
> receive a response: 100
> receive a response: 200
>
> In erlang it's implemented like gen_server which receives messages, the sockets options are: binary, {active, once}.
>
> Every time, when the message is received from the mailbox, code try to do some job, after that try to handle message bursts by calling in the loop (by example 5 times):
> gen_udp:recv(Socket, 0, 0), bursts and after that set the socket option {active, once} back.
>
> So, here I'm observing floating strange behavior, from time to time the second!! and always the second!! response is absent. I see this UDP missing packets in tcpdump, I don't see any drops in UDP stack at all.
>
> When I'm trying to add some logs into my erlang code, the problem disappears, when I'm run strace to recvfrom syscall the problem disappears... I't try to debug VM with gdb and make some breakpoints on sock_recvfrom in inet_drv.c (it's OTP-21.3), but the problem disappears (((
>
> I can't find the problem root cause, is it in my code or is it in VM? Maybe someone could help me with that?
>
> --
> Петровский Александр / Alexander Petrovsky,
>
> Skype: askjuise
> Phone: +7 931 9877991
>


--
Best regards,
Paul Peregud
+48602112091


--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991

Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Andreas Schultz-3
Am Di., 4. Feb. 2020 um 14:35 Uhr schrieb Alexander Petrovsky <[hidden email]>:
Paul, thank you for your quick response, it seems like I'm was not accurate enough in my previous letter )

I do not wonder about drops, I know that drops could occur in buffers in network interface, in kernel buffers and also in buffers in VM. In my case there are not drops in kernel space accordingly to netstat -Sss -anumpcat /proc/net/snmp | grep Udp, cat /proc/net/udp; also it could be the bug in my code, which I can't find by some reason ))

Please, pay attention that after I'm trying to debug this behavior with strace/gdb the problem has gone. I'm interesting, how could I find/observer is there any drops in VM buffers or so... ?

One thing you could try is to use systemtap instead of strace/gdb to monitor the UDP socket. It has much less impact on runtime behavior and might provide a clue as to whether the data is actually read from socket.

Also, have you tried using the new socket module instead of gen_udp?

Andreas


вт, 4 февр. 2020 г. в 16:07, Paul Peregud <[hidden email]>:
AFAIK UDP packet can be dropped by anyone. By intermediate nodes and
by your kernel. I would just assume that they can be dropped by your
BEAM VM too. I also don't think that "it's always the second packet"
behavior is strange. Play around with kernel buffers sizes and you
will likely observe some change.

Assumption that UDP packet that was sent will be delivered is wrong.

On Tue, Feb 4, 2020 at 8:32 AM Alexander Petrovsky <[hidden email]> wrote:
>
> Hello!
>
> The first of all, my erlang version OTP-21.3 (without any minor patches)
>
> I've got the strange floating problem with UDP to receive, which I couldn't debug by myself. Let me describe briefly what I have: There is UDP socket which used for send packets to many peers and receives responses from them (rate ~1pps):
>
> send request: SIP options
> receive a response: 100
> receive a response: 200
>
> In erlang it's implemented like gen_server which receives messages, the sockets options are: binary, {active, once}.
>
> Every time, when the message is received from the mailbox, code try to do some job, after that try to handle message bursts by calling in the loop (by example 5 times):
> gen_udp:recv(Socket, 0, 0), bursts and after that set the socket option {active, once} back.
>
> So, here I'm observing floating strange behavior, from time to time the second!! and always the second!! response is absent. I see this UDP missing packets in tcpdump, I don't see any drops in UDP stack at all.
>
> When I'm trying to add some logs into my erlang code, the problem disappears, when I'm run strace to recvfrom syscall the problem disappears... I't try to debug VM with gdb and make some breakpoints on sock_recvfrom in inet_drv.c (it's OTP-21.3), but the problem disappears (((
>
> I can't find the problem root cause, is it in my code or is it in VM? Maybe someone could help me with that?
>
> --
> Петровский Александр / Alexander Petrovsky,
>
> Skype: askjuise
> Phone: +7 931 9877991
>


--
Best regards,
Paul Peregud
+48602112091


--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991



--

Andreas Schultz

-- 

Principal Engineer

t: +49 391 819099-224

------------------------------- enabling your networks -----------------------------

Travelping GmbH 
Roentgenstraße 13
39108 Magdeburg
Germany

t: +49 391 819099-0
f: +49 391 819099-299

e: [hidden email]
w: https://www.travelping.com/

Company registration: Amtsgericht Stendal 
Geschaeftsfuehrer: Holger Winkelmann
Reg. No.: HRB 10578
VAT ID: DE236673780
Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Valentin Micic-6
In reply to this post by Alexander Petrovsky-2

On 04 Feb 2020, at 10:31, Alexander Petrovsky <[hidden email]> wrote:

Hello!

The first of all, my erlang version OTP-21.3 (without any minor patches)

I've got the strange floating problem with UDP to receive, which I couldn't debug by myself. Let me describe briefly what I have: There is UDP socket which used for send packets to many peers and receives responses from them (rate ~1pps):

send request: SIP options
receive a response: 100
receive a response: 200

In erlang it's implemented like gen_server which receives messages, the sockets options are: binary, {active, once}.

Every time, when the message is received from the mailbox, code try to do some job, after that try to handle message bursts by calling in the loop (by example 5 times):
gen_udp:recv(Socket, 0, 0), bursts and after that set the socket option {active, once} back.

So, here I'm observing floating strange behavior, from time to time the second!! and always the second!! response is absent. I see this UDP missing packets in tcpdump, I don't see any drops in UDP stack at all.

When I'm trying to add some logs into my erlang code, the problem disappears, when I'm run strace to recvfrom syscall the problem disappears... I't try to debug VM with gdb and make some breakpoints on sock_recvfrom in inet_drv.c (it's OTP-21.3), but the problem disappears (((

I can't find the problem root cause, is it in my code or is it in VM? Maybe someone could help me with that? 

--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991


So, I suppose this is related to SIP communication, and I’ll go out on a limb here…

Considering that SIP message, when transported using UDP, must fit into a single UDP message, then, your statement that second and only second response packet is missing, may imply that you are sending response using two UDP packets.

(A) AFAIK, SIP over UDP does not have any continuation, hence, must fit into a single packet.
If you (by any chance) send SIP message over multiple UDP packets, this shouldl not work — saying this just in case.

(B) OTOH, if tcpdump reveals UDP fragmentation (where a single UDP datagram is fragmented over multiple IP packets), then maybe your UDP buffer it to small to assemble the whole message, in which case message might be truncated, and your SIP parser may see it as the part of the message (second!! and always second!!), that’s gone missing.

(C) Finally, when your server processing is too slow for the size of the buffer, then, there’s a chance that receive buffer may get full, in which case it would drop the packet that cannot be added to the receive buffer.

However, judging from your “second!! and always second!!” statement, I’d say that (A) or (B), are more likely, barring something completely different (e.g some other bug, perhaps ;-))

Kind regards

V/


.


Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Alexander Petrovsky-2
In reply to this post by Andreas Schultz-3


вт, 4 февр. 2020 г. в 16:46, Andreas Schultz <[hidden email]>:
Am Di., 4. Feb. 2020 um 14:35 Uhr schrieb Alexander Petrovsky <[hidden email]>:
Paul, thank you for your quick response, it seems like I'm was not accurate enough in my previous letter )

I do not wonder about drops, I know that drops could occur in buffers in network interface, in kernel buffers and also in buffers in VM. In my case there are not drops in kernel space accordingly to netstat -Sss -anumpcat /proc/net/snmp | grep Udp, cat /proc/net/udp; also it could be the bug in my code, which I can't find by some reason ))

Please, pay attention that after I'm trying to debug this behavior with strace/gdb the problem has gone. I'm interesting, how could I find/observer is there any drops in VM buffers or so... ?

One thing you could try is to use systemtap instead of strace/gdb to monitor the UDP socket. It has much less impact on runtime behavior and might provide a clue as to whether the data is actually read from socket.

Also, have you tried using the new socket module instead of gen_udp?

Andreas


вт, 4 февр. 2020 г. в 16:07, Paul Peregud <[hidden email]>:
AFAIK UDP packet can be dropped by anyone. By intermediate nodes and
by your kernel. I would just assume that they can be dropped by your
BEAM VM too. I also don't think that "it's always the second packet"
behavior is strange. Play around with kernel buffers sizes and you
will likely observe some change.

Assumption that UDP packet that was sent will be delivered is wrong.

On Tue, Feb 4, 2020 at 8:32 AM Alexander Petrovsky <[hidden email]> wrote:
>
> Hello!
>
> The first of all, my erlang version OTP-21.3 (without any minor patches)
>
> I've got the strange floating problem with UDP to receive, which I couldn't debug by myself. Let me describe briefly what I have: There is UDP socket which used for send packets to many peers and receives responses from them (rate ~1pps):
>
> send request: SIP options
> receive a response: 100
> receive a response: 200
>
> In erlang it's implemented like gen_server which receives messages, the sockets options are: binary, {active, once}.
>
> Every time, when the message is received from the mailbox, code try to do some job, after that try to handle message bursts by calling in the loop (by example 5 times):
> gen_udp:recv(Socket, 0, 0), bursts and after that set the socket option {active, once} back.
>
> So, here I'm observing floating strange behavior, from time to time the second!! and always the second!! response is absent. I see this UDP missing packets in tcpdump, I don't see any drops in UDP stack at all.
>
> When I'm trying to add some logs into my erlang code, the problem disappears, when I'm run strace to recvfrom syscall the problem disappears... I't try to debug VM with gdb and make some breakpoints on sock_recvfrom in inet_drv.c (it's OTP-21.3), but the problem disappears (((
>
> I can't find the problem root cause, is it in my code or is it in VM? Maybe someone could help me with that?
>
> --
> Петровский Александр / Alexander Petrovsky,
>
> Skype: askjuise
> Phone: +7 931 9877991
>


--
Best regards,
Paul Peregud
+48602112091


--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991



--

Andreas Schultz

-- 

Principal Engineer

t: +49 391 819099-224

------------------------------- enabling your networks -----------------------------

Travelping GmbH 
Roentgenstraße 13
39108 Magdeburg
Germany

t: +49 391 819099-0
f: +49 391 819099-299

e: [hidden email]
w: https://www.travelping.com/

Company registration: Amtsgericht Stendal 
Geschaeftsfuehrer: Holger Winkelmann
Reg. No.: HRB 10578
VAT ID: DE236673780


> One thing you could try is to use systemtap instead of strace/gdb to monitor the UDP socket. It has much less impact on runtime behavior and might provide a clue as to whether the data is actually read from the socket.

Good point about systemtap, I will try it, thanks...

> Also, have you tried using the new socket module instead of gen_udp?

Nope, and in my mind it's still in beta and guesses it's doest for production.



--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991

Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Alexander Petrovsky-2
In reply to this post by Valentin Micic-6


вт, 4 февр. 2020 г. в 16:52, Valentin Micic <[hidden email]>:

On 04 Feb 2020, at 10:31, Alexander Petrovsky <[hidden email]> wrote:

Hello!

The first of all, my erlang version OTP-21.3 (without any minor patches)

I've got the strange floating problem with UDP to receive, which I couldn't debug by myself. Let me describe briefly what I have: There is UDP socket which used for send packets to many peers and receives responses from them (rate ~1pps):

send request: SIP options
receive a response: 100
receive a response: 200

In erlang it's implemented like gen_server which receives messages, the sockets options are: binary, {active, once}.

Every time, when the message is received from the mailbox, code try to do some job, after that try to handle message bursts by calling in the loop (by example 5 times):
gen_udp:recv(Socket, 0, 0), bursts and after that set the socket option {active, once} back.

So, here I'm observing floating strange behavior, from time to time the second!! and always the second!! response is absent. I see this UDP missing packets in tcpdump, I don't see any drops in UDP stack at all.

When I'm trying to add some logs into my erlang code, the problem disappears, when I'm run strace to recvfrom syscall the problem disappears... I't try to debug VM with gdb and make some breakpoints on sock_recvfrom in inet_drv.c (it's OTP-21.3), but the problem disappears (((

I can't find the problem root cause, is it in my code or is it in VM? Maybe someone could help me with that? 

--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991


So, I suppose this is related to SIP communication, and I’ll go out on a limb here…

Considering that SIP message, when transported using UDP, must fit into a single UDP message, then, your statement that second and only second response packet is missing, may imply that you are sending response using two UDP packets.

(A) AFAIK, SIP over UDP does not have any continuation, hence, must fit into a single packet.
If you (by any chance) send SIP message over multiple UDP packets, this shouldl not work — saying this just in case.

(B) OTOH, if tcpdump reveals UDP fragmentation (where a single UDP datagram is fragmented over multiple IP packets), then maybe your UDP buffer it to small to assemble the whole message, in which case message might be truncated, and your SIP parser may see it as the part of the message (second!! and always second!!), that’s gone missing.

(C) Finally, when your server processing is too slow for the size of the buffer, then, there’s a chance that receive buffer may get full, in which case it would drop the packet that cannot be added to the receive buffer.

However, judging from your “second!! and always second!!” statement, I’d say that (A) or (B), are more likely, barring something completely different (e.g some other bug, perhaps ;-))

Kind regards

V/


.



Accordingly to tcpdump the size on packets from 300 to 400 bytes... also, I've double-checked, there are no fragmented packets at all, my ss -anump for beam shows:

skmem:(r0,rb16384,t0,tb212992,f4096,w0,o0,bl0,d0)

where:
 - rb16384 - rx buffer;
 - tb212992 - tx buffer;
 - d0 - drop counter.

So, I guess it's not an (A) nor (B). My internal erlang buffer {buffer, Size} (https://erlang.org/doc/man/inet.html) is 8192. It's could be (C), but the incoming packet rate is really low. Also, it's not a SIP parser for messages received over UDP, due to I don't see and packets in my receive process...

--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991

Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Valentin Micic-6


On 04 Feb 2020, at 16:53, Alexander Petrovsky <[hidden email]> wrote:



вт, 4 февр. 2020 г. в 16:52, Valentin Micic <[hidden email]>:

On 04 Feb 2020, at 10:31, Alexander Petrovsky <[hidden email]> wrote:

Hello!

The first of all, my erlang version OTP-21.3 (without any minor patches)

I've got the strange floating problem with UDP to receive, which I couldn't debug by myself. Let me describe briefly what I have: There is UDP socket which used for send packets to many peers and receives responses from them (rate ~1pps):

send request: SIP options
receive a response: 100
receive a response: 200

In erlang it's implemented like gen_server which receives messages, the sockets options are: binary, {active, once}.

Every time, when the message is received from the mailbox, code try to do some job, after that try to handle message bursts by calling in the loop (by example 5 times):
gen_udp:recv(Socket, 0, 0), bursts and after that set the socket option {active, once} back.

So, here I'm observing floating strange behavior, from time to time the second!! and always the second!! response is absent. I see this UDP missing packets in tcpdump, I don't see any drops in UDP stack at all.

When I'm trying to add some logs into my erlang code, the problem disappears, when I'm run strace to recvfrom syscall the problem disappears... I't try to debug VM with gdb and make some breakpoints on sock_recvfrom in inet_drv.c (it's OTP-21.3), but the problem disappears (((

I can't find the problem root cause, is it in my code or is it in VM? Maybe someone could help me with that? 

--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991


So, I suppose this is related to SIP communication, and I’ll go out on a limb here…

Considering that SIP message, when transported using UDP, must fit into a single UDP message, then, your statement that second and only second response packet is missing, may imply that you are sending response using two UDP packets.

(A) AFAIK, SIP over UDP does not have any continuation, hence, must fit into a single packet.
If you (by any chance) send SIP message over multiple UDP packets, this shouldl not work — saying this just in case.

(B) OTOH, if tcpdump reveals UDP fragmentation (where a single UDP datagram is fragmented over multiple IP packets), then maybe your UDP buffer it to small to assemble the whole message, in which case message might be truncated, and your SIP parser may see it as the part of the message (second!! and always second!!), that’s gone missing.

(C) Finally, when your server processing is too slow for the size of the buffer, then, there’s a chance that receive buffer may get full, in which case it would drop the packet that cannot be added to the receive buffer.

However, judging from your “second!! and always second!!” statement, I’d say that (A) or (B), are more likely, barring something completely different (e.g some other bug, perhaps ;-))

Kind regards

V/


.



Accordingly to tcpdump the size on packets from 300 to 400 bytes... also, I've double-checked, there are no fragmented packets at all, my ss -anump for beam shows:

skmem:(r0,rb16384,t0,tb212992,f4096,w0,o0,bl0,d0)

where:
 - rb16384 - rx buffer;
 - tb212992 - tx buffer;
 - d0 - drop counter.

So, I guess it's not an (A) nor (B). My internal erlang buffer {buffer, Size} (https://erlang.org/doc/man/inet.html) is 8192. It's could be (C), but the incoming packet rate is really low. Also, it's not a SIP parser for messages received over UDP, due to I don't see and packets in my receive process...

--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991


We (and by that I include myself as well) have been using {active, once}, followed by gen_udp:recv/3 construct quite often, and I do not recall ever loosing a packet when rates were less than, say,  80,000 UDP messages/second — provided that the server processing  is fast enough (but that's a different conversation).

Transition between {active, once} and gen_udp:recv/3 will *never* loose any packets, as the packet will be in the UDP buffer until recalled by gen_udp:recv or another {active, once} 

Maybe you should  consider a situation when the second packet is *not* in the buffer yet (e.g. enroute).  In that case gen_udp:recv(sock, 0, 0 ) will fail with {error, timeout}, and you’ll pick up the “second” UDP message only next time when you call {active once}.

So, why don’t you go and change this timeout value to something other than zero, say, 1000 (one second) just to see if the behaviour changes in any way. Then you could reduce it to something smaller… say 10, or even 5 ms.


V/

Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Alexander Petrovsky-2


вт, 4 февр. 2020 г. в 21:07, Valentin Micic <[hidden email]>:


On 04 Feb 2020, at 16:53, Alexander Petrovsky <[hidden email]> wrote:



вт, 4 февр. 2020 г. в 16:52, Valentin Micic <[hidden email]>:

On 04 Feb 2020, at 10:31, Alexander Petrovsky <[hidden email]> wrote:

Hello!

The first of all, my erlang version OTP-21.3 (without any minor patches)

I've got the strange floating problem with UDP to receive, which I couldn't debug by myself. Let me describe briefly what I have: There is UDP socket which used for send packets to many peers and receives responses from them (rate ~1pps):

send request: SIP options
receive a response: 100
receive a response: 200

In erlang it's implemented like gen_server which receives messages, the sockets options are: binary, {active, once}.

Every time, when the message is received from the mailbox, code try to do some job, after that try to handle message bursts by calling in the loop (by example 5 times):
gen_udp:recv(Socket, 0, 0), bursts and after that set the socket option {active, once} back.

So, here I'm observing floating strange behavior, from time to time the second!! and always the second!! response is absent. I see this UDP missing packets in tcpdump, I don't see any drops in UDP stack at all.

When I'm trying to add some logs into my erlang code, the problem disappears, when I'm run strace to recvfrom syscall the problem disappears... I't try to debug VM with gdb and make some breakpoints on sock_recvfrom in inet_drv.c (it's OTP-21.3), but the problem disappears (((

I can't find the problem root cause, is it in my code or is it in VM? Maybe someone could help me with that? 

--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991


So, I suppose this is related to SIP communication, and I’ll go out on a limb here…

Considering that SIP message, when transported using UDP, must fit into a single UDP message, then, your statement that second and only second response packet is missing, may imply that you are sending response using two UDP packets.

(A) AFAIK, SIP over UDP does not have any continuation, hence, must fit into a single packet.
If you (by any chance) send SIP message over multiple UDP packets, this shouldl not work — saying this just in case.

(B) OTOH, if tcpdump reveals UDP fragmentation (where a single UDP datagram is fragmented over multiple IP packets), then maybe your UDP buffer it to small to assemble the whole message, in which case message might be truncated, and your SIP parser may see it as the part of the message (second!! and always second!!), that’s gone missing.

(C) Finally, when your server processing is too slow for the size of the buffer, then, there’s a chance that receive buffer may get full, in which case it would drop the packet that cannot be added to the receive buffer.

However, judging from your “second!! and always second!!” statement, I’d say that (A) or (B), are more likely, barring something completely different (e.g some other bug, perhaps ;-))

Kind regards

V/


.



Accordingly to tcpdump the size on packets from 300 to 400 bytes... also, I've double-checked, there are no fragmented packets at all, my ss -anump for beam shows:

skmem:(r0,rb16384,t0,tb212992,f4096,w0,o0,bl0,d0)

where:
 - rb16384 - rx buffer;
 - tb212992 - tx buffer;
 - d0 - drop counter.

So, I guess it's not an (A) nor (B). My internal erlang buffer {buffer, Size} (https://erlang.org/doc/man/inet.html) is 8192. It's could be (C), but the incoming packet rate is really low. Also, it's not a SIP parser for messages received over UDP, due to I don't see and packets in my receive process...

--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991


We (and by that I include myself as well) have been using {active, once}, followed by gen_udp:recv/3 construct quite often, and I do not recall ever loosing a packet when rates were less than, say,  80,000 UDP messages/second — provided that the server processing  is fast enough (but that's a different conversation).

Transition between {active, once} and gen_udp:recv/3 will *never* loose any packets, as the packet will be in the UDP buffer until recalled by gen_udp:recv or another {active, once} 

Maybe you should  consider a situation when the second packet is *not* in the buffer yet (e.g. enroute).  In that case gen_udp:recv(sock, 0, 0 ) will fail with {error, timeout}, and you’ll pick up the “second” UDP message only next time when you call {active once}.

So, why don’t you go and change this timeout value to something other than zero, say, 1000 (one second) just to see if the behaviour changes in any way. Then you could reduce it to something smaller… say 10, or even 5 ms.


V/


Using systemtap, thanks to Andreas Schultz for a good idea, I find out, that all UDP packets are read by erlang VM from Linux kernel buffers without any losses.

Maybe you should  consider a situation when the second packet is *not* in the buffer yet (e.g. enroute).  In that case gen_udp:recv(sock, 0, 0 ) will fail with {error, timeout}, and you’ll pick up the “second” UDP message only next time when you call {active once}.

Yeah, it seems like you are right, the problem in pair {active, once} and gen_udp:recv(sock, 0, 0). After packet delivered and read from the mailbox, gen_udp:recv(sock, 0, 0) is called (socket state is {active, false}) and it will fail with {error, timeout}. It's doesn't matter how many times I'll call gen_udp:recv(sock, 0, 0) one time, one million times, or one million times with 1-second interval - the result always will be {error, timeout}. But if I'll call gen_udp:recv(sock, 0, 1) (timeout is 1ms by example) the result will be {error, timeout} too, but! after that, each of N calls of gen_udp:recv(sock, 0, 0) (pay attention, the timeout is 0ms) will gives UDP packet.

So, what I found next: I create UDP socket and set the {active, once} on it, after the message is delivered, just call gen_udp:recv(sock, 0, 0) by hand several times and see what happens in systemtap and in gdb.

a) the data evenly comes into UDP socket from network...
b) socket in state {active, once} and systemtap shows me that VM thread read data from the socket (into the internal buffer) with recvfrom syscall "in the loop";
c) socket in state {active, false} and  systemtap shows me that VM thread read data from the socket (into the internal buffer) with recvfrom syscall "in the loop";
d) call gen_udp:recv(sock, 0, 0) several times by hand, got {error, timeout}systemtap still shows me that VM thread read data from the socket (into the internal buffer) with recvfrom syscall;
e) call gen_udp:recv(sock, 0, 0) with gdb breakpoint on https://github.com/erlang/otp/blob/OTP-21.3/erts/emulator/drivers/common/inet_drv.c#L12528, I see that call VM try to read from socket (in my mind it's strange, there is data in the internal buffer...), recvfrom syscall return -1, the erlang code got {error, timeout};
f) systemtap still shows me that VM thread read data from the socket (into the internal buffer) with recvfrom syscall "in the loop";

g) call gen_udp:recv(sock, 0, 1) with gdb breakpoint once again, the recvfrom syscall return -1, but in this case VM calls https://github.com/erlang/otp/blob/OTP-21.3/erts/emulator/drivers/common/inet_drv.c#L12533 and thread don't read data from the socket "in the loop" anymore. Next N calls gen_udp:recv(sock, 0, 0) will return data, and each will call recvfrom syscall. At this point, just for note, if incoming rate it more than reading rate, I see UDP packet drops in the kernel buffers...
h) when N+1...N+M call of gen_udp:recv(sock, 0, 0) returns {error, timeout}, VM thread start read data from the socket "in the loop" once again...

So, this behaviour (strange in my mind) shed the light a little bit on my missing UDP packets... I'm still wondering, why gen_udp:recv(sock, 0, 1) moves socket into "manual control", why gen_udp:recv(sock, 0, 1) don't read the data from the internal buffer?

--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991

Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Alexander Petrovsky-2
From: Valentin Micic

I’ve run a couple of tests on my own, and managed to reproduce your “second! and only second!!” packet problem.

It happens under following circumstances:

1) Socket (Rx) is instructed to provide a single packet as it arrives ( as per {active, once} );


2) After receiving this packet, a call to gen_udp:recv( Rx, 0, 0 ) is executed before another packet has ARRIVED to the kernel buffer;

Then the following diagram illustrates the problem:

Note:
    - Tx is sending socket;
    - Rx is receiving socket;

Please note that sequence diagrams below are not completely accurate, but it is reasonably illustrative of what may have happened.



The next sequence diagram depicts a situation when a non-zero value for gen_udp:recv/3, e.g. gen_udp:recv( Rx, 0, 10 ):




Thus, it appears that gen_udp:recv( Socket, Length, Timeout ) is broken when Timeout is set to 0.
For all positive non-zero values for Timeout, it behaves properly (or as intuitively expected).

Therefore, this should either be fixed, or it should be documented in order to indicate that 0 must *not* be used as a value for the Timeout in gen_udp:recv/3 call.

Aside, I fail to see a benefit of using Timeout of value 0 when combined with {active, once}. The only time when zero value may appear to be useful is when server does need to process other messages, e.g. messages not originated by the socket, which would be the case when gen_server is used to support the socket. In my view (which is supported by substantial empirical evidence), gen_server is not the most optimal behaviour to use when implementing support for sockets.

Kind regards

V/


So, seems like there is some buggy behavior in the erlang VM for reading UDP messages between switching from {active, once} to gen_udp:recv(socket, 0, 0)? Lukas, Raimo could you help with that?

--
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Phone: +7 931 9877991

Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Raimo Niskanen-11
I can reproduce it on a month old 'master'.
We'll take a look at it...

Thank you for a pinpointed bug report!
/ Raimo


On Fri, Feb 07, 2020 at 04:50:54PM +0300, Alexander Petrovsky wrote:

> >
> > From: Valentin Micic
> >
> > I’ve run a couple of tests on my own, and managed to reproduce your
> > “second! and only second!!” packet problem.
> >
> > It happens under following circumstances:
> >
> > 1) Socket (Rx) is instructed to provide a single packet as it arrives ( as
> > per {active, once} );
> >
> >
> > 2) After receiving this packet, a call to gen_udp:recv( Rx, 0, 0 ) is
> > executed before another packet has ARRIVED to the kernel buffer;
> >
> > Then the following diagram illustrates the problem:
> >
> > Note:
> >     - Tx is sending socket;
> >     - Rx is receiving socket;
> >
> > *Please note that sequence diagrams below are not completely accurate, but
> > it is reasonably illustrative of what may have happened.*
> >
> >
> >
> > The next sequence diagram depicts a situation when a non-zero value for
> > gen_udp:recv/3, e.g. gen_udp:recv( Rx, 0, *10* ):
> >
> >
> >
> >
> > Thus, it appears that gen_udp:recv( *Socket, Length*, *Timeout* ) is
> > broken when *Timeout* is set to *0*.
> > For all positive non-zero values for Timeout, it behaves properly (or as
> > intuitively expected).
> >
> > Therefore, this should either be fixed, or it should be documented in
> > order to indicate that 0 must *not* be used as a value for the Timeout in
> > gen_udp:recv/3 call.
> >
> > Aside, I fail to see a benefit of using Timeout of value 0 when combined
> > with {active, once}. The only time when zero value may appear to be useful
> > is when server does need to process other messages, e.g. messages not
> > originated by the socket, which would be the case when gen_server is used
> > to support the socket. In my view (which is supported by substantial
> > empirical evidence), gen_server is not the most optimal behaviour to use
> > when implementing support for sockets.
> >
> > Kind regards
> >
> > V/
> >
> >
> So, seems like there is some buggy behavior in the erlang VM for reading
> UDP messages between switching from {active, once} to gen_udp:recv(socket,
> 0, 0)? Lukas, Raimo could you help with that?
>
> --
> Петровский Александр / Alexander Petrovsky,
>
> Skype: askjuise
> Phone: +7 931 9877991




--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Raimo Niskanen-11
On Mon, Feb 10, 2020 at 04:23:37PM +0100, Raimo Niskanen wrote:
> I can reproduce it on a month old 'master'.
> We'll take a look at it...
>
> Thank you for a pinpointed bug report!
> / Raimo

I have a patch that fixes the problem:

@@ -12609,12 +12609,8 @@ static ErlDrvSSizeT packet_inet_ctl(ErlDrvData e,
unsigned int cmd, char* buf,
  return ctl_error(EALREADY, rbuf, rsize);
 
     if (packet_inet_input(udesc, desc->event) == 0) {
- if (timeout == 0)
-    async_error_am(desc, am_timeout);
- else {
-    if (timeout != INET_INFINITY)
- driver_set_timer(desc->port, timeout);
- }
+                if (timeout != INET_INFINITY)
+                    driver_set_timer(desc->port, timeout);
     }
     return ctl_reply(INET_REP_OK, tbuf, 2, rbuf, rsize);
  }

It should be possible to apply to any version.  The code lines I removed
have been around since we entered GitHub.

The shortcut for timeout == 0 was flawed.  Right before this an async event
was enqueued, which is not cleared when taking this short way out, and the
packet_inet_input function had put the socket in the poll set, which the
shortcut also ignored.  So when the UDP packet arrives it is received and
delivery to the caller fails since there is no longer any caller - the caller
was already told {error, timeout}.  The packet gets wasted.

This fix removes the shortcut for timeout == 0.  It would be possible to
instead fix the shortcut, which could improve latency slightly.  With this
fix code sets a timer for timeout == 0, which immediately times out
and then all internal state gets updated as it should.  The same code path
is used for all timeouts including 0.

Again, thank you for the bug report and pinpointing.

I will now create an internal ticket, write a test case, and prepare
for a patch...  We would prefer to just include this fix in the upcoming
OTP 22.3.

Best regards
/ Raimo Niskanen


>
>
> On Fri, Feb 07, 2020 at 04:50:54PM +0300, Alexander Petrovsky wrote:
> > >
> > > From: Valentin Micic
> > >
> > > I’ve run a couple of tests on my own, and managed to reproduce your
> > > “second! and only second!!” packet problem.
> > >
> > > It happens under following circumstances:
> > >
> > > 1) Socket (Rx) is instructed to provide a single packet as it arrives ( as
> > > per {active, once} );
> > >
> > >
> > > 2) After receiving this packet, a call to gen_udp:recv( Rx, 0, 0 ) is
> > > executed before another packet has ARRIVED to the kernel buffer;
> > >
> > > Then the following diagram illustrates the problem:
> > >
> > > Note:
> > >     - Tx is sending socket;
> > >     - Rx is receiving socket;
> > >
> > > *Please note that sequence diagrams below are not completely accurate, but
> > > it is reasonably illustrative of what may have happened.*
> > >
> > >
> > >
> > > The next sequence diagram depicts a situation when a non-zero value for
> > > gen_udp:recv/3, e.g. gen_udp:recv( Rx, 0, *10* ):
> > >
> > >
> > >
> > >
> > > Thus, it appears that gen_udp:recv( *Socket, Length*, *Timeout* ) is
> > > broken when *Timeout* is set to *0*.
> > > For all positive non-zero values for Timeout, it behaves properly (or as
> > > intuitively expected).
> > >
> > > Therefore, this should either be fixed, or it should be documented in
> > > order to indicate that 0 must *not* be used as a value for the Timeout in
> > > gen_udp:recv/3 call.
> > >
> > > Aside, I fail to see a benefit of using Timeout of value 0 when combined
> > > with {active, once}. The only time when zero value may appear to be useful
> > > is when server does need to process other messages, e.g. messages not
> > > originated by the socket, which would be the case when gen_server is used
> > > to support the socket. In my view (which is supported by substantial
> > > empirical evidence), gen_server is not the most optimal behaviour to use
> > > when implementing support for sockets.
> > >
> > > Kind regards
> > >
> > > V/
> > >
> > >
> > So, seems like there is some buggy behavior in the erlang VM for reading
> > UDP messages between switching from {active, once} to gen_udp:recv(socket,
> > 0, 0)? Lukas, Raimo could you help with that?
> >
> > --
> > Петровский Александр / Alexander Petrovsky,
> >
> > Skype: askjuise
> > Phone: +7 931 9877991
>
>
>
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Raimo Niskanen-11
On Wed, Feb 12, 2020 at 02:31:10PM +0300, Alexander Petrovsky wrote:
> Awesome, thanks a lot :)
>
> I'll test this fix and in case of some problems, I will let you know!

Great!

I would apprecite if you could verify also if the fix works,
so I can merge it into ourt maint and master.

/ Raimo


>
> вт, 11 февр. 2020 г. в 21:01, Valentin Micic <[hidden email]>:
>
> > Thank you Raimo, much obliged.
> > I am sure this will make Alexander very happy :-)
> >
> > V/
> >
> > > On 11 Feb 2020, at 17:21, Raimo Niskanen <
> > [hidden email]> wrote:
> > >
> > > On Mon, Feb 10, 2020 at 04:23:37PM +0100, Raimo Niskanen wrote:
> > >> I can reproduce it on a month old 'master'.
> > >> We'll take a look at it...
> > >>
> > >> Thank you for a pinpointed bug report!
> > >> / Raimo
> > >
> > > I have a patch that fixes the problem:
> > >
> > > @@ -12609,12 +12609,8 @@ static ErlDrvSSizeT packet_inet_ctl(ErlDrvData
> > e,
> > > unsigned int cmd, char* buf,
> > >               return ctl_error(EALREADY, rbuf, rsize);
> > >
> > >           if (packet_inet_input(udesc, desc->event) == 0) {
> > > -             if (timeout == 0)
> > > -                 async_error_am(desc, am_timeout);
> > > -             else {
> > > -                 if (timeout != INET_INFINITY)
> > > -                     driver_set_timer(desc->port, timeout);
> > > -             }
> > > +                if (timeout != INET_INFINITY)
> > > +                    driver_set_timer(desc->port, timeout);
> > >           }
> > >           return ctl_reply(INET_REP_OK, tbuf, 2, rbuf, rsize);
> > >       }
> > >
> > > It should be possible to apply to any version.  The code lines I removed
> > > have been around since we entered GitHub.
> > >
> > > The shortcut for timeout == 0 was flawed.  Right before this an async
> > event
> > > was enqueued, which is not cleared when taking this short way out, and
> > the
> > > packet_inet_input function had put the socket in the poll set, which the
> > > shortcut also ignored.  So when the UDP packet arrives it is received and
> > > delivery to the caller fails since there is no longer any caller - the
> > caller
> > > was already told {error, timeout}.  The packet gets wasted.
> > >
> > > This fix removes the shortcut for timeout == 0.  It would be possible to
> > > instead fix the shortcut, which could improve latency slightly.  With
> > this
> > > fix code sets a timer for timeout == 0, which immediately times out
> > > and then all internal state gets updated as it should.  The same code
> > path
> > > is used for all timeouts including 0.
> > >
> > > Again, thank you for the bug report and pinpointing.
> > >
> > > I will now create an internal ticket, write a test case, and prepare
> > > for a patch...  We would prefer to just include this fix in the upcoming
> > > OTP 22.3.
> > >
> > > Best regards
> > > / Raimo Niskanen
> > >
> > >
> > >>
> > >>
> > >> On Fri, Feb 07, 2020 at 04:50:54PM +0300, Alexander Petrovsky wrote:
> > >>>>
> > >>>> From: Valentin Micic
> > >>>>
> > >>>> I’ve run a couple of tests on my own, and managed to reproduce your
> > >>>> “second! and only second!!” packet problem.
> > >>>>
> > >>>> It happens under following circumstances:
> > >>>>
> > >>>> 1) Socket (Rx) is instructed to provide a single packet as it arrives
> > ( as
> > >>>> per {active, once} );
> > >>>>
> > >>>>
> > >>>> 2) After receiving this packet, a call to gen_udp:recv( Rx, 0, 0 ) is
> > >>>> executed before another packet has ARRIVED to the kernel buffer;
> > >>>>
> > >>>> Then the following diagram illustrates the problem:
> > >>>>
> > >>>> Note:
> > >>>>    - Tx is sending socket;
> > >>>>    - Rx is receiving socket;
> > >>>>
> > >>>> *Please note that sequence diagrams below are not completely
> > accurate, but
> > >>>> it is reasonably illustrative of what may have happened.*
> > >>>>
> > >>>>
> > >>>>
> > >>>> The next sequence diagram depicts a situation when a non-zero value
> > for
> > >>>> gen_udp:recv/3, e.g. gen_udp:recv( Rx, 0, *10* ):
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> Thus, it appears that gen_udp:recv( *Socket, Length*, *Timeout* ) is
> > >>>> broken when *Timeout* is set to *0*.
> > >>>> For all positive non-zero values for Timeout, it behaves properly (or
> > as
> > >>>> intuitively expected).
> > >>>>
> > >>>> Therefore, this should either be fixed, or it should be documented in
> > >>>> order to indicate that 0 must *not* be used as a value for the
> > Timeout in
> > >>>> gen_udp:recv/3 call.
> > >>>>
> > >>>> Aside, I fail to see a benefit of using Timeout of value 0 when
> > combined
> > >>>> with {active, once}. The only time when zero value may appear to be
> > useful
> > >>>> is when server does need to process other messages, e.g. messages not
> > >>>> originated by the socket, which would be the case when gen_server is
> > used
> > >>>> to support the socket. In my view (which is supported by substantial
> > >>>> empirical evidence), gen_server is not the most optimal behaviour to
> > use
> > >>>> when implementing support for sockets.
> > >>>>
> > >>>> Kind regards
> > >>>>
> > >>>> V/
> > >>>>
> > >>>>
> > >>> So, seems like there is some buggy behavior in the erlang VM for
> > reading
> > >>> UDP messages between switching from {active, once} to
> > gen_udp:recv(socket,
> > >>> 0, 0)? Lukas, Raimo could you help with that?
> > >>>
> > >>> --
> > >>> Петровский Александр / Alexander Petrovsky,
> > >>>
> > >>> Skype: askjuise
> > >>> Phone: +7 931 9877991
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > >
> > > --
> > >
> > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >
> >
>
> --
> Alexander Petrovsky

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Alexander Petrovsky-2
Hello!

We have tested this fix in the idle state when there is not so much traffic on the UDP socket and also under the load. In both cases, the problem has disappeared, so I guess we could say that the fix works correctly, thanks!

ср, 12 февр. 2020 г. в 17:13, Raimo Niskanen <[hidden email]>:
On Wed, Feb 12, 2020 at 02:31:10PM +0300, Alexander Petrovsky wrote:
> Awesome, thanks a lot :)
>
> I'll test this fix and in case of some problems, I will let you know!

Great!

I would apprecite if you could verify also if the fix works,
so I can merge it into ourt maint and master.

/ Raimo


>
> вт, 11 февр. 2020 г. в 21:01, Valentin Micic <[hidden email]>:
>
> > Thank you Raimo, much obliged.
> > I am sure this will make Alexander very happy :-)
> >
> > V/
> >
> > > On 11 Feb 2020, at 17:21, Raimo Niskanen <
> > [hidden email]> wrote:
> > >
> > > On Mon, Feb 10, 2020 at 04:23:37PM +0100, Raimo Niskanen wrote:
> > >> I can reproduce it on a month old 'master'.
> > >> We'll take a look at it...
> > >>
> > >> Thank you for a pinpointed bug report!
> > >> / Raimo
> > >
> > > I have a patch that fixes the problem:
> > >
> > > @@ -12609,12 +12609,8 @@ static ErlDrvSSizeT packet_inet_ctl(ErlDrvData
> > e,
> > > unsigned int cmd, char* buf,
> > >               return ctl_error(EALREADY, rbuf, rsize);
> > >
> > >           if (packet_inet_input(udesc, desc->event) == 0) {
> > > -             if (timeout == 0)
> > > -                 async_error_am(desc, am_timeout);
> > > -             else {
> > > -                 if (timeout != INET_INFINITY)
> > > -                     driver_set_timer(desc->port, timeout);
> > > -             }
> > > +                if (timeout != INET_INFINITY)
> > > +                    driver_set_timer(desc->port, timeout);
> > >           }
> > >           return ctl_reply(INET_REP_OK, tbuf, 2, rbuf, rsize);
> > >       }
> > >
> > > It should be possible to apply to any version.  The code lines I removed
> > > have been around since we entered GitHub.
> > >
> > > The shortcut for timeout == 0 was flawed.  Right before this an async
> > event
> > > was enqueued, which is not cleared when taking this short way out, and
> > the
> > > packet_inet_input function had put the socket in the poll set, which the
> > > shortcut also ignored.  So when the UDP packet arrives it is received and
> > > delivery to the caller fails since there is no longer any caller - the
> > caller
> > > was already told {error, timeout}.  The packet gets wasted.
> > >
> > > This fix removes the shortcut for timeout == 0.  It would be possible to
> > > instead fix the shortcut, which could improve latency slightly.  With
> > this
> > > fix code sets a timer for timeout == 0, which immediately times out
> > > and then all internal state gets updated as it should.  The same code
> > path
> > > is used for all timeouts including 0.
> > >
> > > Again, thank you for the bug report and pinpointing.
> > >
> > > I will now create an internal ticket, write a test case, and prepare
> > > for a patch...  We would prefer to just include this fix in the upcoming
> > > OTP 22.3.
> > >
> > > Best regards
> > > / Raimo Niskanen
> > >
> > >
> > >>
> > >>
> > >> On Fri, Feb 07, 2020 at 04:50:54PM +0300, Alexander Petrovsky wrote:
> > >>>>
> > >>>> From: Valentin Micic
> > >>>>
> > >>>> I’ve run a couple of tests on my own, and managed to reproduce your
> > >>>> “second! and only second!!” packet problem.
> > >>>>
> > >>>> It happens under following circumstances:
> > >>>>
> > >>>> 1) Socket (Rx) is instructed to provide a single packet as it arrives
> > ( as
> > >>>> per {active, once} );
> > >>>>
> > >>>>
> > >>>> 2) After receiving this packet, a call to gen_udp:recv( Rx, 0, 0 ) is
> > >>>> executed before another packet has ARRIVED to the kernel buffer;
> > >>>>
> > >>>> Then the following diagram illustrates the problem:
> > >>>>
> > >>>> Note:
> > >>>>    - Tx is sending socket;
> > >>>>    - Rx is receiving socket;
> > >>>>
> > >>>> *Please note that sequence diagrams below are not completely
> > accurate, but
> > >>>> it is reasonably illustrative of what may have happened.*
> > >>>>
> > >>>>
> > >>>>
> > >>>> The next sequence diagram depicts a situation when a non-zero value
> > for
> > >>>> gen_udp:recv/3, e.g. gen_udp:recv( Rx, 0, *10* ):
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> Thus, it appears that gen_udp:recv( *Socket, Length*, *Timeout* ) is
> > >>>> broken when *Timeout* is set to *0*.
> > >>>> For all positive non-zero values for Timeout, it behaves properly (or
> > as
> > >>>> intuitively expected).
> > >>>>
> > >>>> Therefore, this should either be fixed, or it should be documented in
> > >>>> order to indicate that 0 must *not* be used as a value for the
> > Timeout in
> > >>>> gen_udp:recv/3 call.
> > >>>>
> > >>>> Aside, I fail to see a benefit of using Timeout of value 0 when
> > combined
> > >>>> with {active, once}. The only time when zero value may appear to be
> > useful
> > >>>> is when server does need to process other messages, e.g. messages not
> > >>>> originated by the socket, which would be the case when gen_server is
> > used
> > >>>> to support the socket. In my view (which is supported by substantial
> > >>>> empirical evidence), gen_server is not the most optimal behaviour to
> > use
> > >>>> when implementing support for sockets.
> > >>>>
> > >>>> Kind regards
> > >>>>
> > >>>> V/
> > >>>>
> > >>>>
> > >>> So, seems like there is some buggy behavior in the erlang VM for
> > reading
> > >>> UDP messages between switching from {active, once} to
> > gen_udp:recv(socket,
> > >>> 0, 0)? Lukas, Raimo could you help with that?
> > >>>
> > >>> --
> > >>> Петровский Александр / Alexander Petrovsky,
> > >>>
> > >>> Skype: askjuise
> > >>> Phone: +7 931 9877991
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > >
> > > --
> > >
> > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >
> >
>
> --
> Alexander Petrovsky

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


--
Alexander Petrovsky
Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Raimo Niskanen-11
On Fri, Feb 14, 2020 at 04:23:15PM +0300, Alexander Petrovsky wrote:
> Hello!
>
> We have tested this fix in the idle state when there is not so much traffic
> on the UDP socket and also under the load. In both cases, the problem has
> disappeared, so I guess we could say that the fix works correctly, thanks!

Good to know.  Thank you for informing!
/ Raimo


>
> ср, 12 февр. 2020 г. в 17:13, Raimo Niskanen <
> [hidden email]>:
>
> > On Wed, Feb 12, 2020 at 02:31:10PM +0300, Alexander Petrovsky wrote:
> > > Awesome, thanks a lot :)
> > >
> > > I'll test this fix and in case of some problems, I will let you know!
> >
> > Great!
> >
> > I would apprecite if you could verify also if the fix works,
> > so I can merge it into ourt maint and master.
> >
> > / Raimo
> >
> >
> > >
> > > вт, 11 февр. 2020 г. в 21:01, Valentin Micic <[hidden email]>:
> > >
> > > > Thank you Raimo, much obliged.
> > > > I am sure this will make Alexander very happy :-)
> > > >
> > > > V/
> > > >
> > > > > On 11 Feb 2020, at 17:21, Raimo Niskanen <
> > > > [hidden email]> wrote:
> > > > >
> > > > > On Mon, Feb 10, 2020 at 04:23:37PM +0100, Raimo Niskanen wrote:
> > > > >> I can reproduce it on a month old 'master'.
> > > > >> We'll take a look at it...
> > > > >>
> > > > >> Thank you for a pinpointed bug report!
> > > > >> / Raimo
> > > > >
> > > > > I have a patch that fixes the problem:
> > > > >
> > > > > @@ -12609,12 +12609,8 @@ static ErlDrvSSizeT
> > packet_inet_ctl(ErlDrvData
> > > > e,
> > > > > unsigned int cmd, char* buf,
> > > > >               return ctl_error(EALREADY, rbuf, rsize);
> > > > >
> > > > >           if (packet_inet_input(udesc, desc->event) == 0) {
> > > > > -             if (timeout == 0)
> > > > > -                 async_error_am(desc, am_timeout);
> > > > > -             else {
> > > > > -                 if (timeout != INET_INFINITY)
> > > > > -                     driver_set_timer(desc->port, timeout);
> > > > > -             }
> > > > > +                if (timeout != INET_INFINITY)
> > > > > +                    driver_set_timer(desc->port, timeout);
> > > > >           }
> > > > >           return ctl_reply(INET_REP_OK, tbuf, 2, rbuf, rsize);
> > > > >       }
> > > > >
> > > > > It should be possible to apply to any version.  The code lines I
> > removed
> > > > > have been around since we entered GitHub.
> > > > >
> > > > > The shortcut for timeout == 0 was flawed.  Right before this an async
> > > > event
> > > > > was enqueued, which is not cleared when taking this short way out,
> > and
> > > > the
> > > > > packet_inet_input function had put the socket in the poll set, which
> > the
> > > > > shortcut also ignored.  So when the UDP packet arrives it is
> > received and
> > > > > delivery to the caller fails since there is no longer any caller -
> > the
> > > > caller
> > > > > was already told {error, timeout}.  The packet gets wasted.
> > > > >
> > > > > This fix removes the shortcut for timeout == 0.  It would be
> > possible to
> > > > > instead fix the shortcut, which could improve latency slightly.  With
> > > > this
> > > > > fix code sets a timer for timeout == 0, which immediately times out
> > > > > and then all internal state gets updated as it should.  The same code
> > > > path
> > > > > is used for all timeouts including 0.
> > > > >
> > > > > Again, thank you for the bug report and pinpointing.
> > > > >
> > > > > I will now create an internal ticket, write a test case, and prepare
> > > > > for a patch...  We would prefer to just include this fix in the
> > upcoming
> > > > > OTP 22.3.
> > > > >
> > > > > Best regards
> > > > > / Raimo Niskanen
> > > > >
> > > > >
> > > > >>
> > > > >>
> > > > >> On Fri, Feb 07, 2020 at 04:50:54PM +0300, Alexander Petrovsky wrote:
> > > > >>>>
> > > > >>>> From: Valentin Micic
> > > > >>>>
> > > > >>>> I’ve run a couple of tests on my own, and managed to reproduce
> > your
> > > > >>>> “second! and only second!!” packet problem.
> > > > >>>>
> > > > >>>> It happens under following circumstances:
> > > > >>>>
> > > > >>>> 1) Socket (Rx) is instructed to provide a single packet as it
> > arrives
> > > > ( as
> > > > >>>> per {active, once} );
> > > > >>>>
> > > > >>>>
> > > > >>>> 2) After receiving this packet, a call to gen_udp:recv( Rx, 0, 0
> > ) is
> > > > >>>> executed before another packet has ARRIVED to the kernel buffer;
> > > > >>>>
> > > > >>>> Then the following diagram illustrates the problem:
> > > > >>>>
> > > > >>>> Note:
> > > > >>>>    - Tx is sending socket;
> > > > >>>>    - Rx is receiving socket;
> > > > >>>>
> > > > >>>> *Please note that sequence diagrams below are not completely
> > > > accurate, but
> > > > >>>> it is reasonably illustrative of what may have happened.*
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> The next sequence diagram depicts a situation when a non-zero
> > value
> > > > for
> > > > >>>> gen_udp:recv/3, e.g. gen_udp:recv( Rx, 0, *10* ):
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Thus, it appears that gen_udp:recv( *Socket, Length*, *Timeout* )
> > is
> > > > >>>> broken when *Timeout* is set to *0*.
> > > > >>>> For all positive non-zero values for Timeout, it behaves properly
> > (or
> > > > as
> > > > >>>> intuitively expected).
> > > > >>>>
> > > > >>>> Therefore, this should either be fixed, or it should be
> > documented in
> > > > >>>> order to indicate that 0 must *not* be used as a value for the
> > > > Timeout in
> > > > >>>> gen_udp:recv/3 call.
> > > > >>>>
> > > > >>>> Aside, I fail to see a benefit of using Timeout of value 0 when
> > > > combined
> > > > >>>> with {active, once}. The only time when zero value may appear to
> > be
> > > > useful
> > > > >>>> is when server does need to process other messages, e.g. messages
> > not
> > > > >>>> originated by the socket, which would be the case when gen_server
> > is
> > > > used
> > > > >>>> to support the socket. In my view (which is supported by
> > substantial
> > > > >>>> empirical evidence), gen_server is not the most optimal behaviour
> > to
> > > > use
> > > > >>>> when implementing support for sockets.
> > > > >>>>
> > > > >>>> Kind regards
> > > > >>>>
> > > > >>>> V/
> > > > >>>>
> > > > >>>>
> > > > >>> So, seems like there is some buggy behavior in the erlang VM for
> > > > reading
> > > > >>> UDP messages between switching from {active, once} to
> > > > gen_udp:recv(socket,
> > > > >>> 0, 0)? Lukas, Raimo could you help with that?
> > > > >>>
> > > > >>> --
> > > > >>> Петровский Александр / Alexander Petrovsky,
> > > > >>>
> > > > >>> Skype: askjuise
> > > > >>> Phone: +7 931 9877991
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > > >> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > > >
> > > > > --
> > > > >
> > > > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > >
> > > >
> > >
> > > --
> > > Alexander Petrovsky
> >
> > --
> >
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> >
>
>
> --
> Alexander Petrovsky

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
Reply | Threaded
Open this post in threaded view
|

Re: The missing UDP packets

Raimo Niskanen-11
The patch is now merged to 'maint' and 'master for the upcoming releases
23.0-rc1 and 22.3.

/ Raimo


On Fri, Feb 14, 2020 at 02:33:45PM +0100, Raimo Niskanen wrote:

> On Fri, Feb 14, 2020 at 04:23:15PM +0300, Alexander Petrovsky wrote:
> > Hello!
> >
> > We have tested this fix in the idle state when there is not so much traffic
> > on the UDP socket and also under the load. In both cases, the problem has
> > disappeared, so I guess we could say that the fix works correctly, thanks!
>
> Good to know.  Thank you for informing!
> / Raimo
>
>
> >
> > ср, 12 февр. 2020 г. в 17:13, Raimo Niskanen <
> > [hidden email]>:
> >
> > > On Wed, Feb 12, 2020 at 02:31:10PM +0300, Alexander Petrovsky wrote:
> > > > Awesome, thanks a lot :)
> > > >
> > > > I'll test this fix and in case of some problems, I will let you know!
> > >
> > > Great!
> > >
> > > I would apprecite if you could verify also if the fix works,
> > > so I can merge it into ourt maint and master.
> > >
> > > / Raimo
> > >
> > >
> > > >
> > > > вт, 11 февр. 2020 г. в 21:01, Valentin Micic <[hidden email]>:
> > > >
> > > > > Thank you Raimo, much obliged.
> > > > > I am sure this will make Alexander very happy :-)
> > > > >
> > > > > V/
> > > > >
> > > > > > On 11 Feb 2020, at 17:21, Raimo Niskanen <
> > > > > [hidden email]> wrote:
> > > > > >
> > > > > > On Mon, Feb 10, 2020 at 04:23:37PM +0100, Raimo Niskanen wrote:
> > > > > >> I can reproduce it on a month old 'master'.
> > > > > >> We'll take a look at it...
> > > > > >>
> > > > > >> Thank you for a pinpointed bug report!
> > > > > >> / Raimo
> > > > > >
> > > > > > I have a patch that fixes the problem:
> > > > > >
> > > > > > @@ -12609,12 +12609,8 @@ static ErlDrvSSizeT
> > > packet_inet_ctl(ErlDrvData
> > > > > e,
> > > > > > unsigned int cmd, char* buf,
> > > > > >               return ctl_error(EALREADY, rbuf, rsize);
> > > > > >
> > > > > >           if (packet_inet_input(udesc, desc->event) == 0) {
> > > > > > -             if (timeout == 0)
> > > > > > -                 async_error_am(desc, am_timeout);
> > > > > > -             else {
> > > > > > -                 if (timeout != INET_INFINITY)
> > > > > > -                     driver_set_timer(desc->port, timeout);
> > > > > > -             }
> > > > > > +                if (timeout != INET_INFINITY)
> > > > > > +                    driver_set_timer(desc->port, timeout);
> > > > > >           }
> > > > > >           return ctl_reply(INET_REP_OK, tbuf, 2, rbuf, rsize);
> > > > > >       }
> > > > > >
> > > > > > It should be possible to apply to any version.  The code lines I
> > > removed
> > > > > > have been around since we entered GitHub.
> > > > > >
> > > > > > The shortcut for timeout == 0 was flawed.  Right before this an async
> > > > > event
> > > > > > was enqueued, which is not cleared when taking this short way out,
> > > and
> > > > > the
> > > > > > packet_inet_input function had put the socket in the poll set, which
> > > the
> > > > > > shortcut also ignored.  So when the UDP packet arrives it is
> > > received and
> > > > > > delivery to the caller fails since there is no longer any caller -
> > > the
> > > > > caller
> > > > > > was already told {error, timeout}.  The packet gets wasted.
> > > > > >
> > > > > > This fix removes the shortcut for timeout == 0.  It would be
> > > possible to
> > > > > > instead fix the shortcut, which could improve latency slightly.  With
> > > > > this
> > > > > > fix code sets a timer for timeout == 0, which immediately times out
> > > > > > and then all internal state gets updated as it should.  The same code
> > > > > path
> > > > > > is used for all timeouts including 0.
> > > > > >
> > > > > > Again, thank you for the bug report and pinpointing.
> > > > > >
> > > > > > I will now create an internal ticket, write a test case, and prepare
> > > > > > for a patch...  We would prefer to just include this fix in the
> > > upcoming
> > > > > > OTP 22.3.
> > > > > >
> > > > > > Best regards
> > > > > > / Raimo Niskanen
> > > > > >
> > > > > >
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Feb 07, 2020 at 04:50:54PM +0300, Alexander Petrovsky wrote:
> > > > > >>>>
> > > > > >>>> From: Valentin Micic
> > > > > >>>>
> > > > > >>>> I’ve run a couple of tests on my own, and managed to reproduce
> > > your
> > > > > >>>> “second! and only second!!” packet problem.
> > > > > >>>>
> > > > > >>>> It happens under following circumstances:
> > > > > >>>>
> > > > > >>>> 1) Socket (Rx) is instructed to provide a single packet as it
> > > arrives
> > > > > ( as
> > > > > >>>> per {active, once} );
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> 2) After receiving this packet, a call to gen_udp:recv( Rx, 0, 0
> > > ) is
> > > > > >>>> executed before another packet has ARRIVED to the kernel buffer;
> > > > > >>>>
> > > > > >>>> Then the following diagram illustrates the problem:
> > > > > >>>>
> > > > > >>>> Note:
> > > > > >>>>    - Tx is sending socket;
> > > > > >>>>    - Rx is receiving socket;
> > > > > >>>>
> > > > > >>>> *Please note that sequence diagrams below are not completely
> > > > > accurate, but
> > > > > >>>> it is reasonably illustrative of what may have happened.*
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> The next sequence diagram depicts a situation when a non-zero
> > > value
> > > > > for
> > > > > >>>> gen_udp:recv/3, e.g. gen_udp:recv( Rx, 0, *10* ):
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Thus, it appears that gen_udp:recv( *Socket, Length*, *Timeout* )
> > > is
> > > > > >>>> broken when *Timeout* is set to *0*.
> > > > > >>>> For all positive non-zero values for Timeout, it behaves properly
> > > (or
> > > > > as
> > > > > >>>> intuitively expected).
> > > > > >>>>
> > > > > >>>> Therefore, this should either be fixed, or it should be
> > > documented in
> > > > > >>>> order to indicate that 0 must *not* be used as a value for the
> > > > > Timeout in
> > > > > >>>> gen_udp:recv/3 call.
> > > > > >>>>
> > > > > >>>> Aside, I fail to see a benefit of using Timeout of value 0 when
> > > > > combined
> > > > > >>>> with {active, once}. The only time when zero value may appear to
> > > be
> > > > > useful
> > > > > >>>> is when server does need to process other messages, e.g. messages
> > > not
> > > > > >>>> originated by the socket, which would be the case when gen_server
> > > is
> > > > > used
> > > > > >>>> to support the socket. In my view (which is supported by
> > > substantial
> > > > > >>>> empirical evidence), gen_server is not the most optimal behaviour
> > > to
> > > > > use
> > > > > >>>> when implementing support for sockets.
> > > > > >>>>
> > > > > >>>> Kind regards
> > > > > >>>>
> > > > > >>>> V/
> > > > > >>>>
> > > > > >>>>
> > > > > >>> So, seems like there is some buggy behavior in the erlang VM for
> > > > > reading
> > > > > >>> UDP messages between switching from {active, once} to
> > > > > gen_udp:recv(socket,
> > > > > >>> 0, 0)? Lukas, Raimo could you help with that?
> > > > > >>>
> > > > > >>> --
> > > > > >>> Петровский Александр / Alexander Petrovsky,
> > > > > >>>
> > > > > >>> Skype: askjuise
> > > > > >>> Phone: +7 931 9877991
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >>
> > > > > >> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > > > >
> > > > > > --
> > > > > >
> > > > > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > > > >
> > > > >
> > > >
> > > > --
> > > > Alexander Petrovsky
> > >
> > > --
> > >
> > > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > >
> >
> >
> > --
> > Alexander Petrovsky
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB