[Dnsmasq-discuss] dnsmasq 2.86 seems to stop reading from one of its dns sockets after a period of time under load

Mon May 16 17:52:25 UTC 2022

What value did you use?

On my Ubuntu desktop, /proc/sys/net/core/wmem_default and wmem_max are 
both 212992 which is a fair few DNS replies.

Simon.

On 16/05/2022 18:34, Tom Keddie wrote:
> Hi Simon,
> 
> Thanks for your response.  I don't have the detailed logs but it's a 
> noisy qa wireless environment where clients are coming and going a lot.  
> eg. In syslog I could see instances where we would get a DHCP request 
> and then a L2 wireless disassociate message would appear immediately 
> afterwards, that response isn't going to be deliverable as unicast 
> (although for dhcp it might fall back to broadcast eventually).
> 
> As we know, DNS isn't logged in such a manner but you could see the same 
> scenario unfolding where we get a bunch of dns requests, the client 
> drops off immediately afterwards and the responses can't be delivered.  
> When there's a lot of requests or a lot of clients you can see how the 
> socket buffer would fill.
> 
> Increasing the socket buffers as I described below allowed the test to 
> run for the required 96 hours, without it we weren't making it past the 
> 48 hour mark.
> 
> A dynamic solution might work provided it was carefully bound to prevent 
> DoS.  If you have something you'd like us to test I probably arrange a 
> time slot, it's a busy setup that needs lots of hardware though.
> 
> Thanks,
> Tom Keddie
> 
> ps. this is a controlled environment (as much as you can control wifi), 
> there are no malicious actors nor intent in this scenario.  It's a soak 
> test with a large variety of clients all doing busy work like video 
> streaming etc.
> 
> 
> On Fri, May 13, 2022 at 12:48 PM Simon Kelley <simon at thekelleys.org.uk 
> <mailto:simon at thekelleys.org.uk>> wrote:
> 
> 
> 
>     On 10/05/2022 16:40, Tom Keddie via Dnsmasq-discuss wrote:
>      > Hi All,
>      >
>      >     I think you're saying that it's not surprising that dnsmasq
>     is not
>      >     reading from the socket because the send queue is also full.
>      >
>      >
>      > As per this thread on netdev
>      >
>     (https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/
>     <https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/>
> 
>      >
>     <https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/
>     <https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/>>)
> 
>      > it seems we were consuming the socket send buffer with pending
>     packets
>      > waiting for ARP responses that were never coming.  This was causing
>      > failures sending to devices that were still live.
>      >
>      > As per that thread we increased the /proc/sys/net/core/wmem_default
>      > value so all sockets will have larger send buffers (the device
>     has very
>      > few sockets in use). It might be useful to add dnsmasq config
>     options to
>      > increase SO_SNDBUF on the dhcp and dns sockets to allow more
>     granular
>      > control.
>      >
>      > Thanks, Tom Keddie
> 
>     So queries are being received, and answered, but the reply is being
>     dropped by the kernel because the send queue is full of replies to dead
>     hosts? If the hosts are dead, where are the queries coming from to
>     generate these blocked replies?
> 
>     It might be sensible to automatically increase the send queue length
>     when a packer send gets EAGAIN. at least the first time, but I'd
>     like to
>     understand exactly what's going on first.
> 
> 
>     Simon.
> 
>      >
>      > _______________________________________________
>      > Dnsmasq-discuss mailing list
>      > Dnsmasq-discuss at lists.thekelleys.org.uk
>     <mailto:Dnsmasq-discuss at lists.thekelleys.org.uk>
>      >
>     https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>     <https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss>
>