[Dnsmasq-discuss] dnsmasq 2.86 seems to stop reading from one of its dns sockets after a period of time under load
Simon Kelley
simon at thekelleys.org.uk
Mon May 16 17:52:25 UTC 2022
What value did you use?
On my Ubuntu desktop, /proc/sys/net/core/wmem_default and wmem_max are
both 212992 which is a fair few DNS replies.
Simon.
On 16/05/2022 18:34, Tom Keddie wrote:
> Hi Simon,
>
> Thanks for your response. I don't have the detailed logs but it's a
> noisy qa wireless environment where clients are coming and going a lot.
> eg. In syslog I could see instances where we would get a DHCP request
> and then a L2 wireless disassociate message would appear immediately
> afterwards, that response isn't going to be deliverable as unicast
> (although for dhcp it might fall back to broadcast eventually).
>
> As we know, DNS isn't logged in such a manner but you could see the same
> scenario unfolding where we get a bunch of dns requests, the client
> drops off immediately afterwards and the responses can't be delivered.
> When there's a lot of requests or a lot of clients you can see how the
> socket buffer would fill.
>
> Increasing the socket buffers as I described below allowed the test to
> run for the required 96 hours, without it we weren't making it past the
> 48 hour mark.
>
> A dynamic solution might work provided it was carefully bound to prevent
> DoS. If you have something you'd like us to test I probably arrange a
> time slot, it's a busy setup that needs lots of hardware though.
>
> Thanks,
> Tom Keddie
>
> ps. this is a controlled environment (as much as you can control wifi),
> there are no malicious actors nor intent in this scenario. It's a soak
> test with a large variety of clients all doing busy work like video
> streaming etc.
>
>
> On Fri, May 13, 2022 at 12:48 PM Simon Kelley <simon at thekelleys.org.uk
> <mailto:simon at thekelleys.org.uk>> wrote:
>
>
>
> On 10/05/2022 16:40, Tom Keddie via Dnsmasq-discuss wrote:
> > Hi All,
> >
> > I think you're saying that it's not surprising that dnsmasq
> is not
> > reading from the socket because the send queue is also full.
> >
> >
> > As per this thread on netdev
> >
> (https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/
> <https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/>
>
> >
> <https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/
> <https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/>>)
>
> > it seems we were consuming the socket send buffer with pending
> packets
> > waiting for ARP responses that were never coming. This was causing
> > failures sending to devices that were still live.
> >
> > As per that thread we increased the /proc/sys/net/core/wmem_default
> > value so all sockets will have larger send buffers (the device
> has very
> > few sockets in use). It might be useful to add dnsmasq config
> options to
> > increase SO_SNDBUF on the dhcp and dns sockets to allow more
> granular
> > control.
> >
> > Thanks, Tom Keddie
>
> So queries are being received, and answered, but the reply is being
> dropped by the kernel because the send queue is full of replies to dead
> hosts? If the hosts are dead, where are the queries coming from to
> generate these blocked replies?
>
> It might be sensible to automatically increase the send queue length
> when a packer send gets EAGAIN. at least the first time, but I'd
> like to
> understand exactly what's going on first.
>
>
> Simon.
>
> >
> > _______________________________________________
> > Dnsmasq-discuss mailing list
> > Dnsmasq-discuss at lists.thekelleys.org.uk
> <mailto:Dnsmasq-discuss at lists.thekelleys.org.uk>
> >
> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
> <https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss>
>
More information about the Dnsmasq-discuss
mailing list