[Dnsmasq-discuss] dnsmasq 2.86 seems to stop reading from one of its dns sockets after a period of time under load
Tom Keddie
tomk+dnsmasq at minim.com
Tue May 17 14:08:02 UTC 2022
> What value did you use?
I went brute force and used 1M. The default on this arm based device was
also 212992.
root at MH7601:~# cat /proc/sys/net/core/wmem_default
1048576
I agree that is a lot but given the arp queue length has 101 entries, that
is a lot of packets (especially if that might mean 101 hosts - not sure if
the arp/neigh queue is per host or per request).
root at MH7601:~# cat /proc/sys/net/ipv4/neigh/default/unres_qlen
101
This is a very controlled environment, there are only about 30 sockets open
at any time. This approach won't suit most people but it saved me from
crafting a patch into openwrt.
Thanks,
Tom
On Mon, May 16, 2022 at 10:52 AM Simon Kelley <simon at thekelleys.org.uk>
wrote:
> What value did you use?
>
> On my Ubuntu desktop, /proc/sys/net/core/wmem_default and wmem_max are
> both 212992 which is a fair few DNS replies.
>
>
> Simon.
>
>
> On 16/05/2022 18:34, Tom Keddie wrote:
> > Hi Simon,
> >
> > Thanks for your response. I don't have the detailed logs but it's a
> > noisy qa wireless environment where clients are coming and going a lot.
> > eg. In syslog I could see instances where we would get a DHCP request
> > and then a L2 wireless disassociate message would appear immediately
> > afterwards, that response isn't going to be deliverable as unicast
> > (although for dhcp it might fall back to broadcast eventually).
> >
> > As we know, DNS isn't logged in such a manner but you could see the same
> > scenario unfolding where we get a bunch of dns requests, the client
> > drops off immediately afterwards and the responses can't be delivered.
> > When there's a lot of requests or a lot of clients you can see how the
> > socket buffer would fill.
> >
> > Increasing the socket buffers as I described below allowed the test to
> > run for the required 96 hours, without it we weren't making it past the
> > 48 hour mark.
> >
> > A dynamic solution might work provided it was carefully bound to prevent
> > DoS. If you have something you'd like us to test I probably arrange a
> > time slot, it's a busy setup that needs lots of hardware though.
> >
> > Thanks,
> > Tom Keddie
> >
> > ps. this is a controlled environment (as much as you can control wifi),
> > there are no malicious actors nor intent in this scenario. It's a soak
> > test with a large variety of clients all doing busy work like video
> > streaming etc.
> >
> >
> > On Fri, May 13, 2022 at 12:48 PM Simon Kelley <simon at thekelleys.org.uk
> > <mailto:simon at thekelleys.org.uk>> wrote:
> >
> >
> >
> > On 10/05/2022 16:40, Tom Keddie via Dnsmasq-discuss wrote:
> > > Hi All,
> > >
> > > I think you're saying that it's not surprising that dnsmasq
> > is not
> > > reading from the socket because the send queue is also full.
> > >
> > >
> > > As per this thread on netdev
> > >
> > (
> https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/
> > <
> https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/
> >
> >
> > >
> > <
> https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/
> > <
> https://lore.kernel.org/netdev/CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com/
> >>)
> >
> > > it seems we were consuming the socket send buffer with pending
> > packets
> > > waiting for ARP responses that were never coming. This was
> causing
> > > failures sending to devices that were still live.
> > >
> > > As per that thread we increased the
> /proc/sys/net/core/wmem_default
> > > value so all sockets will have larger send buffers (the device
> > has very
> > > few sockets in use). It might be useful to add dnsmasq config
> > options to
> > > increase SO_SNDBUF on the dhcp and dns sockets to allow more
> > granular
> > > control.
> > >
> > > Thanks, Tom Keddie
> >
> > So queries are being received, and answered, but the reply is being
> > dropped by the kernel because the send queue is full of replies to
> dead
> > hosts? If the hosts are dead, where are the queries coming from to
> > generate these blocked replies?
> >
> > It might be sensible to automatically increase the send queue length
> > when a packer send gets EAGAIN. at least the first time, but I'd
> > like to
> > understand exactly what's going on first.
> >
> >
> > Simon.
> >
> > >
> > > _______________________________________________
> > > Dnsmasq-discuss mailing list
> > > Dnsmasq-discuss at lists.thekelleys.org.uk
> > <mailto:Dnsmasq-discuss at lists.thekelleys.org.uk>
> > >
> >
> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
> > <
> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20220517/87889023/attachment.htm>
More information about the Dnsmasq-discuss
mailing list