[Dnsmasq-discuss] dnsmasq using 100% cpu on router

David Joslin davidj at nkcc.org.uk
Thu Apr 24 19:41:13 UTC 2014


Thanks for the reply, Simon.

DNSSEC isn't enabled.

I wonder if the pattern of the problem gives any clues...

As I said, on a normal day with around 40-50 clients on the network there
is no problem at all with dnsmasq managing to use barely 0 - 2% of the CPU.
When the problem occurred there were a little over 100 clients. Running top
showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on
top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using
very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a
couple of seconds before dropping back. Then it would start peaking at
higher and higher levels before dropping back. Eventually, after running
for maybe half an hour it would start peaking at over 90% and staying there
for longer before dropping back. At this point dns requests would become
very slow (and maybe time out). And then dnsmasq would hit 100% cpu and
would stay there. Dns requests would time out and only restarting dnsmasq
would fix the problem. The pattern would then start over again.

I may be wrong but it doesn't seem that dnsmasq is hitting a bug that
suddenly causes it to loop and hog the cpu until it's killed. It seems to
gradually show more and more of the problem before it eventually hogs 100%
cpu and has to be killed.

If the problem was caused by dnsmasq being overloaded with requests, is it
likely or possible that 50 clients could put very little load on it but 100
clients could swamp it? Also, would the problem not show itself as soon as
dnsmasq was restarted rather than showing the gradual increase in peak
usage until it hits 100%?

I hope this helps. Any thoughts on this pattern?

Cheers

David


On 24 April 2014 12:41, Simon Kelley <simon at thekelleys.org.uk> wrote:

> On 22/04/14 20:04, David Joslin wrote:
> > Hi
> >
> > I have an Asus rt-n16 router running the Shibby version of the Tomato
> > firmware which includes dnsmasq version 2.69test3. It's in use in a
> > building that frequently has 50+ users on a wireless network and dnsmasq
> > has performed extremely well with very little load on the router.
> >
> > However, we've recently run a couple of conferences in the building and
> the
> > number of people using the wireless network has been just over 100.
> Several
> > times there have been problems resolving addresses and when I've looked
> at
> > the router dnsmasq has been using 100% cpu. Restarting dnsmasq
> temporarily
> > fixes the problem but it occurs again maybe 20 minutes later.
> >
> > I've turned off logging, increased the cache-size and the maximum number
> of
> > dhcp leases (anything I could see that might be a problem with more
> users)
> > but this hasn't fixed the problem.
> >
> > I wondered if anyone has come across anything similar or has any
> > suggestions?
> >
>
> The first thing is to try and decide which of two possible scenarios ar
> happening. The first is that you've triggered a bug in the code and
> dnsmasq is looping somewhere without ever getting back to the select()
> loop and doing actual work. The second is that it's getting so much work
> that it's running out of CPU to do it.
>
> In the first case, dnsmasq will stop working entirely. Is that
> consistent with  "problems resolving addresses" or does it still
> partially work? Turning off logging is probably counter-productive here,
> the logs may have valuable clues.
>
>
> In the second case, DNSSEC is something to worry about. Do you have that
> turned on?
>
> Also, it's possible to arrive at configurations with DNS forwarding
> loops where once DNS query gets sent upstream, but somehow ends up back
> at the dnsmasq instance that originally forwarded it and then goes round
> in circles. It's quite difficult to do this without at least two dnsmasq
> instances, but it is possible.
>
> Finally, logging to a syslog daemon which does its own DNS lookups (to
> label logs from remote hosts) can create a collapse: dnsmasq will log
> several lines for each DNS query, if each of those lines generates a new
> DNS query which has to handled by dnsmasq, it all goes wrong very quickly.
>
>
> Cheers,
>
>
> Simon.
>
>
>
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss at lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20140424/cfd36332/attachment.html>


More information about the Dnsmasq-discuss mailing list