[Dnsmasq-discuss] dnsmasq using 100% cpu on router

Simon Kelley simon at thekelleys.org.uk
Thu Apr 24 19:49:52 UTC 2014


On 24/04/14 20:41, David Joslin wrote:
> Thanks for the reply, Simon.
> 
> DNSSEC isn't enabled.
> 
> I wonder if the pattern of the problem gives any clues...
> 
> As I said, on a normal day with around 40-50 clients on the network there
> is no problem at all with dnsmasq managing to use barely 0 - 2% of the CPU.
> When the problem occurred there were a little over 100 clients. Running top
> showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on
> top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using
> very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a
> couple of seconds before dropping back. Then it would start peaking at
> higher and higher levels before dropping back. Eventually, after running
> for maybe half an hour it would start peaking at over 90% and staying there
> for longer before dropping back. At this point dns requests would become
> very slow (and maybe time out). And then dnsmasq would hit 100% cpu and
> would stay there. Dns requests would time out and only restarting dnsmasq
> would fix the problem. The pattern would then start over again.
> 
> I may be wrong but it doesn't seem that dnsmasq is hitting a bug that
> suddenly causes it to loop and hog the cpu until it's killed. It seems to
> gradually show more and more of the problem before it eventually hogs 100%
> cpu and has to be killed.
> 
> If the problem was caused by dnsmasq being overloaded with requests, is it
> likely or possible that 50 clients could put very little load on it but 100
> clients could swamp it? Also, would the problem not show itself as soon as
> dnsmasq was restarted rather than showing the gradual increase in peak
> usage until it hits 100%?


Logs would help. The pattern doesn't look familiar, but if I had to
guess, I'd say that the problem is DHCP, not DNS. Every change to the
DHCP lease database causes the file storing it to be re-written, and I
suspect that's what's eating CPU, in disk wait.

Version of dnsmasq in use would be useful, and a copy of your config (to
me privately, if you prefer.)

When dnsmasq is running at 100%, try running

strace -p <pid of dnsmasq process>

that will run forever, printing what syscalls are being made, you can
ctrl-c it after a show while, which will stop strace, but not dnsmasq.


Cheers,


Simon

> 
> I hope this helps. Any thoughts on this pattern?
> 
> Cheers
> 
> David
> 
> 
> On 24 April 2014 12:41, Simon Kelley <simon at thekelleys.org.uk> wrote:
> 
>> On 22/04/14 20:04, David Joslin wrote:
>>> Hi
>>>
>>> I have an Asus rt-n16 router running the Shibby version of the Tomato
>>> firmware which includes dnsmasq version 2.69test3. It's in use in a
>>> building that frequently has 50+ users on a wireless network and dnsmasq
>>> has performed extremely well with very little load on the router.
>>>
>>> However, we've recently run a couple of conferences in the building and
>> the
>>> number of people using the wireless network has been just over 100.
>> Several
>>> times there have been problems resolving addresses and when I've looked
>> at
>>> the router dnsmasq has been using 100% cpu. Restarting dnsmasq
>> temporarily
>>> fixes the problem but it occurs again maybe 20 minutes later.
>>>
>>> I've turned off logging, increased the cache-size and the maximum number
>> of
>>> dhcp leases (anything I could see that might be a problem with more
>> users)
>>> but this hasn't fixed the problem.
>>>
>>> I wondered if anyone has come across anything similar or has any
>>> suggestions?
>>>
>>
>> The first thing is to try and decide which of two possible scenarios ar
>> happening. The first is that you've triggered a bug in the code and
>> dnsmasq is looping somewhere without ever getting back to the select()
>> loop and doing actual work. The second is that it's getting so much work
>> that it's running out of CPU to do it.
>>
>> In the first case, dnsmasq will stop working entirely. Is that
>> consistent with  "problems resolving addresses" or does it still
>> partially work? Turning off logging is probably counter-productive here,
>> the logs may have valuable clues.
>>
>>
>> In the second case, DNSSEC is something to worry about. Do you have that
>> turned on?
>>
>> Also, it's possible to arrive at configurations with DNS forwarding
>> loops where once DNS query gets sent upstream, but somehow ends up back
>> at the dnsmasq instance that originally forwarded it and then goes round
>> in circles. It's quite difficult to do this without at least two dnsmasq
>> instances, but it is possible.
>>
>> Finally, logging to a syslog daemon which does its own DNS lookups (to
>> label logs from remote hosts) can create a collapse: dnsmasq will log
>> several lines for each DNS query, if each of those lines generates a new
>> DNS query which has to handled by dnsmasq, it all goes wrong very quickly.
>>
>>
>> Cheers,
>>
>>
>> Simon.
>>
>>
>>
>> _______________________________________________
>> Dnsmasq-discuss mailing list
>> Dnsmasq-discuss at lists.thekelleys.org.uk
>> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>>
> 
> 
> 
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss at lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> 




More information about the Dnsmasq-discuss mailing list