[Dnsmasq-discuss] dnsmasq using 100% cpu on router

Kevin Darbyshire-Bryant kevin at darbyshire-bryant.me.uk
Thu Apr 24 20:13:23 UTC 2014


On 24/04/2014 20:49, Simon Kelley wrote:
> On 24/04/14 20:41, David Joslin wrote:
>> Thanks for the reply, Simon.
>>
>> DNSSEC isn't enabled.
>>
>> I wonder if the pattern of the problem gives any clues...
>>
>> As I said, on a normal day with around 40-50 clients on the network there
>> is no problem at all with dnsmasq managing to use barely 0 - 2% of the CPU.
>> When the problem occurred there were a little over 100 clients. Running top
>> showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on
>> top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using
>> very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a
>> couple of seconds before dropping back. Then it would start peaking at
>> higher and higher levels before dropping back. Eventually, after running
>> for maybe half an hour it would start peaking at over 90% and staying there
>> for longer before dropping back. At this point dns requests would become
>> very slow (and maybe time out). And then dnsmasq would hit 100% cpu and
>> would stay there. Dns requests would time out and only restarting dnsmasq
>> would fix the problem. The pattern would then start over again.
>>
>> I may be wrong but it doesn't seem that dnsmasq is hitting a bug that
>> suddenly causes it to loop and hog the cpu until it's killed. It seems to
>> gradually show more and more of the problem before it eventually hogs 100%
>> cpu and has to be killed.
>>
>> If the problem was caused by dnsmasq being overloaded with requests, is it
>> likely or possible that 50 clients could put very little load on it but 100
>> clients could swamp it? Also, would the problem not show itself as soon as
>> dnsmasq was restarted rather than showing the gradual increase in peak
>> usage until it hits 100%?
>
> Logs would help. The pattern doesn't look familiar, but if I had to
> guess, I'd say that the problem is DHCP, not DNS. Every change to the
> DHCP lease database causes the file storing it to be re-written, and I
> suspect that's what's eating CPU, in disk wait.
>
> Version of dnsmasq in use would be useful, and a copy of your config (to
> me privately, if you prefer.)
>
> When dnsmasq is running at 100%, try running
>
> strace -p <pid of dnsmasq process>
>
> that will run forever, printing what syscalls are being made, you can
> ctrl-c it after a show while, which will stop strace, but not dnsmasq.
>
>
> Cheers,
>
>
> Simon
>
>

Chaps,

Please be aware that the dnsmasq included in tomato is not a clean
'pull' out of Simon's release but includes some tweaks, mainly to the
lease writing code (where it outputs 'remaining leasetime' rather than
expiry time)  There's also a 'helper' function that upon receipt of
SIGUSR1 (or it may be 2 I can't remember) dumps the leasefile in a
tomato specific format so that it may be read & parsed into the 'dhcp
status' page.

Those changes were 'formalised' by me into IFDEF conditional compilation
flags when I first investigated updating dnsmasq from v2.61 to something
slightly newer which fixed the IPv6 RA flags.  The original changes by
Jon Zarate were identified and re-inserted after a few false starts.  I
am no 'C' coder!

My suggestion for a start are to upgrade to dnsmasq 2.70 rather than a
test release of 2.69.  Also try changing the location of the leasefile
to somewhere else e.g. a USB stick if your router supports it.

I've not encountered anything like this but then I don't have 100 clients.

Kevin


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3768 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20140424/b91f010e/attachment.bin>


More information about the Dnsmasq-discuss mailing list