[Dnsmasq-discuss] Mysterious dnsmasq 2.38 'hangs'

Simon Kelley simon at thekelleys.org.uk
Fri Feb 16 21:12:15 GMT 2007


Paul Chambers wrote:
> I have dnsmasq 2.38 installed on Fedora Core 5 from an RPM (official
> 'extras' repository). When initially started, dnsmasq works very well,
> big improvement over my previous 'bind+dhcpd' setup. But it only lasts a
> few hours. Eventually dnsmasq will cease answering queries or DHCP
> requests. Or perhaps it's answering them so slowly that the request
> times out.
> 
> I stopped running the daemon from an init script, and ran it by hand on
> a console with -d, to see what's happening. Same behavior, no log
> messages to indicate anything wrong, but interestingly, I'll usually get
> a log line emitted for each signal I send to it. For example, I tried
> sending 'sigusr1' to get a dump, and would sometimes get the first
> header line, followed by the second header line maybe ten seconds later.
> Other times I'd get several lines, then it'd stop. Hitting cntl-C always
> gets me to a shell prompt, but interestingly, if I immeidately restart
> it with -d, I'll usually get one line logged to the console and it'll
> 'hang' again.
> 
> I've tried attaching to it with gdb a few times, and it always seems to
> be inside syslog() when I do so, even though there's little coming out
> on the console. That probably implies that the delay is actually
> occuring inside syslog? Might it have something to do with looking up
> the host's name to generate the syslog entry?

Are you logging remotely? There's a classic deadlock if you have 
/etc/syslog.conf set to log to a remote machine given by name: dnsmasq 
needs to log something, so it calls syslog, which calls gethostbyname() 
which results in a query  to dnsmasq, which won't answer the query 
untill it's finished logging the last one....


> 
> I've also rebuilt it from source, and added extra logging here and
> there, without learning anything significant. I still get the 'hang'
> occuring.
> 
> I've had to switch back to bind for now.
> 
> Any suggestions?

This is not something I've seen before: DNS failing after a time is 
usually one dodgy upstream nameserver, but if DHCP is quiet too, that 
does sort-of imply that it's blocked in syslog() calls.

The suggestion to try an earlier version is good, but please drop back 
to 2.35. 2.36 and 2.37 have known problems that will just complicate 
matters.

Another way to get a handle on what's going on might be to run "dnsmasq 
-d" under strace. You'll generate a fair bit of output, but spooled to a 
file until the gremlin strikes, it could be very useful.


Cheers,

Simon.




More information about the Dnsmasq-discuss mailing list