[Dnsmasq-discuss] CPU spinning bug, possibly related to SSHFP queries

Tore Anderson tore at fud.no
Thu Nov 28 07:38:13 GMT 2019


Hello,

I've noticed that Dnsmasq on my system sometimes enters a defective state where it starts spinning on the CPU. When it has entered this state, I need to send it SIGKILL to get rid of it - SIGTERM is ignored.

The version is current Git master (2.80-93-g6ebdc95).

I've enabled query logging and grabbed the final log lines of a few incidents (slightly anonymised):

Example 1:

forwarded git.i.example.org to 192.168.33.1
reply git.i.example.org is <CNAME>
reply git01-osl3.i.example.org is 10.22.3.196
reply git.i.example.org is <CNAME>
reply git01-osl3.i.example.org is 2001:db8:400:c:18:59ff:fe7a:73c4
query[type=44] git.i.example.org from 127.0.0.1
(CPU spin begins)

Example 2:

query[A] s2-a8-osl3.n.example.org from 127.0.0.1
forwarded s2-a8-osl3.n.example.org to 192.168.33.1
query[AAAA] s2-a8-osl3.n.example.org from 127.0.0.1
forwarded s2-a8-osl3.n.example.org to 192.168.33.1
reply s2-a8-osl3.n.example.org is <CNAME>
reply lo.s2-a8-osl3.n.example.org is 2001:db8:1::4:1
reply s2-a8-osl3.n.example.org is <CNAME>
reply lo.s2-a8-osl3.n.example.org is 192.168.63.11
query[type=44] s2-a8-osl3.n.example.org from 127.0.0.1
(CPU spin begins)

Example 3:

query[A] s1-a8-osl3.n.example.org from 127.0.0.1
forwarded s1-a8-osl3.n.example.org to 192.168.33.1
query[AAAA] s1-a8-osl3.n.example.org from 127.0.0.1
forwarded s1-a8-osl3.n.example.org to 192.168.33.1
reply s1-a8-osl3.n.example.org is <CNAME>
reply lo.s1-a8-osl3.n.example.org is 192.168.63.10
reply s1-a8-osl3.n.example.org is <CNAME>
reply lo.s1-a8-osl3.n.example.org is 2001:db8:1::4:0
query[type=44] s1-a8-osl3.n.example.org from 127.0.0.1
(CPU spin begins)

All of them ends with an incoming query for SSHFP records (type 44), which I find highly suspect. The SSHFP requests comes from the SSH client (due to VerifyHostKeyDNS being set in my ~/.ssh/config).

None of the hostnames in question do have SSHFP records published, but that does not seem to matter, as the query does not seem to be forwarded upstream in the first place. When the bug does not occur, Dnsmasq does log that it forwards the query upstream, like so:

query[type=44] l1-a9-osl3.n.example.org from 127.0.0.1
forwarded l1-a9-osl3.n.example.org to 192.168.33.1

Dnsmasq is invoked from NetworkManager, using the following command line:

/usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces --pid-file=/run/NetworkManager/dnsmasq.pid --listen-address=127.0.0.1 --cache-size=400 --clear-on-reload --conf-file=/dev/null --proxy-dnssec --enable-dbus=org.freedesktop.NetworkManager.dnsmasq --conf-dir=/etc/NetworkManager/dnsmasq.d

Additional configuration in /etc/NetworkManager/dnsmasq.d/dnssec.conf:

dnssec
conf-file = /usr/share/dnsmasq/trust-anchors.conf
log-queries

Finally, my environment contains RES_OPTIONS=edns0 in case that is relevant (this is required for SSH's VerifyHostKeyDNS feature to work correctly).

I cannot reliably reproduce the issue. It seems to happen regularly (several times a day) during normal usage - I use the SSH client quite frequently.

I would be happy to provide additional debugging information, if given instructions on how to obtain it.'

Tore



More information about the Dnsmasq-discuss mailing list