[Dnsmasq-discuss] CPU spinning bug, possibly related to SSHFP queries

Geert Stappers geert.stappers at hendrikx-itc.nl
Fri Nov 29 08:31:32 GMT 2019


On 29-11-2019 07:51, Tore Anderson wrote:

> * Vladislav Grishenko
>
>> Can you try to capture dns exchange to dnsmasq (on lo interface) and from it (on your nic interface) both at the same time?
>> $ tcpdump -i lo port 53 -w /path/to/dns-lo.pcap
>> $ tcpdump -i <ifname> port 53 -w /path/to/dns-ext.pcap
>> Highly possible that trigger query (or reply) can't be logged in usual way, but will be captured by tcpdump.
>> Next, I'd like to take a look at them, will there be something after/near the last logged query before spinning starts.
>>
>> p.s. Caution, pcap files will contain all your dns traffic, sending it to mail list might be not a really good idea.
> Hi,
>
> PCAP attached. I used «tcpdump -i any», so it's a single file with the internal and external traffic interleaved.
>
> It is apparent that the initial SSHFP query is not forwarded upstream, and that the subsequent queries from the stub resolver (a retransmission of the SSHFP query plus some other unrelated queries) are neither logged nor forwarded.
>
> Here are the corresponding log lines from Dnsmasq:
>
> nov. 29 07:15:53.964856 sloth.fud.no dnsmasq[48069]: query[A] l1-g9-osl2.n.bitbit.net from 127.0.0.1
> nov. 29 07:15:53.965060 sloth.fud.no dnsmasq[48069]: forwarded l1-g9-osl2.n.bitbit.net to 87.238.33.1
> nov. 29 07:15:53.965155 sloth.fud.no dnsmasq[48069]: query[AAAA] l1-g9-osl2.n.bitbit.net from 127.0.0.1
> nov. 29 07:15:53.965273 sloth.fud.no dnsmasq[48069]: forwarded l1-g9-osl2.n.bitbit.net to 87.238.33.1
> nov. 29 07:15:54.039407 sloth.fud.no dnsmasq[48069]: reply l1-g9-osl2.n.bitbit.net is <CNAME>
> nov. 29 07:15:54.039461 sloth.fud.no dnsmasq[48069]: reply eth0.l1-g9-osl2.n.bitbit.net is 10.20.120.102
> nov. 29 07:15:54.039666 sloth.fud.no dnsmasq[48069]: reply l1-g9-osl2.n.bitbit.net is <CNAME>
> nov. 29 07:15:54.039700 sloth.fud.no dnsmasq[48069]: reply eth0.l1-g9-osl2.n.bitbit.net is NODATA-IPv6
> nov. 29 07:15:54.964042 sloth.fud.no dnsmasq[48069]: query[type=44] l1-g9-osl2.n.bitbit.net from 127.0.0.1
> (CPU starts spinning at this point, no further log lines appear)

The pcap file says / shows

07:15:53.964639 IP localhost.58896 > localhost.domain: 26883+ [1au] A?
l1-g9-osl2.n.bitbit.net. (52)
07:15:53.964696 IP localhost.58896 > localhost.domain: 9742+ [1au] AAAA?
l1-g9-osl2.n.bitbit.net. (52)
07:15:53.964955 IP 100.66.5.6.46462 > dns.i.bitbit.net.domain: 45004+
[1au] A? l1-g9-osl2.n.bitbit.net. (52)
07:15:53.965208 IP 100.66.5.6.46462 > dns.i.bitbit.net.domain: 42294+
[1au] AAAA? l1-g9-osl2.n.bitbit.net. (52)
07:15:54.039175 IP dns.i.bitbit.net.domain > 100.66.5.6.46462: 45004*
2/3/7 CNAME eth0.l1-g9-osl2.n.bitbit.net., A 10.20.120.102 (284)
07:15:54.039273 IP dns.i.bitbit.net.domain > 100.66.5.6.46462: 42294*
1/1/1 CNAME eth0.l1-g9-osl2.n.bitbit.net. (145)
07:15:54.039525 IP localhost.domain > localhost.58896: 26883* 2/3/7
CNAME eth0.l1-g9-osl2.n.bitbit.net., A 10.20.120.102 (284)
07:15:54.039735 IP localhost.domain > localhost.58896: 9742* 1/1/1 CNAME
eth0.l1-g9-osl2.n.bitbit.net. (145)
07:15:54.963758 IP localhost.49114 > localhost.domain: 35496+ [1au]
SSHFP? l1-g9-osl2.n.bitbit.net. (52)
07:15:58.440050 IP localhost.60721 > localhost.domain: 17459+ [1au] A?
mattermost.redpill-linpro.com. (58)
07:15:58.440064 IP localhost.60721 > localhost.domain: 44236+ [1au]
AAAA? mattermost.redpill-linpro.com. (58)
07:15:59.968059 IP localhost.49114 > localhost.domain: 35496+ [1au]
SSHFP? l1-g9-osl2.n.bitbit.net. (52)
07:16:02.394821 IP localhost.56180 > localhost.domain: 61011+ AAAA?
fedoraproject.org. (35)
07:16:03.445058 IP localhost.60721 > localhost.domain: 17459+ [1au] A?
mattermost.redpill-linpro.com. (58)
07:16:03.445071 IP localhost.60721 > localhost.domain: 44236+ [1au]
AAAA? mattermost.redpill-linpro.com. (58)
07:16:07.399014 IP localhost.56180 > localhost.domain: 61011+ AAAA?
fedoraproject.org. (35)

The 07:15:54.039273 packet is last where source host and destination
host differ.  After that packet are the remaining packets only between
localhost and localhost.

Looking closer reveals that the first two packets are also between
localhost.

Strange, but probably not the problem.



Screenshot

| $ host l1-g9-osl2.n.bitbit.net
| l1-g9-osl2.n.bitbit.net is an alias for eth0.l1-g9-osl2.n.bitbit.net.
| eth0.l1-g9-osl2.n.bitbit.net has address 10.20.120.102
| $ host -t unknowntype l1-g9-osl2.n.bitbit.net
| host: invalid type: unknowntype
|
| $ host -t sshfp l1-g9-osl2.n.bitbit.net
| l1-g9-osl2.n.bitbit.net is an alias for eth0.l1-g9-osl2.n.bitbit.net.
| $ dig -t sshfp  l1-g9-osl2.n.bitbit.net
|
| ; <<>> DiG 9.11.5-P4-5.1-Debian <<>> -t sshfp l1-g9-osl2.n.bitbit.net
| ;; global options: +cmd
| ;; Got answer:
| ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13365
| ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
|
| ;; OPT PSEUDOSECTION:
| ; EDNS: version: 0, flags:; udp: 4096
| ; COOKIE: 905e71c3631f8577e2d927615de0d63aaa0a6211eb0ec2e1 (good)
| ;; QUESTION SECTION:
| ;l1-g9-osl2.n.bitbit.net.    IN    SSHFP
|
| ;; ANSWER SECTION:
| l1-g9-osl2.n.bitbit.net. 86103    IN    CNAME   
eth0.l1-g9-osl2.n.bitbit.net.
|
| ;; AUTHORITY SECTION:
| n.bitbit.net.        3314    IN    SOA    ns-foo.i.bitbit.net.
hostmaster.redpill-linpro.com. 2018033152 10800 3600 2419200 3600
|
| ;; Query time: 0 msec
| ;; SERVER: 192.168.42.24#53(192.168.42.24)
| ;; WHEN: vr nov 29 09:26:34 CET 2019
| ;; MSG SIZE  rcvd: 173
|
| $


I think the problem is not getting a "sshfp response" upon a "sshfp query".



Regards

Geert Stappers







More information about the Dnsmasq-discuss mailing list