[Dnsmasq-discuss] Dnsmasq responses broken for Linux and Mac clients, but working on Windows and Android clients

Timo Sigurdsson public_timo.s at silentcreek.de
Wed Oct 19 21:45:10 BST 2016


Hi,

I have a weird issue with Dnsmasq which I think is related to DNSSEC, but I don't exactly understand why or what is happening and how to fix it.

I'm currently running Dnsmasq 2.76 on my router powered by a fairly recent build of LEDE (r1792, Kernel 4.4.23). DNSSEC validation and DNSSEC-check-unsigned are both turned on.

Sometimes, the Linux and Mac clients in my network cannot resolve random domain names. But at the same time, resolution of the exact same names works on Windows clients as well as my Android devices - and even on the router itself. When I restart Dnsmasq everything works again.

For example, just now, my Debian machine could not resolve the domain security.debian.org. `nslookup security.debian.org` would show:
  ;; Truncated, retrying in TCP mode.
  Server:		192.168.123.1
  Address:	192.168.123.1#53

  ** server can't find security.debian.org: SERVFAIL

Similarily, `dig +dnssec security.debian.org` would show:
  ;; Truncated, retrying in TCP mode.

  ; <<>> DiG 9.9.5-9+deb8u7-Debian <<>> +dnssec security.debian.org
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 37369
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

  ;; OPT PSEUDOSECTION:
  ; EDNS: version: 0, flags: do; udp: 512
  ;; QUESTION SECTION:
  ;security.debian.org.		IN	A

  ;; Query time: 1167 msec
  ;; SERVER: 192.168.123.1#53(192.168.123.1)
  ;; WHEN: Wed Oct 19 21:23:49 CEST 2016
  ;; MSG SIZE  rcvd: 48

However, on Windows everything would work fine. And even *on the router itself*, a lookup of the domain gives a valid result:
  Name:      security.debian.org
  Address 1: 2001:a78:5::216:35ff:fe7f:be4f villa.debian.org
  Address 2: 2001:a78:5:1:216:35ff:fe7f:6ceb lobos.debian.org
  Address 3: 195.20.242.89 wieck.debian.org
  Address 4: 212.211.132.250 lobos.debian.org
  Address 5: 212.211.132.32 villa.debian.org

I also checked whether the time on the router is correct - and it is (constantly synced via ntp). What's more, as soon as I restart Dnsmasq (just the service, not the router itself), everything works again. But then some other domain name might fail. Two days ago I noticed that my Linux clients couldn't resolve www.kernel.org - after that I turned off dnssec-check-unsigned (but left DNSSEC enabled), which seems to have made the problem occur less often, but still it does occur from time to time and always with different domain names. It's also not related to that specific Debian client, but occurs as well on an Ubuntu 16.04 laptop, as well as a MacBook running OS X 10.11. Is it possible the DNSSEC validation somehow causes the response by Dnsmasq too long for Linux and Mac clients?

Some more background: On the very same router, I've been running OpenWrt 15.05 for more than a year. I had DNSSEC and dnssec-check-unsigned enabled for all that time without any issues. About 2-3 weeks ago, I upgraded the router to LEDE which oviously brings a newer Dnsmasq version and base system. Last week, I started noticing DNS issues - so that change might be related.

The issue also seems to be a bit hard to debug, because it occurs so randomly. I tried to enable logging of DNS queries just now to see if I can find the problem on the router side. But changing that setting led Dnsmasq to restart and then the resolution of the names that didn't work before worked again. So, I have to wait until it occurs the next time with some other domain name.

Does anybody have an idea what could be going on here?

Thank you very much!

Regards,

Timo



More information about the Dnsmasq-discuss mailing list