[Dnsmasq-discuss] !strict-order and SERVFAIL

Thu Jan 6 21:52:48 GMT 2011

Alexander Clouter wrote:
> Hi,
> 
> * Simon Kelley <simon at thekelleys.org.uk> [2011-01-05 21:11:49+0000]:
>> What is supposed to happen in response to a SERVFAIL is that the query
>> gets sent, again, to all available servers. If all those servers in turn
>> return SERVFAIL then the error gets propagated back to the original
>> querier.
>>
>> I'm not quite sure what's going on in your packet capture: there seems
>> to be four possible upstream servers, so the query gets sent to all four
>> after the first one returns SERVFAIL. Not all the servers are replying,
>> which explains why nothing goes back to the original requestor.
>>
>> I have a suspicion that there may be a problem with the second round of
>> requests generating a third round, and so on, but I can't work out
>> exactly which server is which. Could you provide a list of IP addresses
>> for the various actors and some idea which upstream servers are configured?
>>
> The capure is probably looking complicated as I captured 'any' so it's 
> *all* there, sorry :-/
> 
> ---- resolv.conf ----
> nameserver 2a01:348:0:1::e:1
> nameserver 2a01:348:0:1::f:1
> nameserver 77.75.104.58
> nameserver 77.75.104.59
> ----
> 
> [LAN] <- eth0 - [router] - ppp0 -> [intertubes]
> 
> The router's interfaces look like so:
>  * lo: 2a01:348:45:0:311e:7a15:88ab:1f59/128
>  * ppp0: 77.75.106.34 and fe80::61c5:6995:3e02:cf0a/10
>  * eth0: 192.168.1.1 and 2a01:348:45:1000::/64 (uses ::0)
> 
> dnsmasq receives queries on eth0.
> 
> When the router speaks to the DNS servers, things look like:
> ----
> $ ip route get 2a01:348:0:1::e:1
> 2a01:348:0:1::e:1 via fe80::205:ff:fe60:2c1b dev ppp0  src 2a01:348:45:0:311e:7a15:88ab:1f59  metric 1024  mtu 1492 advmss 1432 hoplimit 4294967295
> 
> $ ip route get 77.75.104.58
> 77.75.104.58 dev ppp0  src 77.75.106.34
>     cache  mtu 1492 advmss 1452 hoplimit 64
> ----
> 
> The LAN uses 2a01:348:45:1000::/64 and 192.168.1.0/24.
> 

Ok, that makes sense. I'd not appreciated that dnsmasq was getting
queries to 2a01:348:45:1000 as well as 192.168.1.1

All is behaving as designed. After the first SERVFAIL response, retries
of the query are being exploded to all four upstream servers. For any
given query, only two are responding, so dnsmasq is not returning those
failures, in the hope that is might get a good answer from another
server. I'm happy that's OK.

To make this work better, you could ensure that all four upstream
servers return SERVFAIL, or make them return NXDOMAIN instead. Another
option  is to configure dnsmasq to return the NXDOMAIN directly with

local=/0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.5.4.0.0.8.4.3.0.1.0.a.2.ip6.arpa/

(you probably need more leading zeros to get the correct netmask, I
didn't count them exactly)

Cheers,

Simon.