[Dnsmasq-discuss] Query retried "out of nothing"

Wed Apr 7 20:33:38 UTC 2021

On 04/04/2021 16:54, Dominik wrote:
> Hey Simon,
> 
> I'm currently testing the tip of dnsmasq master and noticed the following:
> When running the test http://test-ipv6.com/ I do see some queries being
> retried seemingly without any indication.
> 
> Example from the log:
> 
> Apr  4 17:43:58 dnsmasq[3054422]: 467 192.168.2.224/49166 query[AAAA]
> ipv4.nop.hu from 192.168.2.224
> Apr  4 17:43:58 dnsmasq[3054422]: 467 192.168.2.224/49166 forwarded
> ipv4.nop.hu to 8.8.8.8
> [... many unrelated lines ...]
> Apr  4 17:43:58 dnsmasq[3054422]: 467 192.168.2.224/49166 forwarded
> ipv4.nop.hu to 8.8.8.8
> [... many unrelated lines ...]
> Apr  4 17:43:58 dnsmasq[3054422]: 467 192.168.2.224/49166 reply error is
> SERVFAIL
> 
> There was seemingly nothing triggering the second forwarding. A Wireshark
> recording revealed that the re-forwarding was triggered because of
> receiving a SERVFAIL but this was not logged.
> 
> Interestingly, when querying this domain alone, it works as expected and no
> re-submission is tried:
> 
> Apr  4 17:45:55 dnsmasq[3054422]: 483 192.168.2.224/40200 query[AAAA]
> ipv4.nop.hu from 192.168.2.224
> Apr  4 17:45:55 dnsmasq[3054422]: 483 192.168.2.224/40200 forwarded
> ipv4.nop.hu to 8.8.8.8
> Apr  4 17:45:55 dnsmasq[3054422]: 483 192.168.2.224/40200 reply error is
> SERVFAIL
> 
> So there seems to be an issue with the new retry mechanism behaving
> differently when under load and when handling a single query. Should the
> query be retried at all when upstream responsed with SERVFAIL (I have only
> one server, 8.8.8.8, configured as upstream DNS resolver)?

Yes. SERVFAIL can be a transient state, so a retry is a sensible thing
to do. (Why it's re-used as the RCODE for a query which failed DNSSEC
validation, and may well not be at all transient, is another question.
That seems daft to me)

The logic for if the query is retried is a bit complex, mainly to avoid
retrying over and over again if the error is not transient. This works
by retrying once, but to all available servers.

My guess is that in your first example, the query was sent to a single
server, and then retried to all available servers (which is the same
thing in your config.) In the second example, it had already been sent
to all available servers  (again, just one) and so wasn't retried.

So, the behaviour makes sense, but is a little odd when then there's
only one server, since the difference between sending to a single server
and sending to all servers is not visible.

I don't think there's a bug here.

Simon.

> 
> 
> I'm happy to run additional tests or provide additional information, if
> required. I can also share the Wireshark recording and the full log if you cannot reproduce this. I prefer to share it off-list because the log may contain sensitive information.
> 
> Best,
> Dominik
> 
> 
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss at lists.thekelleys.org.uk
> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>