<div>Simon,<br><br>The upstream server is authoritative for the initial domain (being inside an organisation I don’t think that’s unusual) and the incomplete (but perfectly valid, I agree) response is taken as complete. The upstream server does do recursion as well, but when that failed it just returned what it could (seems reasonable enough).<br><br>I’d have thought that the lack of an actual resolved A record (which is what was asked for) would mark the cache entry as incomplete at best.<br>This is pure gut, not a technically based statement.<br><br>And whilst I agree that the record was cached (and that that is probably technically correct) I can’t then explain why dnsmasq stopped using the cache when I restarted my program - with 45+ minutes of cache left, dnsmasq went back to the upstream server and got a complete answer.<br><br>Restarting dnsmasq obviously reset the cache, and everything recovered when I did that - but restarting other software shouldn’t have magically reset the cache, and yet it did.<br><br>(Un)Fortunately the second/third nameservers seem to be being better behaved at the moment, so we haven’t seen the incomplete response in several days - kind of makes it harder to test though.<br><br>Cheers,<br><br>John</div><div><br><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 29 Mar 2019 at 22:43, Simon Kelley <<a href="mailto:simon@thekelleys.org.uk" target="_blank">simon@thekelleys.org.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 21/03/2019 11:01, John Robson wrote:<br>
> OK,<br>
> <br>
> Maybe this does reveal something about the caching...<br>
> Which might be expected behaviour, but I am not convinced it's useful...<br>
> <br>
> Overnight monitoring has shown that the upstream server does<br>
> occasionally send back an incomplete (but perfectly valid) CNAME only<br>
> response. Mostly I can justify the caching behaviour based on the TTLs<br>
> of the second CNAME or A record (the server is authoritative for the<br>
> first CNAME, so that's always at 3600).<br>
> <br>
> As a slight aside:<br>
> dnsmasq sends a query at 22:57:32.599, then again (new transaction id)<br>
> at 22:57:33.601, and at 22:57:36.601.<br>
> This last query gets a response in 0.1 seconds, both the others<br>
> eventually come in (incomplete) at 22:57:44.073<br>
> I am assuming that dnsmasq ignored these late arrivals (either due to a<br>
> default timeout, or just because a better answer has been received -<br>
> this would be comparable with behaviour when it queries multiple servers<br>
> to decide which is 'best').<br>
> In this case we are protected by the fact that the incomplete query<br>
> takes far longer than the complete one due to timeouts.<br>
> <br>
> Later though:<br>
> At 01:12:47 we are out of TTL, so send a request, and get an incomplete<br>
> response... The response only contains the first CNAME, which has a 3600<br>
> TTL.<br>
> <br>
> Then dnsmasq doesn't send another query for an hour - despite the fact<br>
> that it doesn't have a "good" answer.<br>
> In this case the query it sends after an hour gets incomplete response<br>
> again - not good.<br>
> Then I lost track because the container got moved to a different host -<br>
> but it looks like it was returning incomplete for several hours...<br>
> <br>
> <br>
> dnsmasq is otherwise well behaved - it is still responding to other<br>
> queries just fine, despite being hammered by more than 2k queries/second<br>
> <br>
> Two questions:<br>
> - Is it correct/wanted behaviour to cache an incomplete record like this?<br>
> I have no issue caching the cname, but should we keep trying to resolve<br>
> the cname to an a record?<br>
> <br>
> - Why/How does a restart of the querying program change the caching<br>
> behaviour of dnsmasq?<br>
> Because even if the program is restarted after just a few minutes it<br>
> immediately gets better data - my capture from yesterday shows that<br>
> despite the fact that the TTL had 2855 seconds (of the 3600 default)<br>
> left just two minutes before the first 'new process' request comes in,<br>
> that new request triggers an outbound query.<br>
> <br>
> <br>
> Cheers,<br>
> <br>
> John<br>
> <br>
<br>
What's you're calling an "incomplete" answer is actually a perfectly<br>
good answer. Dnsmasq is entitled to infer that the target of the CNAME<br>
doesn't exist if it's not included in the answer, and keep that<br>
information in the cache for the the TTL period.<br>
<br>
Note that is _only_ true if the the upstream server is a recursive<br>
server - as such it's expected to attempt the follow the CNAME and<br>
return as much of the chain as exists. If the upstream server is an<br>
authoritative server, that's not true - if the CNAME target is outside<br>
the domain(s) that the server is authoritative for, then the target will<br>
not be included. This is one reason why dnsmasq should only use<br>
recursive servers, an it will log an error if an upstream server is not<br>
recursive (ra flag not set). It's also the most common reason why people<br>
see the dnsmasq behaviour you're describing.<br>
<br>
<br>
<br>
Cheers,<br>
<br>
Simon.<br>
<br>
<br>
_______________________________________________<br>
Dnsmasq-discuss mailing list<br>
<a href="mailto:Dnsmasq-discuss@lists.thekelleys.org.uk" target="_blank">Dnsmasq-discuss@lists.thekelleys.org.uk</a><br>
<a href="http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss" rel="noreferrer" target="_blank">http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss</a><br>
</blockquote></div>
</div>