[Dnsmasq-discuss] Odd caching behaviour...
Simon Kelley
simon at thekelleys.org.uk
Fri Mar 29 22:08:23 GMT 2019
On 21/03/2019 11:01, John Robson wrote:
> OK,
>
> Maybe this does reveal something about the caching...
> Which might be expected behaviour, but I am not convinced it's useful...
>
> Overnight monitoring has shown that the upstream server does
> occasionally send back an incomplete (but perfectly valid) CNAME only
> response. Mostly I can justify the caching behaviour based on the TTLs
> of the second CNAME or A record (the server is authoritative for the
> first CNAME, so that's always at 3600).
>
> As a slight aside:
> dnsmasq sends a query at 22:57:32.599, then again (new transaction id)
> at 22:57:33.601, and at 22:57:36.601.
> This last query gets a response in 0.1 seconds, both the others
> eventually come in (incomplete) at 22:57:44.073
> I am assuming that dnsmasq ignored these late arrivals (either due to a
> default timeout, or just because a better answer has been received -
> this would be comparable with behaviour when it queries multiple servers
> to decide which is 'best').
> In this case we are protected by the fact that the incomplete query
> takes far longer than the complete one due to timeouts.
>
> Later though:
> At 01:12:47 we are out of TTL, so send a request, and get an incomplete
> response... The response only contains the first CNAME, which has a 3600
> TTL.
>
> Then dnsmasq doesn't send another query for an hour - despite the fact
> that it doesn't have a "good" answer.
> In this case the query it sends after an hour gets incomplete response
> again - not good.
> Then I lost track because the container got moved to a different host -
> but it looks like it was returning incomplete for several hours...
>
>
> dnsmasq is otherwise well behaved - it is still responding to other
> queries just fine, despite being hammered by more than 2k queries/second
>
> Two questions:
> - Is it correct/wanted behaviour to cache an incomplete record like this?
> I have no issue caching the cname, but should we keep trying to resolve
> the cname to an a record?
>
> - Why/How does a restart of the querying program change the caching
> behaviour of dnsmasq?
> Because even if the program is restarted after just a few minutes it
> immediately gets better data - my capture from yesterday shows that
> despite the fact that the TTL had 2855 seconds (of the 3600 default)
> left just two minutes before the first 'new process' request comes in,
> that new request triggers an outbound query.
>
>
> Cheers,
>
> John
>
What's you're calling an "incomplete" answer is actually a perfectly
good answer. Dnsmasq is entitled to infer that the target of the CNAME
doesn't exist if it's not included in the answer, and keep that
information in the cache for the the TTL period.
Note that is _only_ true if the the upstream server is a recursive
server - as such it's expected to attempt the follow the CNAME and
return as much of the chain as exists. If the upstream server is an
authoritative server, that's not true - if the CNAME target is outside
the domain(s) that the server is authoritative for, then the target will
not be included. This is one reason why dnsmasq should only use
recursive servers, an it will log an error if an upstream server is not
recursive (ra flag not set). It's also the most common reason why people
see the dnsmasq behaviour you're describing.
Cheers,
Simon.
More information about the Dnsmasq-discuss
mailing list