[Dnsmasq-discuss] Odd caching behaviour...

John Robson jrobson at zenoss.com
Sat Mar 30 08:41:43 GMT 2019


Simon,

The upstream server is authoritative for the initial domain (being inside
an organisation I don’t think that’s unusual) and the incomplete (but
perfectly valid, I agree) response is taken as complete. The upstream
server does do recursion as well, but when that failed it just returned
what it could (seems reasonable enough).

I’d have thought that the lack of an actual resolved A record (which is
what was asked for) would mark the cache entry as incomplete at best.
This is pure gut, not a technically based statement.

And whilst I agree that the record was cached (and that that is probably
technically correct) I can’t then explain why dnsmasq stopped using the
cache when I restarted my program - with 45+ minutes of cache left, dnsmasq
went back to the upstream server and got a complete answer.

Restarting dnsmasq obviously reset the cache, and everything recovered when
I did that - but restarting other software shouldn’t have magically reset
the cache, and yet it did.

(Un)Fortunately the second/third nameservers seem to be being better
behaved at the moment, so we haven’t seen the incomplete response in
several days - kind of makes it harder to test though.

Cheers,

John



On Fri, 29 Mar 2019 at 22:43, Simon Kelley <simon at thekelleys.org.uk> wrote:

> On 21/03/2019 11:01, John Robson wrote:
> > OK,
> >
> > Maybe this does reveal something about the caching...
> > Which might be expected behaviour, but I am not convinced it's useful...
> >
> > Overnight monitoring has shown that the upstream server does
> > occasionally send back an incomplete (but perfectly valid) CNAME only
> > response.  Mostly I can justify the caching behaviour based on the TTLs
> > of the second CNAME or A record (the server is authoritative for the
> > first CNAME, so that's always at 3600).
> >
> > As a slight aside:
> > dnsmasq sends a query at 22:57:32.599, then again (new transaction id)
> > at 22:57:33.601, and at 22:57:36.601.
> > This last query gets a response in 0.1 seconds, both the others
> > eventually come in (incomplete) at 22:57:44.073
> > I am assuming that dnsmasq ignored these late arrivals (either due to a
> > default timeout, or just because a better answer has been received -
> > this would be comparable with behaviour when it queries multiple servers
> > to decide which is 'best').
> > In this case we are protected by the fact that the incomplete query
> > takes far longer than the complete one due to timeouts.
> >
> > Later though:
> > At 01:12:47 we are out of TTL, so send a request, and get an incomplete
> > response... The response only contains the first CNAME, which has a 3600
> > TTL.
> >
> > Then dnsmasq doesn't send another query for an hour - despite the fact
> > that it doesn't have a "good" answer.
> > In this case the query it sends after an hour gets incomplete response
> > again - not good.
> > Then I lost track because the container got moved to a different host -
> > but it looks like it was returning incomplete for several hours...
> >
> >
> > dnsmasq is otherwise well behaved - it is still responding to other
> > queries just fine, despite being hammered by more than 2k queries/second
> >
> > Two questions:
> >  - Is it correct/wanted behaviour to cache an incomplete record like
> this?
> > I have no issue caching the cname, but should we keep trying to resolve
> > the cname to an a record?
> >
> >  - Why/How does a restart of the querying program change the caching
> > behaviour of dnsmasq?
> > Because even if the program is restarted after just a few minutes it
> > immediately gets better data - my capture from yesterday shows that
> > despite the fact that the TTL had 2855 seconds (of the 3600 default)
> > left just two minutes before the first 'new process' request comes in,
> > that new request triggers an outbound query.
> >
> >
> > Cheers,
> >
> > John
> >
>
> What's you're calling an "incomplete" answer is actually a perfectly
> good answer. Dnsmasq is entitled to infer that the target of the CNAME
> doesn't exist if it's not included in the answer, and keep that
> information in the cache for the the TTL  period.
>
> Note that is _only_ true if the the upstream server is a recursive
> server - as such it's expected to attempt the follow the CNAME and
> return as much of the chain as exists. If the upstream server is an
> authoritative server, that's not true - if the CNAME target is outside
> the domain(s) that the server is authoritative for, then the target will
> not be included. This is one reason why dnsmasq should only use
> recursive servers, an it will log an error if an upstream server is not
> recursive (ra flag not set). It's also the most common reason why people
> see the dnsmasq behaviour you're describing.
>
>
>
> Cheers,
>
> Simon.
>
>
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss at lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20190330/7d94399f/attachment-0001.html>


More information about the Dnsmasq-discuss mailing list