[Dnsmasq-discuss] Odd caching behaviour...
John Robson
jrobson at zenoss.com
Thu Apr 4 17:28:57 BST 2019
Ok, thanks - that makes sense in terms of the 'incomplete' entry being
cached.
I might set up a couple of dns servers to simulate this at some point - I'm
going to want a reproducible setup for our own testing as well... If I can
then I'll come back with that log...
Actually, maybe I already have it, let me check...
Nope - I have that up to the dnsmasq restart, not the software restart.
Cheers,
John
On Thu, 4 Apr 2019 at 16:27, Simon Kelley <simon at thekelleys.org.uk> wrote:
> On 30/03/2019 08:41, John Robson wrote:
> > Simon,
> >
> > The upstream server is authoritative for the initial domain (being
> > inside an organisation I don’t think that’s unusual) and the incomplete
> > (but perfectly valid, I agree) response is taken as complete. The
> > upstream server does do recursion as well, but when that failed it just
> > returned what it could (seems reasonable enough).
> >
> > I’d have thought that the lack of an actual resolved A record (which is
> > what was asked for) would mark the cache entry as incomplete at best.
> > This is pure gut, not a technically based statement.
>
> A CNAME reply with no record for the target of the CNAME, from a
> recursive server, establishes that the target doesn't exist. If it were
> otherwise, there would be large numbers of legitimate answers which are
> uncachable. Consider that there are many record types and the target of
> a CNAME will not exist for most record types.
>
> As a common example, an IPv6 enabled host will query for the AAAA record
> of something it wants to talk to. If hostname is a CNAME, and the thing
> it want's to talk to doesn't have an AAAA record, then the reply will be
> a CNAME with no target. You really want to be able to cache that.
>
>
> >
> > And whilst I agree that the record was cached (and that that is probably
> > technically correct) I can’t then explain why dnsmasq stopped using the
> > cache when I restarted my program - with 45+ minutes of cache left,
> > dnsmasq went back to the upstream server and got a complete answer.
> >
> > Restarting dnsmasq obviously reset the cache, and everything recovered
> > when I did that - but restarting other software shouldn’t have magically
> > reset the cache, and yet it did.
>
>
> I can't explain that. If it's reproducible, run dnsmasq with
> --log-queries set and see exactly what's going on.
>
>
> >
> > (Un)Fortunately the second/third nameservers seem to be being better
> > behaved at the moment, so we haven’t seen the incomplete response in
> > several days - kind of makes it harder to test though.
>
> Not reproducible, then. That's a pity.
>
>
> Cheers,
>
> Simon.
>
> >
> > Cheers,
> >
> > John
> >
> >
> >
> > On Fri, 29 Mar 2019 at 22:43, Simon Kelley <simon at thekelleys.org.uk
> > <mailto:simon at thekelleys.org.uk>> wrote:
> >
> > On 21/03/2019 11:01, John Robson wrote:
> > > OK,
> > >
> > > Maybe this does reveal something about the caching...
> > > Which might be expected behaviour, but I am not convinced it's
> > useful...
> > >
> > > Overnight monitoring has shown that the upstream server does
> > > occasionally send back an incomplete (but perfectly valid) CNAME
> only
> > > response. Mostly I can justify the caching behaviour based on the
> > TTLs
> > > of the second CNAME or A record (the server is authoritative for
> the
> > > first CNAME, so that's always at 3600).
> > >
> > > As a slight aside:
> > > dnsmasq sends a query at 22:57:32.599, then again (new transaction
> id)
> > > at 22:57:33.601, and at 22:57:36.601.
> > > This last query gets a response in 0.1 seconds, both the others
> > > eventually come in (incomplete) at 22:57:44.073
> > > I am assuming that dnsmasq ignored these late arrivals (either due
> > to a
> > > default timeout, or just because a better answer has been received
> -
> > > this would be comparable with behaviour when it queries multiple
> > servers
> > > to decide which is 'best').
> > > In this case we are protected by the fact that the incomplete query
> > > takes far longer than the complete one due to timeouts.
> > >
> > > Later though:
> > > At 01:12:47 we are out of TTL, so send a request, and get an
> > incomplete
> > > response... The response only contains the first CNAME, which has
> > a 3600
> > > TTL.
> > >
> > > Then dnsmasq doesn't send another query for an hour - despite the
> fact
> > > that it doesn't have a "good" answer.
> > > In this case the query it sends after an hour gets incomplete
> response
> > > again - not good.
> > > Then I lost track because the container got moved to a different
> > host -
> > > but it looks like it was returning incomplete for several hours...
> > >
> > >
> > > dnsmasq is otherwise well behaved - it is still responding to other
> > > queries just fine, despite being hammered by more than 2k
> > queries/second
> > >
> > > Two questions:
> > > - Is it correct/wanted behaviour to cache an incomplete record
> > like this?
> > > I have no issue caching the cname, but should we keep trying to
> > resolve
> > > the cname to an a record?
> > >
> > > - Why/How does a restart of the querying program change the
> caching
> > > behaviour of dnsmasq?
> > > Because even if the program is restarted after just a few minutes
> it
> > > immediately gets better data - my capture from yesterday shows that
> > > despite the fact that the TTL had 2855 seconds (of the 3600
> default)
> > > left just two minutes before the first 'new process' request comes
> in,
> > > that new request triggers an outbound query.
> > >
> > >
> > > Cheers,
> > >
> > > John
> > >
> >
> > What's you're calling an "incomplete" answer is actually a perfectly
> > good answer. Dnsmasq is entitled to infer that the target of the
> CNAME
> > doesn't exist if it's not included in the answer, and keep that
> > information in the cache for the the TTL period.
> >
> > Note that is _only_ true if the the upstream server is a recursive
> > server - as such it's expected to attempt the follow the CNAME and
> > return as much of the chain as exists. If the upstream server is an
> > authoritative server, that's not true - if the CNAME target is
> outside
> > the domain(s) that the server is authoritative for, then the target
> will
> > not be included. This is one reason why dnsmasq should only use
> > recursive servers, an it will log an error if an upstream server is
> not
> > recursive (ra flag not set). It's also the most common reason why
> people
> > see the dnsmasq behaviour you're describing.
> >
> >
> >
> > Cheers,
> >
> > Simon.
> >
> >
> > _______________________________________________
> > Dnsmasq-discuss mailing list
> > Dnsmasq-discuss at lists.thekelleys.org.uk
> > <mailto:Dnsmasq-discuss at lists.thekelleys.org.uk>
> > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> >
>
>
--
*John Robson Sr. Customer Support Engineer**, Zenoss
<https://www.zenoss.com/>*
jrobson at zenoss.com | *O:*
<https://www.zenoss.com/resources/gartner-market-guide-it-infrastructure-monitoring-tools>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20190404/9db1bd48/attachment.html>
More information about the Dnsmasq-discuss
mailing list