[Dnsmasq-discuss] Odd caching behaviour...

John Robson jrobson at zenoss.com
Thu Apr 4 17:28:57 BST 2019


Ok, thanks - that makes sense in terms of the 'incomplete' entry being
cached.

I might set up a couple of dns servers to simulate this at some point - I'm
going to want a reproducible setup for our own testing as well...  If I can
then I'll come back with that log...

Actually, maybe I already have it, let me check...
Nope - I have that up to the dnsmasq restart, not the software restart.

Cheers,

John

On Thu, 4 Apr 2019 at 16:27, Simon Kelley <simon at thekelleys.org.uk> wrote:

> On 30/03/2019 08:41, John Robson wrote:
> > Simon,
> >
> > The upstream server is authoritative for the initial domain (being
> > inside an organisation I don’t think that’s unusual) and the incomplete
> > (but perfectly valid, I agree) response is taken as complete. The
> > upstream server does do recursion as well, but when that failed it just
> > returned what it could (seems reasonable enough).
> >
> > I’d have thought that the lack of an actual resolved A record (which is
> > what was asked for) would mark the cache entry as incomplete at best.
> > This is pure gut, not a technically based statement.
>
> A CNAME reply with no record for the target of the CNAME, from a
> recursive server, establishes that the target doesn't exist. If it were
> otherwise, there would be large numbers of legitimate answers which are
> uncachable. Consider that there are many record types and the target of
> a CNAME will not exist for most record types.
>
> As a common example, an IPv6 enabled host will query for the AAAA record
> of something it wants to talk to. If hostname is a CNAME, and the thing
> it want's to talk to doesn't have an AAAA record, then the reply will be
> a CNAME with no target. You really want to be able to cache that.
>
>
> >
> > And whilst I agree that the record was cached (and that that is probably
> > technically correct) I can’t then explain why dnsmasq stopped using the
> > cache when I restarted my program - with 45+ minutes of cache left,
> > dnsmasq went back to the upstream server and got a complete answer.
> >
> > Restarting dnsmasq obviously reset the cache, and everything recovered
> > when I did that - but restarting other software shouldn’t have magically
> > reset the cache, and yet it did.
>
>
> I can't explain that. If it's reproducible, run dnsmasq with
> --log-queries set and see exactly what's going on.
>
>
> >
> > (Un)Fortunately the second/third nameservers seem to be being better
> > behaved at the moment, so we haven’t seen the incomplete response in
> > several days - kind of makes it harder to test though.
>
> Not reproducible, then. That's a pity.
>
>
> Cheers,
>
> Simon.
>
> >
> > Cheers,
> >
> > John
> >
> >
> >
> > On Fri, 29 Mar 2019 at 22:43, Simon Kelley <simon at thekelleys.org.uk
> > <mailto:simon at thekelleys.org.uk>> wrote:
> >
> >     On 21/03/2019 11:01, John Robson wrote:
> >     > OK,
> >     >
> >     > Maybe this does reveal something about the caching...
> >     > Which might be expected behaviour, but I am not convinced it's
> >     useful...
> >     >
> >     > Overnight monitoring has shown that the upstream server does
> >     > occasionally send back an incomplete (but perfectly valid) CNAME
> only
> >     > response.  Mostly I can justify the caching behaviour based on the
> >     TTLs
> >     > of the second CNAME or A record (the server is authoritative for
> the
> >     > first CNAME, so that's always at 3600).
> >     >
> >     > As a slight aside:
> >     > dnsmasq sends a query at 22:57:32.599, then again (new transaction
> id)
> >     > at 22:57:33.601, and at 22:57:36.601.
> >     > This last query gets a response in 0.1 seconds, both the others
> >     > eventually come in (incomplete) at 22:57:44.073
> >     > I am assuming that dnsmasq ignored these late arrivals (either due
> >     to a
> >     > default timeout, or just because a better answer has been received
> -
> >     > this would be comparable with behaviour when it queries multiple
> >     servers
> >     > to decide which is 'best').
> >     > In this case we are protected by the fact that the incomplete query
> >     > takes far longer than the complete one due to timeouts.
> >     >
> >     > Later though:
> >     > At 01:12:47 we are out of TTL, so send a request, and get an
> >     incomplete
> >     > response... The response only contains the first CNAME, which has
> >     a 3600
> >     > TTL.
> >     >
> >     > Then dnsmasq doesn't send another query for an hour - despite the
> fact
> >     > that it doesn't have a "good" answer.
> >     > In this case the query it sends after an hour gets incomplete
> response
> >     > again - not good.
> >     > Then I lost track because the container got moved to a different
> >     host -
> >     > but it looks like it was returning incomplete for several hours...
> >     >
> >     >
> >     > dnsmasq is otherwise well behaved - it is still responding to other
> >     > queries just fine, despite being hammered by more than 2k
> >     queries/second
> >     >
> >     > Two questions:
> >     >  - Is it correct/wanted behaviour to cache an incomplete record
> >     like this?
> >     > I have no issue caching the cname, but should we keep trying to
> >     resolve
> >     > the cname to an a record?
> >     >
> >     >  - Why/How does a restart of the querying program change the
> caching
> >     > behaviour of dnsmasq?
> >     > Because even if the program is restarted after just a few minutes
> it
> >     > immediately gets better data - my capture from yesterday shows that
> >     > despite the fact that the TTL had 2855 seconds (of the 3600
> default)
> >     > left just two minutes before the first 'new process' request comes
> in,
> >     > that new request triggers an outbound query.
> >     >
> >     >
> >     > Cheers,
> >     >
> >     > John
> >     >
> >
> >     What's you're calling an "incomplete" answer is actually a perfectly
> >     good answer. Dnsmasq is entitled to infer that the target of the
> CNAME
> >     doesn't exist if it's not included in the answer, and keep that
> >     information in the cache for the the TTL  period.
> >
> >     Note that is _only_ true if the the upstream server is a recursive
> >     server - as such it's expected to attempt the follow the CNAME and
> >     return as much of the chain as exists. If the upstream server is an
> >     authoritative server, that's not true - if the CNAME target is
> outside
> >     the domain(s) that the server is authoritative for, then the target
> will
> >     not be included. This is one reason why dnsmasq should only use
> >     recursive servers, an it will log an error if an upstream server is
> not
> >     recursive (ra flag not set). It's also the most common reason why
> people
> >     see the dnsmasq behaviour you're describing.
> >
> >
> >
> >     Cheers,
> >
> >     Simon.
> >
> >
> >     _______________________________________________
> >     Dnsmasq-discuss mailing list
> >     Dnsmasq-discuss at lists.thekelleys.org.uk
> >     <mailto:Dnsmasq-discuss at lists.thekelleys.org.uk>
> >     http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> >
>
>

-- 

*John Robson Sr. Customer Support Engineer**, Zenoss
<https://www.zenoss.com/>*
jrobson at zenoss.com | *O:*

<https://www.zenoss.com/resources/gartner-market-guide-it-infrastructure-monitoring-tools>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20190404/9db1bd48/attachment.html>


More information about the Dnsmasq-discuss mailing list