[Dnsmasq-discuss] Odd caching behaviour...

Geert Stappers stappers at stappers.nl
Wed Mar 20 22:44:59 GMT 2019


On Wed, Mar 20, 2019 at 09:00:20PM +0000, John Robson wrote:
> Hi,
> 
> I have a library which I think has a bug, but this bug is affecting DNS
> queries, and bringing out some odd behaviour in dnsmasq...
> 
> Program is making a query to resolve an address (foo.bar.com)
> A normal query results in a CNAME (foo.bar.com.edgekey.net), which results
> in another CNAME (e1234.a.akamaiedge.net) which has an A record.
> 
> However every so often dnsmasq returns just the first CNAME.
> Note I haven't yet caught it in the act of that first truncated response.
> The only thing that makes sense to me is if the edgekey.net name servers
> didn't respond in good time... but....
> 
> However the bug in the library then means it asks again, instantly.  and
> again... and again....
> It manages over 100MB/ minute of DNS requests - dnsmasq answering them all
> from the cache (I see *no* external requests for that address).

Hey, that is the idea about DNS caching ...


> When I restart the program the very first query (identical query as before)
> gets a complete answer from dnsmasq.
> 
> What I can't understand is how that restart makes any difference to dnsmasq.
> Does dnsmasq have some sort of 'Oh hell the query load is insane I'm just
> extending the cache a bit to help' mode which it then escapes from as the
> program restarts?
> There are no external queries for this name during the period of insanity,
> but the first request after does get put to the external name servers.
> 
> I'm running an 'external interface only' capture to try and capture the
> initial error condition (which I very much doubt is a problem in dnsmasq),
> to see if that can shed some light on the issue.
> 
> 
> Thoughts? debug hints? laughter?
 

To me it seems that the first DNS request from the application has
"recursion".  Upon encountering the bug is doing the app "non
recursion". With "recusion" do I mean 'When the reply is not an A-record
do a next query'.

On debug hints:  Currently is the suspected trigger of the bug
a DNS that doesn't respond within good time.  So make a "chain"
of DNServers where you control the response time of one.

Good luck with it.  And feel welcome to report back.


> Cheers,
> John

Groeten
Geert Stappers
-- 
Leven en laten leven



More information about the Dnsmasq-discuss mailing list