[Dnsmasq-discuss] Dnsmasq stops caching for a while on receive of failed or retried lookup?

Mark Fermor, HolidayExtras.com mark.fermor at holidayextras.com
Tue Jun 12 12:21:43 BST 2018


Hello,

Running dnsmasq with these options:
/usr/sbin/dnsmasq -k --cache-size=50 --log-facility=- --user=nobody
--group=nobody --no-hosts --neg-ttl=60 --max-ttl=240 --max-cache-ttl=300

No local dnsmasq config file so that's literally all the config other than
defaults applied by dnsmasq

dnsmasq -v
Dnsmasq version 2.78  Copyright (c) 2000-2017 Simon Kelley
Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6
no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify

This is something running running in Kubernetes, they run as sidekick
containers to the main application container. I have multiple of the same
deployment running in the cluster, so they're all at the same versions and
receiving equal amounts of traffic via load balancing. They all talk to the
same endpoints outbound and do the same work load etc. I've sent sigusr1
signal to all of the pods individually (all pods have been running for
approx 48 hours bar pod4 which has been running less hours):

pod1
I0608 15:11:34.998127       1 nanny.go:116] dnsmasq[19]: time 1528470694
I0608 15:11:34.998169       1 nanny.go:116] dnsmasq[19]: cache size 50,
0/2267416 cache insertions re-used unexpired cache entries.
I0608 15:11:34.998175       1 nanny.go:116] dnsmasq[19]: queries forwarded
3218560, queries answered locally 3182486
I0608 15:11:34.998180       1 nanny.go:116] dnsmasq[19]: queries for
authoritative zones 0
I0608 15:11:34.998184       1 nanny.go:116] dnsmasq[19]: server
10.227.240.10#53: queries sent 3218560, retried or failed 16

pod2
I0608 15:11:35.909168       1 nanny.go:116] dnsmasq[18]: time 1528470695
I0608 15:11:35.909206       1 nanny.go:116] dnsmasq[18]: cache size 50,
0/197465 cache insertions re-used unexpired cache entries.
I0608 15:11:35.909211       1 nanny.go:116] dnsmasq[18]: queries forwarded
240843, queries answered locally 6159789
I0608 15:11:35.909216       1 nanny.go:116] dnsmasq[18]: queries for
authoritative zones 0
I0608 15:11:35.909219       1 nanny.go:116] dnsmasq[18]: server
10.227.240.10#53: queries sent 240843, retried or failed 4

pod3
I0608 15:11:36.948015       1 nanny.go:116] dnsmasq[20]: time 1528470696
I0608 15:11:36.948083       1 nanny.go:116] dnsmasq[20]: cache size 50,
0/63648 cache insertions re-used unexpired cache entries.
I0608 15:11:36.948138       1 nanny.go:116] dnsmasq[20]: queries forwarded
46004, queries answered locally 6347223
I0608 15:11:36.948188       1 nanny.go:116] dnsmasq[20]: queries for
authoritative zones 0
I0608 15:11:36.948219       1 nanny.go:116] dnsmasq[20]: server
10.227.240.10#53: queries sent 46004, retried or failed 1

pod4
I0608 15:11:38.032330       1 nanny.go:116] dnsmasq[24]: time 1528470698
I0608 15:11:38.032374       1 nanny.go:116] dnsmasq[24]: cache size 50,
0/1359727 cache insertions re-used unexpired cache entries.
I0608 15:11:38.032382       1 nanny.go:116] dnsmasq[24]: queries forwarded
1939395, queries answered locally 742411
I0608 15:11:38.032388       1 nanny.go:116] dnsmasq[24]: queries for
authoritative zones 0
I0608 15:11:38.032394       1 nanny.go:116] dnsmasq[24]: server
10.227.240.10#53: queries sent 1939395, retried or failed 7


The problem I notice, is some pods (pod1, pod2, pod4) are forwarding far
more requests than the other pods (an indication of what other is, would be
pod3). I'm not sure what's caused this seeing as the application is doing
the same across all pods. The only thing I notice here, is that pods 1/2/4
all have a number of "retried or failed", which the other pods don't.
Therefore I wonder if the reason that those pods have sent so many more
requests upstream instead of hitting the cache, is because of something to
do with a "retried or failed", which then stops the cache from working for
a decent period of time? I've not been able to find anything (google)
relating to this scenario but it's the only thing that makes sense to me
right now. I can accept a couple of failures for lookup here and there, but
one failure (if i'm onto something that is), seems to then cause no cache
hits for a large period of time?

Thanks and Best,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20180612/f3f636ac/attachment.html>


More information about the Dnsmasq-discuss mailing list