[Dnsmasq-discuss] [PATCH] Allow expired RRSIGs when stale caching is enabled
Dominik Derigs
dl6er at dl6er.de
Sat May 16 07:51:09 UTC 2026
Hi Simon,
Pi-hole enables use-stale-cache by default to reduce DNS latency for
clients - they get an instant stale answer while dnsmasq refreshes in
the background.
When this feature is combined with DNSSEC validation and a caching
upstream resolver like unbound (Pi-hole + local unbound is a very
popular combination), use-stale-cache creates a problematic interaction
with RRSIG timestamp checking:
1. Unbound caches a response with its RRSIGs at time T and validates
them (RRSIGs are valid at that point)
2. Later, dnsmasq queries unbound; unbound returns the cached response
whose RRSIG has since expired
3. validate_rrset() finds sig_expiration < curtime, skips all
signatures, and returns BOGUS with EDE "signature expired" (7)
4. Every concurrent query for that domain fails until unbound refreshes
its cache
Popular resolvers like unbound enable serve-expired with no time limit
by default, so RRSIGs can be days past their expiration timestamp by the
time dnsmasq sees them. In various user submitted logs (no telemetry,
they have to opt-in sending them so we help them solving issues easier)
we often saw bursts of dozens of BOGUS results across many DNSSEC-signed
domains (slack.com, startpage.com, paypal.com, computer-bild.de, ...)
with the same EDE code 7, all clearing up after a few seconds once the
upstream cache refreshes. During the failure window the domains are
completely unreachable for clients.
The core tension is that use-stale-cache deliberately trades freshness
for availability on the data plane (is_expired() in cache.c serves stale
records past their TTL), but the RRSIG timestamp check in
validate_rrset() enforces strict freshness on the validation plane. When
a caching upstream sits between dnsmasq and the authoritative servers,
these two policies conflict - enforcing strict RRSIG freshness on top of
deliberately stale data is contradictory.
I have been able to confirm this issue myself easily with such a local
unbound setup (serve-expired with no time limit). The key is that you
have to visit a website that is DNSSEC protected but which you don't
visit usually. Then visit it once (saves it in unbound's cache), do
other things for two days, then visit it again. You will then observe
what I described here. My own observations showed RRSIGs expiration ages
range from ~25,000s (~7h) to ~40,000s (~11h). I first considered us
using our stale timeout setting in dnsmasq but reality proved that this
was insufficient as the external resolvers may give us RRSIGs which are
much older.
The attached patch resolves this intermittent error by skipping the
RRSIG expiration timestamp check when use-stale-cache is enabled
(cache_max_expiry != 0). The actual cryptographic signature check still
runs in full - only the timestamp gate is relaxed. When stale caching is
disabled (the default), behaviour is completely unchanged. Hence, this
patch should make the dnsmasq DNSSEC implementation only more
consistent, but not weaker.
Cheers,
Dominik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Allow-expired-RRSIGs-when-stale-caching-is-enabled.patch
Type: text/x-patch
Size: 2639 bytes
Desc: not available
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20260516/5a6aa13e/attachment.bin>
More information about the Dnsmasq-discuss
mailing list