[Dnsmasq-discuss] [PATCH] Retry queries only after giving the upstream server some time to respond

Simon Kelley simon at thekelleys.org.uk
Tue Apr 6 22:18:54 UTC 2021

On 06/04/2021 19:49, Dominik Derigs wrote:
> Hey Simon,
> your patch surely makes sense.
> On Mon, 2021-04-05 at 21:38 +0100, Simon Kelley wrote:
>> Except that this all started because some clients don't retry from the
>> same ID/source port and treating them as a new query that can be
>> answered when the existing query for the same name completes fails
>> because that means dnsmasq never sees retries from this type of client,
>> and it relies on those retries to work in the face of packet loss.
>> https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q1/014697.html
> I see. The "misbehaving" clients out there (a) worked pre-2.83 and (b)
> we cannot rely on them being "fixed".
> I'm intentionally putting the keywords in quotes because of:
> On Mon, 2021-04-05 at 21:38 +0100, Simon Kelley wrote:
>> What's a "real" retry. I'm not sure there's an RFC that says it has to
>> be from the same source port and query-ID, [...]
> Too bad, I figured we could keep up with saving some bandwidth but I
> can perfectly live with the fact that you consider the timeout I
> suggested as a too-risky feature as in it could break user's systems in
> ways which are difficult to define.
> Unless we could get a retry feature baked into dnsmasq sometime in the
> future.

Hmm. The result of small-hours musing on this is the realisation that.

1) Most queries take less than a second to get an answer.
2) No client retries in less than a second.

So, if we see two queries for the same name within a second or so, they
can't be a query and  a retry, but must be independent. Two queries for
the same name separated by more than a second are likely to be a retry
since in the first query had succeeded, the name would be in the cache
and the second query would never be forwarded.

That's not well explained, but the conclusion is that it's safe to
combine queries for the same name if they are only a second or two
apart, and combining queries a second or two apart wins the reduction in
upstream traffic, since repeated queries further apart than that are
suppressed by the cache, unless a query or answer is lost upstream, then
limiting the combining to a second or two will be as effective as doing
it for longer periods.

Unfortunately, the time resolution of the standard dnsamsq code is one
second, so the choice of delay is either 0-1 seconds, or 1-2 seconds. I
think the later is fine.

I think we should try something like your patch but remove the
configurablilty, and limit the time to 1-2 seconds.


More information about the Dnsmasq-discuss mailing list