[Dnsmasq-discuss] Infinite loop in dnsmasq v2.86?

Simon Kelley simon at thekelleys.org.uk
Mon Jan 10 22:27:26 UTC 2022


On 10/01/2022 04:12, John Byrne via Dnsmasq-discuss wrote:
>>> 1) I guess you're using DNSSEC, if that correct?
> 
> Yes,

OK, that fits with what I've found, and may explain why I've not seen a
flood of reports about this. Relatively few people are doing DNSSEC
validation.
> 
>>> 2) How difficult is that to reproduce?
> 
> It happened twice in the last week. One interesting thing stood out:  
> all three were to the admanmedia.com <http://admanmedia.com> domain. The
> messages from one look like this:
> 
> Jan  4 11:20:03 john daemon.info <http://daemon.info> dnsmasq[1078]:
> query[A] cs.admanmedia.com <http://cs.admanmedia.com> from 192.168.1.76
> Jan  4 11:20:03 john daemon.info <http://daemon.info> dnsmasq[1078]:
> forwarded cs.admanmedia.com <http://cs.admanmedia.com> to 8.8.8.8
> Jan  4 11:20:03 john daemon.info <http://daemon.info> dnsmasq[1078]:
> dnssec-query[DS] admanmedia.com <http://admanmedia.com> to 8.8.8.8
> Jan  4 11:20:03 john daemon.info <http://daemon.info> dnsmasq[1078]:
> query[A] cs.admanmedia.com <http://cs.admanmedia.com> from 192.168.1.76
> 
> 
> Unfortunately, just doing a two digs to the domain does not reproduce
> the problem. I wondered if it was something having to do with the second
> request coming too soon and modified dig to send two requests and that
> didn't help.
> 
> 

I think you're on the right lines here.

Andreas's backtrace in the Debian bug report puts the infinite loop ate
line 337 or src/forward.c, which looks like this

        while (forward->blocking_query)
	    forward = forward->blocking_query;


Which is a nice tight infinite loop to analyse, and doesn't need many
datastructures to make it loop, just a cycle in the
forward->blocking_query linked list.

This code is at line 363 in the git head, and it's code which executes
on a repeated request, as you surmise. Specifically, it runs when the
answer to the A query has arrived, but not been returned to the client
yet because it has not been validated. The validation is awaiting the
answer to a DS or DNSKEY query to provide information need to do  the
validation. There's no point in sending the original query upstream
again, as we have the answer to that, instead we assume that the query
validation is blocked on is lost, either query or reply, and retry that.
The loop chases down the list of subsidiary queries to find the last
one, which is not blocked by another query, and therefore is awaiting an
answer, and resends that query. This only happens when the original
query is answered, gets repeated by the client, and is awaiting DNSSEC
records for validation.

So the question is, how can that linked list end up with a loop in it?
When a validation attempt returns with the requirement for more DS or
DNSKEY data, the DNSSEC code looks to see if a query for that data is
already in progress, around line 855 of forward.c and if it finds such,
piggy-backs the request onto it. The search is done by comparing the
hashes of the query section of the query packet, which are stored in the
struct frec. That's the obvious place where a loop could happen.

I've come up with three possibilities of how that could form a loop.

1) The code which calculates the hash of a query doesn't handle a
malformed packet well. It just returns without an undefined hash, which
might be the last hash is successfully computed. This can lead to
finding a query further up the existing list of dependencies, and making
a loop.

2) A hash collision could do the same thing.

3) As the DNSSEC code traverses down the list of dependencies, if it
hits loop, that will turn into a loop in this linked list.

Say the A record for cs.admanmedia.com depends on the DS record for
cs.admanmedia.com and that depends on the DNSKEY for cs.admanmedia.com
which than requires the DS record for cs.admanmedia.com. This should not
happen for properly signed records, but I'm certainly not sure that the
DNSSEC validation code will never do it given sufficiently broken input.


I have a fix for 1) and a fix for 3) is quite trivial. I'm not sure 2)
is a real problem, but I think a fix is possible for that too.


Later.....


Simon.



More information about the Dnsmasq-discuss mailing list