[Dnsmasq-discuss] Client retries broken in 2.84

Simon Kelley simon at thekelleys.org.uk
Mon Feb 22 23:20:46 UTC 2021


On 22/02/2021 23:04, Nicholas Mu wrote:
> Hi Simon,
> 
> The commit fixes all the issues we were seeing. Thanks for getting the
> fix out so quickly. 

Excellent. I just pushed my tree, which has a further small update to
this. I've been dogfooding it here and all seems well.

> 
> I had one follow up. So now it seems that for all clients retries will
> use the same SP/QID. Would it be possible to have a way/config to vary
> SP on retries or are we stuck with a single SP due to the CVE? The
> reason we'd prefer varying SPs is mostly due to flow hashing. Say
> dnsmasq is configured with a single upstream nameserver. That means any
> retries will use the same 5-tuple and retries will follow the same
> network path. If some paths in the network have an outage then we are
> stuck on that path for all retries.  In general, we find better DNS
> availability when SP varies across retries and we can traverse different
> paths on the network. Wondering if you had any thoughts on this? 
> 

Rock and hard place. The CVE aims to avoid exactly what you want,
because the more different SP/QID combinations that are valid for a
given DNS query, the easier it is for an attacker to get an answer
accepted into the cache when spraying large numbers of random SP/QID
combos at the DNS server. Using different SP/QID on retries  allows the
attacker to send lots of identical queries/retries and therefore make
his life easier.

<thinks> I guess you could assign a new SP for retries, IF you stopped
accepting answers on the old one. There are interesting fairness
problems there:

client1 asks for example.com, forwarded upstream, reply about to return,
when client2 asks for example.com, that gets forwarded upstream, the
SP/QID used with client1 gets abandoned and client1 has to await
client2s reply. In the meantime client3 ask for example.com......


Cheers,

Simon.


> Thanks,
> Nick
> 
> On Wed, Feb 17, 2021 at 4:03 PM Simon Kelley <simon at thekelleys.org.uk
> <mailto:simon at thekelleys.org.uk>> wrote:
> 
>     On 16/02/2021 00:42, Nicholas Mu wrote:
>     > Hi, 
>     >
>     > I noticed a low level increase in DNS errors after upgrading to 2.84.
>     > After doing some packet diving, it seems that retries behave
>     differently
>     > in the new version. For my testing, I'm using dnspython but I believe
>     > this issue would affect any client that uses different source
>     ports and
>     > query ids for retries. As a result, dnspython will attempt retries for
>     > up to 30 seconds and will eventually timeout as only a single
>     packet is
>     > ever sent and retries are rendered ineffective. 
>     >
>     > On 2.82, multiple packets are sent as dnspython retries. Note the
>     > retries are using different source ports and query ids:
>     >
>     > |[ec2-user at ip-172-31-44-29 src]$ grep cell-1 /tmp/dnsmasq-2.82
>     > 19:59:03.826638 IP 172.31.44.29.44547 > 172.31.0.2.53: 51880+ NS?
>     > somedomain. (64)
>     > 19:59:05.928335 IP 172.31.44.29.33363 > 172.31.0.2.53: 41382+ NS?
>     > somedomain. (64)
>     > 19:59:08.130620 IP 172.31.44.29.21177 > 172.31.0.2.53: 36073+ NS?
>     > somedomain. (64)
>     > 19:59:10.532792 IP 172.31.44.29.57223 > 172.31.0.2.53: 50309+ NS?
>     > somedomain. (64)|
>     > |
>     > |
>     > |On 2.84, only a single packet is sent:|
>     > |
>     > |
>     > |[ec2-user at ip-172-31-44-29 src]$ grep cell-1 /tmp/dnsmasq-2.84
>     > 19:53:12.189849 IP 172.31.44.29.5335 > 172.31.0.2.53: 826+ NS?
>     > somedomain. (64)|
>     > |
>     > |
>     > I also tested using dig, nslookup, and host which all use the same
>     > source port and query id on retries. The behavior works as intended on
>     > both versions. I would suspect the following commit is responsible for
>     > this behavior change:
>     >
>     >       Handle multiple identical near simultaneous DNS queries better.
>     >       Previously, such queries would all be forwarded
>     >       independently. This is, in theory, inefficent but in practise
>     >       not a problem, _except_ that is means that an answer for any
>     >       of the forwarded queries will be accepted and cached.
>     >       An attacker can send a query multiple times, and for each
>     repeat,
>     >       another {port, ID} becomes capable of accepting the answer he is
>     >       sending in the blind, to random IDs and ports. The chance of a
>     >       succesful attack is therefore multiplied by the number of
>     repeats
>     >       of the query. The new behaviour detects repeated queries and
>     >       merely stores the clients sending repeats so that when the
>     >       first query completes, the answer can be sent to all the
>     >       clients who asked. Refer: CVE-2020-25686.
>     >
>     > Is this intended? Seems to me any clients with retry behavior
>     similar to
>     > dnspython are now broken. Clients will hang until their configured
>     > timeouts are reached on any single DNS failure.
>     >
>     > Thanks,
>     >
>     > Nick
>     >
> 
> 
>     Your analysis is spot-on.
> 
>     I think it's possible to satisfy both the security and robustness
>     requirements here.
> 
>     Pre 2.84, a retry for the same query with different query-id and/or
>     source port would be treated as an independent query and forwarded
>     again, with a new source-port and query-id. This gives the attacker the
>     ability to increase the attack surface for cache pollution by sending
>     many repeat queries.
> 
>     In 2.84 a repeat with the same SP/QID gets treated as it always has: as
>     a retry, and the query gets forwarded again, and this time to all
>     available servers.  The same query but with different SP/QID now gets
>     piggy-backed onto the existing query, as you noted.
> 
>     The solution here is for a repeat query with different SP/QID to trigger
>     the same retry behaviour as a repeat query with the same SP/QID, but
>     also to still piggy back the existing query, so that no new SP/QID
>     tuples are generated going upstream.
> 
>     I just pushed
>     http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=141a26f979b4bc959d8e866a295e24f8cf456920
>     which should implement this. Please test!
> 
> 
>     Cheers,
> 
>     Simon.
> 
> 
> 
> 
>     _______________________________________________
>     Dnsmasq-discuss mailing list
>     Dnsmasq-discuss at lists.thekelleys.org.uk
>     <mailto:Dnsmasq-discuss at lists.thekelleys.org.uk>
>     http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> 




More information about the Dnsmasq-discuss mailing list