[Dnsmasq-discuss] Corrupted query causing FORMERR?
John Horne
john.horne at plymouth.ac.uk
Sun Aug 20 23:54:48 UTC 2023
On Sun, 2023-08-20 at 23:11 +0100, Simon Kelley wrote:
>
> On 17/08/2023 18:08, John Horne wrote:
> >
> > We have for some time had reports of intermittent DNS query failures. For
> > the servers concerned, a client on the server causes a query to be sent
> > (via resolv.conf) to 127.0.0.1 which is the dnsmasq process. If the query
> > is not in the cache, then it is forwarded to a DNS resolver server running
> > Unbound.
> >
> > I have been running a short script which runs 'dig' every 10 seconds on a
> > name.
> >
> > The unbound servers shows entries such as:
> >
> > ============
> > Aug 16 21:08:48 unbound[1837198:1] query: 10.121.16.84
> > sauopprdwebsite1.blob.core.windows.net. A IN
> > Aug 16 21:08:48 unbound[1837198:1] reply: 10.121.16.84 - - - FORMERR - - -
> > ============
> >
> > The script/dig output shows
> >
> > ============
> > ; <<>> DiG 9.11.36-RedHat-9.11.36-8.el8_8.1 <<>>
> > sauopprdwebsite1.blob.core.windows.net
> > ;; global options: +cmd
> > ;; Got answer:
> > ;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 59752
> > ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
> >
> > ;; OPT PSEUDOSECTION:
> > ; EDNS: version: 0, flags:; udp: 1232
> > ;; QUESTION SECTION:
> > ;sauopprdwebsite1.blob.core.windows.net. IN A
> >
> > ;; Query time: 1 msec
> > ;; SERVER: 127.0.0.1#53(127.0.0.1)
> > ;; WHEN: Wed Aug 16 21:08:48 BST 2023
> > ;; MSG SIZE rcvd: 67
> > ============
> >
> >
> > In running tcpdump (at a different time) showed that the query seemed to
> > become corrupted. It was still seen as a DNS query, but as can be seen it
> > contains part of a CNAME at the end, which in itself seems to be part of a
> > reply to the query. That is, the CNAME 'blob.dub08' is actually part of a
> > CNAME that is part of the reply for the DNS name
> > 'sauopprdwebsite1.blob.core.windows.net'.
> > Very odd!
> >
> > The tcpdump output was:
> >
> > ===========
> > 17:30:19.344158 IP (tos 0x0, ttl 64, id 53147, offset 0, flags [DF], proto
> > UDP
> > (17), length 107)
> > 10.121.16.84.38190 > 10.120.16.9.domain: [bad udp cksum 0x35b6 -> 0x96f9!]
> > 32966+ [1au] A? sauopprdwebsite1.blob.core.windows.net. ar:
> > sauopprdwebsite1.blob.core.windows.net. CNAME[|domain]
> > 0x0000: 4500 006b cf9b 4000 4011 3599 0a79 1054 E..k.. at .@.5..y.T
> > 0x0010: 0a78 1009 952e 0035 0057 35b6 80c6 0120 .x.....5.W5.....
> > 0x0020: 0001 0000 0000 0001 1073 6175 6f70 7072 .........sauoppr
> > 0x0030: 6477 6562 7369 7465 3104 626c 6f62 0463 dwebsite1.blob.c
> > 0x0040: 6f72 6507 7769 6e64 6f77 7303 6e65 7400 ore.windows.net.
> > 0x0050: 0001 0001 c00c 0005 0001 0000 0012 002d ...............-
> > 0x0060: 0462 6c6f 620f 6475 6230 38 .blob.dub08
> >
> > 17:30:29.357617 IP (tos 0x0, ttl 64, id 54580, offset 0, flags [DF], proto
> > UDP
> > (17), length 107)
> > 10.121.16.84.37068 > 10.120.16.9.domain: [bad udp cksum 0x35b6 -> 0xfd34!]
> > 62988+ [1au] A? sauopprdwebsite1.blob.core.windows.net. ar: . OPT
> > UDPsize=4096
> > (79)
> > 0x0000: 4500 006b d534 4000 4011 3000 0a79 1054 E..k.4 at .@.0..y.T
> > 0x0010: 0a78 1009 90cc 0035 0057 35b6 f60c 0120 .x.....5.W5.....
> > 0x0020: 0001 0000 0000 0001 1073 6175 6f70 7072 .........sauoppr
> > 0x0030: 6477 6562 7369 7465 3104 626c 6f62 0463 dwebsite1.blob.c
> > 0x0040: 6f72 6507 7769 6e64 6f77 7303 6e65 7400 ore.windows.net.
> > 0x0050: 0001 0001 0000 2910 0000 0000 0000 0c00 ......).........
> > 0x0060: 0a00 0820 2475 1b72 25a1 0e ....$u.r%..
> > ===========
> >
> > The second tcpdump query output above correctly shows the 'OPT' record, and
> > resolves the query with no problems.
> >
> >
> > So, for some reason we are seeing corrupted DNS queries coming from dnsmasq
> > to the Unbound server. Anyone any ideas as to what could be causing this or
> > what could be checked?
> >
> > For additional info, the client servers are typically running Rocky 8 Linux
> > in Azure with dnsmasq version 2.79. The Unbound server is a Rocky 9 Linux
> > server in Azure running Unbound version 1.16.
> > I have run the test script on Azure servers, a local VMware server and a
> > local physical server. The Azure servers show many FORMERR failures, the
> > VMware has only shown a few, and so far we had none from the physical
> > server.
> >
> > Things tried include, disabling the 'edns0' option in the /etc/resolv.conf
> > file; setting the max UDP packet size to 1232; setting the max UDP packet
> > size to 512; using dnsmasq version 2.89. These all failed in that we still
> > received FORMERR replies.
> > The only option so far that has worked is to disable the use of dnsmasq via
> > 127.0.0.1, and let each server send queries direct to the Unbound server.
> > This has caused no FORMERR replies.
> >
> >
>
> To recap:
>
> Dnsmasq is somehow mangling a query before sending it to the upstream
> server.
>
So it seems.
> This behaviour is not consistent: it doesn't always happen.
>
Correct.
> I'm not clear if this is just for the particular query in the example,
> or happens for others.
>
It happens for others too. For example, www.prospects.ac.uk,
we-jobruntimedata-prod-su1.azure-automation.net, collector.newrelic.com.
> It seems to be associated with virtual servers.
>
So it seems.
>
> What would be useful is to get packet dumps of the relevant packets: the
> query going into dnsmasq, the query going upstream from dnsmasq to
> unbound, the reply from unbound and the reply from dnsmasq to the
> original requestor. Possibly the easiest way to do that would be use the
> packet-dump facility built in to dnsmasq
>
> --dumpfile=<path/to/file>
> --dumpmask=0x000F
>
> and send me the resulting file.
>
I'll see about getting this for you.
John.
--
John Horne | Senior Operations Analyst | Technology and Information Services
University of Plymouth | Drake Circus | Plymouth | Devon | PL4 8AA | UK
________________________________
[https://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>
This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, University of Plymouth accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. University of Plymouth does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.
More information about the Dnsmasq-discuss
mailing list