[Dnsmasq-discuss] Corrupted query causing FORMERR?

John Horne john.horne at plymouth.ac.uk
Thu Aug 17 17:08:28 UTC 2023


Hello,

We have for some time had reports of intermittent DNS query failures. For the
servers concerned, a client on the server causes a query to be sent (via
resolv.conf) to 127.0.0.1 which is the dnsmasq process. If the query is not in
the cache, then it is forwarded to a DNS resolver server running Unbound.

I have been running a short script which runs 'dig' every 10 seconds on a name.

The unbound servers shows entries such as:

============
Aug 16 21:08:48 unbound[1837198:1] query: 10.121.16.84
sauopprdwebsite1.blob.core.windows.net. A IN
Aug 16 21:08:48 unbound[1837198:1] reply: 10.121.16.84 - - - FORMERR - - -
============

The script/dig output shows

============
; <<>> DiG 9.11.36-RedHat-9.11.36-8.el8_8.1 <<>>
sauopprdwebsite1.blob.core.windows.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 59752
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;sauopprdwebsite1.blob.core.windows.net.        IN A

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Aug 16 21:08:48 BST 2023
;; MSG SIZE  rcvd: 67
============


In running tcpdump (at a different time) showed that the query seemed to become
corrupted. It was still seen as a DNS query, but as can be seen it contains
part of a CNAME at the end, which in itself seems to be part of a reply to the
query. That is, the CNAME 'blob.dub08' is actually part of a CNAME that is part
of the reply for the DNS name 'sauopprdwebsite1.blob.core.windows.net'.
Very odd!

The tcpdump output was:

===========
17:30:19.344158 IP (tos 0x0, ttl 64, id 53147, offset 0, flags [DF], proto UDP
(17), length 107)
10.121.16.84.38190 > 10.120.16.9.domain: [bad udp cksum 0x35b6 -> 0x96f9!]
32966+ [1au] A? sauopprdwebsite1.blob.core.windows.net. ar:
sauopprdwebsite1.blob.core.windows.net. CNAME[|domain]
0x0000: 4500 006b cf9b 4000 4011 3599 0a79 1054 E..k.. at .@.5..y.T
0x0010: 0a78 1009 952e 0035 0057 35b6 80c6 0120 .x.....5.W5.....
0x0020: 0001 0000 0000 0001 1073 6175 6f70 7072 .........sauoppr
0x0030: 6477 6562 7369 7465 3104 626c 6f62 0463 dwebsite1.blob.c
0x0040: 6f72 6507 7769 6e64 6f77 7303 6e65 7400 ore.windows.net.
0x0050: 0001 0001 c00c 0005 0001 0000 0012 002d ...............-
0x0060: 0462 6c6f 620f 6475 6230 38 .blob.dub08

17:30:29.357617 IP (tos 0x0, ttl 64, id 54580, offset 0, flags [DF], proto UDP
(17), length 107)
10.121.16.84.37068 > 10.120.16.9.domain: [bad udp cksum 0x35b6 -> 0xfd34!]
62988+ [1au] A? sauopprdwebsite1.blob.core.windows.net. ar: . OPT UDPsize=4096
(79)
0x0000: 4500 006b d534 4000 4011 3000 0a79 1054 E..k.4 at .@.0..y.T
0x0010: 0a78 1009 90cc 0035 0057 35b6 f60c 0120 .x.....5.W5.....
0x0020: 0001 0000 0000 0001 1073 6175 6f70 7072 .........sauoppr
0x0030: 6477 6562 7369 7465 3104 626c 6f62 0463 dwebsite1.blob.c
0x0040: 6f72 6507 7769 6e64 6f77 7303 6e65 7400 ore.windows.net.
0x0050: 0001 0001 0000 2910 0000 0000 0000 0c00 ......).........
0x0060: 0a00 0820 2475 1b72 25a1 0e ....$u.r%..
===========

The second tcpdump query output above correctly shows the 'OPT' record, and
resolves the query with no problems.


So, for some reason we are seeing corrupted DNS queries coming from dnsmasq to
the Unbound server. Anyone any ideas as to what could be causing this or what
could be checked?

For additional info, the client servers are typically running Rocky 8 Linux in
Azure with dnsmasq version 2.79. The Unbound server is a Rocky 9 Linux server
in Azure running Unbound version 1.16.
I have run the test script on Azure servers, a local VMware server and a local
physical server. The Azure servers show many FORMERR failures, the VMware has
only shown a few, and so far we had none from the physical server.

Things tried include, disabling the 'edns0' option in the /etc/resolv.conf
file; setting the max UDP packet size to 1232; setting the max UDP packet size
to 512; using dnsmasq version 2.89. These all failed in that we still received
FORMERR replies.
The only option so far that has worked is to disable the use of dnsmasq via
127.0.0.1, and let each server send queries direct to the Unbound server. This
has caused no FORMERR replies.



Thanks,

John.

--
John Horne | Senior Operations Analyst | Technology and Information Services
University of Plymouth | Drake Circus | Plymouth | Devon | PL4 8AA | UK
________________________________
[https://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>

This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, University of Plymouth accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. University of Plymouth does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.


More information about the Dnsmasq-discuss mailing list