[Dnsmasq-discuss] DNSMASQ failing to return SRV records with loss of communication to a single DNS server
Warner, Andrew C [CTO]
Andrew.C.Warner at sprint.com
Tue Aug 14 18:05:52 BST 2018
Subject: DNSMASQ failing to return SRV records with loss of communication to a single DNS server
Issue: We have SIP SRV records for a domain which can be provided by two DNS servers in our environment. During testing we have noticed that if one of the DNS servers is un-reachable, the request for the SRV records via dnsmasq times out.
This only happens when the query is originated from outside the box where dnsmasq is running. IE - if we issue the SRV query from the dnsmasq server, the SRV records are returned. If we issue the request from a client VM which is set to resolve queries against our dnsmasq host - the request times out.
Note: some of the information below has been changed/replaced with xxx, such as IP addresses and domain names for security reasons.
Dnsmasq.conf has the following entries - indicating to forward requests for labdomain.net to 10.xx.xx.12 and 10.xx.xx.20.
server=/labdomain.net/10.xx.xx.12
server=/labdomain.net/10.xx.xx.20
VM making SRV queries is 10.xx.xx.99
When we query for an SRV record with 10.xx.xx.5 being our DNSMASQ server, and have commented out the non-reachable DNS server: 10.xx.xx.12 - we receive a response to the SRV query.
#server=/labdomain.net/10.xx.xx.12
server=/labdomain.net/10.xx.xx.20
[labuser at f5-test ~]$ dig srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5
;; Truncated, retrying in TCP mode.
; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.62.rc1.el6_9.5 <<>> srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14584
;; flags: qr aa; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 5
;; QUESTION SECTION:
;_sip._udp.scscf.sprout.lp.labdomain.net. IN SRV
;; ANSWER SECTION:
_sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-05.labdomain.net.
_sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-01.labdomain.net.
_sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-02.labdomain.net.
_sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-03.labdomain.net.
_sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-04.labdomain.net.
;; ADDITIONAL SECTION:
ovpklp-viscscf-spn-05.labdomain.net. 43200 IN A 10.xx.xx.18
ovpklp-viscscf-spn-01.labdomain.net. 43200 IN A 10.xx.xx.14
ovpklp-viscscf-spn-02.labdomain.net. 43200 IN A 10.xx.xx.15
ovpklp-viscscf-spn-03.labdomain.net. 43200 IN A 10.xx.xx.16
ovpklp-viscscf-spn-04.labdomain.net. 43200 IN A 10.xx.xx.17
;; Query time: 2 msec
;; SERVER: 10.xx.xx.5#53(10.xx.xx.5)
;; WHEN: Mon Aug 13 16:34:40 2018
;; MSG SIZE rcvd: 528
When we query for an SRV record with 10.xx.xx.5 being our DNSMASQ server, and have both the good and non-reachable DNS server in play - we receive a timeout to the SRV query. In this case - 10.xx.xx.20 is fully capable of responding to the SRV query.
server=/labdomain.net/10.xx.xx.12 <-- not reachable
server=/labdomain.net/10.xx.xx.20
[labuser at f5-test ~]$ dig srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5
;; Truncated, retrying in TCP mode.
; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.62.rc1.el6_9.5 <<>> srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5
;; global options: +cmd
;; connection timed out; no servers could be reached
Dnsmasq logging shows:
Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99
Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: forwarded _sip._udp.scscf.sprout.lp.labdomain.net to 10.xx.xx.12
Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: forwarded _sip._udp.scscf.sprout.lp.labdomain.net to 10.xx.xx.20
Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: nameserver 10.xx.xx.20 refused to do a recursive query
Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5172]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99
Aug 14 16:22:24 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5173]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99
Aug 14 16:22:34 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5174]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99
I could use some ideas on how to further troubleshoot this issue.
Andy Warner
Telecom Design Engineer
O: 406-752-3330 / M: 913-972-7521
andrew.c.warner at sprint.com
[cid:408000_086801428601138001 at pvmxe13g01]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20180814/133f983c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 5171 bytes
Desc: image001.png
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20180814/133f983c/attachment.png>
More information about the Dnsmasq-discuss
mailing list