[Dnsmasq-discuss] DNSMASQ failing to return SRV records with loss of communication to a single DNS server

Warner, Andrew C [CTO] Andrew.C.Warner at sprint.com
Tue Aug 14 18:05:52 BST 2018


Subject: DNSMASQ failing to return SRV records with loss of communication to a single DNS server

Issue:  We have SIP SRV records for a domain which can be provided by two DNS servers in our environment.  During testing we have noticed that if one of the DNS servers is un-reachable, the request for the SRV records via dnsmasq times out.


This only happens when the query is originated from outside the box where dnsmasq is running.  IE - if we issue the SRV query from the dnsmasq server, the SRV records are returned.  If we issue the request from a client VM which is set to resolve queries against our dnsmasq host - the request times out.



Note:  some of the information below has been changed/replaced with xxx,  such as IP addresses and domain names for security reasons.



Dnsmasq.conf has the following entries - indicating to forward requests for labdomain.net to 10.xx.xx.12 and 10.xx.xx.20.

server=/labdomain.net/10.xx.xx.12

server=/labdomain.net/10.xx.xx.20



VM making SRV queries is 10.xx.xx.99





When we query for an SRV record with 10.xx.xx.5 being our DNSMASQ server, and have commented out the non-reachable DNS server: 10.xx.xx.12 - we receive a response to the SRV query.



#server=/labdomain.net/10.xx.xx.12

server=/labdomain.net/10.xx.xx.20





[labuser at f5-test ~]$ dig srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5

;; Truncated, retrying in TCP mode.



; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.62.rc1.el6_9.5 <<>> srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14584

;; flags: qr aa; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 5



;; QUESTION SECTION:

;_sip._udp.scscf.sprout.lp.labdomain.net. IN SRV



;; ANSWER SECTION:

_sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-05.labdomain.net.

_sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-01.labdomain.net.

_sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-02.labdomain.net.

_sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-03.labdomain.net.

_sip._udp.scscf.sprout.lp.labdomain.net. 15 IN SRV 10 50 5054 ovpklp-viscscf-spn-04.labdomain.net.



;; ADDITIONAL SECTION:

ovpklp-viscscf-spn-05.labdomain.net. 43200 IN A 10.xx.xx.18

ovpklp-viscscf-spn-01.labdomain.net. 43200 IN A 10.xx.xx.14

ovpklp-viscscf-spn-02.labdomain.net. 43200 IN A 10.xx.xx.15

ovpklp-viscscf-spn-03.labdomain.net. 43200 IN A 10.xx.xx.16

ovpklp-viscscf-spn-04.labdomain.net. 43200 IN A 10.xx.xx.17



;; Query time: 2 msec

;; SERVER: 10.xx.xx.5#53(10.xx.xx.5)

;; WHEN: Mon Aug 13 16:34:40 2018

;; MSG SIZE  rcvd: 528





When we query for an SRV record with 10.xx.xx.5 being our DNSMASQ server, and have both the good and non-reachable DNS server in play - we receive a timeout to the SRV query.  In this case - 10.xx.xx.20 is fully capable of responding to the SRV query.


server=/labdomain.net/10.xx.xx.12        <-- not reachable

server=/labdomain.net/10.xx.xx.20



[labuser at f5-test ~]$ dig srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5

;; Truncated, retrying in TCP mode.



; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.62.rc1.el6_9.5 <<>> srv _sip._udp.scscf.sprout.lp.labdomain.net @10.xx.xx.5

;; global options: +cmd

;; connection timed out; no servers could be reached


Dnsmasq logging shows:

Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99
Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: forwarded _sip._udp.scscf.sprout.lp.labdomain.net to 10.xx.xx.12
Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: forwarded _sip._udp.scscf.sprout.lp.labdomain.net to 10.xx.xx.20
Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5161]: nameserver 10.xx.xx.20 refused to do a recursive query
Aug 14 16:22:14 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5172]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99
Aug 14 16:22:24 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5173]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99
Aug 14 16:22:34 vsmslp-az2-dev-dnsmasq1-mgt dnsmasq[5174]: query[SRV] _sip._udp.scscf.sprout.lp.labdomain.net from 10.xx.xx.99


I could use some ideas on how to further troubleshoot this issue.




Andy Warner
Telecom Design Engineer
O: 406-752-3330 / M: 913-972-7521
andrew.c.warner at sprint.com
[cid:408000_086801428601138001 at pvmxe13g01]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20180814/133f983c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 5171 bytes
Desc: image001.png
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20180814/133f983c/attachment.png>


More information about the Dnsmasq-discuss mailing list