[Dnsmasq-discuss] [PATCH] Re: Issues with dnsmasq under NM and domain redirection: REFUSED

Petr Menšík pemensik at redhat.com
Tue Oct 31 16:39:29 UTC 2023


I am still not sure what exactly causes this problem, but I have hit it 
again. I am sure it happens sometimes, when I disconnect from my Lenovo 
docking station and then connect back to it.

Interesting thing I have found is it gets unblocked by sending a simple 
dig -4 @localhost +tcp fedoraproject.org query. TCP query seems to do 
enumerate_interfaces(0) on every query, which fixes incorrect ifindex 
and unblocks the dnsmasq.

I am not sure why check_servers(0); called from dbus.c does not fix this 
reliably. It seems to me it should. It may be just delayed or run too 
soon. I think we can afford enumerating interface on fatal error, which 
results in REFUSED response anyway.

It runs with these parameters:

/usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts 
--bind-interfaces --pid-file=/run/NetworkManager/dnsmasq.pid 
--listen-address=127.0.0.1 --cache-size=400 --clear-on-reload 
--conf-file=/dev/null --proxy-dnssec 
--enable-dbus=org.freedesktop.NetworkManager.dnsmasq 
--conf-dir=/etc/NetworkManager/dnsmasq.d

But it seems to me local_bind would bind interface whether 
--bind-interfaces or --bind-dynamic is present. So I think no condition 
should be for enumerate_interfaces(0); call in this case as well.

I have created for it bug #2247269 [1] for tracking this.

1. https://bugzilla.redhat.com/show_bug.cgi?id=2247269

On 16. 10. 23 15:02, Petr Menšík wrote:
> Hello everyone.
>
> Today I have returned to work, where I am running dnsmasq 2.89 on my 
> Fedora 27 laptop. It is configured by Network Manager by its 
> dns=dnsmasq plugin. But when I returned today, I have found our 
> internal network refused to resolve any name. I dug into dnsmasq what 
> it does. Problem is it did not fix itself after a while, but 
> stubbornly failed without later fix.
>
> It were failing quite often on random_sock() local_bind call. The 
> errno returned 99. I have noticed it failed to notice change of 
> ifindex in interface it should be bound to.
>
> (gdb) bt
> #0  0x00007f53305e7020 in strerror () from /lib64/libc.so.6
> #1  0x00005557a3ec2c4b in random_sock (s=s at entry=0x5557a43fef50) at 
> /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:2511
> #2  0x00005557a3ec62f2 in allocate_rfd 
> (fdlp=fdlp at entry=0x5557a43f5280, serv=serv at entry=0x5557a43fef50)
>     at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:2607
> #3  0x00005557a3ec72dc in forward_query (udpfd=4, 
> udpaddr=0x7ffdb6bfbd30, dst_addr=0x7ffdb6bfbd00, dst_iface=0, 
> header=0x5557a43e03d0, plen=51,
>     limit=0x5557a43e0880 "", now=1697453089, forward=0x5557a43f5230, 
> ad_reqd=1, do_bit=0, fast_retry=0)
>     at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:498
> #4  0x00005557a3ed0ebd in receive_query (now=1697453089, 
> listen=0x5557a43e0cc0) at 
> /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:1869
> #5  check_dns_listeners (now=1697453089) at 
> /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/dnsmasq.c:1845
> #6  0x00005557a3eac9ef in main (argc=<optimized out>, argv=<optimized 
> out>) at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/dnsmasq.c:1266
>
> (gdb) p *$d->servers->next->next->next->next->next->next
> $8 = {flags = 800, domain_len = 14, domain = 0x5557a43f5eb0 
> "brq.redhat.com", next = 0x5557a43ffa10, serial = 6, arrayposn = 23,
>   last_server = -1, addr = {sa = {sa_family = 2, sa_data = 
> "\0005\n&\005\032\226\r\2170S\177\000"}, in = {sin_family = 2, 
> sin_port = 13568,
>       sin_addr = {s_addr = 436545034}, sin_zero = 
> "\226\r\2170S\177\000"}, in6 = {sin6_family = 2, sin6_port = 13568, 
> sin6_flowinfo = 436545034,
>       sin6_addr = {__in6_u = {__u6_addr8 = 
> "\226\r\2170S\177\000\0000\275\001\a\220\000\000", __u6_addr16 = 
> {3478, 12431, 32595, 0, 48432, 1793,
>             144, 0}, __u6_addr32 = {814681494, 32595, 117554480, 
> 144}}}, sin6_scope_id = 3446832640}}, source_addr = {sa = {sa_family = 2,
>       sa_data = "\000\000\000\000\000\000@\274\277\266\375\177\000"}, 
> in = {sin_family = 2, sin_port = 0, sin_addr = {s_addr = 0},
>       sin_zero = "@\274\277\266\375\177\000"}, in6 = {sin6_family = 2, 
> sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
>           __u6_addr8 = 
> "@\274\277\266\375\177\000\000@\274\277\266\375\177\000", __u6_addr16 
> = {48192, 46783, 32765, 0, 48192, 46783, 32765, 0},
>           __u6_addr32 = {3066018880, 32765, 3066018880, 32765}}}, 
> sin6_scope_id = 814672583}},
>   interface = "enp9s0u1\000\000\000\000\000\000\000\000", ifindex = 7, 
> sfd = 0x0, tcpfd = 0, edns_pktsz = 1232, pktsz_reduced = 0, queries = 
> 446,
>   failed_queries = 0, nxdomain_replies = 0, retrys = 4, query_latency 
> = 0, mma_latency = 0, forwardtime = 0, forwardcount = 0, uid = 
> 3867576473}
> (gdb) p *$d->servers->next->next->next->next->next->next->next
> $9 = {flags = 800, domain_len = 10, domain = 0x5557a43ff9f0 
> "redhat.com", next = 0x5557a43f5fb0, serial = 7, arrayposn = 25, 
> last_server = -1,
>   addr = {sa = {sa_family = 2, sa_data = 
> "\0005\n&\005\032\226\r\2170S\177\000"}, in = {sin_family = 2, 
> sin_port = 13568, sin_addr = {
>         s_addr = 436545034}, sin_zero = "\226\r\2170S\177\000"}, in6 = 
> {sin6_family = 2, sin6_port = 13568, sin6_flowinfo = 436545034,
>       sin6_addr = {__in6_u = {__u6_addr8 = 
> "\226\r\2170S\177\000\0000\275\001\a\220\000\000", __u6_addr16 = 
> {3478, 12431, 32595, 0, 48432, 1793,
>             144, 0}, __u6_addr32 = {814681494, 32595, 117554480, 
> 144}}}, sin6_scope_id = 3446832640}}, source_addr = {sa = {sa_family = 2,
>       sa_data = "\000\000\000\000\000\000@\274\277\266\375\177\000"}, 
> in = {sin_family = 2, sin_port = 0, sin_addr = {s_addr = 0},
>       sin_zero = "@\274\277\266\375\177\000"}, in6 = {sin6_family = 2, 
> sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
>           __u6_addr8 = 
> "@\274\277\266\375\177\000\000@\274\277\266\375\177\000", __u6_addr16 
> = {48192, 46783, 32765, 0, 48192, 46783, 32765, 0},
>           __u6_addr32 = {3066018880, 32765, 3066018880, 32765}}}, 
> sin6_scope_id = 814672583}},
>   interface = "enp9s0u1\000\000\000\000\000\000\000\000", ifindex = 6, 
> sfd = 0x0, tcpfd = 0, edns_pktsz = 1232, pktsz_reduced = 0,
>   queries = 6480, failed_queries = 0, nxdomain_replies = 0, retrys = 
> 134, query_latency = 34, mma_latency = 4414, forwardtime = 0,
>   forwardcount = 0, uid = 3578949556}
>
> $ ip a show dev enp9s0u1
> 7: enp9s0u1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel 
> state UP group default qlen 1000
>     link/ether 00:50:b6:b4:17:b2 brd ff:ff:ff:ff:ff:ff
>     inet 10.43.2.229/24 brd 10.43.2.255 scope global dynamic 
> noprefixroute enp9s0u1
>        valid_lft 56729sec preferred_lft 56729sec
>     inet6 2620:52:0:2b02:b3ba:7320:65f8:1fff/64 scope global dynamic 
> noprefixroute
>        valid_lft 2591999sec preferred_lft 604799sec
>     inet6 fe80::b2f:65c5:d743:524b/64 scope link noprefixroute
>        valid_lft forever preferred_lft forever
>
> The problem seems to be wrong ifindex for redhat.com domain, while for 
> brq.redhat.com it has refreshed correctly. I am not sure how exactly 
> did that happen, but I think I have saw that few times already. I am 
> not sure about exact steps required to reproduce this issue, but I 
> think it would be related to undocking from thunderbolt and 
> reconnecting again. Has anyone else saw similar behaviour?
>
> It seems to me call to enumerate_interfaces(0) should have fixed this. 
> I wonder whether it would make sense to call it explicitly after 
> local_bind failure. Because full journal I do not have details about 
> interface changes anymore:
>
> journalctl -xeu NetworkManager | grep 'failed to bind server socket to 
> enp9s0u1' | wc -l
> 711
>
> Has similar error been seen in the wild? Is there fix for it, which I 
> have failed to find?
>
> Cheers,
> Petr
>
-- 
Petr Menšík
Software Engineer, RHEL
Red Hat, http://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Force-interface-enumeration-after-local_bind-failure.patch
Type: text/x-patch
Size: 1552 bytes
Desc: not available
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20231031/f0dad130/attachment.bin>


More information about the Dnsmasq-discuss mailing list