[Dnsmasq-discuss] [PATCH] Re: Issues with dnsmasq under NM and domain redirection: REFUSED

Simon Kelley simon at thekelleys.org.uk
Mon Nov 27 23:51:46 UTC 2023



On 31/10/2023 16:39, Petr Menšík wrote:
> I am still not sure what exactly causes this problem, but I have hit it 
> again. I am sure it happens sometimes, when I disconnect from my Lenovo 
> docking station and then connect back to it.
> 
> Interesting thing I have found is it gets unblocked by sending a simple 
> dig -4 @localhost +tcp fedoraproject.org query. TCP query seems to do 
> enumerate_interfaces(0) on every query, which fixes incorrect ifindex 
> and unblocks the dnsmasq.
> 
> I am not sure why check_servers(0); called from dbus.c does not fix this 
> reliably. It seems to me it should. It may be just delayed or run too 
> soon. I think we can afford enumerating interface on fatal error, which 
> results in REFUSED response anyway.
> 
> It runs with these parameters:
> 
> /usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts 
> --bind-interfaces --pid-file=/run/NetworkManager/dnsmasq.pid 
> --listen-address=127.0.0.1 --cache-size=400 --clear-on-reload 
> --conf-file=/dev/null --proxy-dnssec 
> --enable-dbus=org.freedesktop.NetworkManager.dnsmasq 
> --conf-dir=/etc/NetworkManager/dnsmasq.d
> 
> But it seems to me local_bind would bind interface whether 
> --bind-interfaces or --bind-dynamic is present. So I think no condition 
> should be for enumerate_interfaces(0); call in this case as well.

If that's sufficient to fix this bug, then I can't see a reason not to 
make the change. The other way to fix it is to 
s/--bind-interfaces/--bind-dynamic/  That's maybe a better fix, since 
there are platforms which can't enumerate interfaces, so the problem 
will still be there. At least if you set --bind-dynamic on such a 
platform it will warn you as it falls back to bind-interfaces behaviour.

Cheers,

Simon.

> 
> I have created for it bug #2247269 [1] for tracking this.
> 
> 1. https://bugzilla.redhat.com/show_bug.cgi?id=2247269
> 
> On 16. 10. 23 15:02, Petr Menšík wrote:
>> Hello everyone.
>>
>> Today I have returned to work, where I am running dnsmasq 2.89 on my 
>> Fedora 27 laptop. It is configured by Network Manager by its 
>> dns=dnsmasq plugin. But when I returned today, I have found our 
>> internal network refused to resolve any name. I dug into dnsmasq what 
>> it does. Problem is it did not fix itself after a while, but 
>> stubbornly failed without later fix.
>>
>> It were failing quite often on random_sock() local_bind call. The 
>> errno returned 99. I have noticed it failed to notice change of 
>> ifindex in interface it should be bound to.
>>
>> (gdb) bt
>> #0  0x00007f53305e7020 in strerror () from /lib64/libc.so.6
>> #1  0x00005557a3ec2c4b in random_sock (s=s at entry=0x5557a43fef50) at 
>> /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:2511
>> #2  0x00005557a3ec62f2 in allocate_rfd 
>> (fdlp=fdlp at entry=0x5557a43f5280, serv=serv at entry=0x5557a43fef50)
>>     at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:2607
>> #3  0x00005557a3ec72dc in forward_query (udpfd=4, 
>> udpaddr=0x7ffdb6bfbd30, dst_addr=0x7ffdb6bfbd00, dst_iface=0, 
>> header=0x5557a43e03d0, plen=51,
>>     limit=0x5557a43e0880 "", now=1697453089, forward=0x5557a43f5230, 
>> ad_reqd=1, do_bit=0, fast_retry=0)
>>     at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:498
>> #4  0x00005557a3ed0ebd in receive_query (now=1697453089, 
>> listen=0x5557a43e0cc0) at 
>> /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:1869
>> #5  check_dns_listeners (now=1697453089) at 
>> /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/dnsmasq.c:1845
>> #6  0x00005557a3eac9ef in main (argc=<optimized out>, argv=<optimized 
>> out>) at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/dnsmasq.c:1266
>>
>> (gdb) p *$d->servers->next->next->next->next->next->next
>> $8 = {flags = 800, domain_len = 14, domain = 0x5557a43f5eb0 
>> "brq.redhat.com", next = 0x5557a43ffa10, serial = 6, arrayposn = 23,
>>   last_server = -1, addr = {sa = {sa_family = 2, sa_data = 
>> "\0005\n&\005\032\226\r\2170S\177\000"}, in = {sin_family = 2, 
>> sin_port = 13568,
>>       sin_addr = {s_addr = 436545034}, sin_zero = 
>> "\226\r\2170S\177\000"}, in6 = {sin6_family = 2, sin6_port = 13568, 
>> sin6_flowinfo = 436545034,
>>       sin6_addr = {__in6_u = {__u6_addr8 = 
>> "\226\r\2170S\177\000\0000\275\001\a\220\000\000", __u6_addr16 = 
>> {3478, 12431, 32595, 0, 48432, 1793,
>>             144, 0}, __u6_addr32 = {814681494, 32595, 117554480, 
>> 144}}}, sin6_scope_id = 3446832640}}, source_addr = {sa = {sa_family = 2,
>>       sa_data = "\000\000\000\000\000\000@\274\277\266\375\177\000"}, 
>> in = {sin_family = 2, sin_port = 0, sin_addr = {s_addr = 0},
>>       sin_zero = "@\274\277\266\375\177\000"}, in6 = {sin6_family = 2, 
>> sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
>>           __u6_addr8 = 
>> "@\274\277\266\375\177\000\000@\274\277\266\375\177\000", __u6_addr16 
>> = {48192, 46783, 32765, 0, 48192, 46783, 32765, 0},
>>           __u6_addr32 = {3066018880, 32765, 3066018880, 32765}}}, 
>> sin6_scope_id = 814672583}},
>>   interface = "enp9s0u1\000\000\000\000\000\000\000\000", ifindex = 7, 
>> sfd = 0x0, tcpfd = 0, edns_pktsz = 1232, pktsz_reduced = 0, queries = 
>> 446,
>>   failed_queries = 0, nxdomain_replies = 0, retrys = 4, query_latency 
>> = 0, mma_latency = 0, forwardtime = 0, forwardcount = 0, uid = 
>> 3867576473}
>> (gdb) p *$d->servers->next->next->next->next->next->next->next
>> $9 = {flags = 800, domain_len = 10, domain = 0x5557a43ff9f0 
>> "redhat.com", next = 0x5557a43f5fb0, serial = 7, arrayposn = 25, 
>> last_server = -1,
>>   addr = {sa = {sa_family = 2, sa_data = 
>> "\0005\n&\005\032\226\r\2170S\177\000"}, in = {sin_family = 2, 
>> sin_port = 13568, sin_addr = {
>>         s_addr = 436545034}, sin_zero = "\226\r\2170S\177\000"}, in6 = 
>> {sin6_family = 2, sin6_port = 13568, sin6_flowinfo = 436545034,
>>       sin6_addr = {__in6_u = {__u6_addr8 = 
>> "\226\r\2170S\177\000\0000\275\001\a\220\000\000", __u6_addr16 = 
>> {3478, 12431, 32595, 0, 48432, 1793,
>>             144, 0}, __u6_addr32 = {814681494, 32595, 117554480, 
>> 144}}}, sin6_scope_id = 3446832640}}, source_addr = {sa = {sa_family = 2,
>>       sa_data = "\000\000\000\000\000\000@\274\277\266\375\177\000"}, 
>> in = {sin_family = 2, sin_port = 0, sin_addr = {s_addr = 0},
>>       sin_zero = "@\274\277\266\375\177\000"}, in6 = {sin6_family = 2, 
>> sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
>>           __u6_addr8 = 
>> "@\274\277\266\375\177\000\000@\274\277\266\375\177\000", __u6_addr16 
>> = {48192, 46783, 32765, 0, 48192, 46783, 32765, 0},
>>           __u6_addr32 = {3066018880, 32765, 3066018880, 32765}}}, 
>> sin6_scope_id = 814672583}},
>>   interface = "enp9s0u1\000\000\000\000\000\000\000\000", ifindex = 6, 
>> sfd = 0x0, tcpfd = 0, edns_pktsz = 1232, pktsz_reduced = 0,
>>   queries = 6480, failed_queries = 0, nxdomain_replies = 0, retrys = 
>> 134, query_latency = 34, mma_latency = 4414, forwardtime = 0,
>>   forwardcount = 0, uid = 3578949556}
>>
>> $ ip a show dev enp9s0u1
>> 7: enp9s0u1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel 
>> state UP group default qlen 1000
>>     link/ether 00:50:b6:b4:17:b2 brd ff:ff:ff:ff:ff:ff
>>     inet 10.43.2.229/24 brd 10.43.2.255 scope global dynamic 
>> noprefixroute enp9s0u1
>>        valid_lft 56729sec preferred_lft 56729sec
>>     inet6 2620:52:0:2b02:b3ba:7320:65f8:1fff/64 scope global dynamic 
>> noprefixroute
>>        valid_lft 2591999sec preferred_lft 604799sec
>>     inet6 fe80::b2f:65c5:d743:524b/64 scope link noprefixroute
>>        valid_lft forever preferred_lft forever
>>
>> The problem seems to be wrong ifindex for redhat.com domain, while for 
>> brq.redhat.com it has refreshed correctly. I am not sure how exactly 
>> did that happen, but I think I have saw that few times already. I am 
>> not sure about exact steps required to reproduce this issue, but I 
>> think it would be related to undocking from thunderbolt and 
>> reconnecting again. Has anyone else saw similar behaviour?
>>
>> It seems to me call to enumerate_interfaces(0) should have fixed this. 
>> I wonder whether it would make sense to call it explicitly after 
>> local_bind failure. Because full journal I do not have details about 
>> interface changes anymore:
>>
>> journalctl -xeu NetworkManager | grep 'failed to bind server socket to 
>> enp9s0u1' | wc -l
>> 711
>>
>> Has similar error been seen in the wild? Is there fix for it, which I 
>> have failed to find?
>>
>> Cheers,
>> Petr
>>
>>
>> _______________________________________________
>> Dnsmasq-discuss mailing list
>> Dnsmasq-discuss at lists.thekelleys.org.uk
>> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss



More information about the Dnsmasq-discuss mailing list