<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;
mso-ligatures:standardcontextual;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:241716089;
mso-list-type:hybrid;
mso-list-template-ids:-969404942 -1591601368 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-start-at:10;
mso-level-number-format:bullet;
mso-level-text:\F06E ;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;
mso-fareast-font-family:"Times New Roman";
mso-bidi-font-family:Aptos;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:\F0A7 ;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:\F0A7 ;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:\F0A7 ;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style>
</head>
<body lang="EN-US" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hello everyone,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I have found an issue in dnsmasq v2.90 that is causing problems in our Openstack environments. When our Neutron agents rewrite the configs and send a SIGHUP to trigger a reload, dnsmasq will (usually) crash
with a SIGABRT signal. This only seems to happen in our busiest Openstack regions where VMs are coming and going constantly, causing dnsmasq to reload many times per minute. In other regions where there are no new VMs being created, the reloads work fine
with no crashes.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I investigated in a very busy region where I see dozens of crashes per minute. It is only using dnsmasq for DHCP, it is not receiving DNS queries. This is a production environment, but I rebuilt dnsmasq
with debug symbols and managed to capture this with gdb when it crashes. I tried it a few times and the crash always has the same stack trace.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">################################################################################<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Reading symbols from /usr/lib/debug/usr/sbin/dnsmasq-2.90-1.el9.x86_64.debug...<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Attaching to program: /usr/lib/debug/usr/sbin/dnsmasq-2.90-1.el9.x86_64.debug, process 3075598<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><snip loading messages><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[Thread debugging using libthread_db enabled]<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Using host libthread_db library "/lib64/libthread_db.so.1".<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">0x00007ff1c8c62ac7 in poll () from target:/lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">(gdb) c<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Continuing.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Program received signal SIGHUP, Hangup.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">0x00007ff1c8c62ac7 in poll () from target:/lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">(gdb) c<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Continuing.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Program received signal SIGABRT, Aborted.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">0x00007ff1c8beca6c in __pthread_kill_implementation () from target:/lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">(gdb) where<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#0 0x00007ff1c8beca6c in __pthread_kill_implementation () from target:/lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#1 0x00007ff1c8b9f686 in raise () from target:/lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#2 0x00007ff1c8b89833 in abort () from target:/lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#3 0x00007ff1c8b8a170 in __libc_message.cold () from target:/lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#4 0x00007ff1c8bf6b17 in malloc_printerr () from target:/lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#5 0x00007ff1c8bf8800 in _int_free () from target:/lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#6 0x00007ff1c8bfae55 in free () from target:/lib64/libc.so.6<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#7 0x000055f6521e0c18 in dhcp_netid_free (nid=0x7ff1c8bfae55 <free+85>) at /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:1333<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#8 dhcp_netid_list_free (netid=0x0) at /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:1363<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#9 dhcp_config_free (config=0x55f652b51a60) at /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:1381<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#10 0x000055f652b51930 in ?? ()<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#11 0x000055f6529eb1f8 in ?? ()<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#12 0x0000000000000fa4 in ?? ()<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#13 0x000055f6529eaf60 in ?? ()<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#14 0x000055f6529eaf60 in ?? ()<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#15 0x000055f6521f5259 in clear_dynamic_conf () at /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:5777<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#16 reread_dhcp () at /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:5818<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#17 clear_cache_and_reload (now=94516438056960) at /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/dnsmasq.c:1742<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#18 0x4141414141414141 in ?? ()<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#19 0x0000000067ae1dbd in ?? ()<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">#20 0x0000000000000000 in ?? ()<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">(gdb)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">################################################################################<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">The dnsmasq command line looks like this (lightly redacted):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">dnsmasq --no-hosts --no-resolv \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --pid-file=/var/lib/neutron/dhcp/xxx/pid \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --dhcp-hostsfile=/var/lib/neutron/dhcp/xxx/host \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --addn-hosts=/var/lib/neutron/dhcp/xxx/addn_hosts \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --dhcp-optsfile=/var/lib/neutron/dhcp/xxx/opts \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --dhcp-leasefile=/var/lib/neutron/dhcp/xxx/leases \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --dhcp-match=set:ipxe,175 \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --dhcp-userclass=set:ipxe6,iPXE \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --local-service \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --bind-dynamic \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --dhcp-range=set:subnet-yyy,10.1.1.0,static,255.255.248.0,86400s \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --dhcp-range=set:subnet-zzz,10.2.1.0,static,255.255.252.0,86400s \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --dhcp-option-force=option:mtu,1500 \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --dhcp-lease-max=3072 \<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> --conf-file=/etc/neutron/dnsmasq-neutron.conf<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">The /etc/neutron/dnsmasq-neutron.conf file only sets these options (lightly redacted):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">dhcp-boot=smsboot\pxelinux.com,boothost,10.0.1.2<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">dhcp-option=option:ntp-server,10.0.0.1,10.0.1.1,10.0.2.1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">The /var/lib/neutron/dhcp/xxx/host file contains between 800-3000 entries, depending on the time of day. They each look something like this (lightly redacted):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">fa:16:3e:3b:ad:b9,set:16a8f84b90f640f7a2c9a133d844985e,host-10-1-2-3,10.1.2.3<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">The /var/lib/neutron/dhcp/xxx/addn_hosts file contains between 800-3000 entries, depending on the time of day. They each look something like this (lightly redacted):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">10.1.9.9 np0006812233.subdomain.subdomain.mycorp.com. np0006812233<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">The /var/lib/neutron/dhcp/xxx/opts file contains about 190 entries. The top of the file looks like this, the rest of the entries are just like the last two lines, defining more domain-name and domain-search
values for additional subdomains (lightly redacted):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">tag:subnet-xxx,option:dns-server,10.0.0.10,10.0.0.11<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">tag:subnet-xxx,option:classless-static-route,10.1.1.0/22,0.0.0.0,169.254.169.254/32,10.2.1.30,0.0.0.0/0,10.2.1.1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">tag:subnet-xxx,249,10.1.1.0/22,0.0.0.0,169.254.169.254/32,10.2.1.30,0.0.0.0/0,10.2.1.1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">tag:subnet-xxx,option:router,10.2.1.1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">tag:subnet-yyy,option:dns-server,0.0.10,10.0.0.11<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">tag:subnet-yyy,option:classless-static-route,10.2.1.0/21,0.0.0.0,169.254.169.254/32,10.1.1.30,0.0.0.0/0,10.1.1.1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">tag:subnet-yyy,249,10.2.1.0/21,0.0.0.0,169.254.169.254/32,10.1.1.30,0.0.0.0/0,10.1.1.1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">tag:subnet-yyy,option:router,10.1.1.1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">tag:16a8f84b90f640f7a2c9a133d844985e,option:domain-name,subdomain.subdomain.mycorp.com<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">tag:16a8f84b90f640f7a2c9a133d844985e,option:domain-search,subdomain.subdomain. mycorp.com,subdomain. mycorp.com, mycorp.com<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">The /var/lib/neutron/xxx/leases file contains between 800-3000 entries, depending on the time of day. They each look something like this (lightly redacted):<o:p></o:p></span></p>
<div>
<div>
<p class="MsoNormal"><span style="mso-ligatures:none">1739552375 </span><span style="font-size:11.0pt">fa:16:3e:3b:ad:b9
</span><span style="mso-ligatures:none">10.1.2.3 </span><span style="font-size:11.0pt">np0006812233
</span><span style="mso-ligatures:none">*<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-ligatures:none"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-ligatures:none">What can I do to help troubleshoot this? I know C but I’m not familiar with the dnsmasq code. Thanks in advance!<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-ligatures:none"> <o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-ligatures:none">-- Sam Clippinger <o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>