[Dnsmasq-discuss] crash on double free
Ferenc Wagner
wferi at niif.hu
Mon Sep 20 18:30:53 BST 2010
Simon Kelley <simon at thekelleys.org.uk> writes:
> Ferenc Wagner wrote:
>
>> Simon Kelley <simon at thekelleys.org.uk> writes:
>>
>>> On 15/09/10 12:07, Ferenc Wagner wrote:
>>>
>>>> However, I also got a different crash with the original binary. I hope
>>>> it's a different realisation of the same problem, can you confirm?
>>>
>>> I can't see any other reason for this problem, I'm pretty sure it's
>>> down to heap corruption from an earlier double-free.
>>
>> It's a rather narrow chance, as I was running under electric fence...
>
> It was late..... I'll try again :-)
>
> At the point of the crash, 0xb7184f8c had already been freed and
> therefore mapped out by efence. Hence when 0xb7184f8c gets deferenced
> by memcpy, it segfaults. This is consistent with the known and fixed bug.
Yes, if the segfault comes from the first byte moved, not from a later
out-of-range one, caused by the bogus value of "len". Why, I've still
got the core, let's check...
(gdb) x/i $eip
0xb7599d5a <memcpy+26>: movsw %ds:(%esi),%es:(%edi)
(gdb) p/x $edi
$1 = 0xb7184f8c
You're right, it's the first access.
> The value of "len" must be a optimisation artifact, there is no way that
> add_extradata_opt() could generate that value.
I've never seen such an artifact (pretty much nothing but a value being
optimized out), but my C is admittedly rusty.
(gdb) x $esp+0xc
0xbfda0e98: 0x0000000e
So memcpy() was called for 14 bytes, nothing like that crazy number.
>>>> I'm continuing testing the fix. It usually took me tens of minutes to
>>>> reproduce the crash, but with the change it already survived more than
>>>> an hour. Unfortunately, it isn't fully automatic (because of other bugs
>>>> in other software).
>>>
>>> To trigger this bug, there needs to be a dhcp-script, obviously. But
>>> also the rate of DHCP transactions needs to be fast enough and/or the
>>> script needs to be slow enough so that a second DHCP transaction
>>> happens on a lease before the first one has been sent to the
>>> DHCP-script. This is pretty rare, hence no-one has seen this bug, as
>>> far as I know, even though it has been lurking for some time (years).
>>
>> Well, this doesn't fully match my test setup, which contained a single
>> netbooted Linux continuously rebooting in Qemu. The exotic part is that
>> the PXE ROM used the network interface natively, while the Linux system
>> with an added 802.1q tag. So a single lease was ping-ponging between
>> two different subnets.
>
> How much work was you dhcp-script doing.
It's nothing but a call to an SGE utility to add the new host to a
hostlist. The script itself is nothing, and qconf shouldn't take long
either. Occasionally it encounters a DNS problem (unable to resolve
host, cf. other thread), but even that's fast, not some 5 sec timeout.
> By coincidence I had another report of this bug yesterday which
> triggered only when the DHCP transaction rate is high.
Lucky you! :)
--
Cheers,
Feri.
More information about the Dnsmasq-discuss
mailing list