[Dnsmasq-discuss] Dnsmasq with Gigantic hosts file

Wed Jan 11 22:24:21 GMT 2012

2012/1/11 Simon Kelley <simon at thekelleys.org.uk>:
> On 11/01/12 18:44, Jan Seiffert wrote:
>> 2012/1/11 Simon Kelley <simon at thekelleys.org.uk>:
[snip]
>>> I've thought about this a bit more, and I think it's pretty easy to
>>> hash the addresses only during the process of reading hostsfiles
>>> with basically no extra resource: There is a pointer field in cache
>>> entries which is unused at that time, and could be used to hold an
>>> open-hash chain. All that's needed is an array of pointers, one per
>>> hash bucket, which can be freed once files are read.
>>>
>>
>> *mumbel, mumbel* While i envy your genius, and are intrigued by the
>> nifty trick, this is on the verge of ... insanity?
>>
> We eat insanity for breakfast round here.
>

Yummy!
*crunch crunch*
Can you pass me the milk, please?

>> Ok, ok, a reverse tree is "complicated", but would put the whole
>> reverse thing to rest, also during runtime.
>
> Tree is good, because it works for lookup too. Downside is extra memory
> use, and malloc/free during cache operation: dnsmasq at the moment is
> written so that it never calls malloc/free during DNS operations - saves
> memory fragmentation

It is written so the tree grows and shrinks dynamically, but only to
the needed extend, and after that i prop. would be rather static,
allocating a tree node here, freeing one there. I expect no massiv malloc mania.
If this would fragment your memory, then your allocator is bad...

But thats the beauty, the code is on-off switchable.
So operators on normal systems (installed from distro) would turn it
on, on plastic-box-router turn it off.

> , especially in MMU-less systems (Do they still exist?)
>

Sure, they do, but i guess they are fading into so low cost market
segments, that dnsmasq is out of the question.
(i heard those "can't do sh..."-router are not using Linux or *BSD,
because it is to big, saving on 2Mbyte of flash and 4Mbyte RAM)

>
>
>> I mean now Preston seems to be slowed down during read in, but i
>> guess later reverse lookups will also not be fast due to the sheer
>> number of unique IP/hosts.
>>
>
> ash-during-read doesn't cost extra memory and doesn't do malloc, but it
> won't speed up reverse operations. This isn't normally a problem, even
> for gigantic hosts files, since the lookup cost is limited by the number
> of _reverse_ entries in the cache. For ad-blocking gigantic hosts files,
> almost none of the entries are reverse, so no problem.

Yes, that solved the problem last time, so i didn't take the patch
from back then any further...

> For Preston's
> workload, it maybe moreso, but reverse lookups are much less frequent
> than forward ones on most systems, so it may still be OK.
>

... but the underlying problem that reverse lookup are implemented ...
suboptimal, remains.
Even if they are less frequent, Preston is with his use case (cache a
subnet because it is flaky)
nearing the wall we predicted back then.
If i remember right, the main problem back then was that even a single
reverse lookup (say Windows smb or other stuff) in a lot of forward
lookups could slow everything down dramatically because dnsmasq is
synchronous (which is a good thing).

> Simon.
>

Greetings
Jan

-- 
Remember to eat a healthy breakfast, for tonight we dine in hell!