[Dnsmasq-discuss] Dnsmasq with Gigantic hosts file

Tue Jan 30 13:33:12 GMT 2007

Jan 'RedBully' Seiffert wrote:

> While it is common that most of these entries will point to one address
> (be it 127.0.0.1 or a local LAN machine) and F_REVERSE not set for them,
> it's a pity it will still slow down reverse lookups...

Reverse lookups currently iterate over every cache entry, including (in
the case) the 700k from /etc/hosts which don't have F_REVERSE set. It
will ignore them, of course.

Actually the observation that most of these entries don't set F_REVERSE
is a valuable insight; it's also case that they will have the F_IMMORTAL
bit set.

This allows a very simple optimisation: just put the reverse entries at
the start of the hash-chain. Reverse lookups still need to check every
hash chain, but only the first few entries which have F_REVERSE set, all
the thousands of entries from /etc/hosts further down the list can just
be skipped.

The IMMORTAL bit allows a similar optimisation for garbage-collection
(the cache_scan_free() function.) IMMORTAL entries never need to be
garbage collected, so  by putting them at the _end_ of the hash chain,
iterating over every entry to find expired ones is no longer impacted by
having lots of /etc/hosts names.

I've not tested this yet, but I think the following patch should be enough.

--- dnsmasq-2.36/dnsmasq-2.36/src/cache.c       2006-12-30
13:44:08.000000000 +0000
+++ dnsmasq-2.37/dnsmasq-2.37/src/cache.c       2007-01-30
12:28:49.000000000 +0000
@@ -157,9 +157,24 @@

 static void cache_hash(struct crec *crecp)
 {
-  struct crec **bucket = hash_bucket(cache_get_name(crecp));
-  crecp->hash_next = *bucket;
-  *bucket = crecp;
+  /* maintain an invariant that all entries with F_REVERSE set
+     are at the start of the hash-chain  and all non-reverse
+     immortal entries are at the end of the hash-chain.
+     This allows reverse searches and garbage collection to be optimised */
+
+  struct crec **up = hash_bucket(cache_get_name(crecp));
+
+  if (!(crecp->flags & F_REVERSE))
+    {
+      while (*up && ((*up)->flags & F_REVERSE))
+       up = &((*up)->hash_next);
+
+      if (crecp->flags & F_IMMORTAL)
+       while (*up && (!(*up)->flags & F_IMMORTAL))
+         up = &((*up)->hash_next);
+    }
+  crecp->hash_next = *up;
+  *up = crecp;
 }

 static void cache_free(struct crec *crecp)
@@ -295,8 +310,13 @@
 #else
       int addrlen = INADDRSZ;
 #endif
+      /* take advantage of the fact that has chains have stuff in the
+        order <reverse>,<other>,<immortal> so that when we hit an entry
+        which isn't reverse and is immortal, we're done. */
       for (i = 0; i < hash_size; i++)
-       for (crecp = hash_table[i], up = &hash_table[i]; crecp; crecp =
crecp->hash_next)
+       for (crecp = hash_table[i], up = &hash_table[i];
+            crecp && ((crecp->flags & F_REVERSE) || !(crecp->flags &
F_IMMORTAL));
+            crecp = crecp->hash_next)
          if (is_expired(now, crecp))
            {
              *up = crecp->hash_next;
@@ -567,12 +587,16 @@
   else
     {
       /* first search, look for relevant entries and push to top of list
-        also free anything which has expired */
+        also free anything which has expired. All the reverse entries
are at the
+        start of the hash chain, so we can give up when we find the first
+        non-REVERSE one.  */
        int i;
        struct crec **up, **chainp = &ans;

-       for(i=0; i<hash_size; i++)
-        for (crecp = hash_table[i], up = &hash_table[i]; crecp; crecp =
crecp->hash_next)
+       for (i=0; i<hash_size; i++)
+        for (crecp = hash_table[i], up = &hash_table[i];
+             crecp && (crecp->flags & F_REVERSE);
+             crecp = crecp->hash_next)
           if (!is_expired(now, crecp))
             {
               if ((crecp->flags & F_REVERSE) &&

> 
> Maybe the following patch will help:
> Maintain a tree for reverse lookups.
> Patch is compile tested only, i don't know if this will work, I surely
> dropped the ball somewhere. Also because the cache logic 'as is' is
> quite complex IMHO (but that's maybe just my brain always needing a
> moment wrapping around hash tables when looking into the impl.).

That code is clearly in the spirit of the existing cache code, which
scares me whenever I look at it! I'll test mine first, just because it's
simpler and I (think I) understand it. If it's no good I'll work on
understanding yours and go with that.

Cheers,

Simon.