[Dnsmasq-discuss] Google's DNS servers

Mon Jan 18 19:54:33 GMT 2010

Simon Kelley wrote:
> There's a difference between doing this [with] a recursive server(Google) 
> and a cache (dnsmasq.) ...to make this work, refreshes have to happen
> _after_ the TTL expires, plus a few seconds slop to avoid 
> clock-drift. That's OK as long as take it into account.

Good point. The caching server could probably hang onto records for a 
few seconds after expiration to bridge the gap.

> Records get evicted from the dnsmasq cache for two reasons, either 
> the TTL expires, or sapce is needed for another record, and the 
> least-recently-used record is dumped.

Do you currently update a timestamp on cached records when there is a 
cache hit?

> Clearly only the first of should result in a refresh, and the act of
> doing a refresh shouldn't move the record back up the LRU order.

Right. And this could pose a problem for an external implementation, as 
there would be no way to refresh a record without impacting LRU.

> The most valuable records to refresh are those with short TTLs and 
> those which are accessed often. One could combine this by marking a 
> record for refresh iff is gets accessed _and_ it has less the n seconds 
> before it expires. That would refresh records which are being accessed 
> regularly and the those being accessed at all with short TTL values.

This sounds like a relatively low cost enhancement that could be 
beneficial, and perhaps even the area that offers the biggest gain, but 
it only addresses part of the problem covered by Google's approach.

(This would also be harder to implement as a prototype external to Dnsmasq.)

> This scheme means that if a record is refreshed and then not accessed 
> again, it will drop out the next time the TTL expires. That's important, 
> you don't what records to persist forever,

While records shouldn't persist forever, there may be benefits to taking 
a longer term view of usage patterns than what you can obtain by simply 
tracking the LRU of what happens to be in the cache.

Dnsmas's view of usage is going to be limited to how long it has been 
running, how much memory it has, and the rate at which records are being 
removed from the cache. A short term analysis doesn't let you balance 
cache capacity between moderately frequently queried records, and random 
one-off queries. A longer term view may show your site regularly queries 
for 1000 domains, but some of those keep getting pushed out of the cache 
by the "long tail" of random queries.

But I think you're right that the biggest impact will be seen from 
records with a short TTL. With the current algorithm, if the end-clients 
request a record regularly, but at a frequency less than the TTL, every 
request will be delayed for an external lookup.

Something to consider is that cache expiration is based on LRU or TTL, 
yet TTL is an artificial limit set externally, and may have no bearing 
on what is optimal caching strategy for the usage pattern. Even your 
proposed idea of applying a single refresh cycle to short TTL records 
just extends this arbitrary limit.

For example, statistical analysis might show that a record has a very 
high probability of being reused in the next 30 seconds, medium in the 
next 20 minutes, low in the next 60 minutes, and very low in the next 4 
hours. This probably has no correlation to the records TTL.

This would suggest a strategy where you might choose to automatically 
refresh all records as needed until they are at least 4 hours old, and 
reset the clock on hits. (I think the existing LRU algorithm would 
already take care of expiring the oldest records when memory ran low.)

> I see this a completely orthogonal to using persistent storage. You
> don't need PS to do predictive cache refresh, and you don't need cache
> refresh to do persistent storage.

Agreed.

> I continue to contend that persistent storage is a bad idea.

If you want a seeded cache at startup, you can't avoid some sort of 
persistent storage. However, this could be implemented external to 
Dnsmasq easily.

The other use case, as I mentioned, is being able to take a longer term 
view in your usage analysis. This may prove to not be beneficial or at 
least not beneficial enough to justify.

  -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/