[Dnsmasq-discuss] A (possibly bad) idea: failover in dnsmasq
v_cadet at yahoo.fr
Sat May 26 12:26:24 BST 2012
--- On Sat 26.5.12, Simon Kelley wrote :
> > What if there be a heartbeat link in dnsmasq through
> which the active
> > dnsmasq would stream changes (or the whole block of
> data) to the
> > passive instance along with keep-alive probes?
> That has attractions: Both dnsmasq instances could provide
> DNS service at all times, and whichever was "master" could
> provide DHCP, whilst the "slave" just keeps it's database
> up-to-date. The main problem with this is the "split brain"
> scenario, where both instances are up, but they can't talk
> to each other because the network between them is
> partitioned. In that case both acting as masters for their
> half of the network is fine, the problem comes when
> connectivity returns and the lease databases have to be
Hmmm... a failed dnsmasq could request all the changes that occurred since its last failure from its peer(s). Newer records overwrite older ones. Expired leases and records are to be removed [or overwritten according to the received data block that was requested].
Since machines with a lease send their requests to only one dnsmasq instance, lease and record reconciliation should be rather straight forward IMHO and all records from all dnsmasq peers can be merged in decreasing order of expiry date.
That would also suggest each dnsmasq instance maintains a "dirty" state flag until its database is completely in sync with others.
What needs to be done, I guess, is that the "dirty" dnsmasq instance that recovers connection from his other peers must immediately switch to non-authoritative mode and return to passive mode, handing over (or forwarding) its [live] DNS requests to the "master" instance. No DHCP requests should be answered.
If the network connectivity is restored before the failed dnsmasq instance runs again then the latter switches to "dirty" state and non authoritative mode, syncing its database with his other peers.
This implies that a non master dnsmasq should still be able to receive DNS requests. There's a choice here. Either reply directly or forward them to the new dnsmasq master. Could be a mix of both: directly answer requests, which the slave knows aren't yet replicated with the master.
The complete handshake protocol would require that a dnsmasq instance notifies the requesting peer that the sync is complete so that it can switch to "non-dirty and passive" state.
I haven't thought thoroughly, it's just a rough idea for the moment.
More information about the Dnsmasq-discuss