[Dnsmasq-discuss] DHCPv6 doesn't work on Linux interfaces enslaved to a VRF
Simon Kelley
simon at thekelleys.org.uk
Mon Oct 9 21:17:23 UTC 2023
On 09/10/2023 11:40, Luci Stanescu wrote:
> Hi Simon,
>
> Thank you for your response and your openness to this issue! My thoughts
> below, inline (and apologies for the rather long email).
>
>> On 9 Oct 2023, at 01:05, Simon Kelley <simon at thekelleys.org.uk> wrote:
>> 1) Even if this is a kernel bug, kernel bugs fixes take a long time to
>> spread, so working around them in dnsmasq is a good thing to do, as
>> long as it doesn't leave us with long-term technical debt. This
>> wouldn't be the first time a kernel bug has been worked around.
>
> I agree, that's why I've opened this discussion here.
>
>> 2) https://docs.kernel.org/networking/vrf.html says:
>>
>> Applications that are to work within a VRF need to bind their socket
>> to the VRF device:
>> setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1);
>> or to specify the output device using cmsg and IP_PKTINFO.
>>
>> Which kind of implies that this might not be a kernel bug, rather
>> we're just not doing what's required to work with VRF.
>
> I'm not convinced this isn't a kernel bug. The VRF implementation has
> been developed in stages over several years. It is indeed the case that
> initially the sockets had to be bound to the VRF device or to specify it
> via IP_PKTINFO/IPV6_PKTINFO. But then came support for
> net.ipv4.*_l3mdev_accept sysctls (which confusingly also affect AF_INET6
> sockets) as well as a series of patches in 2018 that allowed specifying
> a VRF slave device for several operations. Before that series of
> patches, it certainly made sense for sin6_scope_id in msg_name for
> recvmsg() to be the VRF device (it had to be) – but I'm not convinced it
> shouldn't have been changed after the rules for connect() and sendmsg()
> were relaxed. The thing is, as it stands, the kernel code works well for
> everything except IPv6 link-local communication, so it wouldn't be
> surprising for this to be a simple oversight.
>
> I had tracked this down while trying to figure out what's going on here
> and detailed a bit in the kernel bug report, which you can find here:
>
> https://lore.kernel.org/netdev/06798029-660D-454E-8628-3A9B9E1AF6F8@safebits.tech/T/#u <https://lore.kernel.org/netdev/06798029-660D-454E-8628-3A9B9E1AF6F8@safebits.tech/T/#u>
>
>> Setting the device to send to using IP_PKTINFO rather than relying on
>> the flowinfo field in the destination address would be quite possible,
>> and the above implies that it will work.
>
> Apologies for being pernickety, but it's the scope_id field which is
> relevant here, rather than flowinfo. And since we're talking AF_INET6,
> shouldn't it be IPv6_PKTINFO?
My bad. I meant scope_id. It was late :(
>
> I *think* it should work. I have been unable to find a situation where
> the scope received in the IPV6_PKTINFO cmsg to recvmsg() cannot be used
> to reliably send a response out the same interface (which I believe is
> exactly what DHCPv6 code will always want to do), but my word is
> certainly no guarantee. More about this towards the end of the email.
>
> However, it'll only work as long as you do NOT specify a scope in the
> destination of the sendmsg() call or that scope specified is exactly the
> same as in the IPV6_PKTINFO ancillary message. Specifically, you cannot
> specify the VRF master device index. I've adapted my earlier scripts to
> test this and I've pasted them at the end of this email.
>
>> This brings us on to
>>
>> 3) IPv4. Does DHCPv4 work with VRF devices? It would be nice to test,
>> and fix any similar problems in the same patch. Interestingly, the
>> DHCPv4 code already sets the outgoing device via IP_PKTINFO (there
>> being no flowinfo field in an IPv4 sockaddr) so it stands a chance of
>> just working.
>
> DHCPv4 works just fine. My dnsmasq configuration uses 'interface' to
> specify the VRF slave interface (which in my case is a bridge) and
> DHCPv4 messages are sent out correctly.
>
>> Copying the inferface index into the flowinfo of the destination or
>> setting IP_PKTINFO are both easy patches to make and try. The
>> difficult bit is being sure that they won't break existing installations.
>
> My tests seem to imply that leaving the received scope_id field (which
> is the VRF master device index) unchanged and setting IPV6_PKTINFO won't
> work. Three options seem to work:
> 1. Overwrite scope_id of source address from recvmsg() with the
> interface index from the received IPV6_PKTINFO.
> 2. When performing the sendmsg(), set the scope_id of the
> destination to 0 and add IPV6_PKTINFO with the the empty address (since
> the received IPV6_PKTINFO specifies the multicast address and that won't
> do as a source) and the interface index from the received IPV6_PKTINFO.
> 3. If the socket is bound to an L3 interface (not the VRF master
> device), just set the scope_id in the destination to 0 and IPV6_PKTINFO
> is not required. I'm not sure this'll work for dnsmasq, but I thought of
> including it for the sake of completeness.
>
This is good information.
I've implemented option 1 here and it's currently running and dogfood on
my home network. There are no VRF interfaces there: this is a test
mainly to check that nothing breaks. So far, so good.
The patch I used is attached. It would be interesting to see if it
solves the problem for you.
> I've adapted my scripts slightly to allow easier testing of the
> behaviour. The receiver socket now binds to the VRF device instead (you
> can even not bind to any device and just set the
> net.ipv4.udp_l3mdev_accept sysctl to 1). The interface configuration is
> as before:
>
> ip link add myvrf type vrf table 42
> ip link set myvrf up
> ip link add veth1 type veth peer name veth2
> ip link set veth1 master myvrf up
> ip link set veth2 up
>
> The receiver now echoes back any data that it gets:
>
> import socket
> import struct
>
> s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
> s.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_RECVPKTINFO, 1)
> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b'myvrf')
> s.bind(('', 2000, 0, 0))
> mreq = struct.pack('@16sI', socket.inet_pton(socket.AF_INET6,
> 'ff02::1:2'), socket.if_nametoindex('veth1'))
> s.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_JOIN_GROUP, mreq)
>
> while True:
> data, cmsg_list, flags, source = s.recvmsg(4096, 4096)
> dest_scope = None
> for level, type, cmsg_data in cmsg_list:
> if level == socket.IPPROTO_IPV6 and type == socket.IPV6_PKTINFO:
> dest_address, dest_scope = struct.unpack('@16sI', cmsg_data)
> dest_address = socket.inet_ntop(socket.AF_INET6, dest_address)
> print("PKTINFO destination {}%{}".format(dest_address,
> socket.if_indextoname(dest_scope)))
> source_address, source_port, source_flow, source_scope = source
> print("receved message from {}%{}".format(source_address,
> socket.if_indextoname(source_scope)))
> ipinfo_cmsg = struct.pack('@16sI',
> socket.inet_pton(socket.AF_INET6, '::'), dest_scope or 0)
> s.sendmsg([data],
> [(socket.IPPROTO_IPV6, socket.IPV6_PKTINFO, ipinfo_cmsg)],
> 0,
> (source_address, source_port, source_flow, source_scope))
>
> The sender now waits for a response:
>
> import socket
>
> s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
> dest = ('ff02::1:2', 2000, 0, socket.if_nametoindex('veth2'))
> s.sendto(b'foo', dest)
> data, source = s.recvfrom(4096)
> source_address, source_port, source_flow, source_scope = source
> print("Received response from '{}%{}': {}".format(source_address,
> socket.if_indextoname(source_scope), data))
>
> If you run this example, the sendmsg() call from the receiver will fail
> with EINVAL. That's because the IPV6_PKTINFO ancillary message specifies
> the correct interfaces as a scope (the veth1 interface index), but
> sendmsg() specifies the VRF master device as a scope for the destination
> (as received from recvmsg()). You can get it to work by either:
> 1. Specify 'dest_scope' in the last argument to sendmsg(), which
> corresponds to the sin6_scope_id field of the msg_name field. I guess
> you can even specify 'dest_scope or source_scope' to have a fallback if
> (for whatever reason) recvmsg() doesn't get a IPV6_PKTINFO ancillary
> message. You can also not include the IPv6_PKTINFO ancillary message in
> the sendmsg() call here, since it's now redundant.
> 2. Specify '0' as the last argument to sendmsg() instead of
> source_scope, but now the IPV6_PKTINFO ancillary message in sendmsg()
> becomes mandatory.
> 3. Bind the socket to 'veth1' instead of 'myvrf', don't include a
> IPV6_PKTINFO ancillary message in sendmsg() and specify '0' as the last
> argument to sendmsg() instead of source_scope.
>
> These three changes correspond to my earlier three options, respectively.
>
>> The difficult bit is being sure that they won't break existing
>> installations.
>
> Indeed. I think this boils down to:
> 1. Knowing that DHCPv6 should always send a message on the same L3
> interface on which it received a request. I believe this holds true, but
> you'll definitely know more about the spec and real-world scenarios than
> I do.
> 2. Finding authoritative information that the interface index from
> IPV6_PKTINFO is always set to the L3 interface on which a datagram was
> received. The kernel mailing list might be start? I'd certainly be happy
> to help think about and test various scenarios.
>
I'm happy that 1. is true. Please enquire about 2.
Cheers,
Simon.
> Cheers,
>
> Luci
>
>
> --
> Luci Stanescu
> Information Security Consultant
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vrf.patch
Type: text/x-patch
Size: 1267 bytes
Desc: not available
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20231009/024a0930/attachment.bin>
More information about the Dnsmasq-discuss
mailing list