Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lpf IPoIB support #134

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

TimoRoth
Copy link

@TimoRoth TimoRoth commented Feb 6, 2024

I'd like to replace good old dhcpd with something more modern, but we have an InfiniBand network, and thus need IPoIB support for DHCP4.

This is my current work in progress for adding support to the linux lpf side of things.
It's not quite complete, since I'd like to gather some feedback on the interest and way of implementation first, so here is what I got so far.

I couldn't find any documentation on the IPoIB raw header format, but by looking at it, it seems to be just the 20 byte long hwaddr followed by the same 2 byte ethernet protocol type, and two bytes which are always zero.
Based on that, I ended up with this patch.

There is still one issue, which is that in case of a broadcast, InfiniBand does not neccesarily have a generic broadcast address. Instead it has to be queried from the interface, which will need to be added to the generic interface code still.

@tomaszmrugalski
Copy link
Member

Very interesting. There's rfc4390 that documents the IB flavor of DHCP.

Couple potential problems ahead:

  • ISC doesn't have any IB hardware (or experience with it), so ongoing testing would be an issue
  • my understanding is that all devices report MAC address of 0 and client-id is used to contain the actual identifier. So Kea's assumption that the MAC address is unique would not hold. I suspect many things could break in subtle and not so subtle ways.

None of the above are show-stoppers, just something to be cautious about.

@TimoRoth
Copy link
Author

TimoRoth commented Feb 6, 2024

I think they have something like a MAC, it's just obscenely (20 bytes) long.
A device looks like this:

6: ibp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
    link/infiniband 00:00:01:9f:fe:80:00:00:00:00:00:00:e0:07:1b:ff:ff:70:d3:e0 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

With old ISC DHCPd, the entry for that node would look like this:

host nodeExample {
        hardware ethernet E0:07:1B:70:D3:E0;
        option dhcp-client-identifier=20:E0:07:1B:FF:FF:70:D3:E0;
        fixed-address 10.10.10.10;
        option host-name "nodeExample";
}

I'm not quite sure who came up with the translation back to the 6 byte MAC in dhcpd (it's a Mellanox Patch-Set to make it support Infiniband), but all the values are directly derived from one another.
The leading 0x20 in the client-id seems to be just the HTYPE_INFINIBAND.

I have access to a bunch of hardware to test with, and specially their PXE boot environment is rather particular about doing things correctly.

I just read the RFC, and it states that the server does not know the clients hardware address.
While that's true at the pure DHCP protocol level, at least on Linux via raw packet sockets, the Kernel happily tells one the source hardware address. So a unicast-response is in fact possible.
But I'd imagine Kea would nontheless respect the BROADCAST-Flag and broadcast back the reply, which works fine enough.

@TimoRoth TimoRoth force-pushed the ib_filtering branch 4 times, most recently from 0a03d53 to fb97d76 Compare February 8, 2024 20:52
@sempervictus
Copy link

Thank you, neat.
Might make sense to qualify that the kernel's behavior on this is consistent w/ upstream implementation & ofed along with it being a defined behavior that won't be patched-away when its discovered to be related to a security or performance concern.

@TimoRoth
Copy link
Author

I haven't had a chance to give this a test yet, but in theory the code should be complete.

The Kernel behaviour in regards to the response address doesn't really matter at all, since the RFC states that the clients must always request a broadcast response.

@TimoRoth TimoRoth force-pushed the ib_filtering branch 5 times, most recently from 85bcbf8 to 235aaa0 Compare May 3, 2024 16:37
@TimoRoth TimoRoth force-pushed the ib_filtering branch 2 times, most recently from 83b2783 to 549afba Compare May 21, 2024 21:51
@TimoRoth TimoRoth changed the title WIP: Add lpf IPoIB support Add lpf IPoIB support May 21, 2024
@TimoRoth TimoRoth marked this pull request as ready for review May 21, 2024 22:30
@TimoRoth
Copy link
Author

TimoRoth commented May 21, 2024

I have now finally had the chance to test this, and after fixing two small kinks (confused IFLA_BROADCAST and IFA_BROADCAST and Pkt4 refusing the long CHADDR), it's working just fine.

Here's a log of a node booting up via HTTP boot. First instance is the UEFI HTTP boot tool reqeusting an IP, and the second is dhclient in the initrd:

May 22 00:20:38 login01 kea-dhcp4[421812]: INFO  DHCP4_QUERY_LABEL received query: [hwtype=1 e0:07:1b:70:14:d0], cid=[no info], tid=0x8dfd9448
May 22 00:20:38 login01 kea-dhcp4[421812]: INFO  EVAL_RESULT [hwtype=1 e0:07:1b:70:14:d0], cid=[no info], tid=0x8dfd9448: Expression httpclients evaluated to true
May 22 00:20:38 login01 kea-dhcp4[421812]: INFO  DHCP4_PACKET_RECEIVED [hwtype=1 e0:07:1b:70:14:d0], cid=[no info], tid=0x8dfd9448: DHCPDISCOVER (type 1) received from 0.0.0.0 to 255.255.255.255 on interface ibp1s0
May 22 00:20:38 login01 kea-dhcp4[421812]: INFO  DHCP4_LEASE_OFFER [hwtype=1 e0:07:1b:70:14:d0], cid=[no info], tid=0x8dfd9448: lease 10.110.10.150 will be offered
May 22 00:20:38 login01 kea-dhcp4[421812]: INFO  DHCP4_PACKET_SEND [hwtype=1 e0:07:1b:70:14:d0], cid=[no info], tid=0x8dfd9448: trying to send packet DHCPOFFER (type 2) from 10.110.10.250:67 to 255.255.255.255:68 on interface ibp1s0
May 22 00:20:41 login01 kea-dhcp4[421812]: INFO  DHCP4_QUERY_LABEL received query: [hwtype=1 e0:07:1b:70:14:d0], cid=[no info], tid=0x8dfd9448
May 22 00:20:41 login01 kea-dhcp4[421812]: INFO  EVAL_RESULT [hwtype=1 e0:07:1b:70:14:d0], cid=[no info], tid=0x8dfd9448: Expression httpclients evaluated to true
May 22 00:20:41 login01 kea-dhcp4[421812]: INFO  DHCP4_PACKET_RECEIVED [hwtype=1 e0:07:1b:70:14:d0], cid=[no info], tid=0x8dfd9448: DHCPREQUEST (type 3) received from 0.0.0.0 to 255.255.255.255 on interface ibp1s0
May 22 00:20:41 login01 kea-dhcp4[421812]: INFO  DHCP4_LEASE_ALLOC [hwtype=1 e0:07:1b:70:14:d0], cid=[no info], tid=0x8dfd9448: lease 10.110.10.150 has been allocated for 3600 seconds
May 22 00:20:41 login01 kea-dhcp4[421812]: INFO  DHCP4_PACKET_SEND [hwtype=1 e0:07:1b:70:14:d0], cid=[no info], tid=0x8dfd9448: trying to send packet DHCPACK (type 5) from 10.110.10.250:67 to 255.255.255.255:68 on interface ibp1s0


May 22 00:22:10 login01 kea-dhcp4[421812]: INFO  DHCP4_QUERY_LABEL received query: [hwtype=32 e0:07:1b:ff:ff:70:14:d0], cid=[20:e0:07:1b:ff:ff:70:14:d0], tid=0xc4181770
May 22 00:22:10 login01 kea-dhcp4[421812]: INFO  DHCP4_PACKET_RECEIVED [hwtype=32 e0:07:1b:ff:ff:70:14:d0], cid=[20:e0:07:1b:ff:ff:70:14:d0], tid=0xc4181770: DHCPDISCOVER (type 1) received from 0.0.0.0 to 255.255.255.255 on interface ibp1s0
May 22 00:22:10 login01 kea-dhcp4[421812]: INFO  DHCP4_LEASE_OFFER [hwtype=32 e0:07:1b:ff:ff:70:14:d0], cid=[20:e0:07:1b:ff:ff:70:14:d0], tid=0xc4181770: lease 10.110.10.2 will be offered
May 22 00:22:10 login01 kea-dhcp4[421812]: INFO  DHCP4_PACKET_SEND [hwtype=32 e0:07:1b:ff:ff:70:14:d0], cid=[20:e0:07:1b:ff:ff:70:14:d0], tid=0xc4181770: trying to send packet DHCPOFFER (type 2) from 10.110.10.250:67 to 255.255.255.255:68 on interface ibp1s0
May 22 00:22:10 login01 kea-dhcp4[421812]: INFO  DHCP4_QUERY_LABEL received query: [hwtype=32 e0:07:1b:ff:ff:70:14:d0], cid=[20:e0:07:1b:ff:ff:70:14:d0], tid=0xc4181770
May 22 00:22:10 login01 kea-dhcp4[421812]: INFO  DHCP4_PACKET_RECEIVED [hwtype=32 e0:07:1b:ff:ff:70:14:d0], cid=[20:e0:07:1b:ff:ff:70:14:d0], tid=0xc4181770: DHCPREQUEST (type 3) received from 0.0.0.0 to 255.255.255.255 on interface ibp1s0
May 22 00:22:10 login01 kea-dhcp4[421812]: INFO  DHCP4_LEASE_ALLOC [hwtype=32 e0:07:1b:ff:ff:70:14:d0], cid=[20:e0:07:1b:ff:ff:70:14:d0], tid=0xc4181770: lease 10.110.10.2 has been allocated for 3600 seconds
May 22 00:22:10 login01 kea-dhcp4[421812]: INFO  DHCP4_PACKET_SEND [hwtype=32 e0:07:1b:ff:ff:70:14:d0], cid=[20:e0:07:1b:ff:ff:70:14:d0], tid=0xc4181770: trying to send packet DHCPACK (type 5) from 10.110.10.250:67 to 255.255.255.255:68 on interface ibp1s0

Writing reservations for this is super annoying, since the two clients can't agree on neither hwaddr nor cid.
But luckily the pre-boot environment does not need to get the correct IP + hostname assigned.

@TimoRoth TimoRoth force-pushed the ib_filtering branch 2 times, most recently from 5db69a2 to fd98acb Compare May 22, 2024 00:46
@TimoRoth TimoRoth force-pushed the ib_filtering branch 2 times, most recently from cba7e01 to 1b17aa3 Compare June 23, 2024 15:14
@TimoRoth
Copy link
Author

I've been using this on our network/cluster since a while now, and it's been running without issues.
So if there is anything further I should do to move forward with the PR, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants