Kea Interacting with Wired Networking

After a reboot of my new server, the Kea DHCP server did not respond to DHCPDISCOVERs or DHCPREQUESTs

I had trouble noticing this because my laptop retained a DHCP lease for a while after rebooting the new server. Close laptop, come back later, and it could not get a lease. I had to experience it a few times before believing.

systemctl restart kea-dhcp4 on the new server got me past the proximal problem: the restarted kea-dhcp4 would give out leases. My laptop could join the IP network without hassle.

I figured out the root cause by looking at kea-dhcp4 logs:

sudo journalctl -u kea-dhcp4.service --since 2024-06-16T14:34:00 > kea.logs

In this command, “2024-06-16T14:34:00” is just before rebooting the entire new server.

I could compare the sequence of log entries from boot to restart, to the log entries from restart to present time. I could see kea-dhcp4 give out IPv4 address to my laptop after I restarted it, but not between system boot and kea-dhcp4 restart.

Other than the timestamps on log entries, the first unusual log entry that occurred in the time from from system boot to kea-dhcp4 restart was:

WARN  DHCPSRV_OPEN_SOCKET_FAIL failed to open socket: the interface enp4s0 is not running

Why, according to kea-dhcp4 is interface enp4s0 not running? It’s certainly running now, if I run ip -br link or ip -br address. I also have to believe that kea-dhcp4 does not try to open the socket again later.

Since I now have kea-dhcp4 running on my Dell R530), I double checked the log entries on the Dell R530, confirming that enp4s0 “not running” indicates the real problem.


Here’s a diagram of my network, for purposes of this post:

diagram of network for this post

I can ssh to my new server by joining a WiFi network served by my production server, the Dell R530, then using the IPv4 address assigned to interface enp7s0 on the new server.

There’s some iptables setup done at system boot time to assign static IPv4 addresses to interface enp4s0 (172.24.0.1), to set up IPv4 forwarding, to do Network Address Translation (NAT) masquerading. This is almost certainly where enp4s0 “not running” happens.

My first thought was that the systemd unit that I wrote to do all the iptables work didn’t run at the correct time. I have only a hazy understanding of how systemd (PID 1) decides to order units (services, targets, devices) at system startup.

I have had this systemd unit (imaginatively called network.service) running on three different machines since sometime in 2018.

Here’s the Unit specification that caused kea to start before enps0 network interface was ready:

[Unit]
Description=Wired Static IP Connectivity
Wants=network.target
Before=network-pre.target
BindsTo=sys-subsystem-net-devices-eno1.device
After=sys-subsystem-net-devices-eno1.device

I read some man pages, looked up some system admin blogs on systemd unit files, especially, the Before=, After=, Wants= and Requires= config items. I tried a few things, but when they didn’t work (kea started before enp4s0 was UP), I looked at /usr/lib/systemd/system/iptables.service for inspiration.

iptables.service has Before=network-pre.target, which really doesn’t match the documentation, which says something like this:

network-pre.target is used to order services before any network interfaces start to be configured.

I ended up with this in /etc/systemd/system/network.service file:

[Unit]
Description=Wired Static IP Connectivity
Before=network-pre.target
Wants=network-pre.target
After=sys-subsystem-net-devices-enp4s0.device
Requires=sys-subsystem-net-devices-enp4s0.device

It looks to me like this problem occurs on my new server, but not my production server, because the production server has units to start pppd and PPPoE, and do Path-MTU-clamping. My current theory is these units on my production server caused a different synchronization, so dhcpd or kea-dhcp4 gets started later in the boot sequence.