dhilgarth/debugging swarm network.md

I've just had the case that the networkDB looked correct, but the packages were still being routed to the wrong node.

Details: Node S, running the source container, has IP 172.17.2.2 Node T, running the target container, has IP 172.16.3.2

The target container has the endpoint IP 192.168.74.111. On both nodes, the following command return the same data:

docker run --net host dockereng/network-diagnostic:onlyclient -port 2000 -v -net ebbnwg5bhz9y9g40r8yum5bdt -t overlay 2>&1 | grep \\.111
time="2024-10-18T16:18:25Z" level=debug msg="key:c3d4202a63eac0339c97051eaf16464f21e505e63782c5944d5c6ab7a120fd59 value:{EndpointIP:192.168.74.111/24 EndpointMAC:02:42:c0:a8:4a:6f TunnelEndpointIP:172.16.3.2} owner:f05333e5c866"

-> TunnelEndpointIP is correct.

Still, I could verify via tcpdump, that node S tries to send the packages to a different cluster node with IP 172.17.7.2. This shows that the network DB is not the complete truth.

I dug further. For the next bit, it's important to know that the MAC addresses of the container's interfaces are always 02:42:IP:as:hex, so the MAC address of the target container is 02:42:c0:a8:42:6f. Or just get it via nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz arp -n | grep \\.111

Using nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb show | grep 6f I could verify that the forwarding database in fact has the wrong IP 172.17.7.2 as the destination:

# nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb show | grep 6f
02:42:c0:a8:4a:6f dev vxlan0 master br0
02:42:c0:a8:4a:6f dev vxlan0 dst 172.17.7.2 link-netnsid 0 self permanent

I've manually fixed the fdb entry via

# nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb delete 02:42:c0:a8:4a:6f dev vxlan0 dst 172.17.7.2
# nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb add 02:42:c0:a8:4a:6f dev vxlan0 dst 172.16.3.2 self permanent

This resulted in the packages now being received by the correct node T, but still not by the container running on it. A few minutes later, it started working without further changes...

On another node, I had a problem that looked like the same at first, but was very different: containers on that node couldn't reach the Docker Swarm Service running on node T.
However, they could access the container of that service via its IP. Just the connection via the service VIP didn't work.
Checking the endpoint table showed nothing wrong:

docker run --net host dockereng/network-diagnostic:onlyclient -port 2000 -v -net ebbnwg5bhz9y9g40r8yum5bdt -t sd 2>&1 | grep mimir-lb

Resulted in the correct entry:

{Name:mimir-lb.1.o2ta5bj935uwseekyddngin78 ServiceName:mimir-lb ServiceID:yt4kbu3b59erq6rgmzbvxrd86 VirtualIP:192.168.74.42 EndpointIP:192.168.74.111 IngressPorts:[] Aliases:[] TaskAliases:[d5a90b876bde] ServiceDisabled:false}

But similar to the other issue, the network DB might not actually be in sync with what was actually configured on the node, so I checked the actual mapping between the VIP and the container:

# nsenter --net=/var/run/docker/netns/lb_ebbnwg5bh iptables -t mangle -L -n -v --line-numbers | grep \\.42
4     7402  444K MARK       all  --  *      *       0.0.0.0/0            192.168.74.42        MARK set 0x1db

Translating the mark to decimal (475) and checking it in the ipvsadm:

# nsenter --net=/var/run/docker/netns/lb_ebbnwg5bh ipvsadm -L -n
...
FWM  475 rr
  -> 192.168.74.111:0             Masq    1      5          0

Looks also correct.

hdep · 2025-06-20T13:11:36Z

thanks a lot.
I have the same issue has #1
I updated :
moby/moby#49908

dhilgarth · 2025-06-20T13:20:19Z

Awesome, happy that it helped you. Let's hope the underlying issue is soon being fixed. As far is I can tell, it's currently actively being worked on

hdep · 2025-06-20T13:23:39Z

yes I just read moby/moby#50232 which looks like exactly what I did. Lot of service update for debug, then this issue appeared.

dhilgarth/debugging swarm network.md

Select an option

No results found

Select an option

No results found

hdep commented Jun 20, 2025

Uh oh!

dhilgarth commented Jun 20, 2025

Uh oh!

hdep commented Jun 20, 2025

Uh oh!