Last active
July 15, 2025 10:20
-
-
Save dhilgarth/1d2c350ddb1204ae07640471b4f2a732 to your computer and use it in GitHub Desktop.
Troubleshoot Docker Swarm Network issues
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| See https://github.com/moby/moby/issues/47728 for context. | |
| I've just had the case that the networkDB looked correct, but the packages were still being routed to the wrong node. | |
| Details: | |
| Node S, running the source container, has IP 172.17.2.2 | |
| Node T, running the target container, has IP 172.16.3.2 | |
| The target container has the endpoint IP 192.168.74.111. | |
| On both nodes, the following command return the same data: | |
| ``` | |
| docker run --net host dockereng/network-diagnostic:onlyclient -port 2000 -v -net ebbnwg5bhz9y9g40r8yum5bdt -t overlay 2>&1 | grep \\.111 | |
| time="2024-10-18T16:18:25Z" level=debug msg="key:c3d4202a63eac0339c97051eaf16464f21e505e63782c5944d5c6ab7a120fd59 value:{EndpointIP:192.168.74.111/24 EndpointMAC:02:42:c0:a8:4a:6f TunnelEndpointIP:172.16.3.2} owner:f05333e5c866" | |
| ``` | |
| -> TunnelEndpointIP is correct. | |
| Still, I could verify via tcpdump, that node S tries to send the packages to a different cluster node with IP 172.17.7.2. | |
| This shows that the network DB is not the complete truth. | |
| I dug further. For the next bit, it's important to know that the MAC addresses of the container's interfaces are always 02:42:IP:as:hex, so the MAC address of the target container is 02:42:c0:a8:42:6f. Or just get it via `nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz arp -n | grep \\.111` | |
| Using `nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb show | grep 6f` I could verify that the forwarding database in fact has the wrong IP 172.17.7.2 as the destination: | |
| ``` | |
| # nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb show | grep 6f | |
| 02:42:c0:a8:4a:6f dev vxlan0 master br0 | |
| 02:42:c0:a8:4a:6f dev vxlan0 dst 172.17.7.2 link-netnsid 0 self permanent | |
| ``` | |
| I've manually fixed the fdb entry via | |
| ``` | |
| # nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb delete 02:42:c0:a8:4a:6f dev vxlan0 dst 172.17.7.2 | |
| # nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb add 02:42:c0:a8:4a:6f dev vxlan0 dst 172.16.3.2 self permanent | |
| ``` | |
| This resulted in the packages now being received by the correct node T, but still not by the container running on it. A few minutes later, it started working without further changes... |
Author
Awesome, happy that it helped you. Let's hope the underlying issue is soon being fixed. As far is I can tell, it's currently actively being worked on
yes I just read moby/moby#50232 which looks like exactly what I did. Lot of service update for debug, then this issue appeared.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
thanks a lot.
I have the same issue has #1
I updated :
moby/moby#49908