Skip to content

Instantly share code, notes, and snippets.

@dhilgarth
Last active July 15, 2025 10:20
Show Gist options
  • Select an option

  • Save dhilgarth/1d2c350ddb1204ae07640471b4f2a732 to your computer and use it in GitHub Desktop.

Select an option

Save dhilgarth/1d2c350ddb1204ae07640471b4f2a732 to your computer and use it in GitHub Desktop.
Troubleshoot Docker Swarm Network issues
See https://github.com/moby/moby/issues/47728 for context.
I've just had the case that the networkDB looked correct, but the packages were still being routed to the wrong node.
Details:
Node S, running the source container, has IP 172.17.2.2
Node T, running the target container, has IP 172.16.3.2
The target container has the endpoint IP 192.168.74.111.
On both nodes, the following command return the same data:
```
docker run --net host dockereng/network-diagnostic:onlyclient -port 2000 -v -net ebbnwg5bhz9y9g40r8yum5bdt -t overlay 2>&1 | grep \\.111
time="2024-10-18T16:18:25Z" level=debug msg="key:c3d4202a63eac0339c97051eaf16464f21e505e63782c5944d5c6ab7a120fd59 value:{EndpointIP:192.168.74.111/24 EndpointMAC:02:42:c0:a8:4a:6f TunnelEndpointIP:172.16.3.2} owner:f05333e5c866"
```
-> TunnelEndpointIP is correct.
Still, I could verify via tcpdump, that node S tries to send the packages to a different cluster node with IP 172.17.7.2.
This shows that the network DB is not the complete truth.
I dug further. For the next bit, it's important to know that the MAC addresses of the container's interfaces are always 02:42:IP:as:hex, so the MAC address of the target container is 02:42:c0:a8:42:6f. Or just get it via `nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz arp -n | grep \\.111`
Using `nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb show | grep 6f` I could verify that the forwarding database in fact has the wrong IP 172.17.7.2 as the destination:
```
# nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb show | grep 6f
02:42:c0:a8:4a:6f dev vxlan0 master br0
02:42:c0:a8:4a:6f dev vxlan0 dst 172.17.7.2 link-netnsid 0 self permanent
```
I've manually fixed the fdb entry via
```
# nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb delete 02:42:c0:a8:4a:6f dev vxlan0 dst 172.17.7.2
# nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb add 02:42:c0:a8:4a:6f dev vxlan0 dst 172.16.3.2 self permanent
```
This resulted in the packages now being received by the correct node T, but still not by the container running on it. A few minutes later, it started working without further changes...
@hdep
Copy link
Copy Markdown

hdep commented Jun 20, 2025

thanks a lot.
I have the same issue has #1
I updated :
moby/moby#49908

@dhilgarth
Copy link
Copy Markdown
Author

Awesome, happy that it helped you. Let's hope the underlying issue is soon being fixed. As far is I can tell, it's currently actively being worked on

@hdep
Copy link
Copy Markdown

hdep commented Jun 20, 2025

yes I just read moby/moby#50232 which looks like exactly what I did. Lot of service update for debug, then this issue appeared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment