Skip to main content

Network Troubleshooting Toolkit

When a container cannot connect to another service, it is tempting to start changing firewall rules or restarting everything. This almost always makes things worse. Instead, follow a structured diagnosis path that narrows down the issue layer by layer.

The Troubleshooting Flowchart

flowchart TD
A["Service A cannot reach Service B"] --> B["1. Is Service B running?<br/>docker ps -a"]
B -->|"Not running"| C["Start/fix Service B"]
B -->|"Running"| D["2. Same network?<br/>docker network inspect"]
D -->|"Different networks"| E["Connect to same network"]
D -->|"Same network"| F["3. DNS resolves?<br/>docker exec nslookup"]
F -->|"Name not found"| G["Check container/service name"]
F -->|"Resolves"| H["4. Port open?<br/>docker exec nc -vz"]
H -->|"Connection refused"| I["Check app is listening<br/>on correct port"]
H -->|"Timeout"| J["Check firewall rules"]
H -->|"Connected"| K["5. Check app config<br/>(env vars, URLs)"]

style A fill:#ffebee,stroke:#c62828
style C fill:#e8f5e9,stroke:#2e7d32
style E fill:#e8f5e9,stroke:#2e7d32

Step-by-Step Diagnosis

Step 1: Is the Target Running?

docker ps -a
docker compose ps

If the target container is not running (or keeps restarting), fix that first -- the network is not the problem.

Step 2: Are They on the Same Network?

# Check what networks each container is on
docker inspect -f '{{json .NetworkSettings.Networks}}' service-a
docker inspect -f '{{json .NetworkSettings.Networks}}' service-b

# Or see all containers on a network
docker network inspect app-net

This is the #1 cause of connectivity failures. If the containers are not on the same network, DNS will not work.

Step 3: Does DNS Resolve?

docker exec -it service-a nslookup service-b
# or
docker exec -it service-a getent hosts service-b

If the name does not resolve:

  • Check the container/service name is correct (typos happen)
  • Confirm both containers are on the same user-defined network
  • Remember: the default bridge network does not support DNS

Step 4: Is the Port Open?

docker exec -it service-a nc -vz service-b 5432
ResultMeaning
open / succeededPort is reachable. Problem is in app config
Connection refusedTarget is running but not listening on that port
TimeoutFirewall blocking, wrong IP, or service not ready

Step 5: Check Application Config

If the network path works but the app still cannot connect, the problem is usually in the application configuration:

  • Wrong hostname in environment variable
  • Wrong port number
  • Using localhost instead of the service name
  • Missing credentials

Symptom Quick Reference

Error MessageMost Likely CauseFirst Command
Name or service not knownDNS -- different networks or wrong namedocker network inspect
Connection refusedTarget app not listeningdocker exec nc -vz target port
Connection timed outFirewall, routing, or service not readyss -tulpen on host
Intermittent failuresStartup race or DNS cachingCheck startup order, add retries

Checking Host-Level Port Exposure

If the issue is accessing a container from outside the Docker host:

# What ports are published?
docker port my-container

# What is actually listening on the host?
ss -tulpen | grep LISTEN

# Is it bound to the right interface?
# 0.0.0.0:8080 = all interfaces
# 127.0.0.1:8080 = localhost only

Using a Debug Container

If the failing container does not have networking tools installed, spin up a temporary debug container on the same network:

docker run --rm -it --network app-net alpine:3.20 sh
# Inside:
nslookup db
nc -vz db 5432
wget -qO- http://api:8080/health

This lets you test without modifying production containers.

Key Takeaways

  • Follow the order: container running → same network → DNS resolves → port open → app config. Do not skip steps.
  • Network membership is the #1 cause of DNS failures. Always check it first.
  • Use docker exec or a debug container to test from inside the Docker network, not from the host.
  • Avoid changing firewall rules or restarting services until you have identified the failing layer.
  • Connection refused means the target is reachable but not listening. Timeout means the target is not reachable at all.

What's Next