Network Troubleshooting
A systematic toolkit for diagnosing network issues — from "can't reach the server" to "why is this API slow."
The OSI Troubleshooting Approach
Start at Layer 1, work up. Most issues are at L3–L7 in cloud/dev contexts, but the discipline matters.
L7 — Application Is the app returning the right response?
L6 — Presentation Is TLS valid? Encoding correct?
L5 — Session Is the session established/maintained?
L4 — Transport Can I connect to the port? TCP handshake OK?
L3 — Network Can I reach the IP? Route correct?
L2 — Data Link Is the interface up? ARP resolving?
L1 — Physical Is the cable plugged in? Wi-Fi connected?Core Tools
ping — ICMP reachability
bashping google.com # continuous (Ctrl+C to stop)
ping -c 4 8.8.8.8 # 4 packets
ping -i 0.2 google.com # 200ms interval
ping6 google.com # IPv6
# Output:
64 bytes from 142.250.80.46: icmp_seq=1 ttl=117 time=12.3 ms
64 bytes from 142.250.80.46: icmp_seq=2 ttl=117 time=11.8 ms
# ttl=117 → started at 128, 11 hops
# time → round-trip latencyPing success ≠ port open. A host can respond to ICMP while blocking TCP 80. Ping fail ≠ host down. Many servers/firewalls drop ICMP.
traceroute / tracert — path discovery
bashtraceroute google.com # UDP (default on Linux)
traceroute -T google.com # TCP SYN (bypasses more firewalls)
traceroute -I google.com # ICMP
traceroute -p 443 google.com # specific port
tracert google.com # Windows
# Output:
1 192.168.1.1 (192.168.1.1) 1.245 ms 1.108 ms 0.989 ms
2 10.0.0.1 (10.0.0.1) 5.432 ms 5.211 ms 5.398 ms
3 * * * ← ICMP blocked at this hop (still forwarding)
4 72.14.209.1 8.123 ms 7.989 ms 8.245 ms
5 google.com 10.456 msInterpreting * * *: that hop doesn't respond to probe packets, but it may still be forwarding. If later hops respond, the * * * hop is just filtering — not a break.
High latency at one hop: check if it persists to subsequent hops. If later hops are fast, the problem hop is just deprioritizing ICMP responses (normal).
dig / nslookup — DNS
bash# dig (preferred on Linux/macOS)
dig google.com # A record
dig google.com AAAA # IPv6
dig google.com MX # mail servers
dig google.com NS # name servers
dig google.com TXT # text records (SPF, DKIM)
dig +short google.com # just the IP(s)
dig @8.8.8.8 google.com # use specific resolver
dig @1.1.1.1 google.com +short # Cloudflare resolver
dig +trace google.com # follow the full delegation chain
dig -x 142.250.80.46 # reverse DNS (PTR)
dig google.com +noall +answer # just the answer section
# nslookup (cross-platform, simpler)
nslookup google.com
nslookup google.com 8.8.8.8 # use 8.8.8.8 as resolver
nslookup -type=MX google.comDebugging DNS issues:
bash# 1. Check what resolver you're using
cat /etc/resolv.conf # Linux
scutil --dns # macOS
# 2. Compare local resolver vs public
dig @127.0.0.1 internal.company.com # local
dig @8.8.8.8 internal.company.com # public (shouldn't resolve)
# 3. Check TTL (how long until record refreshes)
dig +nocmd +noall +answer +ttlid google.com
# 3600 = cached for 1 hour
# 4. Flush DNS cache
sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder # macOS
sudo systemd-resolve --flush-caches # Linux (systemd)
ipconfig /flushdns # Windowscurl — HTTP testing
bash# Basic request
curl https://example.com
# Verbose (headers, TLS, timing)
curl -v https://example.com
curl -vvv https://example.com # even more verbose
# Headers only
curl -I https://example.com # HEAD request
curl -D - https://example.com # dump response headers
# Custom headers + body
curl -X POST https://api.example.com/data \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer token123' \
-d '{"key": "value"}'
# Follow redirects
curl -L https://example.com
# Timing breakdown
curl -o /dev/null -s -w "\
dns: %{time_namelookup}s\n\
tcp: %{time_connect}s\n\
tls: %{time_appconnect}s\n\
ttfb: %{time_starttransfer}s\n\
total: %{time_total}s\n\
http_code: %{http_code}\n" https://example.com
# DNS time: 0.020s
# TCP connect: 0.045s
# TLS handshake: 0.120s
# Time to first byte: 0.180s
# Total: 0.210s
# Test specific IP (bypass DNS)
curl --resolve example.com:443:1.2.3.4 https://example.com
# Test with HTTP/2
curl --http2 -v https://example.com
# Certificate info
curl -v --no-progress-meter https://example.com 2>&1 | grep -A20 'Certificate'
# Ignore TLS errors (NEVER in production)
curl -k https://self-signed.example.comnetstat / ss — connections & ports
bash# ss (modern, faster, Linux)
ss -tuln # listening ports (TCP+UDP, numeric)
ss -tnp # TCP connections with process names
ss -s # summary statistics
ss -tan state established # only established TCP
# netstat (classic, cross-platform)
netstat -an # all connections, numeric
netstat -tlnp # listening TCP + PIDs (Linux)
netstat -rn # routing table
netstat -s # protocol statistics
# macOS
netstat -anv # verbose
lsof -i :3000 # what's using port 3000
lsof -i tcp # all TCP connections
# Find what's using a port
ss -tlnp | grep :443
lsof -i :443
fuser 443/tcp # Linuxbash# Check if port is open on remote host
nc -zv google.com 443 # TCP connection test
nc -zvu google.com 53 # UDP
nc -w 3 -zv 10.0.0.5 22 # timeout after 3s
# Telnet alternative
telnet example.com 80
# Then type: GET / HTTP/1.0\n\ntcpdump — packet capture
bash# Capture all traffic on interface
tcpdump -i eth0
# Capture HTTP traffic
tcpdump -i eth0 port 80 -A # -A = print as ASCII
# Capture TCP on 443 to specific host
tcpdump -i eth0 tcp and port 443 and host 8.8.8.8
# Save to file for Wireshark
tcpdump -i eth0 -w capture.pcap
# Read from file
tcpdump -r capture.pcap
# Filter by network
tcpdump -i eth0 net 192.168.1.0/24
# DNS queries only
tcpdump -i eth0 port 53 -v
# Show packet contents in hex+ASCII
tcpdump -i eth0 -XX port 80 host example.com
# Common filters:
# host 1.2.3.4 — from/to this IP
# src 1.2.3.4 — from this IP only
# dst 1.2.3.4 — to this IP only
# port 80 — TCP or UDP port 80
# tcp — TCP only
# 'tcp[tcpflags] & tcp-syn != 0' — SYN packets onlynmap — port scanning
bash# Scan common ports
nmap google.com
# Scan specific ports
nmap -p 80,443,8080 google.com
# Scan range
nmap -p 1-1000 192.168.1.1
# Detect services/versions
nmap -sV 192.168.1.1
# OS detection
nmap -O 192.168.1.1
# Fast scan (top 100 ports)
nmap -F 192.168.1.0/24
# Stealth SYN scan
nmap -sS 192.168.1.1
# UDP scan (slower)
nmap -sU -p 53,67,161 192.168.1.1Only scan hosts you have permission to scan.
Wireshark — packet analysis (GUI)
Graphical packet capture/analysis. Open capture.pcap from tcpdump, or capture live.
Useful display filters:
http # all HTTP
http.request.method == "POST" # HTTP POSTs
dns # DNS
tcp.flags.syn == 1 # TCP SYN packets
ip.addr == 192.168.1.10 # packets to/from IP
tcp.port == 443 # TCP 443
ssl.handshake.type == 1 # TLS ClientHelloFollow TCP Stream: right-click a packet → Follow → TCP Stream. Shows entire conversation in human-readable form.
Systematic Troubleshooting Workflows
"I can't reach the server"
1. Can you ping the server?
YES → L3 reachable, issue is L4-L7
NO → check L1-L3
2. If NO ping:
a. ping your gateway (192.168.1.1)?
NO → local issue: cable, Wi-Fi, NIC
YES → routing issue between you and server
b. traceroute → where does it stop?
3. If YES ping but can't connect:
a. nc -zv server 443 → port open?
NO → firewall blocking, service not running, wrong port
YES → L4 OK, issue is L5-L7 (TLS, app)
b. curl -v https://server → TLS error? App error?
4. Check DNS separately:
dig +short server.example.com
If wrong IP → DNS issue (caching, wrong record)"The API is slow"
curl timing breakdown:
dns: 0.020s → normal (< 100ms ok)
tcp: 0.200s → HIGH (normally < 50ms local, < 150ms cross-region)
tls: 0.400s → ok-ish (50-200ms normal)
ttfb: 2.100s → HIGH ← server processing slow
total: 2.1s
Interpretation:
High DNS → resolver slow, consider 8.8.8.8 or local caching
High TCP → network congestion or geographic distance
High TLS → TLS 1.2 vs 1.3, OCSP stapling, session resumption
High TTFB → server slow: DB query, external API call, CPU bound"SSL certificate error"
bash# Check certificate details
openssl s_client -connect example.com:443 -servername example.com
# Check expiry
echo | openssl s_client -connect example.com:443 2>/dev/null | \
openssl x509 -noout -dates
# Check full chain
openssl s_client -connect example.com:443 -showcerts 2>/dev/null | \
openssl x509 -noout -text | grep -A2 "Subject:"
# Common issues:
# - Expired certificate
# - Wrong hostname (CN mismatch)
# - Incomplete chain (intermediate CA missing)
# - Self-signed cert not trusted
# - TLS version mismatch"Connection refused vs timeout"
Connection refused (ECONNREFUSED):
→ Host reachable, port closed
→ Service not running, or listening on wrong interface (127.0.0.1 vs 0.0.0.0)
→ Firewall rejecting (RST packet sent back)
Connection timeout:
→ Packet dropped, no response
→ Firewall silently dropping
→ Host unreachable / wrong IP
→ Routing issue (packet going to wrong place)
Distinguish: nc -w 3 -zv host port
→ "Connection refused" immediately = ECONNREFUSED
→ hangs for 3s then "timed out" = timeoutLinux Network Configuration
bash# View interfaces
ip addr show # or: ip a
ip link show
# View routes
ip route show # or: ip r
ip route get 8.8.8.8 # which route would be used?
# Add/delete IP
ip addr add 192.168.1.100/24 dev eth0
ip addr del 192.168.1.100/24 dev eth0
# Bring interface up/down
ip link set eth0 up
ip link set eth0 down
# Add/delete route
ip route add 10.0.0.0/8 via 192.168.1.254
ip route del 10.0.0.0/8
# DNS configuration
cat /etc/resolv.conf
# nameserver 8.8.8.8
# nameserver 1.1.1.1
# search internal.company.com
# /etc/hosts — override DNS locally
echo "127.0.0.1 myapp.local" >> /etc/hostsNode.js Network Debugging
javascript// Increase http agent concurrency (default: 5)
import http from 'http';
const agent = new http.Agent({ maxSockets: 50 });
// Debug DNS resolution
import dns from 'dns';
dns.resolve4('example.com', (err, addresses) => {
console.log(addresses);
});
// Log all requests (debugging)
import https from 'https';
const originalRequest = https.request;
https.request = function(...args) {
console.log('HTTPS request:', args[0]);
return originalRequest.apply(this, args);
};
// Check ETIMEDOUT vs ECONNREFUSED vs ENOTFOUND
fetch('https://example.com')
.catch(err => {
// err.cause.code:
// ENOTFOUND → DNS failed
// ECONNREFUSED → port closed
// ETIMEDOUT → no response
// ECONNRESET → connection dropped mid-way
console.error(err.cause?.code);
});Common Error Codes
| Code | Meaning | Likely Cause |
|---|---|---|
ECONNREFUSED |
Connection refused | Port closed, service down |
ETIMEDOUT |
Connection timed out | Firewall dropping, host unreachable |
ENOTFOUND |
DNS lookup failed | Wrong hostname, DNS down |
ECONNRESET |
Connection reset by peer | Server closed connection mid-transfer |
EHOSTUNREACH |
No route to host | Routing issue, interface down |
EADDRINUSE |
Address already in use | Port taken by another process |
EPIPE |
Broken pipe | Writing to closed socket |
SSL_ERROR_RX_RECORD_TOO_LONG |
SSL handshake failed | Connecting to HTTP port with HTTPS |
Quick Reference Cheatsheet
bash# Is host reachable?
ping -c 3 HOST
# Is port open?
nc -zv HOST PORT
# DNS lookup
dig +short HOST
# HTTP test with timing
curl -o /dev/null -sw "%{http_code} %{time_total}s\n" https://HOST
# Listening ports
ss -tlnp
# Route to destination
ip route get HOST_IP
# Trace path
traceroute HOST
# Capture HTTP traffic
tcpdump -i eth0 port 80 -A -n
# Certificate expiry
echo | openssl s_client -connect HOST:443 2>/dev/null | openssl x509 -noout -dates
# What's using port X?
lsof -i :PORT # macOS
ss -tlnp | grep PORT # Linux