There is a popular framing of TCP connection limits that goes something like this:
“There are only 65,536 ports, so a host can handle 65,536 connections.”
This is wrong on both counts — wrong about what’s limited, and wrong about what the limit means. The actual capacity ceiling has very little to do with how many port numbers exist. It has to do with kernel state: socket structures, file descriptors, conntrack entries, and the geometry of 4-tuple uniqueness.
When that state runs out, the host stops accepting new connections — not because it has no port numbers left, but because it has no memory of connections left. Two attack families weaponize this directly: TIME_WAIT exhaustion (a passive accumulation problem) and Sockstress (an active state-pinning attack). Both are conceptually simple. Both are still operationally relevant in 2026, on the systems where they shouldn’t be.
The 4-Tuple, Not the Port
The kernel identifies a TCP connection by a tuple of four values:
(source_ip, source_port, destination_ip, destination_port)
Two connections sharing three of those values can coexist as long as the fourth is different. This means a single client IP, connecting to a single server IP on a single port (say 10.0.0.5:443), is limited only by its own ephemeral port range — on Linux, typically:
$ sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768 60999
That gives roughly 28,000 usable source ports per (src_ip, dst_ip, dst_port) combination. Add a second destination IP and you have another 28,000. Add another source IP and you have another 28,000.
The “65,536 port limit” is a property of the port number field width, not a property of how many connections a host can hold. A busy server with thousands of clients regularly holds millions of simultaneous TCP sockets without bumping into the port ceiling at all — because each connection is differentiated by the client tuple, not by anything on the server side.
The real limits sit elsewhere: socket table size, file descriptor caps, conntrack entry limits, and the kernel memory budget for struct sock allocations.
TIME_WAIT: Who Pays the Cost
TIME_WAIT is the state a TCP socket enters after actively closing a connection — that is, after sending the first FIN. The socket sits in TIME_WAIT for a period defined in Linux as TCP_TIMEWAIT_LEN, hardcoded in include/net/tcp.h:
|
|
That’s 60 seconds, and it’s not tunable via sysctl — you have to recompile the kernel to change it. RFC 793 specifies 2 * MSL (Maximum Segment Lifetime), which is theoretically up to 4 minutes; Windows defaults to 120 seconds, BSD variants are similar.
The purpose of TIME_WAIT is correctness: it prevents stale segments from a closed connection from being misinterpreted as belonging to a new connection that happens to reuse the same 4-tuple. It also ensures the final ACK reaches the peer before the socket is fully released.
The key fact: TIME_WAIT is incurred by the side that initiates the close. This matters enormously for who bears the cost.
Where TIME_WAIT Hurts
Consider a reverse proxy fronting a backend pool. If the proxy uses HTTP/1.0 or sends Connection: close on each request, the proxy is the active closer. Every short request leaves a TIME_WAIT socket on the proxy, occupying a (src_ip, src_port) slot toward that specific backend (dst_ip, dst_port).
If the proxy is hitting one backend on :8080 from one source IP, the math is:
28,000 ephemeral ports / 60s TIME_WAIT ≈ 466 new connections per second
Beyond that rate, connect() calls start failing with EADDRNOTAVAIL. Not because the backend is overloaded. Not because the network is saturated. Because the proxy’s own kernel has no source ports left that aren’t pinned in TIME_WAIT.
TIME_WAIT as an Attack Surface
Pure TIME_WAIT exhaustion as a deliberate attack is awkward, for a structural reason: the side that initiates close pays the cost. An attacker opening and closing connections accumulates TIME_WAIT on themselves, not the victim.
The attack becomes meaningful only when the server initiates close, which happens in several real scenarios:
- HTTP/1.0 without keep-alive
- HTTP/1.1 with
Connection: closeheaders - Idle timeouts on the server side
- HTTPS configurations that close after each request for performance reasons
- Misconfigured WebSocket gateways closing on heartbeat failure
In these cases, an attacker pounding short requests at the server forces server-side TIME_WAIT accumulation. But — and this is critical — the limit is per-4-tuple, not global. The server only runs out of state for that specific (attacker_ip, server_ip, server_port) combination. Other clients are unaffected.
So pure TIME_WAIT exhaustion as a DoS is weak. What it really is, is a capacity ceiling that operators of proxies, load balancers, and high-turnover services routinely hit during legitimate load.
The Deprecated Footgun: tcp_tw_recycle
For years, the most common “fix” recommended on Stack Overflow and operator blogs was:
|
|
This setting accelerated TIME_WAIT cleanup using per-host timestamps. It worked beautifully in lab environments. It catastrophically broke production whenever clients sat behind NAT.
The mechanism: tcp_tw_recycle tracked the most recent TCP timestamp per source IP and rejected incoming SYNs with older timestamps as “duplicates from a TIME_WAIT connection.” When multiple clients shared a NAT public IP (every mobile network, every corporate gateway), their per-host timestamp clocks were independent. Whichever client had the lower clock would be silently blocked — packets dropped, connections refused, no logs.
The flag was removed entirely in Linux 4.12 (July 2017, commit 4396e46187). It does not exist in any modern kernel. If you see a guide still recommending it, the guide is older than puberty.
The correct setting is tcp_tw_reuse, which allows reuse of TIME_WAIT sockets for outgoing connections only, using TCP timestamps (RFC 6191) to disambiguate. It is safe with NAT because it operates on the outbound side.
Sockstress: The Real Attack
Sockstress was disclosed in 2008 by Robert E. Lee and Jack Louis of Outpost24. Unlike TIME_WAIT, it is a deliberate, asymmetric, and devastating attack — and the underlying mechanism has never been fully closed.
The mechanism, step by step:
- Attacker initiates a fully legitimate TCP handshake (SYN, SYN-ACK, ACK). Crucially, this passes SYN cookie defenses, because SYN cookies only protect the half-open queue.
- The connection reaches ESTABLISHED state.
- The attacker sends a TCP segment advertising window size 0. This is a legitimate flow-control signal meaning “I cannot accept data right now, hold off.”
- If there is data the server wants to send (typical for HTTP responses, banners, anything server-initiated), the server enters the TCP Persist Timer state. It cannot transmit, it cannot close, it must wait.
- The Persist Timer sends “zero window probes” at exponentially increasing intervals, asking “can I send yet?” The attacker keeps replying with window 0.
- On Linux, the connection is held open until
tcp_retries2exhaustion — by default, up to ~15 minutes per connection.
The attacker now multiplies this across thousands of connections and source ports. Each held connection consumes:
- A
struct sockallocation in kernel memory - A file descriptor in the server process
- A conntrack table entry (if a stateful firewall sits in the path)
- An
sk_buffqueue for buffered send data
The asymmetry is brutal. The attacker holds essentially zero state per connection (just remember to reply with window 0 occasionally). The server holds full per-connection kernel state for 15 minutes.
Why Standard Defenses Don’t Apply
- SYN cookies: Useless. The handshake is real and completes normally.
- Connection rate limiting: Partially effective, but attacker can pace the openings slowly.
- Per-IP connection limits (iptables
connlimit): Effective if configured. Most production systems aren’t. - Application-layer timeouts: Don’t apply. The TCP stack is stuck in Persist mode, the application never gets the chance to react.
The Linux-Specific Tuning Lever
Two sysctl knobs control how aggressively Linux gives up on stalled connections:
net.ipv4.tcp_retries2 = 15 # default, ~15 min for ESTABLISHED
net.ipv4.tcp_orphan_retries = 0 # special: 0 means use default (8 retries)
Lowering tcp_retries2 to 5 or 6 cuts the per-connection lifetime to 1–2 minutes. This reduces but does not eliminate Sockstress impact. It’s a tradeoff: legitimate clients on flaky networks may also be killed prematurely.
Where These Attacks Still Hit in 2026
Both attack families are well-known. Both have published mitigations. Neither is fully solved in practice, because the gap between “documented mitigation exists” and “mitigation is deployed everywhere” is enormous. The systems still vulnerable today fall into recognizable categories.
TIME_WAIT exhaustion is still hitting:
- High-churn reverse proxies and API gateways running default sysctls. Operators tune cache headers and TLS ciphers obsessively but leave
ip_local_port_rangeat the defaults and forgettcp_tw_reuse=1. - Carrier-grade NAT (CGNAT) devices. When tens of thousands of subscribers share a public IP pool, the NAT box itself runs out of source ports toward popular destinations. This is not theoretical — it’s a daily operational issue for mobile carriers.
- Kubernetes pods with default
net.ipv4.ip_local_port_rangedoing frequent outbound calls to a small set of services. Service mesh sidecars (Envoy, Istio) make this worse by adding another connection layer. - Database connection pools that don’t pool. If your app opens a new connection per query against PostgreSQL on
:5432, you will hit the 466/sec wall before you hit the database. - Egress proxies in regulated environments where every outbound HTTPS goes through one or two corporate proxy IPs.
Sockstress and its descendants still hit:
- Embedded device admin interfaces. Routers, printers, IP cameras, building management systems, industrial controllers. Their TCP stacks are often unmaintained for years. Per-IP connection limits? Almost never configured.
- SCADA / ICS / OT systems. Notoriously bad TCP stacks, often running on Windows CE or stripped Linux variants. A successful Sockstress against a PLC’s HMI interface can interrupt physical processes.
- Legacy enterprise applications behind perimeter firewalls but lacking internal
connlimitrules. Once an attacker is inside (compromised laptop, malicious insider), Sockstress against an internal Oracle DB or SAP frontend is wide open. - Network appliances with web management UIs. Firewalls, switches, load balancers — many vendors expose admin UIs with no per-IP socket caps. This is one of those uncomfortable cases where the security device itself is vulnerable to a 2008 attack.
- Misconfigured cloud workloads where the security team focused on L7 (WAF, rate limiting at the LB) but never tuned
nf_conntracklimits orsomaxconnon the actual EC2/GCE instances.
Modern L7 variants — same idea, different layer
The Sockstress pattern — complete a handshake, then pin state by not making progress — recurred at higher protocol layers because the kernel-level fix is hard, so attackers moved the attack into the application protocol where the same trick still works:
- Slowloris (2009): pin Apache worker threads by sending HTTP headers one byte at a time. Mitigated in Apache by
mod_reqtimeoutand the switch to event MPM, but still effective against unmaintained Apache instances, embedded web servers, and many IoT admin panels. - HTTP/2 Rapid Reset (CVE-2023-44487, October 2023): open HTTP/2 streams, immediately RST them, repeat. Forces server to allocate stream state but never process anything. Patched broadly but unpatched HTTP/2 implementations still in production. Caused the largest DDoS observed at the time of disclosure (398 million RPS at Google).
- HTTP/2 CONTINUATION flood (April 2024, Bartek Nowotarski): send CONTINUATION frames without ending the header block. Server accumulates header state indefinitely. Multiple HTTP/2 libraries were vulnerable; patches are uneven across implementations.
The thread connecting all of these is identical to Sockstress: the server commits resources on handshake/setup, the attacker withholds the progress signal that would let those resources be reclaimed. The protocol changes, the mechanism doesn’t.
Defense: What Actually Works
A defense stack that handles connection-state exhaustion needs layers, because no single setting catches all variants.
| Layer | Control | What it stops |
|---|---|---|
| Kernel | net.ipv4.tcp_tw_reuse=1 |
TIME_WAIT exhaustion (outbound) |
| Kernel | net.ipv4.tcp_retries2=5 |
Sockstress connection lifetime |
| Kernel | net.core.somaxconn=4096+ |
Accept-queue saturation |
| Kernel | nf_conntrack_max tuning |
Conntrack table exhaustion |
| Firewall | iptables -m connlimit --connlimit-above N |
Per-IP socket cap (kills Sockstress at source) |
| Firewall | iptables -m hashlimit |
Rate-based new-connection limiting |
| App | Per-connection idle timeout | Slowloris and HTTP-layer slow attacks |
| App | HTTP/2 stream/frame limits | Rapid Reset, CONTINUATION flood |
| LB/Proxy | Connection pooling to backend | Avoid creating the TIME_WAIT problem in the first place |
| Ops | Monitor ss -s and conntrack usage |
Detect exhaustion before it causes outages |
The most operationally important of these is connlimit. A simple rule like:
|
|
caps any single source IP at 100 concurrent connections to port 443. This single rule defeats classical Sockstress entirely and costs nothing. It is missing from a startling number of production firewall configurations.
Conclusion
TCP connection-state exhaustion is older than most engineers reading this post and it still works in 2026, because the gap between known and deployed defenses is still wide. The reasons are unglamorous: default sysctls, unmaintained appliances, “we have a WAF so we’re fine” thinking, and the persistent belief that the limit is 65,536 ports.
The reframing that actually helps is this: TCP capacity is about state, not numbers. The 4-tuple is the unit of identity. The kernel state tables are the unit of cost. Every attack in this family — TIME_WAIT, Sockstress, Slowloris, Rapid Reset, CONTINUATION flood — is a variation on the same theme: find a way to make the server allocate state, then refuse to let it deallocate.
Knowing that, the defenses pick themselves: limit state per source, time state out aggressively, and monitor the tables that matter (ss -s, nf_conntrack counts) before they overflow.
Further Reading
- Outpost24, “Sockstress” original disclosure (2008)
- RFC 793 — Transmission Control Protocol
- RFC 6191 — Reducing the TIME-WAIT State Using TCP Timestamps
- RFC 7323 — TCP Extensions for High Performance
- CVE-2023-44487 — HTTP/2 Rapid Reset
- Bartek Nowotarski, HTTP/2 CONTINUATION Flood (April 2024)
- Linux kernel:
include/net/tcp.h,Documentation/networking/ip-sysctl.rst
⚠️ Legal Disclaimer
This article is written for educational and defensive security research purposes only. The techniques described are publicly documented for the purpose of building more resilient systems. Unauthorized testing against systems you do not own or have explicit permission to assess is illegal.