DNS is one of the most-described protocols in the world and one of the least-understood at the resolver level. The popular framing is that DNS is “stateless UDP” — clients send queries, resolvers send answers, nothing is held between them. That description is wrong in the way it matters most for security: a recursive resolver carries a substantial amount of per-query state, and the tables that hold that state have surprisingly low default ceilings.

This post is a companion piece to TCP Connection State Exhaustion. That post argued the unit of TCP capacity is state, not port numbers. The same reframing applies to DNS, with a different state geometry. DNS exhaustion attacks aren’t recent — water torture has been operational since 2014, NXNS was published in 2020, TsuNAME in 2021, NRDelegation in 2023 — but their underlying mechanisms remain effective because the gap between documented mitigations and deployed mitigations is, as always, wide.


What State Does a DNS Resolver Actually Hold?

A recursive resolver is a stateful machine pretending to be a stateless service. When a client asks www.example.com, the resolver typically does not have the answer cached. It must:

  1. Accept the client query and allocate a slot to remember the client is waiting
  2. Send queries to authoritative servers (root → TLD → example.com NS)
  3. Track each upstream query with a unique 16-bit transaction ID
  4. Possibly validate DNSSEC signatures, requiring fetching DS and DNSKEY records up the chain
  5. Cache the answer
  6. Return the answer to the original client and free the slot

Every step here holds state for a measurable period. The state surface includes:

State category Bounded by Typical default
Pending recursive queries recursive-clients (BIND), num-queries-per-thread (Unbound) 1000 (BIND), 1024 (Unbound)
Concurrent TCP DNS connections tcp-clients (BIND), incoming-num-tcp (Unbound) 150 (BIND), 10/thread (Unbound)
Outstanding upstream queries Transaction ID space per server 65,536 (16-bit, shared)
Same-name concurrent fetches clients-per-query, fetches-per-zone (BIND) 10 (low)
Cache slots max-cache-size Varies (often 90% RAM or fixed MB)
DNSSEC validation chain Recursion depth limits Often 7 or unbounded
EDNS0/ECS state Per-client subnet entries Implementation-defined

The numbers in the right column are what an attacker is trying to exhaust. They are not large. A default BIND9 resolver can be saturated by 150 TCP sockets. That’s not a typo.


UDP DNS — Why Sockstress Doesn’t Apply, but Something Worse Does

TCP exhaustion attacks like Sockstress work by completing a handshake and then refusing to make progress, pinning kernel struct sock state. UDP has no handshake. There is no per-flow struct sock. The TIME_WAIT/Sockstress family does not translate.

But UDP DNS has its own structural weakness: the resolver does not control the question. Anyone can ask anything, including questions the resolver cannot answer from cache. Every cache miss forces the resolver to allocate a pending-query slot until either:

  • The authoritative answer arrives
  • A timeout fires (typically 10–30 seconds)
  • The slot is forcibly freed under pressure

The attacker’s job, then, is not to consume TCP state. It is to manufacture cache misses faster than the resolver can drain its pending-query table.

Water Torture (Random Subdomain Attack, PRSD)

This is the canonical UDP DNS state exhaustion attack. First seen at scale in 2014 against Chinese DNS infrastructure, and a recurring component of large DDoS events since (notably the 2016 Mirai-led attack on Dyn).

The attacker generates queries of the form:

ed7a9c3f.victim.com
b210cce8.victim.com
55f1a402.victim.com
...

Each prefix is random — usually 8 to 16 hex characters or random labels. The targeted domain (victim.com) is fixed. The attacker sends millions of these queries, typically with spoofed source IPs to evade rate limiting, often through open recursive resolvers used as reflectors.

The mechanics of damage:

  1. Resolver cache miss is guaranteed because every prefix is unique
  2. The resolver must consult the authoritative server for victim.com to determine that, yes, this subdomain doesn’t exist (NXDOMAIN response)
  3. recursive-clients slots fill up waiting for these upstream answers
  4. The authoritative server for victim.com is also flooded — secondary damage
  5. Once recursive-clients is full, legitimate queries are dropped

The defining property: the resolver cannot cache its way out. Negative caching (RFC 2308) helps for repeated queries on the same name, but each random prefix is a fresh name. The cache fills with millions of NXDOMAIN entries, eventually evicting useful records.

Water torture is still operationally effective in 2026 against:

  • Recursive resolvers without Aggressive NSEC caching (RFC 8198) enabled
  • Authoritative servers without rate limiting
  • Resolvers exposing UDP/53 to the internet without DNS Cookies (RFC 7873)
  • Self-hosted Pi-hole, AdGuard Home, dnsmasq instances on consumer routers
  • Internal corporate resolvers that “we don’t worry about because they’re internal” (until an insider or compromised endpoint becomes the attacker)

Why Aggressive NSEC Caching Is the Real Defense

When DNSSEC-signed domains respond with NXDOMAIN, the response includes an NSEC or NSEC3 record that cryptographically proves the non-existence of a range of names. Without aggressive caching, the resolver caches only the specific name asked. With RFC 8198, the resolver caches the NSEC range itself and synthesizes future NXDOMAIN responses locally for any name in that proven-empty range.

For a water-torture attack against victim.com, if victim.com is DNSSEC-signed and the resolver implements RFC 8198, the first random subdomain query triggers a real upstream lookup. Every subsequent random prefix falling in the same NSEC range is answered from local synthesis — no upstream traffic, no pending-query slot, no exhaustion.

This is the single most effective structural defense against water torture, and adoption has been slow because it requires both DNSSEC on the target domain and resolver support. BIND9 enables it by default since 9.12. Unbound supports it. dnsmasq does not.


NXNS Attack — Delegation as Amplification

Published at USENIX Security 2020 by Afek, Bremler-Barr, and Shafir of Tel Aviv University. NXNS exploits how recursive resolvers handle NS records that lack glue (IP addresses for nameservers).

The mechanism:

  1. Attacker controls an authoritative server for attacker.com
  2. Victim resolver receives a query for something.attacker.com
  3. Attacker’s authoritative response delegates the query to many fake nameservers:
;; AUTHORITY SECTION
something.attacker.com.   NS  ns1.fake-target.net.
something.attacker.com.   NS  ns2.fake-target.net.
something.attacker.com.   NS  ns3.fake-target.net.
...
something.attacker.com.   NS  ns25.fake-target.net.
  1. None of these NS records have glue (no A/AAAA in the additional section)
  2. The resolver, to follow the delegation, must now resolve ns1.fake-target.net through ns25.fake-target.net independently — 25 new recursive resolutions triggered by 1 original query
  3. If fake-target.net is the actual victim, its authoritative server now receives a flood of NS lookups
  4. The resolver itself accumulates 25× pending state per attack query

The original paper measured amplification factors of up to 1620× against unpatched resolvers. The attack is bidirectional damage: the resolver’s state table fills, and the victim domain’s authoritative server is flooded.

What makes NXNS particularly nasty: it does not require source IP spoofing. The attacker just controls a legitimate-looking authoritative server. There is no network-level defense that distinguishes “legitimate delegation” from “weaponized delegation” — only response-content inspection.

Mitigations and Their Limits

After disclosure, major resolvers implemented caps:

  • BIND9 added fetches-per-zone and fetches-per-server limits (already existed, defaults raised in importance)
  • Unbound added target-fetch-policy adjustments
  • PowerDNS Recursor added explicit NXNS protections
  • CVE-2020-12662 (Unbound), CVE-2020-12667 (Knot Resolver), CVE-2020-8616 (BIND) all reference NXNS variants

These caps work by limiting how many simultaneous fetches a resolver will initiate per upstream zone or server. They reduce amplification but do not eliminate it. Self-hosted resolvers running older versions, embedded DNS forwarders, and many corporate internal resolvers remain unpatched — particularly because resolver software updates are not part of the normal patching cycle on appliances.


TsuNAME — Cyclical Dependencies

Published in 2021 by Moura et al. TsuNAME exploits resolvers that don’t detect cyclical NS dependencies.

The setup:

example1.com    NS    ns.example2.com.
example2.com    NS    ns.example1.com.

Each domain’s NS points to the other. To resolve either, the resolver must first resolve the other, which requires first resolving the original. A naive resolver loops forever, generating amplifying upstream traffic.

The attack does not require attacker-controlled domains — TsuNAME was discovered because real misconfigured domains were generating massive query volumes against authoritative infrastructure. Google Public DNS and Cisco OpenDNS both observed and reported queries-per-second spikes traceable to cyclic dependencies. After disclosure, both implemented loop detection.

The state exhaustion angle: each iteration of the loop consumes a pending-query slot. A resolver that hits a TsuNAME-style configuration without loop detection saturates its recursive-clients table in seconds.

Modern resolvers (BIND9 9.16+, Unbound 1.13+, PowerDNS Recursor 4.5+) now detect repeated identical fetches and abort. Older versions and many embedded resolvers do not.


NRDelegation — Non-Responsive Delegations

Presented at USENIX Security 2023 by Bushart, Rossow et al. A more recent variant in the delegation-attack family.

The core observation: when a resolver follows a delegation, it must contact the listed nameservers. If those nameservers are non-responsive (silently dropping packets), the resolver retries with progressively longer timeouts before giving up. During this entire waiting period, the resolver holds the pending query state.

The attacker:

  1. Sets up authoritative responses that delegate to nameservers under attacker control
  2. Configures those nameservers to silently drop incoming queries
  3. Forces victim resolvers into long retry/timeout cycles per delegation

This is subtler than NXNS — it doesn’t multiply queries, it stretches each query’s lifetime. The amplification is temporal. A resolver normally turning over a query in 100ms is now holding the slot for 10–30 seconds. The pending-query table fills with stalled lookups even at modest attack rates.

Mitigations are uneven. Bushart et al. proposed bounded retry strategies and aggressive timeout reduction. Some resolvers adopted them; many did not.


TCP DNS — Where Sockstress Returns

UDP DNS responses cannot exceed certain practical sizes (1232 bytes is a common safe ceiling with EDNS0; the legacy limit was 512 bytes). When responses are larger — DNSSEC chains, AXFR zone transfers, large TXT records — the protocol falls back to TCP/53.

TCP DNS is full-blown TCP. Every connection holds:

  • A struct sock in the kernel
  • A file descriptor in the resolver process
  • A slot in the resolver’s TCP client table
  • A conntrack entry if a stateful firewall sits in the path

Every attack from the TCP exhaustion family applies directly:

Sockstress against TCP/53

Open 150 TCP connections to a BIND9 resolver, complete each handshake, advertise window 0. Default tcp-clients is 150. The resolver is now incapable of accepting any new TCP DNS query. AXFR breaks. Any DNSSEC response over 1232 bytes is unserveable. Some DoT-fallback configurations break.

Slowloris against TCP DNS

Send a TCP query length prefix (the 2-byte length field that precedes DNS-over-TCP messages) and then nothing else. Or send the query one byte at a time. Many DNS server implementations don’t enforce per-connection read timeouts on partial queries. Slots stay open until the OS-level idle TCP timeout fires — by default, minutes.

DoT (TCP/853) — Pure TCP, Full Attack Surface

DNS-over-TLS is TCP with TLS on top. Everything that works against HTTPS works here:

  • Sockstress on TCP/853
  • TLS handshake exhaustion (incomplete ClientHello floods)
  • Slow TLS handshake / TLS renegotiation abuse (where renegotiation is still supported)

Self-hosted DoT servers — Pi-hole-in-DoT-mode, AdGuard Home with DoT enabled, NextDNS Lite forks — frequently run with TCP defaults that make them softer than the HTTPS servers they sit beside.

DoH (TCP/443) — All HTTP/2 Attacks Apply

DNS-over-HTTPS is HTTP/2 over TLS over TCP. Every attack in the HTTP/2 family applies:

  • HTTP/2 Rapid Reset (CVE-2023-44487, October 2023) — open many streams, immediately RST. Was observed against Cloudflare and Google’s DoH endpoints during the original attack window.
  • HTTP/2 CONTINUATION Flood (Bartek Nowotarski, April 2024) — send unending CONTINUATION frames, stream state accumulates.
  • TLS handshake exhaustion before HTTP/2 even starts.

DoH is in many ways the highest-state-cost DNS deployment imaginable. A single query that would have been a 100-byte UDP exchange becomes:

TCP handshake (state) →
TLS handshake (state, with key material) →
HTTP/2 stream allocation (state) →
HTTP request parsing (state) →
DNS resolution (state) →
HTTP response (state) →
TLS shutdown (state) →
TCP teardown (TIME_WAIT)

Every stage is exhaustible. DoH deployments often have higher state cost per query than the resolver they sit in front of.


A Comparison Worth Sitting With

The most striking observation when measuring DNS defaults against HTTP defaults:

System Default concurrent client/connection cap Notes
BIND9 (recursive-clients) 1000 UDP pending queries
BIND9 (tcp-clients) 150 TCP DNS sockets
Unbound (incoming-num-tcp per thread) 10 Multiply by num-threads
dnsmasq ~150 hardcoded Common on Pi-hole, OpenWrt routers
Knot Resolver Tunable, often 256 default
nginx (worker_connections per worker) 1024 HTTP for comparison
Apache (MaxRequestWorkers, event MPM) 400 HTTP for comparison
HAProxy (maxconn) 4000 Default front-end limit

The DNS resolver defaults are systematically lower than HTTP server defaults. DNS gets less operational attention. Operators tune nginx exhaustively and leave their internal BIND running stock. The numerical gap is real and exploitable.


Defense — Layered, Mostly Known, Mostly Not Deployed

Layer Control What it addresses
Resolver config recursive-clients raised to 10k+ Water torture headroom
Resolver config tcp-clients raised, per-IP TCP limits Sockstress on TCP/53 and 853
Resolver config fetches-per-zone, fetches-per-server NXNS amplification
Resolver feature Aggressive NSEC caching (RFC 8198) Water torture (DNSSEC zones)
Resolver feature Cyclic delegation detection TsuNAME
Resolver feature Bounded retry on non-responsive NS NRDelegation
Protocol DNS Cookies (RFC 7873) Spoofed-source water torture
Network Per-IP UDP rate limiting Water torture from non-spoofed sources
Network Per-IP TCP connlimit on 53/853/443 Sockstress, Slowloris
Authoritative Response Rate Limiting (RRL) Reflected amplification (different attack family, related infrastructure)
Operational Monitor pending-query depth, TCP slot count Detect exhaustion before outage

The most undervalued of these is monitoring the pending-query depth. BIND exposes it via rndc status and statistics channel. Unbound exposes it via unbound-control stats. Most operators don’t track it, so when the table fills, they discover the problem from user complaints rather than from the metric that would have predicted it.

A reasonable monitoring threshold: alert when pending recursive queries exceed 60% of recursive-clients for more than 60 seconds. That single alert catches most state-exhaustion attacks before they cause customer-visible failure.


A Note on What This Isn’t

This post is about state exhaustion — attacks that fill resolver tables and cause denial of service through resource depletion. There is a separate, much-discussed DNS attack family called DNS amplification, which is a bandwidth attack using DNS as a reflector: spoof a source IP, send a small query, the resolver sends a large response to the victim. Amplification attacks don’t exhaust DNS state; they use DNS infrastructure to attack a third party with bandwidth.

The two families overlap operationally (Response Rate Limiting helps both) but are conceptually distinct. State exhaustion is about the resolver as victim. Amplification is about the resolver as weapon.


Conclusion

DNS state exhaustion attacks are not new and not glamorous, which is precisely why they continue to work. The default ceilings on resolver tables are lower than equivalent HTTP defaults. The protocol-specific attacks (water torture, NXNS, TsuNAME, NRDelegation) have well-documented mitigations that are unevenly deployed. The TCP-based variants (DoT, DoH) inherit every attack from the broader TCP exhaustion family covered in the previous post.

The framing that helps is identical to the TCP case: DNS capacity is about state, not packet rates. Every recursive query allocates state. Every TCP DNS connection allocates state. Every TLS handshake on DoT/DoH allocates state. The attacks in this family are all variations on the theme of forcing the resolver to allocate state and preventing it from being freed.

Knowing that, the defenses pick themselves: limit state per source, time state out aggressively, prefer DNSSEC + aggressive NSEC caching where available, and monitor the state tables before they fill.


Further Reading

  • Afek, Bremler-Barr, Shafir, “NXNSAttack: Recursive DNS Inefficiencies and Vulnerabilities”, USENIX Security 2020
  • Moura et al., “TsuNAME: exploiting misconfiguration and vulnerability to DDoS DNS”, IMC 2021
  • Bushart et al., “NRDelegation Attack: Amplifying Denial-of-Service through Non-Responsive DNS Delegations”, USENIX Security 2023
  • RFC 2308 — Negative Caching of DNS Queries
  • RFC 7873 — Domain Name System (DNS) Cookies
  • RFC 8198 — Aggressive Use of DNSSEC-Validated Cache
  • RFC 9156 — Revised IANA Considerations for DNS Name Reservations
  • TCP Connection State Exhaustion (companion post)

This article is written for educational and defensive security research purposes only. The DNS attack mechanisms described are publicly documented in academic literature, CVE advisories, and resolver vendor security bulletins. Unauthorized testing against DNS infrastructure you do not operate or have explicit permission to assess is illegal and operationally damaging — DNS exhaustion attacks cause collateral damage to legitimate users sharing the same resolver.