DNS is one of the most-described protocols in the world and one of the least-understood at the resolver level. The popular framing is that DNS is “stateless UDP” — clients send queries, resolvers send answers, nothing is held between them. That description is wrong in the way it matters most for security: a recursive resolver carries a substantial amount of per-query state, and the tables that hold that state have surprisingly low default ceilings.
This post is a companion piece to TCP Connection State Exhaustion. That post argued the unit of TCP capacity is state, not port numbers. The same reframing applies to DNS, with a different state geometry. DNS exhaustion attacks aren’t recent — water torture has been operational since 2014, NXNS was published in 2020, TsuNAME in 2021, NRDelegation in 2023 — but their underlying mechanisms remain effective because the gap between documented mitigations and deployed mitigations is, as always, wide.
What State Does a DNS Resolver Actually Hold?
A recursive resolver is a stateful machine pretending to be a stateless service. When a client asks www.example.com, the resolver typically does not have the answer cached. It must:
- Accept the client query and allocate a slot to remember the client is waiting
- Send queries to authoritative servers (root → TLD → example.com NS)
- Track each upstream query with a unique 16-bit transaction ID
- Possibly validate DNSSEC signatures, requiring fetching DS and DNSKEY records up the chain
- Cache the answer
- Return the answer to the original client and free the slot
Every step here holds state for a measurable period. The state surface includes:
| State category | Bounded by | Typical default |
|---|---|---|
| Pending recursive queries | recursive-clients (BIND), num-queries-per-thread (Unbound) |
1000 (BIND), 1024 (Unbound) |
| Concurrent TCP DNS connections | tcp-clients (BIND), incoming-num-tcp (Unbound) |
150 (BIND), 10/thread (Unbound) |
| Outstanding upstream queries | Transaction ID space per server | 65,536 (16-bit, shared) |
| Same-name concurrent fetches | clients-per-query, fetches-per-zone (BIND) |
10 (low) |
| Cache slots | max-cache-size |
Varies (often 90% RAM or fixed MB) |
| DNSSEC validation chain | Recursion depth limits | Often 7 or unbounded |
| EDNS0/ECS state | Per-client subnet entries | Implementation-defined |
The numbers in the right column are what an attacker is trying to exhaust. They are not large. A default BIND9 resolver can be saturated by 150 TCP sockets. That’s not a typo.
UDP DNS — Why Sockstress Doesn’t Apply, but Something Worse Does
TCP exhaustion attacks like Sockstress work by completing a handshake and then refusing to make progress, pinning kernel struct sock state. UDP has no handshake. There is no per-flow struct sock. The TIME_WAIT/Sockstress family does not translate.
But UDP DNS has its own structural weakness: the resolver does not control the question. Anyone can ask anything, including questions the resolver cannot answer from cache. Every cache miss forces the resolver to allocate a pending-query slot until either:
- The authoritative answer arrives
- A timeout fires (typically 10–30 seconds)
- The slot is forcibly freed under pressure
The attacker’s job, then, is not to consume TCP state. It is to manufacture cache misses faster than the resolver can drain its pending-query table.
Water Torture (Random Subdomain Attack, PRSD)
This is the canonical UDP DNS state exhaustion attack. First seen at scale in 2014 against Chinese DNS infrastructure, and a recurring component of large DDoS events since (notably the 2016 Mirai-led attack on Dyn).
The attacker generates queries of the form:
ed7a9c3f.victim.com
b210cce8.victim.com
55f1a402.victim.com
...
Each prefix is random — usually 8 to 16 hex characters or random labels. The targeted domain (victim.com) is fixed. The attacker sends millions of these queries, typically with spoofed source IPs to evade rate limiting, often through open recursive resolvers used as reflectors.
The mechanics of damage:
- Resolver cache miss is guaranteed because every prefix is unique
- The resolver must consult the authoritative server for
victim.comto determine that, yes, this subdomain doesn’t exist (NXDOMAIN response) recursive-clientsslots fill up waiting for these upstream answers- The authoritative server for
victim.comis also flooded — secondary damage - Once
recursive-clientsis full, legitimate queries are dropped
The defining property: the resolver cannot cache its way out. Negative caching (RFC 2308) helps for repeated queries on the same name, but each random prefix is a fresh name. The cache fills with millions of NXDOMAIN entries, eventually evicting useful records.
Water torture is still operationally effective in 2026 against:
- Recursive resolvers without Aggressive NSEC caching (RFC 8198) enabled
- Authoritative servers without rate limiting
- Resolvers exposing UDP/53 to the internet without DNS Cookies (RFC 7873)
- Self-hosted Pi-hole, AdGuard Home, dnsmasq instances on consumer routers
- Internal corporate resolvers that “we don’t worry about because they’re internal” (until an insider or compromised endpoint becomes the attacker)
Why Aggressive NSEC Caching Is the Real Defense
When DNSSEC-signed domains respond with NXDOMAIN, the response includes an NSEC or NSEC3 record that cryptographically proves the non-existence of a range of names. Without aggressive caching, the resolver caches only the specific name asked. With RFC 8198, the resolver caches the NSEC range itself and synthesizes future NXDOMAIN responses locally for any name in that proven-empty range.
For a water-torture attack against victim.com, if victim.com is DNSSEC-signed and the resolver implements RFC 8198, the first random subdomain query triggers a real upstream lookup. Every subsequent random prefix falling in the same NSEC range is answered from local synthesis — no upstream traffic, no pending-query slot, no exhaustion.
This is the single most effective structural defense against water torture, and adoption has been slow because it requires both DNSSEC on the target domain and resolver support. BIND9 enables it by default since 9.12. Unbound supports it. dnsmasq does not.
NXNS Attack — Delegation as Amplification
Published at USENIX Security 2020 by Afek, Bremler-Barr, and Shafir of Tel Aviv University. NXNS exploits how recursive resolvers handle NS records that lack glue (IP addresses for nameservers).
The mechanism:
- Attacker controls an authoritative server for
attacker.com - Victim resolver receives a query for
something.attacker.com - Attacker’s authoritative response delegates the query to many fake nameservers:
;; AUTHORITY SECTION
something.attacker.com. NS ns1.fake-target.net.
something.attacker.com. NS ns2.fake-target.net.
something.attacker.com. NS ns3.fake-target.net.
...
something.attacker.com. NS ns25.fake-target.net.
- None of these NS records have glue (no A/AAAA in the additional section)
- The resolver, to follow the delegation, must now resolve
ns1.fake-target.netthroughns25.fake-target.netindependently — 25 new recursive resolutions triggered by 1 original query - If
fake-target.netis the actual victim, its authoritative server now receives a flood of NS lookups - The resolver itself accumulates 25× pending state per attack query
The original paper measured amplification factors of up to 1620× against unpatched resolvers. The attack is bidirectional damage: the resolver’s state table fills, and the victim domain’s authoritative server is flooded.
What makes NXNS particularly nasty: it does not require source IP spoofing. The attacker just controls a legitimate-looking authoritative server. There is no network-level defense that distinguishes “legitimate delegation” from “weaponized delegation” — only response-content inspection.
Mitigations and Their Limits
After disclosure, major resolvers implemented caps:
- BIND9 added
fetches-per-zoneandfetches-per-serverlimits (already existed, defaults raised in importance) - Unbound added
target-fetch-policyadjustments - PowerDNS Recursor added explicit NXNS protections
- CVE-2020-12662 (Unbound), CVE-2020-12667 (Knot Resolver), CVE-2020-8616 (BIND) all reference NXNS variants
These caps work by limiting how many simultaneous fetches a resolver will initiate per upstream zone or server. They reduce amplification but do not eliminate it. Self-hosted resolvers running older versions, embedded DNS forwarders, and many corporate internal resolvers remain unpatched — particularly because resolver software updates are not part of the normal patching cycle on appliances.
TsuNAME — Cyclical Dependencies
Published in 2021 by Moura et al. TsuNAME exploits resolvers that don’t detect cyclical NS dependencies.
The setup:
example1.com NS ns.example2.com.
example2.com NS ns.example1.com.
Each domain’s NS points to the other. To resolve either, the resolver must first resolve the other, which requires first resolving the original. A naive resolver loops forever, generating amplifying upstream traffic.
The attack does not require attacker-controlled domains — TsuNAME was discovered because real misconfigured domains were generating massive query volumes against authoritative infrastructure. Google Public DNS and Cisco OpenDNS both observed and reported queries-per-second spikes traceable to cyclic dependencies. After disclosure, both implemented loop detection.
The state exhaustion angle: each iteration of the loop consumes a pending-query slot. A resolver that hits a TsuNAME-style configuration without loop detection saturates its recursive-clients table in seconds.
Modern resolvers (BIND9 9.16+, Unbound 1.13+, PowerDNS Recursor 4.5+) now detect repeated identical fetches and abort. Older versions and many embedded resolvers do not.
NRDelegation — Non-Responsive Delegations
Presented at USENIX Security 2023 by Bushart, Rossow et al. A more recent variant in the delegation-attack family.
The core observation: when a resolver follows a delegation, it must contact the listed nameservers. If those nameservers are non-responsive (silently dropping packets), the resolver retries with progressively longer timeouts before giving up. During this entire waiting period, the resolver holds the pending query state.
The attacker:
- Sets up authoritative responses that delegate to nameservers under attacker control
- Configures those nameservers to silently drop incoming queries
- Forces victim resolvers into long retry/timeout cycles per delegation
This is subtler than NXNS — it doesn’t multiply queries, it stretches each query’s lifetime. The amplification is temporal. A resolver normally turning over a query in 100ms is now holding the slot for 10–30 seconds. The pending-query table fills with stalled lookups even at modest attack rates.
Mitigations are uneven. Bushart et al. proposed bounded retry strategies and aggressive timeout reduction. Some resolvers adopted them; many did not.
TCP DNS — Where Sockstress Returns
UDP DNS responses cannot exceed certain practical sizes (1232 bytes is a common safe ceiling with EDNS0; the legacy limit was 512 bytes). When responses are larger — DNSSEC chains, AXFR zone transfers, large TXT records — the protocol falls back to TCP/53.
TCP DNS is full-blown TCP. Every connection holds:
- A
struct sockin the kernel - A file descriptor in the resolver process
- A slot in the resolver’s TCP client table
- A conntrack entry if a stateful firewall sits in the path
Every attack from the TCP exhaustion family applies directly:
Sockstress against TCP/53
Open 150 TCP connections to a BIND9 resolver, complete each handshake, advertise window 0. Default tcp-clients is 150. The resolver is now incapable of accepting any new TCP DNS query. AXFR breaks. Any DNSSEC response over 1232 bytes is unserveable. Some DoT-fallback configurations break.
Slowloris against TCP DNS
Send a TCP query length prefix (the 2-byte length field that precedes DNS-over-TCP messages) and then nothing else. Or send the query one byte at a time. Many DNS server implementations don’t enforce per-connection read timeouts on partial queries. Slots stay open until the OS-level idle TCP timeout fires — by default, minutes.
DoT (TCP/853) — Pure TCP, Full Attack Surface
DNS-over-TLS is TCP with TLS on top. Everything that works against HTTPS works here:
- Sockstress on TCP/853
- TLS handshake exhaustion (incomplete ClientHello floods)
- Slow TLS handshake / TLS renegotiation abuse (where renegotiation is still supported)
Self-hosted DoT servers — Pi-hole-in-DoT-mode, AdGuard Home with DoT enabled, NextDNS Lite forks — frequently run with TCP defaults that make them softer than the HTTPS servers they sit beside.
DoH (TCP/443) — All HTTP/2 Attacks Apply
DNS-over-HTTPS is HTTP/2 over TLS over TCP. Every attack in the HTTP/2 family applies:
- HTTP/2 Rapid Reset (CVE-2023-44487, October 2023) — open many streams, immediately RST. Was observed against Cloudflare and Google’s DoH endpoints during the original attack window.
- HTTP/2 CONTINUATION Flood (Bartek Nowotarski, April 2024) — send unending CONTINUATION frames, stream state accumulates.
- TLS handshake exhaustion before HTTP/2 even starts.
DoH is in many ways the highest-state-cost DNS deployment imaginable. A single query that would have been a 100-byte UDP exchange becomes:
TCP handshake (state) →
TLS handshake (state, with key material) →
HTTP/2 stream allocation (state) →
HTTP request parsing (state) →
DNS resolution (state) →
HTTP response (state) →
TLS shutdown (state) →
TCP teardown (TIME_WAIT)
Every stage is exhaustible. DoH deployments often have higher state cost per query than the resolver they sit in front of.
A Comparison Worth Sitting With
The most striking observation when measuring DNS defaults against HTTP defaults:
| System | Default concurrent client/connection cap | Notes |
|---|---|---|
BIND9 (recursive-clients) |
1000 | UDP pending queries |
BIND9 (tcp-clients) |
150 | TCP DNS sockets |
Unbound (incoming-num-tcp per thread) |
10 | Multiply by num-threads |
| dnsmasq | ~150 hardcoded | Common on Pi-hole, OpenWrt routers |
| Knot Resolver | Tunable, often 256 default | |
nginx (worker_connections per worker) |
1024 | HTTP for comparison |
Apache (MaxRequestWorkers, event MPM) |
400 | HTTP for comparison |
HAProxy (maxconn) |
4000 | Default front-end limit |
The DNS resolver defaults are systematically lower than HTTP server defaults. DNS gets less operational attention. Operators tune nginx exhaustively and leave their internal BIND running stock. The numerical gap is real and exploitable.
Defense — Layered, Mostly Known, Mostly Not Deployed
| Layer | Control | What it addresses |
|---|---|---|
| Resolver config | recursive-clients raised to 10k+ |
Water torture headroom |
| Resolver config | tcp-clients raised, per-IP TCP limits |
Sockstress on TCP/53 and 853 |
| Resolver config | fetches-per-zone, fetches-per-server |
NXNS amplification |
| Resolver feature | Aggressive NSEC caching (RFC 8198) | Water torture (DNSSEC zones) |
| Resolver feature | Cyclic delegation detection | TsuNAME |
| Resolver feature | Bounded retry on non-responsive NS | NRDelegation |
| Protocol | DNS Cookies (RFC 7873) | Spoofed-source water torture |
| Network | Per-IP UDP rate limiting | Water torture from non-spoofed sources |
| Network | Per-IP TCP connlimit on 53/853/443 |
Sockstress, Slowloris |
| Authoritative | Response Rate Limiting (RRL) | Reflected amplification (different attack family, related infrastructure) |
| Operational | Monitor pending-query depth, TCP slot count | Detect exhaustion before outage |
The most undervalued of these is monitoring the pending-query depth. BIND exposes it via rndc status and statistics channel. Unbound exposes it via unbound-control stats. Most operators don’t track it, so when the table fills, they discover the problem from user complaints rather than from the metric that would have predicted it.
A reasonable monitoring threshold: alert when pending recursive queries exceed 60% of recursive-clients for more than 60 seconds. That single alert catches most state-exhaustion attacks before they cause customer-visible failure.
A Note on What This Isn’t
This post is about state exhaustion — attacks that fill resolver tables and cause denial of service through resource depletion. There is a separate, much-discussed DNS attack family called DNS amplification, which is a bandwidth attack using DNS as a reflector: spoof a source IP, send a small query, the resolver sends a large response to the victim. Amplification attacks don’t exhaust DNS state; they use DNS infrastructure to attack a third party with bandwidth.
The two families overlap operationally (Response Rate Limiting helps both) but are conceptually distinct. State exhaustion is about the resolver as victim. Amplification is about the resolver as weapon.
Conclusion
DNS state exhaustion attacks are not new and not glamorous, which is precisely why they continue to work. The default ceilings on resolver tables are lower than equivalent HTTP defaults. The protocol-specific attacks (water torture, NXNS, TsuNAME, NRDelegation) have well-documented mitigations that are unevenly deployed. The TCP-based variants (DoT, DoH) inherit every attack from the broader TCP exhaustion family covered in the previous post.
The framing that helps is identical to the TCP case: DNS capacity is about state, not packet rates. Every recursive query allocates state. Every TCP DNS connection allocates state. Every TLS handshake on DoT/DoH allocates state. The attacks in this family are all variations on the theme of forcing the resolver to allocate state and preventing it from being freed.
Knowing that, the defenses pick themselves: limit state per source, time state out aggressively, prefer DNSSEC + aggressive NSEC caching where available, and monitor the state tables before they fill.
Further Reading
- Afek, Bremler-Barr, Shafir, “NXNSAttack: Recursive DNS Inefficiencies and Vulnerabilities”, USENIX Security 2020
- Moura et al., “TsuNAME: exploiting misconfiguration and vulnerability to DDoS DNS”, IMC 2021
- Bushart et al., “NRDelegation Attack: Amplifying Denial-of-Service through Non-Responsive DNS Delegations”, USENIX Security 2023
- RFC 2308 — Negative Caching of DNS Queries
- RFC 7873 — Domain Name System (DNS) Cookies
- RFC 8198 — Aggressive Use of DNSSEC-Validated Cache
- RFC 9156 — Revised IANA Considerations for DNS Name Reservations
- TCP Connection State Exhaustion (companion post)
⚠️ Legal Disclaimer
This article is written for educational and defensive security research purposes only. The DNS attack mechanisms described are publicly documented in academic literature, CVE advisories, and resolver vendor security bulletins. Unauthorized testing against DNS infrastructure you do not operate or have explicit permission to assess is illegal and operationally damaging — DNS exhaustion attacks cause collateral damage to legitimate users sharing the same resolver.