Motivation
Every cryptographer knows SHA-256 is preimage-resistant. What is less documented is whether its output distribution carries any structural regularity under non-standard projections.
The standard assumption is that SHA-256 outputs are computationally indistinguishable from uniformly random strings. This is true at the bit level for collision and preimage resistance. It is not necessarily true for derived scalar projections of those outputs.
CDP started as an experiment: take the hex-digit sum of a SHA-256 hash, re-hash the string representation of that sum, take the hex-digit sum again, repeat. Does it converge? If so, to what?
The answer was unexpected.
Definitions
CDP Projection:
W(H) = sum of all hex digit values in SHA-256 output H
∈ [434, 555] (empirically confirmed range)
Iterated map:
f(w) = W(SHA256(str(w)))
CDP Fingerprint:
F(H) = ( W(H), w16(H), ce(W(H)), W2, W3, W4, W5, max_digit, min_digit )
where:
w16(H) = 16-component vector (4-digit chunk sums)
ce(W(H)) = cycle entry under iteration
W2..W5 = iterated W values
The Core Observation: Two Deterministic Cycles
Starting from any W value in [250, 750] and iterating f, the sequence converges deterministically into exactly two closed cycles within at most 16 iterations:
C1 (2-node): 476 ↔ 438
C2 (8-node): 471 → 472 → 525 → 537 → 414 → 417 → 546 → 518 → 471
Verified computationally over all 500 starting points in [250, 750]. No exceptions.
Secondary Projections
Two additional projections exhibit independent cycle structure:
W_byte — sum of byte values of hash output:
Fixed point: W_byte(3721) = 3721 (self-referential)
9-cycle: 4108 → 4123 → 3615 → 3641 → 3631 → 3930 → 4444 → 3393 → 3626 → 4108
W_hi — sum of high nibbles of each hash byte:
Fixed point: 246
3-cycle: 242 → 215 → 232 → 242
Independence: r(W_hi, W_lo) = −0.007 (avalanche intact)
Bijective Fingerprint
The CDP fingerprint F is injective over all tested input spaces — zero genuine collisions after input deduplication:
| Input Space | Size | Test | Collisions |
|---|---|---|---|
| 4-char lowercase | 456,976 | Full enumeration | 0 |
| 4-char alnum | 1,679,616 | Full enumeration | 0 |
| 5-char lowercase | 11,881,376 | 300K sample | 0 |
| 8-char lowercase | 208,827,064,576 | 200K sample | 0 |
| 12-char lowercase | ≈9.5×10¹⁶ | 200K sample | 0 |
Note: all prior apparent collisions in early testing were due to duplicate inputs in non-deduplicated random samples. After deduplication, zero genuine fingerprint collisions observed.
This injectivity enables O(1) preimage lookup over constrained input spaces via a pre-computed table.
Four Theorems
Theorem 1 — Complement Nibble Sum Invariant
For any byte A and its complement (255−A), the first 32-bit word W[0] of the SHA-256 message schedule for input bytes([A, 255−A]) satisfies:
Σ nibble(W[0]) = 38 for all A ∈ {0, ..., 255}
Proof: nibble pairs (Ahi, 15−Ahi) and (Alo, 15−Alo) each sum to 15, plus 0x8000 contributes 8 → total 38. Verified computationally for all 256 complement pairs.
Theorem 2 — Universal M-Rate Convergence
Across 640 consecutive SHA-256 inputs, exactly 128 produce W values in the C1 basin core:
M-rate = 128/640 = 0.2000 (exact)
This ratio is independent of input class, round constants K[i], and initialization values H0. Four triple-M runs identified at positions 18–20, 158–160, 429–431, and 618–620.
Theorem 3 — Universal Collapse at Padding Word Boundary
When SHA-256 padding falls at message schedule word W[15] byte 0 (the length-encoding boundary), ≥4 of 8 bit positions produce W values in the C1 basin core. Verified at Block 2 (n=60) and Block 3 (n=172).
Theorem 4 — CDP Ergodic Basin Pressure
SHA-256’s compression function, under the CDP projection, defines an ergodic 2-state Markov chain with transition matrix:
P = [[0.1812, 0.8188],
[0.1677, 0.8323]]
Stationary distribution: π_B = 0.1700
Mixing time: ≤ 8 rounds
Properties confirmed:
- Independent of K[i] values (K=0, K=0xFFFF…, K=random all tested)
- Independent of H0 permutation order (8 permutations tested)
- Independent of input class (random, sequential, single-bit)
Master formula for M-rate:
P(M) = (1 − p_basin)^8 + ε_corr
= (1 − 0.1850)^8 + 28 · E[Cov(1_bi, 1_bj)]
= 0.19473 + 0.00527
= 0.20000
Basin Topology
Full backward reachability analysis over [250, 750]:
| Basin | Count | Fraction | Max Depth |
|---|---|---|---|
| C1 (all ancestors) | 84 | 16.8% | 8 steps |
| C2 (all ancestors) | 416 | 83.2% | 16 steps |
C1 high-fanin nodes:
| Node | In-degree | Notable predecessors |
|---|---|---|
| 476 | 10 | 280, 408, 461, 512, 524, 679, 684, 699 |
| 478 | 8 | 288, 304, 316, 463, 493, 663, 667 |
| 510 | 7 | 268, 321, 449, 542, 724, 742 |
| 437 | 6 | 375, 378, 531, 628, 725 |
Deepest C1 path (8 steps):
275 → 394 → 425 → 529 → 431 → 449 → 510 → 512 → 476
Deepest C2 path (16 steps):
284 → 530 → 483 → 465 → 447 → 500 → 469 → 496 → 491
→ 468 → 494 → 485 → 492 → 466 → 433 → 423 → 471
The SHA-256 Initialization Signature
SHA-256’s NIST initialization constants (H0) have a measurable CDP signature:
H0 = (6a09e667, bb67ae85, 3c6ef372, a54ff53a,
510e527f, 9b05688c, 1f83d9ab, 5be0cd19)
Individual W-values: (58, 72, 62, 67, 49, 59, 70, 65)
W(H0) = 502
| Quantity | Value | vs Equilibrium |
|---|---|---|
| W(H0) | 502 | +22.2 (+4.6%) |
| Round-0 B-rate | 0.1429 | −16.9% |
| Equilibrium B-rate (rounds 1–63) | 0.1716 | — |
W(H0) = 502 converges to C2 in 10 steps: 502 → ... → 471 ∈ C2.
This is not a vulnerability. H0 values are derived from fractional parts of square roots of the first 8 primes (“nothing-up-my-sleeve” numbers). The W(H0) = 502 imprint is an arithmetic consequence of this construction — detectable through the CDP lens, not an intentional design choice.
Input Class Fingerprinting
W-distribution anomalies by input class, vs. random baseline W ~ N(480, 37):
| Class | n | W_std | C1% | Baseline |
|---|---|---|---|---|
| alternating | 2 | 13.5 | 0.0% | 19.5% |
| alternating | 4 | 57.5 | 50.0% | 19.5% |
| pow2 cycle | 8 | 24.1 | 12.5% | 19.5% |
| hw_7 | 2 | 29.4 | 0.0% | 19.5% |
| LFSR | 4 | 42.3 | 27.3% | 19.5% |
| random | any | 37 | ~20% | — |
Complement inversion:
hw_1 (single bit set): C1 = 28.6% (elevated)
hw_7 (single bit clear): C1 = 0.0% (zero — all 64 combos verified)
Complementary bytes produce opposite cycle behavior — a systematic structural inversion.
Translation invariance: counter, seq_asc, and primes sequences produce identical W_mean, W_std, and C1% to floating-point precision for n=2,4.
SHA-256 / AES Connection
The LFSR sequence generated by the AES xtime operation (GF(2^8) multiplication under polynomial 0x11B) consistently produces elevated C1-basin rates:
n=2: 19.7% n=4: 27.3% n=8: 24.7%
vs. random baseline ~20%. This suggests a mathematical resonance between SHA-256’s round constants and AES GF(2^8) arithmetic visible through the CDP projection.
Noted as a pattern observation, not a formal proof. Open for follow-up.
Sequence Asymmetry
For random equal-length strings A ≠ B:
E[W(SHA256(A∥B)) + W(SHA256(B∥A))] ≈ 960 (≈ 2 × 479.8)
σ of the sum: 52 vs. 2σ_W = 73.6 (if independent)
Variance reduction factor: 52 / 73.6 ≈ 0.71
A∥B and B∥A contain identical bytes in different positions. SHA-256’s W-values are positively correlated across concatenation orderings — a statistical footprint of SHA-256’s non-commutative block processing. Confirmed across all tested lengths L ∈ {1, 2, 3, 4, 5, 6, 8}.
Application: Rainbow Chains
Fingerprint vs. Merge — Critical Distinction
These are two separate things:
F(H) injectivity: zero collisions — proven
Rainbow chain merges: separate issue — reduction function dependent
Since F is injective over the target space, using F as a reduction function eliminates direct reduction collisions. However, GPU implementation using a PCG-seeded string generator still exhibits birthday-paradox merges at the rate expected for any surjective reduction:
Observed: 66.7% unique chains at saturation (n×t/|X| = 1.0)
Random: 36.8% unique chains
Improvement: 1.81× over random reduction
Achieving the theoretical zero-merge bound requires an explicit index-to-string bijective mapping derived from F — not a hash-derived PCG reduction. This is the v1→v2 revision.
Storage-Time Tradeoff
GPU rate: 4.6 GH/s (AMD RX 9070 XT, gfx1201, RDNA4, optimized OpenCL kernel)
| Space | Alphabet^n | t | Storage | Build |
|---|---|---|---|---|
| 8-char lowercase | 2.1×10¹¹ | 10⁶ | 3 MB | ~73s |
| 8-char alnum | 2.2×10¹⁴ | 10⁶ | 3 GB | ~3h |
| 8-char full (94) | 6.1×10¹⁵ | 10⁶ | 85 GB | ~25d |
| 10-char lowercase | 1.4×10¹⁴ | 10⁶ | 2 GB | ~14h |
Scope and Limitations
CDP-based rainbow tables are effective against unsalted SHA-256. Any random per-user salt completely neutralizes CDP — W(SHA256(salt∥password)) is uniformly distributed and independent of these structural properties.
Modern password storage (bcrypt, scrypt, Argon2) is not affected.
Intermediate basin states are uncorrelated with final hash output:
r(ι(state(0)), W(final hash)) = −0.033 (empirically independent)
Davies-Meyer construction ensures preimage and collision resistance are unaffected.
What CDP Is Not
To be explicit:
- CDP does not break SHA-256
- CDP does not reduce preimage resistance
- CDP does not find collisions
- CDP is not a cryptographic weakness in the security-relevant sense
It reveals structural properties of the output distribution under a scalar projection. These properties are real, measurable, and not previously documented in the cryptographic literature.
Open Questions
- Does the Markov ergodicity property hold for SHA-3, Blake2/3, or other ARX constructions under analogous projections?
- Is the W[15]b0 pattern formally provable from the padding specification, or purely empirical?
- Can the 1.81× rainbow chain improvement be increased with a basin-aware bijective reduction function?
- What is the formal relationship between the complement nibble sum invariant and AES xtime?
- Why does hw_7 produce exactly 0% C1 membership across all 64 tested combinations?
Paper
JM00NJ · Independent Researcher · netacoding.com · github.com/JM00NJ