Motivation

Every cryptographer knows SHA-256 is preimage-resistant. What is less documented is whether its output distribution carries any structural regularity under non-standard projections.

The standard assumption is that SHA-256 outputs are computationally indistinguishable from uniformly random strings. This is true at the bit level for collision and preimage resistance. It is not necessarily true for derived scalar projections of those outputs.

CDP started as an experiment: take the hex-digit sum of a SHA-256 hash, re-hash the string representation of that sum, take the hex-digit sum again, repeat. Does it converge? If so, to what?

The answer was unexpected.


Definitions

CDP Projection:

W(H) = sum of all hex digit values in SHA-256 output H
     ∈ [434, 555]  (empirically confirmed range)

Iterated map:

f(w) = W(SHA256(str(w)))

CDP Fingerprint:

F(H) = ( W(H), w16(H), ce(W(H)), W2, W3, W4, W5, max_digit, min_digit )

where:
  w16(H)    = 16-component vector (4-digit chunk sums)
  ce(W(H))  = cycle entry under iteration
  W2..W5    = iterated W values

The Core Observation: Two Deterministic Cycles

Starting from any W value in [250, 750] and iterating f, the sequence converges deterministically into exactly two closed cycles within at most 16 iterations:

C1 (2-node):  476 ↔ 438

C2 (8-node):  471 → 472 → 525 → 537 → 414 → 417 → 546 → 518 → 471

Verified computationally over all 500 starting points in [250, 750]. No exceptions.

Secondary Projections

Two additional projections exhibit independent cycle structure:

W_byte — sum of byte values of hash output:

Fixed point:  W_byte(3721) = 3721  (self-referential)
9-cycle:      4108 → 4123 → 3615 → 3641 → 3631 → 3930 → 4444 → 3393 → 3626 → 4108

W_hi — sum of high nibbles of each hash byte:

Fixed point:  246
3-cycle:      242 → 215 → 232 → 242
Independence: r(W_hi, W_lo) = −0.007  (avalanche intact)

Bijective Fingerprint

The CDP fingerprint F is injective over all tested input spaces — zero genuine collisions after input deduplication:

Input Space Size Test Collisions
4-char lowercase 456,976 Full enumeration 0
4-char alnum 1,679,616 Full enumeration 0
5-char lowercase 11,881,376 300K sample 0
8-char lowercase 208,827,064,576 200K sample 0
12-char lowercase ≈9.5×10¹⁶ 200K sample 0

Note: all prior apparent collisions in early testing were due to duplicate inputs in non-deduplicated random samples. After deduplication, zero genuine fingerprint collisions observed.

This injectivity enables O(1) preimage lookup over constrained input spaces via a pre-computed table.


Four Theorems

Theorem 1 — Complement Nibble Sum Invariant

For any byte A and its complement (255−A), the first 32-bit word W[0] of the SHA-256 message schedule for input bytes([A, 255−A]) satisfies:

Σ nibble(W[0]) = 38  for all A ∈ {0, ..., 255}

Proof: nibble pairs (Ahi, 15−Ahi) and (Alo, 15−Alo) each sum to 15, plus 0x8000 contributes 8 → total 38. Verified computationally for all 256 complement pairs.

Theorem 2 — Universal M-Rate Convergence

Across 640 consecutive SHA-256 inputs, exactly 128 produce W values in the C1 basin core:

M-rate = 128/640 = 0.2000  (exact)

This ratio is independent of input class, round constants K[i], and initialization values H0. Four triple-M runs identified at positions 18–20, 158–160, 429–431, and 618–620.

Theorem 3 — Universal Collapse at Padding Word Boundary

When SHA-256 padding falls at message schedule word W[15] byte 0 (the length-encoding boundary), ≥4 of 8 bit positions produce W values in the C1 basin core. Verified at Block 2 (n=60) and Block 3 (n=172).

Theorem 4 — CDP Ergodic Basin Pressure

SHA-256’s compression function, under the CDP projection, defines an ergodic 2-state Markov chain with transition matrix:

P = [[0.1812, 0.8188],
     [0.1677, 0.8323]]

Stationary distribution: π_B = 0.1700
Mixing time: ≤ 8 rounds

Properties confirmed:

  • Independent of K[i] values (K=0, K=0xFFFF…, K=random all tested)
  • Independent of H0 permutation order (8 permutations tested)
  • Independent of input class (random, sequential, single-bit)

Master formula for M-rate:

P(M) = (1 − p_basin)^8 + ε_corr
     = (1 − 0.1850)^8 + 28 · E[Cov(1_bi, 1_bj)]
     = 0.19473 + 0.00527
     = 0.20000

Basin Topology

Full backward reachability analysis over [250, 750]:

Basin Count Fraction Max Depth
C1 (all ancestors) 84 16.8% 8 steps
C2 (all ancestors) 416 83.2% 16 steps

C1 high-fanin nodes:

Node In-degree Notable predecessors
476 10 280, 408, 461, 512, 524, 679, 684, 699
478 8 288, 304, 316, 463, 493, 663, 667
510 7 268, 321, 449, 542, 724, 742
437 6 375, 378, 531, 628, 725

Deepest C1 path (8 steps):

275 → 394 → 425 → 529 → 431 → 449 → 510 → 512 → 476

Deepest C2 path (16 steps):

284 → 530 → 483 → 465 → 447 → 500 → 469 → 496 → 491
    → 468 → 494 → 485 → 492 → 466 → 433 → 423 → 471

The SHA-256 Initialization Signature

SHA-256’s NIST initialization constants (H0) have a measurable CDP signature:

H0 = (6a09e667, bb67ae85, 3c6ef372, a54ff53a,
      510e527f, 9b05688c, 1f83d9ab, 5be0cd19)

Individual W-values: (58, 72, 62, 67, 49, 59, 70, 65)
W(H0) = 502
Quantity Value vs Equilibrium
W(H0) 502 +22.2 (+4.6%)
Round-0 B-rate 0.1429 −16.9%
Equilibrium B-rate (rounds 1–63) 0.1716

W(H0) = 502 converges to C2 in 10 steps: 502 → ... → 471 ∈ C2.

This is not a vulnerability. H0 values are derived from fractional parts of square roots of the first 8 primes (“nothing-up-my-sleeve” numbers). The W(H0) = 502 imprint is an arithmetic consequence of this construction — detectable through the CDP lens, not an intentional design choice.


Input Class Fingerprinting

W-distribution anomalies by input class, vs. random baseline W ~ N(480, 37):

Class n W_std C1% Baseline
alternating 2 13.5 0.0% 19.5%
alternating 4 57.5 50.0% 19.5%
pow2 cycle 8 24.1 12.5% 19.5%
hw_7 2 29.4 0.0% 19.5%
LFSR 4 42.3 27.3% 19.5%
random any 37 ~20%

Complement inversion:

hw_1 (single bit set):    C1 = 28.6%  (elevated)
hw_7 (single bit clear):  C1 = 0.0%   (zero — all 64 combos verified)

Complementary bytes produce opposite cycle behavior — a systematic structural inversion.

Translation invariance: counter, seq_asc, and primes sequences produce identical W_mean, W_std, and C1% to floating-point precision for n=2,4.


SHA-256 / AES Connection

The LFSR sequence generated by the AES xtime operation (GF(2^8) multiplication under polynomial 0x11B) consistently produces elevated C1-basin rates:

n=2: 19.7%    n=4: 27.3%    n=8: 24.7%

vs. random baseline ~20%. This suggests a mathematical resonance between SHA-256’s round constants and AES GF(2^8) arithmetic visible through the CDP projection.

Noted as a pattern observation, not a formal proof. Open for follow-up.


Sequence Asymmetry

For random equal-length strings A ≠ B:

E[W(SHA256(A∥B)) + W(SHA256(B∥A))] ≈ 960  (≈ 2 × 479.8)

σ of the sum:  52  vs.  2σ_W = 73.6  (if independent)
Variance reduction factor: 52 / 73.6 ≈ 0.71

A∥B and B∥A contain identical bytes in different positions. SHA-256’s W-values are positively correlated across concatenation orderings — a statistical footprint of SHA-256’s non-commutative block processing. Confirmed across all tested lengths L ∈ {1, 2, 3, 4, 5, 6, 8}.


Application: Rainbow Chains

Fingerprint vs. Merge — Critical Distinction

These are two separate things:

F(H) injectivity:        zero collisions — proven
Rainbow chain merges:    separate issue — reduction function dependent

Since F is injective over the target space, using F as a reduction function eliminates direct reduction collisions. However, GPU implementation using a PCG-seeded string generator still exhibits birthday-paradox merges at the rate expected for any surjective reduction:

Observed:  66.7% unique chains at saturation (n×t/|X| = 1.0)
Random:    36.8% unique chains

Improvement: 1.81× over random reduction

Achieving the theoretical zero-merge bound requires an explicit index-to-string bijective mapping derived from F — not a hash-derived PCG reduction. This is the v1→v2 revision.

Storage-Time Tradeoff

GPU rate: 4.6 GH/s (AMD RX 9070 XT, gfx1201, RDNA4, optimized OpenCL kernel)

Space Alphabet^n t Storage Build
8-char lowercase 2.1×10¹¹ 10⁶ 3 MB ~73s
8-char alnum 2.2×10¹⁴ 10⁶ 3 GB ~3h
8-char full (94) 6.1×10¹⁵ 10⁶ 85 GB ~25d
10-char lowercase 1.4×10¹⁴ 10⁶ 2 GB ~14h

Scope and Limitations

CDP-based rainbow tables are effective against unsalted SHA-256. Any random per-user salt completely neutralizes CDP — W(SHA256(salt∥password)) is uniformly distributed and independent of these structural properties.

Modern password storage (bcrypt, scrypt, Argon2) is not affected.

Intermediate basin states are uncorrelated with final hash output:

r(ι(state(0)), W(final hash)) = −0.033  (empirically independent)

Davies-Meyer construction ensures preimage and collision resistance are unaffected.


What CDP Is Not

To be explicit:

  • CDP does not break SHA-256
  • CDP does not reduce preimage resistance
  • CDP does not find collisions
  • CDP is not a cryptographic weakness in the security-relevant sense

It reveals structural properties of the output distribution under a scalar projection. These properties are real, measurable, and not previously documented in the cryptographic literature.


Open Questions

  1. Does the Markov ergodicity property hold for SHA-3, Blake2/3, or other ARX constructions under analogous projections?
  2. Is the W[15]b0 pattern formally provable from the padding specification, or purely empirical?
  3. Can the 1.81× rainbow chain improvement be increased with a basin-aware bijective reduction function?
  4. What is the formal relationship between the complement nibble sum invariant and AES xtime?
  5. Why does hw_7 produce exactly 0% C1 membership across all 64 tested combinations?


Paper

DOI


JM00NJ · Independent Researcher · netacoding.com · github.com/JM00NJ