SHA-256 Output Distribution Analysis: Cyclic Digit-sum Projection (CDP) — Original Research | Netacoding | Cybersecurity, Assembly & Network Research

Motivation

Every cryptographer knows SHA-256 is preimage-resistant. What is less documented is whether its output distribution carries any structural regularity under non-standard projections.

The standard assumption is that SHA-256 outputs are computationally indistinguishable from uniformly random strings. This is true at the bit level for collision and preimage resistance. It is not necessarily true for derived scalar projections of those outputs.

CDP started as an experiment: take the hex-digit sum of a SHA-256 hash, re-hash the string representation of that sum, take the hex-digit sum again, repeat. Does it converge? If so, to what?

The answer was unexpected.

Definitions

CDP Projection (W1 — baseline):

W1(H) = sum of all hex digit values in SHA-256 output H
       ∈ [343, 614]  (N=10,000 confirmed range)

Iterated map:

f(w) = W(SHA256(str(w)))

CDP Fingerprint (v4 — five projections):

F(H) = ( W1(H), W2(H), W3(H), W4(H), W5(H) )

W1 — digit_sum:      Σ hex_digit(H)
W2 — mod4_weighted:  Σ hex_digit(H[i]) × w[i mod 4],  w=[1,2,1,0]
W3 — word_sigma:     Σ digitsum(σ₁(H_j))  for j=0..7
W4 — t1t2:           digitsum(T1) + digitsum(T2)  from SHA-256 compression core
W5 — xor_fold:       digitsum(chunk0 ⊕ chunk1 ⊕ chunk2 ⊕ chunk3)

The Core Observation: Two Deterministic Cycles

Starting from any W value in [250, 750] and iterating f, the sequence converges deterministically into exactly two closed cycles within at most 16 iterations:

C1 (2-node):  476 ↔ 438

C2 (8-node):  471 → 472 → 525 → 537 → 414 → 417 → 546 → 518 → 471

Verified computationally over all 500 starting points in [250, 750]. No exceptions.

Secondary Projections

Two additional projections exhibit independent cycle structure:

W_byte — sum of byte values of hash output:

Fixed point:  W_byte(3721) = 3721  (self-referential)
9-cycle:      4108 → 4123 → 3615 → 3641 → 3631 → 3930 → 4444 → 3393 → 3626 → 4108

W_hi — sum of high nibbles of each hash byte:

Fixed point:  246
3-cycle:      242 → 215 → 232 → 242
Independence: r(W_hi, W_lo) = −0.007  (avalanche intact)

Five CDP Projections (v4)

v4 introduces four new projections alongside the baseline digit_sum. Each captures a different structural dimension of SHA-256’s output.

Projection Equations

W2 — mod4_weighted:

W2(H) = Σ int(h_i, 16) × w[i mod 4],   w = [1, 2, 1, 0]

Weight vector derived from K[i] mod-4 grouping structure.

W3 — word_sigma:

W3(H) = Σ digitsum(σ₁(H_j))   for j = 0..7
σ₁(x) = ROTR(x,17) ⊕ ROTR(x,19) ⊕ (x >> 10)

Applies SHA-256’s own message schedule σ₁ to output words.

W4 — t1t2:

W4(H) = digitsum(T1) + digitsum(T2)
T1 = Σ₁(e) + Ch(e,f,g)
T2 = Σ₀(a) + Maj(a,b,c)

Directly uses SHA-256 compression function internals.

W5 — xor_fold:

W5(H) = digitsum(C0 ⊕ C1 ⊕ C2 ⊕ C3)

Folds the 256-bit output into 64 bits via XOR before projecting.

Distribution Statistics (N=10,000)

Projection	Min	Max	Mean	Std	Attractors
W1 digit_sum	343	614	480.6	36.6	2 (C1, C2)
W2 mod4_weighted	318	646	480.4	45.3	3
W3 word_sigma	307	627	480.2	37.1	6
W4 t1t2	47	187	120.1	18.1	5
W5 xor_fold	52	191	119.5	18.2	2

TVD Class Discrimination (N=2,500/class)

Pair	W1	W2	W3	W4	W5
short_pw vs pin	0.060	0.060	0.054	0.054	0.070
pin vs binary	0.056	0.076	0.046	0.052	0.060
long_pw vs binary	0.067	0.051	0.061	0.053	0.048

No single projection dominates all pairs — each captures a different structural dimension.

Bijective Fingerprint

The CDP fingerprint F is injective over all tested input spaces — zero genuine collisions after input deduplication:

Input Space	Size	Test
4-char lowercase	456,976	Full enumeration
4-char alnum	1,679,616	Full enumeration
5-char lowercase	11,881,376	300K sample
8-char lowercase	208,827,064,576	200K sample
12-char lowercase	≈9.5×10¹⁶	200K sample

Compound Fingerprint Uniqueness (v4)

Fingerprint	Unique / N=10,000
W1 alone	235 (2.4%)
W2 alone	287 (2.9%)
W4 alone	123 (1.2%)
F = (W1,W2,W3,W4,W5)	9,958 (99.58%)

Four Theorems

Theorem 1 — Complement Nibble Sum Invariant

For any byte A and its complement (255−A), the first 32-bit word W[0] of the SHA-256 message schedule satisfies:

Σ nibble(W[0]) = 38  for all A ∈ {0, ..., 255}

Proof: nibble pairs (Ahi, 15−Ahi) and (Alo, 15−Alo) each sum to 15, plus 0x8000 contributes 8 → total 38.

Theorem 2 — Universal M-Rate Convergence

M-rate = 128/640 = 0.2000  (exact)

Independent of input class, K[i], and H0. Four triple-M runs at positions 18–20, 158–160, 429–431, 618–620.

Theorem 3 — Universal Collapse at Padding Word Boundary

When padding falls at W[15] byte 0, ≥4 of 8 bit positions produce W ∈ C1 basin core. Verified at Block 2 (n=60) and Block 3 (n=172).

Theorem 4 — CDP Ergodic Basin Pressure

P = [[0.1812, 0.8188],
     [0.1677, 0.8323]]

π_B = 0.1700
Mixing time: ≤ 8 rounds

v4 confirmation: πB ∈ [0.177, 0.188] across K_real, K_flip, K_zeros, K_ones, K_random at N=10,000. K[i] truly does not influence basin pressure.

Message Schedule Analysis (v4)

Period-4 Mechanism

The message schedule expansion:

W[i] = σ₁(W[i-2]) + W[i-7] + σ₀(W[i-15]) + W[i-16]

The W[i-16] feedback term induces period-16 re-entry of input words. K[i] is not the source — confirmed via ablation (N=5,000): all K variants (zeros, ones, random, flip, shuffled) yield TVD < 0.06 and πB ∈ [0.177, 0.188].

W[16] = W[0] Re-entry at Round 17

Setting i=16:

W[16] = σ₁(W[14]) + W[9] + σ₀(W[1]) + W[0]

W[0] (original input’s first 4 bytes) mathematically re-enters at round 17. Verified 5,000/5,000 exactly. However corr(W[0], W[16]) = 0.014 — diffusion absorbs the informational contribution.

H0 Imprinting Ablation (v4)

SHA-256’s NIST initialization constants have a measurable CDP signature. Ablation test replacing H0 with all-zeros (N=5,000):

Round	W̄ (H0_real)	W̄ (H0_zeros)	Δ
1	486.81	135.32	+351.48
2	475.03	255.47	+219.56
3	464.30	375.56	+88.74
4	477.34	495.50	−18.16 ← sign reversal
5	480.17	480.48	−0.31
6–64	≈479.8	≈479.8	≈0

The sign reversal between Round 3 and Round 4 marks the precise boundary at which H0 imprinting is absorbed by diffusion.

W(H0) = 502  (+22.2 above equilibrium 479.8)
Round-0 B-rate: 0.1429  (−16.9% deficit)

Round-by-Round TVD Diffusion Profile (v4)

Round-by-round CDP-TVD across four input classes (N=5,000/class):

Pair	Peak r	Peak TVD	TVD(r=3)	TVD(r=10)	Final TVD
short_pw vs pin	1	0.456	0.211	0.047	0.044
short_pw vs long_pw	1	0.366	0.133	0.044	0.031
long_pw vs binary	1	0.327	0.079	0.051	0.029
pin vs binary	1	0.276	0.106	0.035	0.045

Diffusion phases:

Phase 1 (Rounds 1–4): rapid decay, ≈10× reduction
Phase 2 (Rounds 5–64): plateau, TVD ∈ [0.03, 0.06]

Non-zero residual: TVD never fell below 0.02 in any tested round or pair. SHA-256’s 64-round diffusion minimizes but does not eliminate input class information under CDP.

Partial Inverse Invariance (v4)

Per-sample delta Δ = W(S7) − W(H) under Round-63 partial inverse (N=1,000):

Projection	Δ̄	σΔ	zeros/N	Verdict
W1 digit_sum	−59.33	48.84	2/1,000	systematic drift
W2 mod4	−59.23	62.41	4/1,000	systematic drift
W4 t1t2	+1.17	26.88	22/1,000	statistical
W5 xor_fold	−0.56	24.42	14/1,000	statistical

The −59.3 drift in W1/W2 is an arithmetic artifact of K[63] subtraction (digitsum(0xc67178f2) = 58). W4 and W5 show distributional insensitivity — statistical invariance, not algebraic identity.

Basin Topology

Full backward reachability analysis over [250, 750]:

Basin	Count	Fraction	Max Depth
C1 (all ancestors)	84	16.8%	8 steps
C2 (all ancestors)	416	83.2%	16 steps

Deepest C1 path (8 steps):

275 → 394 → 425 → 529 → 431 → 449 → 510 → 512 → 476

Deepest C2 path (16 steps):

284 → 530 → 483 → 465 → 447 → 500 → 469 → 496 → 491
    → 468 → 494 → 485 → 492 → 466 → 433 → 423 → 471

Input Class Fingerprinting

W-distribution anomalies by input class, vs. random baseline W ~ N(480, 37):

Class	n	W_std	C1%	Baseline
alternating	2	13.5	0.0%	19.5%
alternating	4	57.5	50.0%	19.5%
pow2 cycle	8	24.1	12.5%	19.5%
hw_7	2	40.6	21.9%	19.5%
LFSR	4	42.3	27.3%	19.5%
random	any	37	~20%	—

Complement asymmetry (v3 corrected):

hw_1: C1 = 14.1%  (suppressed, −5.4%)
hw_7: C1 = 21.9%  (elevated,  +2.4%)

Translation invariance: counter, seq_asc, and primes sequences produce identical W_mean, W_std, and C1% for n=2,4.

SHA-256 / AES Connection

The LFSR sequence generated by AES xtime (GF(2^8) multiplication under 0x11B) consistently produces elevated C1-basin rates:

n=2: 19.7%    n=4: 27.3%    n=8: 24.7%

Suggests mathematical resonance between SHA-256’s round constants and AES GF(2^8) arithmetic under CDP. Open for formal proof.

Sequence Asymmetry

σ(W(A∥B) + W(B∥A)) ≈ 52  vs.  2σ_W = 73.6  (if independent)
Variance reduction factor: 52 / 73.6 ≈ 0.71

Confirmed across L ∈ {1, 2, 3, 4, 5, 6, 8}.

Application: Rainbow Tables and Compound Index

42.55× Compound Lookup Index (v4)

Implementation note: The current GPU implementation at github.com/JM00NJ/SHA256-CDP uses digit_sum (W1) only — the compound index is not yet implemented. The 42.55× figure is the theoretical speedup confirmed analytically (N=10,000). The script will be updated to include all five projections in a future release.

Using all five projections as a compound lookup index:

c̄_base     = 10,000 / 235    = 42.55  (digit_sum alone)
c̄_compound = 10,000 / 10,000 = 1.00   (all 5 projections)

Speedup S = 42.55×

Combination	Unique buckets	Speedup
W1 alone (baseline)	235	1.00×
W1 + W2	6,968	29.65×
W2 + W3 + W4 (best 3-proj)	9,973	42.44×
All 5 projections	10,000	42.55×

Literature comparison:

Method	Speedup	Reference
Stepped rainbow tables	2.56×	Appl. Sci. 2025
CDP compound index	42.55×	This work
CDP / literature ratio	16.6×	—

Fingerprint vs. Merge — Critical Distinction

F(H) injectivity:        zero collisions — proven
Rainbow chain merges:    separate issue — reduction function dependent

GPU implementation still exhibits birthday-paradox merges:

Observed:  66.7% unique chains at saturation
Random:    36.8% unique chains
Improvement: 1.81× over random reduction

Storage-Time Tradeoff

GPU rate: 4.6 GH/s (AMD RX 9070 XT, gfx1201, RDNA4, OpenCL/Vulkan)

Implementation note: Build times are based on the GPU rate (4.6 GH/s), achieved on AMD RX 9070 XT (gfx1201, RDNA4) with the OpenCL ILP2 build kernel + Vulkan ACO query pipeline. Note: Windows 11 users may see reduced throughput due to AMD PAL-LLVM optimizer behavior on gfx1201 — use --mode scalar or switch to Linux for full performance.

Space	Alphabet^n	Storage	Build (GPU, 4.6 GH/s)
8-char lowercase	2.1×10¹¹	3 MB	~73s
8-char alnum	2.2×10¹⁴	3 GB	~3h
8-char full (94)	6.1×10¹⁵	85 GB	~25d
10-char lowercase	1.4×10¹⁴	2 GB	~14h

Scope and Limitations

CDP-based rainbow tables are effective against unsalted SHA-256 only. Any random per-user salt completely neutralizes CDP.

Modern password storage (bcrypt, scrypt, Argon2) is not affected.

r(ι(state(0)), W(final hash)) = −0.033  (empirically independent)

Davies-Meyer construction ensures preimage and collision resistance are unaffected.

What CDP Is Not

CDP does not break SHA-256
CDP does not reduce preimage resistance
CDP does not find collisions
CDP is not a cryptographic weakness in the security-relevant sense

It reveals structural properties of the output distribution under scalar projections. These properties are real, measurable, reproducible, and not previously documented in the cryptographic literature.

Open Questions

Does the Markov ergodicity property hold for SHA-3, Blake2/3, or other ARX constructions?
Is the W[15]b0 collapse formally provable from the padding specification?
Can the 1.81× rainbow chain improvement be increased with a basin-aware bijective reduction?
What is the formal relationship between the complement nibble sum invariant and AES xtime?
What formal relationship connects Hamming weight of input bytes to SHA-256 output W-mean shift, and at what input length does the gradient collapse to baseline?
Why does r=58 show negative πB anomaly (piB=0.1449) despite W=479.42 near C1 center? (basin avoidance phenomenon, mechanism unknown)

Paper

📄

CDP: Cyclic Digit-sum Projection — Structural Analysis of SHA-256 Output Distribution and Ergodic Basin Pressure Zenodo · July 2026 · v4 · DOI: 10.5281/zenodo.21611410

View / Download PDF →

JM00NJ · Independent Researcher · netacoding.com · github.com/JM00NJ

Support this research

If this post saved you time or sparked an idea, consider sponsoring independent security research.

♥ Sponsor on GitHub

🎓 Learning Path & Metadata

Motivation#

Definitions#

The Core Observation: Two Deterministic Cycles#

Secondary Projections#

Five CDP Projections (v4)#

Projection Equations#

Distribution Statistics (N=10,000)#

TVD Class Discrimination (N=2,500/class)#

Bijective Fingerprint#

Compound Fingerprint Uniqueness (v4)#

Four Theorems#

Theorem 1 — Complement Nibble Sum Invariant#

Theorem 2 — Universal M-Rate Convergence#

Theorem 3 — Universal Collapse at Padding Word Boundary#

Theorem 4 — CDP Ergodic Basin Pressure#

Message Schedule Analysis (v4)#

Period-4 Mechanism#

W[16] = W[0] Re-entry at Round 17#

H0 Imprinting Ablation (v4)#

Round-by-Round TVD Diffusion Profile (v4)#

Partial Inverse Invariance (v4)#

Basin Topology#

Input Class Fingerprinting#

SHA-256 / AES Connection#

Sequence Asymmetry#

Application: Rainbow Tables and Compound Index#

42.55× Compound Lookup Index (v4)#

Fingerprint vs. Merge — Critical Distinction#

Storage-Time Tradeoff#

Scope and Limitations#

What CDP Is Not#

Open Questions#

Paper#