Introduction
A reverse shell is a fundamental technique in systems security where a compromised target initiates an outbound connection to an attacker-controlled machine, binding its standard input, output, and error streams to the remote connection. This allows the attacker to execute commands on the target system as if they were sitting at its terminal.
The challenge becomes exponential when you need to implement this entirely in x86-64 assembly without relying on libc or any compiled binaries. You must manually orchestrate a precise sequence of syscalls, manage memory manually, and handle all data structures at the lowest level.
In this article, we’ll examine a complete TCP-based reverse shell implementation that leverages Position Independent Code (PIC) architecture—the same Stack Anchor technique discussed in our previous article on PIC fundamentals. This approach ensures our shellcode remains executable regardless of where it’s loaded in memory.
Architecture Overview
The reverse shell follows a linear syscall chain:
[Stack Setup] → [Socket Creation] → [Connect] → [dup2 Redirection] → [execve Shell]
Each step is a discrete syscall that builds upon the previous one. Let’s dissect each phase.
Phase 1: Stack Anchor & PIC Foundation
Before executing any syscall, we establish our memory sandbox using the Stack Anchor technique:
|
|
Why this matters:
- RSP Manipulation: We reserve a large contiguous region for read-write operations
- 16-byte Alignment: x86-64 System V ABI mandates stack alignment before syscalls
- RBP Anchoring: We fix RBP as a stable reference point for all relative addressing
Next, we copy the hardcoded network structure to our writable stack region:
|
|
Why copy to stack? The .text section is mapped R-X (readable, executable, not writable) by the kernel. Any template data we need to modify at runtime must be transferred to writable memory. The stack provides this with automatic cleanup when the process exits.
Phase 2: Socket Creation (SYS_socket = 41)
|
|
Register Breakdown (x86-64 syscall convention):
rdi= First argument (address family)rsi= Second argument (socket type)rdx= Third argument (protocol)rax= Return value (file descriptor or negative error)
The kernel returns a file descriptor (typically 3 or higher) in rax. We immediately save this in r12 because we’ll reference it multiple times in subsequent syscalls.
Why r12? It’s a callee-saved register in x86-64 SysV ABI, meaning it persists across function boundaries. This makes it ideal for storing critical state.
Phase 3: Establish Connection (SYS_connect = 42)
|
|
Data Structure Reference:
The sockaddr_in struct (16 bytes) contains:
Offset Size Field
0 2 sin_family (AF_INET = 2)
2 2 sin_port (network byte order: 0x5c11 = 4444)
4 4 sin_addr (IPv4 address: 192.168.1.59)
8 8 sin_zero (padding, must be zero)
The connect syscall blocks until the connection succeeds or fails. If successful, the kernel establishes a TCP connection to the target IP:port, and our socket FD now represents the active connection.
Phase 4: Redirect File Descriptors (SYS_dup2 = 33)
This is the critical phase where we bind stdin, stdout, and stderr to the socket:
|
|
Three Iterations:
dup2(socket_fd, 2)→ stderr now reads/writes to socketdup2(socket_fd, 1)→ stdout now reads/writes to socketdup2(socket_fd, 0)→ stdin now reads/writes to socket
Why this works: After dup2, any process that inherits these file descriptors will automatically send output to the socket and receive input from it. The redirection is transparent to the child process.
The loop elegantly handles three redirections in minimal code. The jns (jump if not signed/negative) instruction continues while rbx ≥ 0; when rbx becomes -1, the loop exits.
Phase 5: Execute Shell (SYS_execve = 59)
|
|
Syscall Signature (execve):
|
|
The Argv Array Construction:
We build a minimal argv on the stack:
[Stack Before]
...
[rsp + 16] → (higher addresses)
[rsp + 8] → argv[0] = pointer to "/bin//sh"
[rsp] → argv[1] = NULL (array terminator)
Then mov rsi, rsp makes rsi point to the start of this array. The kernel interprets this as:
argv[0]= the shell pathargv[1]= NULL (signals end of array)- No environment variables (rdx = 0)
Why minimal argv? The /bin/sh shell doesn’t require explicit arguments. It automatically enters interactive mode when stdin/stdout/stderr are connected to a socket.
The Shell Loop: Why No Loop is Needed
After execve succeeds, the new process inherits:
- Redirected file descriptors (stdin/stdout/stderr bound to socket)
- The parent process’s memory space (overwritten by the shell)
The /bin/sh binary contains its own event loop (typically implemented with read() syscalls):
|
|
Because stdin/stdout/stderr point to the socket, every interaction happens over the network connection without any additional coordination in our shellcode.
Complete Code Reference
|
|
Context: When & Why PIC for Reverse Shells
Why This Implementation Uses PIC
This reverse shell employs Position Independent Code not because it’s strictly necessary, but because it demonstrates a critical modern concept:
In production exploitation frameworks (Metasploit, Cobalt Strike), reverse shells are always PIC because:
- Target memory layout is unknown (ASLR)
- Payload must work on any system architecture
- Multiple deployments require adaptability
Learning Perspective
This article uses PIC because it’s the correct modern pattern, even though simpler alternatives exist. When you learn systems programming, it’s better to master best practices from day one.
When You Wouldn’t Use PIC
For comparison, you’d skip PIC in these scenarios:
- Controlled lab environment: Known base addresses, ASLR disabled
- Embedded systems: Fixed memory layout (bootloader → kernel)
- Proof-of-Concept: Simple buffer overflow with hardcoded return address
- Educational debugging: Understanding one concept at a time
However, in real-world offensive security, PIC is the industry standard because you never control the target environment.
Bottom line: This is a correct, production-ready approach. Not a toy example—a learning foundation that scales to professional tools.
Key Takeaways
-
PIC Architecture: Every memory reference uses relative addressing (
[rel ...]or[rbp + offset]). No hardcoded absolute addresses exist in the binary. -
Syscall Chaining: The shellcode is a linear sequence of syscalls, each building on the previous. No branches or loops except the dup2 redirection.
-
File Descriptor Inheritance: Once stdin/stdout/stderr are bound to the socket via dup2, they remain bound after execve. The shell inherits these redirections automatically.
-
Stack as Writable Storage: Read-only templates in
.textare copied to the stack, allowing runtime modification while respecting kernel memory protections. -
Minimal Dependencies: The entire payload fits in a few hundred bytes and depends only on the Linux kernel. No libc, no helper functions.
Conclusion
Building a reverse shell in assembly reveals the elegant simplicity of POSIX syscalls. By understanding how socket I/O, process execution, and file descriptor redirection work at the syscall level, you gain deeper insight into how operating systems manage processes and resources.
The Stack Anchor technique ensures the payload remains portable, executable from any memory location—a critical requirement for reliable shellcode deployment in real-world scenarios.
Next steps: Experiment with different shellcode obfuscation techniques, encoding schemes, or polymorphic wrappers to evade security detection mechanisms while preserving the core syscall logic.
Github : https://github.com/JM00NJ/Sectionless-Craft/tree/main/Networking/reverse_shells
Disclaimer: This article is for educational purposes only. Reverse shells should only be used on systems you own or have explicit permission to test. Unauthorized access to computer systems is illegal.