Heap Grooming

Last updated: 2026-04-11
Related: Pool Internals, Heap Internals, Use After Free, Cve 2021 31969, Cve 2024 30085, Cve 2021 31956, Wnf Internals, Cve 2020 1350
Tags: user-mode, kernel-mode, pool

Summary

Heap grooming (also called heap feng shui) is the art of manipulating heap layout to deterministically place objects adjacent to each other. It is a prerequisite for reliable exploitation of any memory corruption bug — without grooming, exploits are unreliable. This page documents grooming strategies for both kernel pool and user-mode heap.


Core Principle

Goal: when a vulnerable allocation occurs, ensure:
  [ATTACKER OBJECT][VULNERABLE OBJECT][ATTACKER OBJECT]
  or
  [VULNERABLE OBJECT][ATTACKER OBJECT]   ← overflow from left
  
So that: overflowing/corrupting vulnerable object hits controlled data.

Heap allocators are deterministic given the same sequence of alloc/free operations. Grooming exploits this determinism.


LFH Grooming (Modern — Both Kernel and User-Mode Segment Heap)

LFH is activated per-bucket after ≥18 allocations of that size. Within an LFH subsegment, all blocks are the same size and packed contiguously.

LFH Grooming Procedure

Phase 1: Activate LFH for target bucket
   → allocate ≥18 objects of target size
   
Phase 2: Fill the subsegment (spray)
   → allocate N more objects (fill current subsegment, force new subsegment creation)
   → N ≈ subsegment_size / block_size (typically 128-256 objects)
   
Phase 3: Create holes
   → free every-other object (or specific pattern)
   → creates alternating free/busy pattern: [F][B][F][B][F][B]...

Phase 4: Trigger vulnerable allocation
   → vulnerable object allocated into a free slot: [F][VULN][F][B][F]...
   
Phase 5: Fill holes with attacker objects
   → allocate same-size attacker objects to fill remaining free slots
   → layout: [ATK][VULN][ATK][B][ATK]...

Phase 6: Trigger overflow/corruption
   → VULN overflows/underflows → hits ATK data

Size Bucket Matching

LFH bucket for size s (NT Heap):

  • 1–1024 bytes: 8-byte granularity → bucket = ceil(s/8)
  • 1025–4096: 16-byte granularity
  • Etc.

For Segment Heap: similar bucketing, exact sizes differ.

Critical: vulnerable object and attacker object must be in the SAME LFH bucket. If they differ by even 1 byte and land in different buckets → layout not guaranteed.


VS (Variable Size) Segment Grooming

For allocations not in LFH (large or infrequent allocations):

Phase 1: Defragment
   → allocate many large objects to fill existing free chunks
   → creates a "clean slate" at end of segment

Phase 2: Spray
   → allocate N controlled objects of exact size
   → they land contiguously at end of current segment
   
Phase 3: Create gap
   → free one controlled object in the middle of the spray
   → creates a hole of exact right size
   
Phase 4: Trigger vulnerable alloc
   → vulnerable object should land in the hole
   
Phase 5: Verify (debug only)
   → read back attacker objects to confirm layout

Kernel Pool Grooming

NonPaged Pool Grooming (Pre-20H1)

Best grooming objects from user mode:

  • Named pipe write buffers (Paged Pool): WriteFile to blocking named pipe, size = allocation size - 8 (header)
  • NtSetValueKey (registry): write registry value → kernel allocates Paged Pool buffer of controlled size
  • PIPE_ATTRIBUTE: extremely flexible, Paged Pool, variable size, header-free body
  • Event objects (NtCreateEvent): NonPagedPool, fixed small size (~0x40)

Kernel Pool Grooming Procedure (Legacy)

1. Exhaust current free list for target size
   → allocate many spray objects to fill free list
2. Allocate vulnerable object
   → lands at end of fresh pool page
3. Allocate attacker object immediately after
   → layout: [VULN_OBJ][ATTACKER_OBJ]
4. Trigger overflow in VULN_OBJ
   → corrupt ATTACKER_OBJ header/body

Segment Heap Kernel Grooming (Post-20H1)

Must use LFH exploitation strategy (see above). Key difference:

  • No chunk headers between LFH blocks → overflow hits object data directly
  • Subsegment bitmap controls allocation → can be manipulated for double-alloc primitive

Pool tag matters: grooming objects must be in same pool (NonPaged vs Paged). Named pipe buffers go to Paged Pool; event objects go to NonPaged Pool.


WNF_STATE_DATA + KeyedEvent Spray (Paged Pool — CVE-2024-26170 Pattern)

Used in the CimFS CVE-2024-26170 exploit to groom paged pool around an OOB-read victim allocation.

Objects

ObjectSizePoolHow to allocate
_WNF_STATE_DATA0x880 bytesPaged PoolNtCreateWnfStateName() + NtUpdateWnfStateData() with 0x880-byte payload
KeyedEvent0x680 bytesPaged PoolNtCreateKeyedEvent()

Strategy

Phase 1 — Fill pool pages with WNF spray:
  Allocate N × _WNF_STATE_DATA (0x880) objects
  Goal: force two specific objects onto the SAME page (contiguous)

Phase 2 — Create holes for victim:
  Free 1/4 of the WNF objects
  This creates evenly-spaced free blocks on the page

Phase 3 — Trigger victim allocation:
  The target allocation (from driver IOCTL processing) falls into a hole
  Now: [WNF spray][victim][WNF spray] on same page

Phase 4 — OOB read reaches spray:
  Unvalidated offset calculation reaches adjacent WNF_STATE_DATA
  Reads fake PFILE_OBJECT from WNF payload

Payload Structure in WNF Data

The 0x880-byte WNF payload contains a complete fake object chain:

[0x00] Fake FILE_OBJECT
         FileObject->DeviceObject → ptr to fake DEVICE_OBJECT
[0x??] Fake DEVICE_OBJECT
         DeviceObject->DriverObject → ptr to fake DRIVER_OBJECT
[0x??] Fake DRIVER_OBJECT
         DriverObject->MajorFunction[3] → gadget_address

When the OOB read dereferences this, IoGetRelatedDeviceObject() + IofCallDriver() invoke the gadget.

Why WNF_STATE_DATA Works

  • Large enough (0x880) to hold the full fake object chain within one allocation
  • Allocated in paged pool — matches the victim driver’s allocation pool
  • NtUpdateWnfStateData allows arbitrary payload content
  • Can spray hundreds of these cheaply

KeyedEvent Role

KeyedEvent objects (0x680) fill the remaining slots between WNF objects to:

  1. Prevent fragmentation from other background allocations
  2. Stabilize the page layout by saturating remaining free space
  3. Ensure the victim falls in the precise WNF-adjacent hole

See Cve 2024 26170 for the complete exploit using this pattern.


ALPC Handle Table Spray (Paged Pool 0x1000 — CVE-2024-30085 Pattern)

Used to place a 0x1000-byte paged pool object adjacent to a pool overflow victim, then leak its kernel pointer.

Object: _ALPC_HANDLE_TABLE

struct _ALPC_HANDLE_TABLE {
    struct _ALPC_HANDLE_ENTRY* Handles;   //0x0  ← kernel pointer we want to leak
    struct _EX_PUSH_LOCK Lock;            //0x8
    ULONGLONG TotalHandles;               //0x10
    ULONG Flags;                          //0x18
};
  • Initial size: 0x80 bytes
  • Each NtAlpcCreateResourceReserve call adds a _KALPC_RESERVE entry
  • When table is full, it’s reallocated at doubled size: 0x80 → 0x100 → 0x200 → 0x400 → 0x800 → 0x1000
  • At 0x1000, the table lands in the VS segment (paged pool), matching the overflow victim size

Growing to 0x1000

// Create ALPC server port
NtAlpcCreatePort(&hPort, &portObjAttr, &portAttr);
// Fill handle table from 0x80 to 0x1000 via repeated reserve allocation
for (int j = 0; j < 127; j++)
    NtAlpcCreateResourceReserve(hPort, 0, 0x28, &hResource);
// 127 entries fills 0x80→0x1000 through doublings

Spray 0x800 ALPC ports to fill 0x1000 holes

#define NUM_ALPC 0x800
HANDLE ports[NUM_ALPC];
CreateALPCPorts(ports, NUM_ALPC);
AllocateALPCReserveHandles(ports, NUM_ALPC, reservesCount - 1);

Leak via Corrupted WNF

After an overflow corrupts _WNF_STATE_DATA.DataSize by 8 bytes (0xff0 → 0xff8), the WNF reads 8 bytes past its data boundary into the adjacent ALPC handle table:

// The leaked value at WNFOutput[0xff0] is Handles pointer from _ALPC_HANDLE_TABLE
ALPC_leak = *((unsigned long long*)(WNFOutput + 0xff0));
// Walk: KALPC_RESERVE → ALPC_PORT → EPROCESS+0x18 → Token+0x4b8

See Cve 2024 30085 for complete flow.


WNF 0xC0 Paged Pool Spray (CVE-2021-31956 Pattern)

First published use of WNF structures as paged pool grooming objects (Alex Plaskett, NCC Group, 2021). The key insight: WNF allocations go to paged pool and are precisely size-controlled from user-mode — making them ideal for placing adjacent to paged-pool overflow victims.

Objects and Sizes

_WNF_NAME_INSTANCE:
  Size = 0xA8 bytes (fixed struct) + 0x10 pool header → chunk 0xB8 → rounds to 0xC0 LFH bucket

_WNF_STATE_DATA (variable body):
  ExAllocatePoolWithQuotaTag(PagedPool, dataLen + 0x10, 'WNF ')
  To land in 0xC0 bucket: dataLen = 0xA0 → chunk = 0xA0 + 0x10 (WNF header) + 0x10 (pool hdr) = 0xC0

Overflow victim (NtFE — NTFS EA):
  NtQueryEaFile(..., Length = firstEaBlockSize)
  → allocates output buffer = firstEaBlockSize bytes
  For 0xC0 bucket: Length = 0xB0 → 0xB0 + 0x10 pool header = 0xC0

All three objects — _WNF_STATE_DATA, _WNF_NAME_INSTANCE, and the NTFS EA output buffer — share the same 0xC0 LFH bucket in paged pool.

Allocation API

// Allocate _WNF_NAME_INSTANCE + _WNF_STATE_DATA (dataLen bytes body):
NtCreateWnfStateName(&state, WnfTemporaryStateName, WnfDataScopeMachine,
                     FALSE, 0, 0x1000, psd);
NtUpdateWnfStateData(&state, buf, dataLen, 0, 0, 0, 0);
// → ExAllocatePoolWithQuotaTag(PagedPool, dataLen + 0x10, 'WNF ')

Strategy

Phase 1 — Spray 0xC0 paged pool with WNF objects:
  Allocate N × WNF state names with dataLen=0xA0 (→ WNF_STATE_DATA 0xC0)
  AND N × WNF_NAME_INSTANCE (each 0xC0)
  Goal: fill the 0xC0 LFH subsegment

Phase 2 — Create holes:
  Free every-other WNF state name → alternating free/busy pattern

Phase 3 — Trigger victim allocation:
  NtQueryEaFile(Length = 0xB0) → NtFE victim allocated into a hole
  Layout: [WNF][NtFE victim][WNF]..

Phase 4 — Trigger overflow:
  NtQueryEaFile EA2 causes integer underflow → controlled memmove past victim
  Overflow of 0x10 bytes past victim end hits adjacent WNF object:
    If WNF_STATE_DATA: corrupts Header+AllocatedSize+DataSize+ChangeStamp at +0x0/+0x4/+0x8/+0xC
    If WNF_NAME_INSTANCE: corrupts Header+RunRef at +0x0/+0x8

Phase 5 — Detect corrupted WNF_STATE_DATA:
  NtQueryWnfStateData all state names; find one with unexpectedly large DataSize
  → identified corrupted WNF_STATE_DATA → OOB read/write relative to that object

Why WNF_STATE_DATA Corruption is Exploitable

Setting DataSize larger than the real allocation enables:

  • OOB read: NtQueryWnfStateData copies DataSize bytes from the WNF payload → reads past the end of the allocation into adjacent paged pool objects
  • OOB write: NtUpdateWnfStateData copies user data into the WNF payload up to DataSize bytes → writes into adjacent objects

From there, pivot to _WNF_NAME_INSTANCE corruption (overwrite StateData pointer → controlled AAR/AAW) or read CreatorProcess (+0x98) for an EPROCESS leak without any separate KASLR bypass.

See Wnf Internals for complete WNF structure reference and Cve 2021 31956 for full exploit chain.


Cross-Subsegment LFH → VS Overflow (CVE-2021-31969 Pattern)

Used when the overflow victim is a tiny LFH object (e.g., 0x20 bytes) but the target for corruption must be a VS-subsegment object (e.g., WNF at 0x1000 bytes). Standard LFH grooming cannot bridge the gap — this technique exhausts both LFH and VS allocator state to force them adjacent.

The Challenge

The HsRp allocation in CVE-2021-31969 is 0x20 bytes → LFH bucket. Under LFH, the victim chunk can only be adjacent to other 0x20-byte objects. WNF_STATE_DATA is 0x1000 bytes → VS segment. They cannot be adjacent under normal conditions.

Key Insight

LFH buckets (fixed-size, managed by LFH frontend) and VS subsegments (variable-size, managed by VS frontend) can be physically contiguous in pool memory when both the LFH backend and VS backend allocate new segments simultaneously.

Procedure

Step 1 — Exhaust all existing 0x20 LFH buckets:
  Spray _TERMINATION_PORT objects (NtRegisterThreadTerminatePort)
  → forces LFH backend to allocate a fresh 0x20 LFH segment
  → new LFH segment appears at current pool high water mark

Step 2 — Exhaust all existing VS subsegments:
  Spray _WNF_STATE_DATA (DataSize=0xff0 → 0x1000 alloc) + _TOKEN objects
  → forces VS allocator to create a new VS subsegment
  → new VS subsegment appears adjacent to (or near) the new LFH segment

Step 3 — Overflow past the LFH segment:
  Trigger overflow of up to ~4 pages of DWORD(0x1000) values
  If fewer than 4 LFH pages exist between victim and VS subsegment:
  overflow crosses the LFH boundary into the VS subsegment

Step 4 — WNF gets overwritten:
  DWORD(0x1000) values overwrite WNF_STATE_DATA.AllocatedSize = 0x1000
                                   WNF_STATE_DATA.DataSize = 0x1000
  → page-sized OOB read/write via NtQueryWnfStateData/NtUpdateWnfStateData

Spray Object: _TERMINATION_PORT (0x20 LFH filler)

struct _TERMINATION_PORT {
    struct _TERMINATION_PORT* Next;  //0x0
    VOID* Port;                      //0x8
};
// Allocated via NtRegisterThreadTerminatePort(alpcPortHandle)
// Freed when thread terminates — use a worker thread with synchronized exit

Reliability

This technique requires luck with layout — pool spray must exhaust both allocators simultaneously. Run sufficient spray volume (thousands of objects) to maximize the probability that new LFH and VS segments land adjacent. Expect <100% reliability; tune based on crash vs. success ratio.

See Cve 2021 31969 for full implementation.


Cross-Cache / Cross-Pool Grooming

When vulnerable object and attacker object must be in different pools/caches, grooming becomes harder:

  • Must cause pools to grow into the same page (uncommon but possible with careful timing)
  • Alternative: find an object in the same pool as vulnerable object that you can control

Measuring Grooming Success

Debug Approach

  1. Enable heap metadata tracing (ETW heap events)
  2. Allocate spray objects → record their addresses
  3. Trigger target allocation → check address
  4. Calculate relative distance from nearest spray object

Production Approach (Reliability Engineering)

  • Run exploit 100+ times in controlled environment
  • Measure % of runs where grooming succeeded
  • Tune phase 2 (spray count) and phase 3 (hole creation pattern) until >90% reliable

Grooming Anti-Patterns

MistakeEffect
Wrong LFH bucketObjects in different subsegments → layout non-deterministic
Too few spray objectsHoles in other pages get filled first
Not activating LFH firstVS segment used → non-contiguous layout
Pool type mismatchAttacker objects in Paged Pool, vulnerable in NonPaged → never adjacent
Not draining lookaside firstLookaside list returns stale chunks before reaching target region

_WNF_NAME_INSTANCES Pre-Spray (Preventing LFH Pollution)

When spraying _WNF_STATE_DATA objects to fill a VS subsegment, a side-effect is that each WNF allocation also creates a companion _WNF_NAME_INSTANCES object (size ~0xD0). Without mitigation, these 0xD0 allocations cause the LFH frontend to create an additional 0xD0 LFH segment, which may fall between the intended LFH and VS subsegments, disrupting the cross-subsegment layout.

Solution: Pre-spray 0xD0 pool holes before the main spray:

for (UINT i = 0; i < 0x4000; i++)
    AllocateWnfObject(0xD0, &gStateName[i]);
for (UINT i = 0; i < 0x4000; i++)
    NtDeleteWnfStateName(&gStateName[i]);
// Now WNF_NAME_INSTANCES objects land in pre-existing holes,
// not in new LFH segments that would disrupt layout

Used in CVE-2022-22715 (Windows Dirty Pipe). Cross-reference: Wnf Internals.


VS Pool RBTree Repair (Post-OOB-Write Cleanup)

When a massive OOB write (e.g., CVE-2022-22715’s 0xFFFE-byte write) corrupts the VS subsegment’s Red-Black Tree metadata alongside object content, the Segment Heap allocator will BSOD on the next allocation or free because it dereferences the corrupted tree node pointers.

Each VS subsegment RBTree node layout:

+0x00  Left child pointer
+0x08  Right child pointer
+0x10  Parent pointer

Fix procedure (requires arbitrary R/W already established via PreviousMode flip):

  1. Compute VS pool manager address from known globals + pool chunk address:
    // CVE-2022-22715 formula:
    UINT64 pHpMgr = (globalHeap ^ pPoolChunkAddr ^ pPoolChunkValue ^ 0xA2E64EADA2E64EAD) - 0x100 + 0x290;
    
  2. Walk the RBTree from root (pHpMgr → left child), traversing toward the corrupted chunk’s address:
    // Navigate: if target < current → go left; if target > current → go right
    // When found: overwrite node with fake node (zero children, relink parent)
    WriteKernel(corruptNode + 0x00, 0);         // null left child
    WriteKernel(corruptNode + 0x08, 0);         // null right child
    WriteKernel(corruptNode + 0x10, parentPtr); // maintain parent link
    
  3. Traverse from both left and right subtrees of root to catch all paths to the corrupt node

Cost: Any VS chunks parented under the deleted node become unreachable (memory leak), but the system no longer crashes. Acceptable trade-off for a stable exploit.

Used in CVE-2022-22715. See also Cve 2022 22715.


Application-Specific Custom Allocator Grooming (WinDNS / CVE-2020-1350 Pattern)

When a target service maintains its own memory pool (rather than using the OS allocator directly), grooming must understand that allocator’s semantics. WinDNS (dns.exe) is the canonical example.

WinDNS Custom Allocator Properties

  • Buckets: 0x50, 0x68, 0x88, 0xA0 (singly linked LIFO freelists)
  • > 0xA0: delegates to HeapAlloc
  • Freed objects: pushed back onto bucket freelist, never returned to native heap
  • Chunk isolation: native heap chunks (0xFF0/0xFA0 sized) contain only WinDNS bucket buffers — no mixing with other application allocations
  • Chunk contiguity: spray order = allocation order within a new native chunk (LIFO within chunk)

This isolation means: if the overflow buffer is < 0xA0 bytes, the overflow walks into other WinDNS bucket buffers, never into OS heap metadata or unrelated data.

TTL as Heap Primitive

The DNS server’s record cache respects TTL values set by the responding DNS server. In CVE-2020-1350, the attacker is the responding DNS server:

TTL valueEffect
Long TTLRecord cached indefinitely → buffer stays allocated
Short TTLRecord freed after ~2 minutes (next cache cleanup cycle)
dwTTL = 0 + dwTimeStamp = 0 in fake recordRecord treated as already expired → freed on next DNS query for that name

The zero-TTL trick enables immediate controlled free without waiting 2 minutes — critical for the later staged exploitation.

Hole-Making Procedure (Avoid SEGFAULT)

1. Heap spray: send many subdomain queries → malicious DNS server responds with
   records (long TTL) → victim caches them → fills WinDNS bucket heap
2. Assign short TTL to exactly one subdomain (the "target" slot)
3. Wait ~2 minutes for WinDNS cleanup to free the target record buffer
4. Re-query the target subdomain with the malformed SIG payload
   → LIFO: new allocation lands in the exactly freed slot
   → overflow stays within the already-mapped heap segment → no SEGFAULT

Grooming to Avoid Cache Tree Corruption

The DNS record cache is a binary tree. Tree nodes are 0x88-byte objects (Bucket 3). When a new 0xFF0 native heap chunk is allocated for Bucket 3, tree nodes appear adjacent to spray buffers:

Fix:

  1. Pre-exhaust the Bucket 3 freelist: allocate many 0x88-byte records, then expire them. They go back onto the freelist.
  2. Subsequent tree node allocations consume the freelist entries → no new native heap chunks needed → tree nodes never appear adjacent to the spray region.
  3. Spray enough 0xA0 records to fill an entire new native chunk — the overflow target and its neighbours are now exclusively attacker-controlled 0xA0 records.

Controlling Reallocation (Fake wRecordSize)

By overwriting RR_Record.wRecordSize in a fake record, the exploit forges which bucket size is reported when the buffer is “freed” via the zero-TTL trick. The LIFO allocator then serves the next allocation of that bucket size from the controlled slot:

Fake RR_Record.wRecordSize = 0x50
→ Free fake record (zero TTL query)
→ Buffer pushed back onto 0x50 bucket freelist
→ Next 0x50 allocation (e.g., DNS_Timeout object) lands in this slot

This is the mechanism for injecting arbitrary object types into the controlled heap region.

Key Contrast with Kernel Pool Grooming

PropertyKernel Pool (Segment Heap LFH)WinDNS Custom Allocator
Allocation orderLFH: pseudo-random within subsegmentLIFO: fully deterministic
Free behaviorReturns to LFH bitmapReturns to bucket freelist (LIFO)
Bucket activation≥18 allocsAlways active
External mixingOther kernel objects share pageIsolated — only WinDNS objects in chunk
Heap primitiveRace/timing sensitiveTTL (attacker-controlled DNS response)

See Cve 2020 1350 for the complete exploit chain.


Exploit Relevance

Grooming is the difference between a crash and a working exploit. Budget significant time on grooming — it’s often 50%+ of exploit development work. Reliability below 70% in lab = unreliable in production due to ASLR and system load variation.


References

  • “Heap Feng Shui in Javascript” — Alexander Sotirov (pioneered the concept)
  • “Kernel Pool Exploitation” — Tarjei Mandt (Azimuth Security)
  • “Pool Party” — SafeBreach Labs (2023) — novel Windows thread pool grooming
  • “Exploiting Windows Kernel Pool” — Valentina Palmiotti, OffensiveCon 2021
  • Corelan heap exploitation tutorials — corelan.be