AI Agents for Offensive Operations
Category: Emerging Techniques / Automation MITRE ATT&CK: (No specific technique — cross-cutting automation capability) Related: Red Teaming, Reconnaissance, Command And Control
Overview
LLM-powered agents can autonomously execute multi-step offensive tasks by combining a reasoning model with a toolset (OS commands, C2 APIs, network scanners). The Model Context Protocol (MCP) provides a standardized framework for wiring tools to LLMs, making it practical to build reconnaissance agents, enumerate Active Directory, or integrate with C2 frameworks like Mythic — all with Claude or local models as the decision engine.
Core Concepts
Agent Architecture
An agent has two components:
- Brain: LLM responsible for reasoning, planning, and tool selection (Claude, GPT-4, local models via Ollama)
- Body: Tools the agent can invoke — subprocess calls, API requests, C2 callbacks
Agents use the ReAct paradigm (Reasoning + Acting): Thought → Action → Observation → loop.
What Is MCP?
Model Context Protocol (MCP) is a client-server framework that standardizes how applications expose tools and context to LLMs. An MCP server exposes:
- Tools: Executable functions (wrap OS commands, APIs)
- Prompts: Reusable prompt templates
One LLM host (e.g., Claude Desktop) connects to multiple MCP servers simultaneously.
Building an Offensive MCP Agent
Server Scaffold (FastMCP)
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("network_mapper")
@mcp.prompt()
def setup_prompt(subnet: str, objective: str) -> str:
return f"""
You are a senior penetration tester specialising in internal networks.
Your objective is to: {objective}
Target subnet: {subnet}
Think step by step. Use the ReAct pattern:
Thought: what to do
Action: which tool to call
Observation: what the result means
"""
@mcp.tool()
async def run_ldap_dns_query(host: str) -> str:
"""Perform LDAP DNS query to identify domain controller."""
command = f"dig -t SRV _ldap._tcp.dc._msdcs.{host}"
return execute_os_command(command)
@mcp.tool()
async def run_nmap_icmp(subnet: str) -> str:
"""ICMP sweep of subnet to discover live hosts."""
return execute_os_command(f"nmap -sn {subnet}")
@mcp.tool()
async def run_ldap_search(dc: str, base_dn: str, filter: str) -> str:
"""Query LDAP for AD enumeration."""
return execute_os_command(
f"ldapsearch -H ldap://{dc} -x -b '{base_dn}' '{filter}'"
)
Running and Debugging
# Start MCP inspector (debug UI at localhost:5173)
npx @modelcontextprotocol/inspector uv run main.py
# Or use 5ire as an alternative client
Mythic C2 Integration
Adam Chester (@xpn) demonstrated wrapping Mythic C2 API callbacks as MCP tools, giving Claude direct control over C2 agents:
- github.com/xpn/mythic_mcp
- Agent can autonomously Kerberoast, lateral move, exfiltrate based on goals
Tool Ideas for Offensive Agents
| Category | Tools to Wrap |
|---|---|
| Recon | nmap, ldapsearch, nslookup, dig, ping, smbclient |
| AD Enumeration | ldapsearch, impacket GetUserSPNs/GetADUsers |
| Credential Attacks | kerbrute (password spray), hashcat, impacket |
| C2 Integration | Mythic API, Sliver gRPC API |
| Reporting | Custom markdown/HTML report generators |
LlamaIndex Alternative
For more control (e.g., local models via Ollama):
from llama_index.llms.ollama import Ollama
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
llm = Ollama(model="llama3.1:8b", base_url="http://192.168.1.218:11434")
tools = [
FunctionTool.from_defaults(fn=run_nmap_icmp),
FunctionTool.from_defaults(fn=run_ldap_search),
FunctionTool.from_defaults(fn=run_password_spray),
]
agent = ReActAgent.from_tools(tools=tools, llm=llm, max_iterations=20)
response = agent.chat("Map the network at 192.168.1.0/24 and find the domain controller")
Agentic Frameworks Comparison
| Framework | Notes |
|---|---|
| MCP (Anthropic) | Standardized client-server; Claude Desktop integration; multi-server |
| LlamaIndex | More control; supports local models; no extra client needed |
| Microsoft AutoGen | Multi-agent orchestration; good for complex workflows |
| CrewAI | Role-based agent teams |
| LangGraph | Stateful agent graphs; good for long-running tasks |
| SmolAgents | Barebones; writes Python code to call tools |
| nerve (evilsocket) | YAML-based rules; great for simple offensive agents |
OPSEC Considerations
- Agent reasoning and tool calls may be logged if using cloud LLM APIs
- Tool calls generate real network traffic — scan logs, LDAP queries, DNS lookups
- Use local models (Ollama) for sensitive operations to avoid API logging
- Wrap C2 callbacks instead of direct OS tools to maintain existing C2 OPSEC
MCP-Powered Vulnerability Research (ilspycmd + Claude)
Building an MCP server that gives Claude the ability to decompile .NET assemblies enables automated vulnerability hunting in Windows binaries.
ilspycmd Docker MCP Server
from mcp.server.fastmcp import FastMCP
import subprocess, os
server = FastMCP("ilspy docker")
@server.tool()
def run_ilspycmd_docker(exe_path, output_folder) -> int:
input_dir = os.path.abspath(os.path.dirname(exe_path))
output_dir = os.path.abspath(output_folder)
exe_filename = os.path.basename(exe_path)
os.makedirs(output_dir, exist_ok=True)
ilspy_cmd_path = "/home/ilspy/.dotnet/tools/ilspycmd"
ilspy_command = f"{ilspy_cmd_path} -p -o /decompile_out {'/decompile_in'}/{exe_filename}"
docker_cmd = [
"docker", "run", "--rm",
"-v", f"{input_dir}:/decompile_in",
"-v", f"{output_dir}:/decompile_out",
"ilspycmd", ilspy_command
]
return subprocess.run(docker_cmd, capture_output=True, text=True).returncode
if __name__ == "__main__":
server.run(transport='stdio')
Workflow for deserialization hunting:
- Claude decompiles target .NET assembly via ilspycmd
- Claude reviews all decompiled files, searching for
BinaryFormatter.Deserialize,TypeFilterLevel.Full, etc. - Claude identifies attack paths from entry points to unsafe deserialize calls
- Claude generates PoC exploit code (e.g., using
ysoserial.netpayloads)
Case study — AddinUtil.exe (known vuln + new path):
System.AddIn.dllcontains unsafe.NET remotingdeserialization (TypeFilterLevel.Full)- Claude identified the
-pipelinerootflag path not documented in original disclosure - Generated working PoC that loads
AddIns.storewith crafted format → triggersBuildAddInCache→ deserializes payload
Tips:
- Explicitly instruct Claude to review ALL files — it may stop early without prompting
- Use
ysoserial.exe -f BinaryFormatter -g TypeConfuseDelegate -c calc -o raw > payload.binfor PoC payloads - Step-through debugging with dotPeek + Visual Studio resolves gaps in Claude’s PoC
LLM-Trained Specialist Models (RLVR)
Dante-7B — Outflank’s 7B parameter specialist model for evasive shellcode loader generation, trained with Reinforcement Learning with Verifiable Rewards (RLVR) targeting Microsoft Defender for Endpoint.
Hugging Face: huggingface.co/outflanknl/Dante-7B
Training Pipeline
- SFT — Fine-tune Qwen2.5-Coder-7B on CodeForces C++/Python + DeepSeek R1-generated shellcode loaders
- RLVR (GRPO) — Train against AV/EDR verifier:
- Verifier provisions clean VM, installs target EDR
- Executes generated loader via simulated double-click
- Rewards: output format → compilation → functionality (Cobalt Strike callback) → evasion (no alerts)
Key Results
- Dante-7B (7B params) outperforms DeepSeek R1 (671B) at generating evasive CS loaders
8% completely bypass MDE; generates multiple completions per prompt for reliability
- 56 hours training on 8×H100 GPUs
RLVR Concepts
| Concept | Description |
|---|---|
| GRPO | Group Relative Policy Optimization — variant of PPO without a separate reward model |
| Verifiable task | Objective truth + fast + scalable + low noise + continuous reward |
| No dataset needed | Model learns through trial/error against verifier; no human-labeled examples |
| Chain-of-thought | RLVR naturally develops reasoning steps toward correct outputs |
Trapped COM Objects for Lateral Movement (LLM-Generated PoC)
Outflank used a multi-agent LLM workflow to enumerate and exploit COM classes for lateral movement:
- Enumeration agent — Python script queries WMI for all COM classes with
AllowRemoteActivation=True, writes to JSON - Filtering agent — GPT-4.1 reviews JSON, identifies classes with
code executionpotential - PoC generation agent — GPT-4.1 generates C#/PowerShell PoC for each promising class
- Validation — Human reviews generated PoCs, tests promising ones
FileSystemImage CLSID: {2C941FC5-975B-59BE-A960-9A2A262853A5}
- Registry requirements:
AllowDCOMReflection,OnlyUseLatestCLR,TreatAs - Exploitable via
.NETreflection →Assembly.Load→ arbitrary code execution
References
- TrustedSec — “MCP: An Introduction to Agentic Op Support” (2025-03-28)
- TrustedSec — “Hunting Deserialization Vulnerabilities With Claude” (2025-06-12)
- Outflank — “Accelerating Offensive R&D with LLMs” (2025-07-31)
- Outflank — “Training Specialist Models (Dante-7B)” (2025-08-07)
- Adam Chester (@xpn) — github.com/xpn/mythic_mcp
- evilsocket — “How to Write an Agent” + nerve project
- ReAct paper — arxiv.org/abs/2210.03629
- MCP docs — modelcontextprotocol.io
