Security & Resilience
Skill 9: Agentic Security and Adversarial Resilience
The security foundation for the age of autonomous agents.
Overview
Skill 9 is a new and critical skill dedicated to the unique security challenges of agentic AI systems. As agents gain autonomy, access to powerful tools, and the ability to take actions in the real world, they become high-value targets for novel attack vectors that don't exist in traditional software systems.
This skill addresses the foundational discipline of securing agentic systems against threats like prompt injection, data poisoning, excessive agency, and insecure output handling—threats that are unique to the agentic paradigm and require a fundamentally different security mindset.
The Three Sub-Skills
| Sub-Skill | Focus Area | Key Concepts |
|---|---|---|
| 9.1 OWASP Top 10 for Agentic Apps | Understanding and mitigating critical threats | Prompt injection, excessive agency, data poisoning |
| 9.2 Guardrails and Safety Layers | Implementing defense-in-depth security | Input/output guardrails, action confirmation |
| 9.3 Adversarial Testing | Proactive vulnerability identification | Automated testing, red team exercises |
9.1 The OWASP Top 10 for Agentic Applications
Prompt Injection Attacks
- Core Threat: Malicious inputs that manipulate agent behavior by overriding system instructions
- Direct injection: Attacker provides malicious user input ("Ignore previous instructions...")
- Indirect injection: Malicious content hidden in retrieved documents or tool outputs
Defenses:
- Input sanitization and detection
- Instruction hierarchy (system prompts that cannot be overridden)
- Output filtering for behavior deviation
- Prompt shields (Microsoft Prompt Shields, Lakera Guard)
Insecure Output Handling
- Core Threat: Agents generate outputs containing sensitive information or malicious content
- Agents may leak PII, API keys, or internal data
- Generated code could be harmful if executed
Defenses:
- Output validation with sensitive pattern scanning
- PII detection and redaction
- Code sandboxing for generated code
- Content filtering
Excessive Agency
- Core Threat: Agents with too many permissions perform unintended or harmful actions
- An agent with excessive agency can cause significant damage if compromised
Defenses:
- Least privilege (minimum necessary permissions)
- Human-in-the-loop for high-risk actions
- Action confirmation before destructive operations
- Hard permission boundaries
Data Poisoning
- Core Threat: Attackers inject malicious data into knowledge bases or memory
- Targets RAG systems, vector databases, training data
Defenses:
- Data validation and source verification
- Anomaly detection
- Provenance tracking
- Cross-reference with trusted sources
9.2 Guardrails and Safety Layers
Input Guardrails
Purpose: Scan all inputs for malicious content before they enter the agent's context.
Technical Implementation:
- Pattern matching (regex-based detection)
- ML classifiers trained on adversarial inputs
- Semantic analysis for deviation detection
- Prompt shields (Lakera Guard, Microsoft Prompt Shields)
Frameworks:
- NeMo Guardrails (NVIDIA)
- Guardrails AI (Python validation)
- LangKit (LangChain security toolkit)
Output Guardrails
Purpose: Scan all agent outputs before execution or delivery.
Technical Implementation:
- PII detection (emails, SSNs, credit cards)
- Toxicity detection
- Policy enforcement
- Hallucination detection
Action Confirmation and Human-in-the-Loop
Purpose: Require explicit confirmation before high-risk actions.
Implementation:
- Risk scoring to classify actions
- Approval workflows for high-risk operations
- Break-glass emergency procedures
- Comprehensive audit trails
9.3 Adversarial Testing and Red Teaming
Automated Adversarial Testing
Purpose: Systematically probe agents for vulnerabilities.
Tools and Frameworks:
- Garak: LLM vulnerability scanner
- PyRIT: Microsoft's Python Risk Identification Toolkit
- Promptfoo: Red teaming platform for LLMs
- Fuzzing tools for edge case discovery
Testing Scenarios:
- Prompt injection attempts
- Jailbreaking (bypassing safety filters)
- PII extraction
- Excessive agency exploitation
Red Team Exercises
Purpose: Security experts attempt to compromise the system before attackers do.
Process:
- Reconnaissance: Understand capabilities and attack surface
- Initial access: Find compromise vectors
- Privilege escalation: Expand access
- Lateral movement: Compromise other systems
- Exfiltration: Extract sensitive data
- Report: Document findings and remediation
Transferable Competencies
Mastering Skill 9 requires proficiency in:
- Threat Modeling: Identifying attack surfaces and threat actors
- Adversarial AI: Understanding adversarial machine learning
- Security Engineering: Defense-in-depth architectures
- Red Teaming: Thinking like an attacker
- Incident Response: Detecting and responding to incidents
- Compliance: GDPR, HIPAA, SOC2 requirements
Common Pitfalls
- No input validation: Allowing untrusted input directly into agent context
- Excessive permissions: Granting more access than necessary
- No output filtering: Allowing agents to leak sensitive information
- Ignoring indirect injection: Only protecting against direct user input
- No human-in-the-loop: Autonomous high-risk actions
- Weak guardrails: Easily bypassed protections
- No adversarial testing: Deploying without security validation
- Trusting external data: Not validating RAG or tool data
- No incident response plan: Unprepared for security incidents
Key Technologies
Security Standards
- OWASP Top 10 for LLM Applications
- OWASP Top 10 for Agentic Applications 2026
- NIST AI Risk Management Framework
- ISO/IEC 42001
Guardrail Frameworks
- NeMo Guardrails (NVIDIA)
- Guardrails AI
- LangKit
- Lakera Guard
Adversarial Testing Tools
- Garak (vulnerability scanner)
- PyRIT (Microsoft)
- Promptfoo
The Bottom Line
Skill 9 is the security foundation for the age of autonomous agents. Agentic systems face unique threats that don't exist in traditional software—prompt injection, data poisoning, excessive agency—and require a fundamentally different security mindset.
By understanding the OWASP Top 10 for Agentic Applications, implementing robust guardrails and safety layers, and conducting continuous adversarial testing, organizations can build agentic systems that are not only powerful and autonomous but also secure, resilient, and trustworthy.
← Back to Nine Skills Framework | Back to Tool Engineering →