Security & Resilience

Security & Resilience

Skill 9: Agentic Security and Adversarial Resilience

The security foundation for the age of autonomous agents.


Overview

Skill 9 is a new and critical skill dedicated to the unique security challenges of agentic AI systems. As agents gain autonomy, access to powerful tools, and the ability to take actions in the real world, they become high-value targets for novel attack vectors that don't exist in traditional software systems.

This skill addresses the foundational discipline of securing agentic systems against threats like prompt injection, data poisoning, excessive agency, and insecure output handling—threats that are unique to the agentic paradigm and require a fundamentally different security mindset.


The Three Sub-Skills

Sub-Skill Focus Area Key Concepts
9.1 OWASP Top 10 for Agentic Apps Understanding and mitigating critical threats Prompt injection, excessive agency, data poisoning
9.2 Guardrails and Safety Layers Implementing defense-in-depth security Input/output guardrails, action confirmation
9.3 Adversarial Testing Proactive vulnerability identification Automated testing, red team exercises

9.1 The OWASP Top 10 for Agentic Applications

Prompt Injection Attacks

  • Core Threat: Malicious inputs that manipulate agent behavior by overriding system instructions
  • Direct injection: Attacker provides malicious user input ("Ignore previous instructions...")
  • Indirect injection: Malicious content hidden in retrieved documents or tool outputs

Defenses:

  • Input sanitization and detection
  • Instruction hierarchy (system prompts that cannot be overridden)
  • Output filtering for behavior deviation
  • Prompt shields (Microsoft Prompt Shields, Lakera Guard)

Insecure Output Handling

  • Core Threat: Agents generate outputs containing sensitive information or malicious content
  • Agents may leak PII, API keys, or internal data
  • Generated code could be harmful if executed

Defenses:

  • Output validation with sensitive pattern scanning
  • PII detection and redaction
  • Code sandboxing for generated code
  • Content filtering

Excessive Agency

  • Core Threat: Agents with too many permissions perform unintended or harmful actions
  • An agent with excessive agency can cause significant damage if compromised

Defenses:

  • Least privilege (minimum necessary permissions)
  • Human-in-the-loop for high-risk actions
  • Action confirmation before destructive operations
  • Hard permission boundaries

Data Poisoning

  • Core Threat: Attackers inject malicious data into knowledge bases or memory
  • Targets RAG systems, vector databases, training data

Defenses:

  • Data validation and source verification
  • Anomaly detection
  • Provenance tracking
  • Cross-reference with trusted sources

9.2 Guardrails and Safety Layers

Input Guardrails

Purpose: Scan all inputs for malicious content before they enter the agent's context.

Technical Implementation:

  • Pattern matching (regex-based detection)
  • ML classifiers trained on adversarial inputs
  • Semantic analysis for deviation detection
  • Prompt shields (Lakera Guard, Microsoft Prompt Shields)

Frameworks:

  • NeMo Guardrails (NVIDIA)
  • Guardrails AI (Python validation)
  • LangKit (LangChain security toolkit)

Output Guardrails

Purpose: Scan all agent outputs before execution or delivery.

Technical Implementation:

  • PII detection (emails, SSNs, credit cards)
  • Toxicity detection
  • Policy enforcement
  • Hallucination detection

Action Confirmation and Human-in-the-Loop

Purpose: Require explicit confirmation before high-risk actions.

Implementation:

  • Risk scoring to classify actions
  • Approval workflows for high-risk operations
  • Break-glass emergency procedures
  • Comprehensive audit trails

9.3 Adversarial Testing and Red Teaming

Automated Adversarial Testing

Purpose: Systematically probe agents for vulnerabilities.

Tools and Frameworks:

  • Garak: LLM vulnerability scanner
  • PyRIT: Microsoft's Python Risk Identification Toolkit
  • Promptfoo: Red teaming platform for LLMs
  • Fuzzing tools for edge case discovery

Testing Scenarios:

  • Prompt injection attempts
  • Jailbreaking (bypassing safety filters)
  • PII extraction
  • Excessive agency exploitation

Red Team Exercises

Purpose: Security experts attempt to compromise the system before attackers do.

Process:

  1. Reconnaissance: Understand capabilities and attack surface
  2. Initial access: Find compromise vectors
  3. Privilege escalation: Expand access
  4. Lateral movement: Compromise other systems
  5. Exfiltration: Extract sensitive data
  6. Report: Document findings and remediation

Transferable Competencies

Mastering Skill 9 requires proficiency in:

  • Threat Modeling: Identifying attack surfaces and threat actors
  • Adversarial AI: Understanding adversarial machine learning
  • Security Engineering: Defense-in-depth architectures
  • Red Teaming: Thinking like an attacker
  • Incident Response: Detecting and responding to incidents
  • Compliance: GDPR, HIPAA, SOC2 requirements

Common Pitfalls

  • No input validation: Allowing untrusted input directly into agent context
  • Excessive permissions: Granting more access than necessary
  • No output filtering: Allowing agents to leak sensitive information
  • Ignoring indirect injection: Only protecting against direct user input
  • No human-in-the-loop: Autonomous high-risk actions
  • Weak guardrails: Easily bypassed protections
  • No adversarial testing: Deploying without security validation
  • Trusting external data: Not validating RAG or tool data
  • No incident response plan: Unprepared for security incidents

Key Technologies

Security Standards

  • OWASP Top 10 for LLM Applications
  • OWASP Top 10 for Agentic Applications 2026
  • NIST AI Risk Management Framework
  • ISO/IEC 42001

Guardrail Frameworks

  • NeMo Guardrails (NVIDIA)
  • Guardrails AI
  • LangKit
  • Lakera Guard

Adversarial Testing Tools

  • Garak (vulnerability scanner)
  • PyRIT (Microsoft)
  • Promptfoo

The Bottom Line

Skill 9 is the security foundation for the age of autonomous agents. Agentic systems face unique threats that don't exist in traditional software—prompt injection, data poisoning, excessive agency—and require a fundamentally different security mindset.

By understanding the OWASP Top 10 for Agentic Applications, implementing robust guardrails and safety layers, and conducting continuous adversarial testing, organizations can build agentic systems that are not only powerful and autonomous but also secure, resilient, and trustworthy.


← Back to Nine Skills Framework | Back to Tool Engineering →