Tool Engineering

Semantic Capability and Tool Engineering

Skill 8 of 9 | Pillar III: Trust & Security

The interface design discipline for the age of agents—creating tools that are discoverable, understandable, and safe.

The Agent's User Interface

Here's a perspective shift that separates amateur agent builders from professionals: tools are the user interface for agents. Just as poorly designed UI frustrates human users, poorly designed tools frustrate agents—leading to errors, inefficiency, and failed tasks.

Skill 8 represents the critical discipline of designing, building, and managing the tools that agents use to interact with the world. Agents extend their capabilities through tools—functions, APIs, and services that allow them to perform actions beyond language generation. The quality of these tools is a primary determinant of agent performance.

Consider the difference: a tool with a vague description like "process data" gives the agent no guidance on when to use it or what to expect. A tool with a precise description—"Convert CSV file to JSON format, preserving column headers as keys"—tells the agent exactly what it does, when it's appropriate, and what the output will look like.

This skill addresses the art and science of creating tools that are discoverable (agents can find them), understandable (agents know when and how to use them), and safe (agents can use them without causing harm). Master these principles, and your agents will work with surgical precision. Ignore them, and you'll spend endless hours debugging why your agent chose the wrong tool or passed invalid parameters.

The Four Sub-Skills of Tool Engineering

Sub-Skill	Focus Area	Key Concepts
8.1 Function Calling and Tool Definitions	Designing clear schemas and robust error handling	JSON Schema, OpenAPI, structured errors
8.2 Dynamic Tool Discovery and Composition	Enabling agents to find and chain tools	Tool registries, semantic search, composition
8.3 Tool UX Design for Agents	Crafting descriptions for semantic usability	Clear descriptions, examples, semantic altitude
8.4 Agent Skills Standard and Progressive Disclosure	Modular, scalable tool packaging	SKILL.md format, progressive disclosure

8.1 Function Calling and Tool Definitions

Modern LLMs support function calling (also called tool use), where the model outputs a structured request to invoke a function with specific parameters. The quality of your tool definitions directly impacts how accurately agents use them.

Designing Clear Tool Schemas

Core Principle: Every tool must have a precise schema defining its inputs, outputs, and behavior.

A tool schema is typically defined using JSON Schema or OpenAPI. It specifies the function name, description, parameters (with types, constraints, and descriptions), and return value. Ambiguous or incomplete schemas lead to agent errors.

Example JSON Schema:

{
  "name": "get_weather",
  "description": "Get current weather for a location",
  "parameters": {
    "type": "object",
    "properties": {
      "location": {
        "type": "string",
        "description": "City name, e.g., 'San Francisco'"
      },
      "unit": {
        "type": "string",
        "enum": ["celsius", "fahrenheit"],
        "description": "Temperature unit"
      }
    },
    "required": ["location"]
  }
}

Best Practices:

Use descriptive parameter names that convey meaning
Provide examples in descriptions (e.g., "City name, e.g., 'San Francisco'")
Specify constraints using enums, ranges, and patterns
Mark required parameters explicitly
Document default values for optional parameters

Implementing Robust Error Handling

Core Principle: Tools must return structured error messages that agents can understand and act upon.

A tool that returns a cryptic error code or stack trace is useless to an agent. Error responses should be structured (JSON), include a human-readable message, an error code, and guidance on how to fix the problem.

Structured Error Response:

{
  "error": {
    "code": "MISSING_PARAMETER",
    "message": "Required parameter 'location' is missing",
    "parameter": "location",
    "suggestion": "Provide a city name, e.g., 'San Francisco'"
  }
}

Benefits: Enables agents to self-correct without human intervention, reduces retry loops, and improves user experience.

Function Calling Protocols

Different LLM providers have different function calling implementations:

OpenAI: Uses tools parameter with function type; model returns tool_calls in response
Anthropic: Uses tools parameter; model returns tool_use content blocks
Google: Uses function_declarations; model returns function_call

Implementation Considerations: Handle multi-turn function calling (agent calls tool, receives result, continues reasoning), parallel function calls (multiple tools invoked simultaneously), function call validation, and graceful error recovery.

8.2 Dynamic Tool Discovery and Composition

In large ecosystems, it's impossible to hardcode all tools into the agent's prompt. Agents must discover tools dynamically based on the task at hand.

Tool Registries and Catalogs

Core Principle: A searchable catalog of available tools, indexed by capability and domain.

A tool registry is a database of available tools with metadata: name, description, category, tags, version, schema. Agents query the registry to find tools that match their current needs.

Registry Metadata:

Name and description
Category (e.g., "data processing", "communication", "finance")
Tags (e.g., "weather", "API", "database", "file-system")
Version and deprecation status
Usage statistics and ratings
Authentication requirements

Implementation: Use a database (PostgreSQL, MongoDB) or specialized tool registry service. Implement search APIs with filtering by category, tags, and semantic similarity.

Semantic Search for Tools

Core Principle: Agents find tools by describing what they need in natural language.

Using embedding-based search, agents can query the tool registry with natural language (e.g., "I need to convert currency"). Tool descriptions are embedded into vectors, and similarity search finds the most relevant tools.

Technical Implementation:

Embed tool descriptions using an embedding model (OpenAI embeddings, sentence transformers)
Store embeddings in a vector database (Pinecone, Weaviate, Qdrant, Chroma)
At query time, embed the agent's query
Retrieve top-k similar tools based on cosine similarity
Present tools to agent for selection

Benefits: Intuitive tool discovery, handles synonyms and paraphrasing, scales to thousands of tools without overwhelming the agent's context.

Tool Composition and Chaining

Core Principle: Advanced agents chain multiple tools together to solve complex problems.

Tool composition means using the output of one tool as the input to another. For example: search_documents() → extract_entities() → lookup_entity_info(). Tools must be designed with composable interfaces (standardized input/output formats).

Design Patterns:

Pipeline: Sequential tool execution where each tool's output feeds the next
Fan-out/Fan-in: Parallel tool execution with result aggregation
Conditional: Tool selection based on intermediate results
Iterative: Repeated tool calls until a condition is met

Implementation Requirements: Define standard data formats (JSON), implement tool orchestration logic, handle errors in multi-step workflows, provide rollback mechanisms for failed chains.

8.3 Tool UX Design for Agents

The "user interface" for an agent is the tool description and schema. Poor UX leads to agent confusion and errors—just as poor UI leads to human frustration.

Writing Clear, Unambiguous Descriptions

Core Principle: Explain not just what the tool does, but when to use it and what the inputs/outputs mean.

A good tool description includes:

What: What does the tool do?
When: When should the agent use it?
Inputs: What do the parameters mean? What format?
Outputs: What will the tool return? What format?
Constraints: What are the limitations?

Example:

Tool: get_weather
Description: Retrieves current weather conditions for a specified location. 
Use this when the user asks about current weather, temperature, or conditions.
Input: location (string) - City name or "City, Country" format, e.g., "Paris, France"
Input: unit (string) - Temperature unit, either "celsius" or "fahrenheit". Defaults to celsius.
Output: JSON object with temperature, conditions, humidity, wind speed.
Limitations: Only provides current weather, not forecasts. Requires internet connection.

Providing Tool Examples

Core Principle: Include example invocations to guide the agent.

Examples are the fastest way for agents to learn how to use a tool. Provide multiple examples covering common use cases and edge cases.

Example Invocations:

Example 1: Get weather in Celsius
Input: {"location": "Tokyo", "unit": "celsius"}
Output: {"temperature": 18, "conditions": "Partly cloudy", "humidity": 65}

Example 2: Get weather in Fahrenheit
Input: {"location": "New York", "unit": "fahrenheit"}
Output: {"temperature": 72, "conditions": "Sunny", "humidity": 45}

Example 3: Location with country specification
Input: {"location": "London, UK", "unit": "celsius"}
Output: {"temperature": 12, "conditions": "Rainy", "humidity": 85}

Benefits: Accelerates agent learning, reduces trial-and-error, improves first-call success rate dramatically.

Optimizing Semantic Altitude

Core Principle: Balancing specificity and flexibility in tool descriptions.

Semantic altitude refers to the level of abstraction in a tool description. Too specific (low altitude) and the tool is inflexible; too general (high altitude) and the agent doesn't know when to use it.

Low Altitude (too specific): "Get weather in San Francisco in Fahrenheit"
Optimal Altitude: "Get current weather for any location in specified units"
High Altitude (too general): "Get information about something"

Best Practice: Start with medium altitude, adjust based on agent performance. Use A/B testing to find the optimal altitude for each tool.

8.4 Agent Skills Standard and Progressive Disclosure

As tool ecosystems grow, managing hundreds or thousands of tools becomes a challenge. The Agent Skills Standard provides a blueprint for packaging tools in a modular, scalable way.

The Agent Skills Standard

Core Principle: Modular skill definitions with progressive disclosure to prevent context overload.

Each skill is a directory containing:

SKILL.md: The interface—skill name, brief description, when to use it (always loaded)
Supporting resources: Detailed documentation, schemas, examples (loaded only when needed)

Directory Structure:

skills/
  weather/
    SKILL.md          # Interface (always loaded)
    detailed_docs.md  # Loaded on demand
    schema.json       # Loaded on demand
    examples/         # Loaded on demand
  currency_conversion/
    SKILL.md
    detailed_docs.md
    schema.json
    examples/

Progressive Disclosure

Core Principle: Load information just-in-time to preserve context window.

Progressive disclosure means agents initially see only skill names and brief descriptions (low context cost). Only when a skill is deemed relevant does the agent load the full documentation (high context cost).

The Flow:

Agent sees list of available skills (names and one-line descriptions)
Agent identifies relevant skills based on task
Agent loads full documentation for selected skills only
Agent uses loaded skills to complete task

Benefits: Reduces context window usage dramatically, scales to hundreds of skills without performance degradation, improves agent focus by filtering irrelevant tools.

Real-World Success and Failure Scenarios

Success: Enterprise Tool Ecosystem

Scenario: Financial services firm builds tool registry with 500+ tools.
Impact: Agents can dynamically discover and use tools, 10x capability expansion.
Implementation: Semantic search with embeddings, Agent Skills standard, progressive disclosure.
Outcome: Agents handle complex workflows without hardcoded tool lists.

Failure: Ambiguous Tool Schema

Scenario: Tool with vague parameter description causes agent to pass wrong data type.
Impact: 30% error rate, poor user experience, manual intervention required.
Root Cause: Schema said "date" without specifying format (ISO 8601? Unix timestamp? MM/DD/YYYY?).
Mitigation: Precise schema with format examples, input validation, structured error messages.

Success: Tool Composition for Complex Workflows

Scenario: Legal research agent chains document search, entity extraction, and case law lookup.
Impact: 5x faster research, comprehensive analysis, high accuracy.
Implementation: Composable tool interfaces, standardized JSON formats, orchestration logic.
Outcome: Agents solve complex problems without human intervention.

The Principle-Based Transformation

From Ad-Hoc Tool Integration...

Poorly documented APIs with no examples
Ambiguous function signatures that confuse agents
Cryptic error messages that provide no guidance
No tool discovery mechanism—hardcoded lists
No versioning—breaking changes break agents

To Systematic Tool Engineering...

Understanding tools as the agent's user interface
Mastering semantic usability and discoverability principles
Applying software engineering best practices to tool design
Building ecosystems of composable, reusable tools with progressive disclosure

Transferable Competencies

Mastering tool engineering builds expertise in:

API Design: RESTful design, OpenAPI, JSON Schema, versioning strategies
Interface Design: Usability principles, documentation, examples, error handling
Semantic Engineering: Natural language understanding, embedding-based search
Software Engineering: Modularity, composability, error handling, testing
Information Architecture: Categorization, tagging, metadata design

Common Pitfalls to Avoid

Ambiguous schemas: Vague parameter descriptions lead to agent errors
Poor error messages: Cryptic errors prevent agent self-correction
No examples: Agents struggle to learn tool usage without examples
Wrong semantic altitude: Too specific or too general descriptions
No tool discovery: Hardcoded tool lists don't scale
Non-composable tools: Incompatible input/output formats prevent chaining
Context overload: Loading all tool documentation upfront wastes context
No versioning: Breaking changes break existing agents
Ignoring usability: Treating tools as technical APIs rather than agent interfaces

Implementation Guidance

For Tool Engineers: Design clear JSON Schema specifications for all tools. Implement structured error handling with actionable messages. Write comprehensive tool descriptions with when-to-use guidance. Provide multiple examples covering common and edge cases. Test tools with actual agents to identify usability issues.

For Architects: Design tool registry architecture with semantic search capabilities. Define tool categorization and tagging standards. Plan for tool composition and orchestration. Adopt the Agent Skills standard for modular packaging. Establish tool quality and governance processes.

For Platform Teams: Build tool registry infrastructure with APIs. Implement embedding-based semantic search. Create tool development SDKs and templates. Monitor tool usage and performance metrics. Maintain tool documentation standards.

Looking Forward

The field is evolving toward:

AI-Generated Tools: LLMs generating tools from natural language descriptions
Self-Extending Agents: Agents that create new tools to fill capability gaps
Tool Learning: ML models that learn optimal tool usage patterns from experience
Universal Tool Standards: Industry-wide standards for tool definitions (OASF evolution)
Multi-Modal Tools: Tools that seamlessly handle text, images, audio, and video
Tool Marketplaces: Thriving ecosystems of third-party tools for agents

Next Skill: Security & Resilience — Protecting agents against adversarial threats.

Back to: The Nine Skills Framework | Learn

Subscribe to the Newsletter → for weekly insights on building production-ready AI systems.