An agent without tools is just a chatbot. Claude can write beautiful prose about how to query a database, but it can't actually query one unless you give it a tool to do so. That much is obvious. What's less obvious — and what took me longer to figure out than I'd like to admit — is that the hard part of tool design has almost nothing to do with code.
The hard part is communication.
Writing for a reader that doesn't read
When you build a tool for an agent, you're writing an interface. Not a visual interface for humans — a textual interface for a language model. The model sees three things: the tool's name, its description, and its parameter schema. That's it. If the model can't figure out when and how to use your tool from those three pieces of information, the implementation behind them doesn't matter. It could be the most elegant code you've ever written. The agent will still use it wrong, or not at all.
I didn't understand this at first. My early tools were built like I was writing for a colleague — someone who'd read the surrounding code, understand the architecture, infer context from variable names. Models don't do that. They pattern-match on descriptions. The description is the interface.
Here's what this looks like in practice. I build tools for a tabletop RPG game master agent — it manages characters, locations, dice rolls, scene planning. When I first wrote the entity retrieval tools, I had a single tool called something like entity_ops with an operation parameter that accepted get, update, delete, query. Seemed clean from a code perspective. One handler, one routing function, nice and tidy.
The agent couldn't figure out when to use it. Or rather, it could — sometimes. It would call entity_ops with operation: "get" when it should have been querying, or it would pass query parameters to a get operation. The tool was clear to me because I knew what all the operations did. The model had to infer the right operation from a single description string that covered four different behaviors.
So I split it. get_entity, get_component, update_component, query_entities, add_component, remove_component. Six tools instead of one. Each does exactly one thing. Each has a description that says exactly what that thing is.
The agent immediately got better at using them.
The single-responsibility insight
This isn't a novel software engineering principle. Single responsibility has been around forever. But it hits differently when your consumer is a language model instead of a human developer.
A human developer can read documentation, look at type signatures, check examples, run the code and see what happens. They can hold a complex multi-purpose API in their head because they understand it structurally. A model encounters each tool invocation as a fresh decision: given the current conversation, which tool should I call, with what parameters? The simpler that decision, the more reliably it makes the right one.
A tool called roll_dice with a description that says "Roll dice with Shadowdark rules" — the model knows exactly when to reach for that. A tool called game_mechanics with a description that says "Various game mechanics operations" — that's useless. Not because it can't do more things, but because the model can't confidently decide when this is the right tool versus some other tool.
The trade-off is real, though. More tools means more options for the model to evaluate on every turn. There's a point where you have so many tools that the model starts getting confused by the sheer number of choices rather than by the ambiguity of any individual tool. I haven't found a hard limit, but somewhere around 15-20 tools per agent I start noticing degradation. The fix is scoping agents — not stuffing every capability into one agent, but having focused agents with focused tool sets.
Descriptions over implementation
This is the part that feels counterintuitive if you come from a software background. The implementation of a tool — the actual code that runs when it's called — is the easy part. The description is where you succeed or fail.
I've started writing tool descriptions the way I'd write documentation for someone who's never seen the codebase. Not terse, not clever. Explicit. When I write a tool for updating a component on an entity, the description says: "Update or add a component to an entity. If the component exists, new data is merged into it. Validates component against schema before saving." That's not a sentence I'd write in a comment for myself. It's a sentence written for a reader that needs to understand exactly what will happen before deciding whether to call this function.
Parameter descriptions matter just as much. entity_id described as just string is less useful than entity_id described as "The unique identifier of the entity to retrieve." And for parameters where the model might not know what values are valid, enumerating the options in the description changes everything. Instead of component_key: string, it's component_key described as 'Component key (e.g., "health", "clock", "abilities", "description")'. Those examples aren't for humans — they're for the model to pattern-match on.
This reframe changed how I think about tool development entirely. I spend more time on descriptions than on handlers now. The handler is just code. The description is the thing that determines whether the handler ever gets called correctly.
Error messages as instructions
My first tools returned errors like { "error": "failed" } or at best { "error": error.message }. The raw error from whatever operation had crashed. Useful for debugging. Useless for an agent.
When a tool returns an error to Claude, the model has to decide what to do next. Retry? Try a different approach? Give up and tell the user? The error message is the only information it has to make that decision. "Failed" gives it nothing. "ECONNREFUSED" gives it something only if the model happens to know what that means in context.
What works: actionable error messages that tell the agent what went wrong and what to try. "entity_id must be a non-empty string" is better than "invalid input." "Component 'health' already exists on entity 'thorin'" is better than "duplicate key error." The agent can read those messages and adjust its next action. That's the point.
I've learned to think of error responses as instructions to the agent's next turn. Not "here's what broke" but "here's what you should do differently." It's the same communication problem as descriptions, just applied to failure cases.
Composition beats complexity
The instinct — especially early on — is to build sophisticated tools that handle complex workflows. A tool that fetches data, cleans it, analyzes it, generates a report. One tool call, everything handled.
This breaks for the same reason multi-purpose tools break: the model can't predict what will happen. And when something fails at step three of a five-step pipeline, the error message has to somehow communicate which step failed and what to do about it, which gets complicated fast.
What works better: small tools that compose. Let the agent be the orchestrator. It calls get_entity to check current state, then update_component to change what needs changing, then query_entities to verify the result. Three tool calls instead of one. But the agent understands each step, can handle errors at each step, and can adapt its approach mid-sequence.
This feels like it should be slower or less efficient, and sometimes it is. But reliability matters more than speed for agent workflows. A three-step sequence that works every time beats a one-step black box that works 80% of the time and fails inscrutably the other 20%.
What this actually is
The deeper realization — the one that reframed everything for me — is that tool design for agents is closer to writing documentation than writing code. You're describing capabilities in natural language for a consumer that processes natural language. The code behind the description is important, obviously. But the description is the product. The code is the implementation detail.
This makes tool design a weirdly humanistic discipline for something so technical. You're doing empathy work — trying to anticipate how a language model will interpret your words, what it will find ambiguous, where it will make wrong assumptions. You're writing for a reader. The reader just happens to be a statistical model.
I don't think we've figured out the best practices for this yet. The tools I build today are significantly better than what I was building six months ago, and I expect the tools I build six months from now will make these look crude. The models are getting better at interpreting tools, which means some of the workarounds we use today will become unnecessary. But the core insight — that tool design is a communication problem — I don't think that changes. If anything, as agents take on more complex tasks with more tools, clear communication between the tool designer and the model becomes more important, not less.
The interesting question is what happens when models start designing their own tools. When the reader and the writer are the same entity, does the communication problem dissolve? Or does it just move somewhere else?