Working with an agent, properly

Most developers treat an agent like a search box with code output. Type in a vague description of what you want, accept what comes back, repeat. It works. Until it doesn't, and then it really doesn't.

The developers getting consistent results from agents are not better at prompting. They follow a workflow. Six phases, in order, every time. Here they are.

Phase 1: Before the first prompt

The most common mistake is starting with the agent. The spec comes first.

Not a Jira ticket. Not a bullet list. A real document: what are you building, why, who uses it, what does done look like, what are the edge cases, what are the non-negotiable constraints. Write this yourself, in plain language, before the agent touches anything.

Then use the agent to challenge it. Give it three roles: an Architect who maps the solution before a line of code exists, a Fact Checker who interrogates the Architect's assumptions, and a Devil's Advocate who tries to break the design. This produces a better outcome in fifteen minutes than most planning meetings manage in an hour.

Once the spec survives that, break it into a sequenced task list. One concrete task per prompt, ordered by dependency. This is your prompt plan. Addy Osmani describes the whole process as "waterfall in fifteen minutes": a structured planning phase that makes the coding phase dramatically smoother.

The task list doubles as session state. When a session dies mid-work, the checked items tell you exactly where to resume. The plan is your recovery mechanism as much as your planning tool.

The prompt is not the spec. The spec is what you write before the prompt exists.

Phase 2: The context layer

Every major agent has a context file. Claude Code uses CLAUDE.md. Cursor uses .cursorrules. GitHub Copilot has workspace instructions. Windsurf has its own rules file. The name varies. The purpose is identical: persistent context the agent reads before it does anything.

This file is not optional.

Put your stack, your commands, your conventions, and @-references to your architecture documents in it. Keep it compact: three sections, one page. A context file that runs to three pages burns tokens on instructions instead of work.

Go further with coding guidelines: explicit patterns, with examples. Not preferences. Patterns. Agents do not pick up vibes. They follow rules you have written down. If you want server actions instead of API routes, write that down. If you want typed return values on every function, write that down.

The Stack Overflow engineering blog made this point clearly in early 2026: guidelines for agents need to be more explicit and demonstrative than guidelines for humans, because agents do not accumulate tacit knowledge from months of reading the codebase. They start fresh every session.

If your agent supports reusable skills or custom instructions for recurring tasks, define them. The upfront investment pays back on every session that uses them.

Phase 3: Enforcement

Instructions in a context file are suggestions. The agent will follow most of them most of the time. That is not good enough for the things that matter.

Hooks are deterministic. A pre-tool hook fires before the agent executes anything. It inspects the payload, allows or blocks the action, and feeds the reason back to the agent if it blocks. A post-tool hook fires after every write and can run your formatter, your linter, your type checker automatically.

The principle generalises beyond any single tool. Whatever agent you use: find the enforcement layer. Lint rules. CI checks. Pre-commit hooks. ADRs committed to the repository so the agent reads your architecture decisions as constraints rather than context. The discipline lives in the system, not in the request.

If the non-negotiables live in the system, you do not have to remember to include them in every prompt. You cannot forget something structural.

Phase 4: Running the session

Chunk small. One function, one feature, one fix per task. When you give an agent a large, vague task, you get a large, vague solution. When you give it a small, specific task, you get something you can actually review and understand.

Commit after each completed chunk. Small diffs are reviewable. Large diffs get skimmed.

Context is finite. Structural tools protect it better than behavioural workarounds. Keep your context below fifty percent capacity during active sessions. When you switch to an unrelated task, start a fresh session. Do not carry the ghost of the previous conversation into a new problem.

For parallel work, use git worktrees. Two agents on two branches, no shared state, no conflicts. Your tool may support this natively. If not, set the worktrees up yourself and run separate sessions. The pattern is the same either way.

When the agent starts looping, re-reading files it already read, re-planning work it already planned, that is a context problem. Check the context window, not the prompt.

Phase 5: Trust limits

Before a session starts, decide what the agent is and is not allowed to do without your explicit sign-off. Write this in your context file so it applies every time.

The list typically covers: running migrations, modifying production configuration, deleting data, executing deployments, and anything touching shared infrastructure. The day Claude deleted my database was a lesson in what happens without a gate. The agent was not going rogue. It was being helpful. That is the problem.

Pin dependencies to commit hashes, not version tags. Tags are mutable references. Someone rewrote 700 of them across four packages in a single weekend. AI-generated dependency additions deserve particular scrutiny: the agent selects packages with confidence, and confidence is not the same as correctness.

When the agent expresses uncertainty, stop and read carefully. That signal is worth more than the suggestion that follows it.

Phase 6: The review

This is where the actual work happens.

Read every diff like you wrote it. Not "does this look right." Every line, with attention. If you cannot explain the change to a colleague without saying "the agent did that," it is not done.

This is harder than it sounds. Reading code you did not write takes more skill than writing it. The instinct to skim is powerful when the output looks plausible. Resist it. Plausible is not correct.

Sonar's 2026 developer survey found that 96% of developers do not fully trust AI output. 48% verify it before merging. That gap is where incidents live, and it compounds as the codebase grows harder to navigate.

For anything non-trivial, use a review agent. Not the same agent that wrote the code. A separate instance with a specific mandate: look for bugs, look for security issues, look for architecture violations. Multi-agent review is cheap. Post-incident retrospectives are not.

The agent handles the fast part. Generation is cheap. What you are paying for is the structure around it: the spec that gives it direction, the context layer that gives it memory, the enforcement that makes it reliable, the session discipline that keeps it focused, the trust limits that keep it safe, and the review that makes every line of output yours.

The workflow is not overhead on top of using an agent. It is what using an agent actually looks like.

Phase 1: Before the first prompt ​

Phase 2: The context layer ​

Phase 3: Enforcement ​

Phase 4: Running the session ​

Phase 5: Trust limits ​

Phase 6: The review ​

Code churn is the lava you can still measure

You can't spot the bug if you didn't write the code

The brilliant parrot problem: what AI actually does when it 'thinks'