Every six months there’s a new agent framework. LangChain, LangGraph, Crew, AutoGen, Mastra — each promising to solve the coordination problem. Each shipping great demos.
None of them have survived long in the Cuecoder production stack.
This isn’t a criticism of the frameworks. They’re genuinely useful for prototyping. The problem is that production agents have constraints that frameworks tend to paper over: tight latency budgets, partial failure modes, observability requirements, and the need to debug a single trace six weeks after it ran.
The core loop
Here’s the agent loop that currently serves production traffic. It’s about 200 lines.
class Agent:
def __init__(self, model, tools, memory, max_steps=12):
self.model = model
self.tools = {t.name: t for t in tools}
self.memory = memory
self.max_steps = max_steps
async def run(self, task: str, ctx: Context) -> AgentResult:
messages = [{"role": "user", "content": task}]
messages = await self.memory.inject(messages, ctx)
for step in range(self.max_steps):
response = await self.model.call(
messages=messages,
tools=list(self.tools.values()),
)
if response.stop_reason == "end_turn":
await self.memory.record(ctx, messages, response)
return AgentResult(output=response.content, steps=step + 1)
if response.stop_reason == "tool_use":
tool_results = await self._run_tools(response.tool_calls, ctx)
messages.append(response.as_message())
messages.extend(tool_results)
continue
raise UnexpectedStopReason(response.stop_reason)
raise MaxStepsExceeded(task, self.max_steps)
That’s the whole loop. No graph nodes. No state machines. No orchestration DSL.
Why not frameworks
The frameworks add complexity to handle things that, in practice, you need to handle yourself anyway:
- Error handling — framework abstractions leak. When a tool call fails at step 7 of 12, you need to know exactly what state the agent is in. Frameworks hide this.
- Observability — you need traces that you can read. Adding a framework layer means learning the framework’s tracing semantics on top of your own.
- Latency — every abstraction layer adds overhead. Not much. But compounding over thousands of requests, it shows up in p95.
What you actually need
The loop above works because the dependencies are explicit:
- A model client — something that calls the LLM and returns a structured response. Wrap the SDK; don’t use it directly.
- A tool registry — a dict. Tools are just async functions with a schema.
- A memory layer — context injection and result recording. This is where most of the interesting engineering is.
- A tracer — instrument the loop, not the tools. Every step gets a trace ID.
The hard part
The hard part isn’t the loop. It’s the tools and the memory.
Tools need to handle partial failure gracefully. A tool that returns an error should return a structured error that the model can reason about, not raise an exception that kills the agent.
Memory needs to be fast and relevant. Context windows are large but not infinite. The memory layer’s job is to decide what to inject — and what not to.
Both of these are domain problems, not framework problems. No framework can solve them for you.
Build the loop yourself. It’s 200 lines. You’ll understand every line of it when something goes wrong at 3am.