Builder LogApril 23, 2026

The 3 AI agent frameworks I tested. Only one shipped.

Same agent. Same eval set. Three weekends. Here is the one that made it to production.

By Ackshaya Varshini

Comparison of three AI agent frameworks: LangGraph, CrewAI, and Claude Agent SDK. The Claude Agent SDK card is highlighted as shipped; the other two are marked not shipped.

The Short Version

I built the same AI agent in three frameworks last month. Only one shipped.

I am building Alma solo. The matching agent has to read a mentor profile and draft a 3-line warm intro to a specific student. And it has to not read like ChatGPT. That last constraint is the hard one, and it is the reason the framework underneath matters.

The Eval Set

Before I touched any framework I wrote the eval. 80 test cases. Each case has a student profile and a mentor pool of 20 to 30. Output is a ranked mentor plus a 3-line intro note.

The metric is not BLEU score or any LLM-as-judge contraption. The metric is reply rate from the mentor after one week. Everything else is vanity.

Every framework got the same prompts, the same tools, the same Claude Sonnet 4.6 backing. The only thing that changed was the plumbing.

LangGraph

Side-by-side code · rank mentors then draft a 3-line intro

from langgraph.graph import StateGraph, END
from typing import TypedDict

class MatchState(TypedDict):
    student: dict
    pool: list[dict]
    ranked: list[dict]
    intro: str

def rank(state: MatchState) -> MatchState:
    state["ranked"] = score_with_claude(state["student"], state["pool"])
    return state

def draft(state: MatchState) -> MatchState:
    state["intro"] = write_intro(state["student"], state["ranked"][0])
    return state

g = StateGraph(MatchState)
g.add_node("rank", rank)
g.add_node("draft", draft)
g.set_entry_point("rank")
g.add_edge("rank", "draft")
g.add_edge("draft", END)

app = g.compile()
out = app.invoke({"student": s, "pool": p, "ranked": [], "intro": ""})

What I liked. Explicit state. Clear transitions. Every node logs a trace you can read linearly. If you come from a systems background, LangGraph feels correct. You model your agent as a graph, not as a vibe.

What broke me. 412 lines for a 5-node graph. Schema definitions, edge conditions, state reducers. On a team, that structure pays off. Solo, it is a boilerplate tax I was paying in hours I did not have.

Verdict. Good for teams shipping production agents with multiple engineers maintaining state contracts. Too heavy for a one-person shop.

CrewAI

Side-by-side code · same task, two role-based agents

from crewai import Agent, Task, Crew

ranker = Agent(
    role="Mentor Ranker",
    goal="Rank mentors by fit for a given student.",
    backstory="You know Alma's mentor pool inside out.",
)

writer = Agent(
    role="Intro Writer",
    goal="Draft a 3-line warm intro.",
    backstory="You write like a friend, not a recruiter.",
)

rank_task = Task(description="Rank pool for student.", agent=ranker)
intro_task = Task(
    description="Write a 3-line intro to the top pick.",
    agent=writer,
    context=[rank_task],
)

Crew(agents=[ranker, writer], tasks=[rank_task, intro_task]).kickoff()

What I liked. Multi-agent out of the box. I set up two role-based agents and the role prompts worked on the first try. The mental model is clean.

What broke me. I spent 4 hours reading traces to debug a silent handoff failure. The Intro Writer was getting an empty context from the Mentor Ranker and confidently writing an intro to nobody. No error. No warning. The abstraction hides exactly the thing you need to see when it breaks.

Verdict. A beautiful abstraction until you need to see inside. For prototypes it is delightful. For shipping, I need to see the raw messages.

Claude Agent SDK

Side-by-side code · same task, one loop, MCP for the mentor pool

from claude_agent_sdk import query, ClaudeAgentOptions

options = ClaudeAgentOptions(
    system_prompt=(
        "Match the student to the best mentor from alma-db. "
        "Return a 3-line warm intro. No corporate voice."
    ),
    mcp_servers={
        "alma-db": {"command": "node", "args": ["./mcp-mentors.js"]},
    },
    allowed_tools=["mcp__alma-db__search_mentors"],
)

async for message in query(prompt=f"Student: {student}", options=options):
    print(message)

What I liked. The full loop reads in one file. Tool calls, memory, subagents, MCP servers, all in one place. I could hold the whole agent in my head on a Sunday morning.

What worked. When it broke at 11pm, I fixed it by 11:20. When I needed to wire in an MCP server to read mentor profiles from my Postgres, it was one config line. No new abstraction layer, no adapter to write.

Verdict.This is the stack I ship Alma's matching engine on today. The ratio of control to ceremony was exactly right for a solo founder.

The Eval Scores

Same 80 cases. Same Claude Sonnet 4.6 underneath. Reply rate is the mentor's reply within 7 days of the intro landing.

FrameworkBuild timeLines of codep50 latencyReply rate
Baseline (no agent)0.9s18%
LangGraph14 hrs4121.8s23%
CrewAI6 hrs1842.3s26%
Claude Agent SDK SHIPPED3 hrs1381.4s34%

Note: numbers are from my own build log across April 2026. Your eval will look different. The point is to have one.

The Decision Framework

If you are looking at the same three, here is the cheat sheet.

If you arePickBecause
A team of 3+ engineersLangGraphThe state contracts and graph discipline pay back when more than one person touches the code.
Prototyping a multi-agent demoCrewAIRole-based agents ship a convincing prototype in hours. Do not put it in front of paying users.
A solo founder shipping to usersClaude Agent SDKOne file, one mental model. MCPs, memory, subagents all hang off the same loop.

The Decision Flowchart

If the table is still too much, follow the arrows.

Decision flowchart: start from building an AI agent, then branch on team size (3+ picks LangGraph, solo or duo moves on), then branch on whether users will actually use it (no, demo only picks CrewAI; yes, ship picks Claude Agent SDK).

The 2am Test

Aphorism

“The framework you can debug at 2am is the framework you ship.”

This is the stack running Alma's matching engine today. Every student-to-mentor intro you see on Alma is drafted by an agent built on the Claude Agent SDK with an MCP server reading the mentor pool from Postgres.

If you are building alone, try the Claude Agent SDK this week. Wire up one tool, one MCP, one real task.

Then tell me what broke. I read every reply.

Related Resources

Follow me on LinkedIn for weekly builder logs from solo founder land

Follow on LinkedIn

More resources coming soon

← Back to all resources