The Short Version
I built the same AI agent in three frameworks last month. Only one shipped.
I am building Alma solo. The matching agent has to read a mentor profile and draft a 3-line warm intro to a specific student. And it has to not read like ChatGPT. That last constraint is the hard one, and it is the reason the framework underneath matters.
The Eval Set
Before I touched any framework I wrote the eval. 80 test cases. Each case has a student profile and a mentor pool of 20 to 30. Output is a ranked mentor plus a 3-line intro note.
The metric is not BLEU score or any LLM-as-judge contraption. The metric is reply rate from the mentor after one week. Everything else is vanity.
Every framework got the same prompts, the same tools, the same Claude Sonnet 4.6 backing. The only thing that changed was the plumbing.
LangGraph
Side-by-side code · rank mentors then draft a 3-line intro
from langgraph.graph import StateGraph, END
from typing import TypedDict
class MatchState(TypedDict):
student: dict
pool: list[dict]
ranked: list[dict]
intro: str
def rank(state: MatchState) -> MatchState:
state["ranked"] = score_with_claude(state["student"], state["pool"])
return state
def draft(state: MatchState) -> MatchState:
state["intro"] = write_intro(state["student"], state["ranked"][0])
return state
g = StateGraph(MatchState)
g.add_node("rank", rank)
g.add_node("draft", draft)
g.set_entry_point("rank")
g.add_edge("rank", "draft")
g.add_edge("draft", END)
app = g.compile()
out = app.invoke({"student": s, "pool": p, "ranked": [], "intro": ""})What I liked. Explicit state. Clear transitions. Every node logs a trace you can read linearly. If you come from a systems background, LangGraph feels correct. You model your agent as a graph, not as a vibe.
What broke me. 412 lines for a 5-node graph. Schema definitions, edge conditions, state reducers. On a team, that structure pays off. Solo, it is a boilerplate tax I was paying in hours I did not have.
Verdict. Good for teams shipping production agents with multiple engineers maintaining state contracts. Too heavy for a one-person shop.
CrewAI
Side-by-side code · same task, two role-based agents
from crewai import Agent, Task, Crew
ranker = Agent(
role="Mentor Ranker",
goal="Rank mentors by fit for a given student.",
backstory="You know Alma's mentor pool inside out.",
)
writer = Agent(
role="Intro Writer",
goal="Draft a 3-line warm intro.",
backstory="You write like a friend, not a recruiter.",
)
rank_task = Task(description="Rank pool for student.", agent=ranker)
intro_task = Task(
description="Write a 3-line intro to the top pick.",
agent=writer,
context=[rank_task],
)
Crew(agents=[ranker, writer], tasks=[rank_task, intro_task]).kickoff()What I liked. Multi-agent out of the box. I set up two role-based agents and the role prompts worked on the first try. The mental model is clean.
What broke me. I spent 4 hours reading traces to debug a silent handoff failure. The Intro Writer was getting an empty context from the Mentor Ranker and confidently writing an intro to nobody. No error. No warning. The abstraction hides exactly the thing you need to see when it breaks.
Verdict. A beautiful abstraction until you need to see inside. For prototypes it is delightful. For shipping, I need to see the raw messages.
Claude Agent SDK
Side-by-side code · same task, one loop, MCP for the mentor pool
from claude_agent_sdk import query, ClaudeAgentOptions
options = ClaudeAgentOptions(
system_prompt=(
"Match the student to the best mentor from alma-db. "
"Return a 3-line warm intro. No corporate voice."
),
mcp_servers={
"alma-db": {"command": "node", "args": ["./mcp-mentors.js"]},
},
allowed_tools=["mcp__alma-db__search_mentors"],
)
async for message in query(prompt=f"Student: {student}", options=options):
print(message)What I liked. The full loop reads in one file. Tool calls, memory, subagents, MCP servers, all in one place. I could hold the whole agent in my head on a Sunday morning.
What worked. When it broke at 11pm, I fixed it by 11:20. When I needed to wire in an MCP server to read mentor profiles from my Postgres, it was one config line. No new abstraction layer, no adapter to write.
Verdict.This is the stack I ship Alma's matching engine on today. The ratio of control to ceremony was exactly right for a solo founder.
The Eval Scores
Same 80 cases. Same Claude Sonnet 4.6 underneath. Reply rate is the mentor's reply within 7 days of the intro landing.
| Framework | Build time | Lines of code | p50 latency | Reply rate |
|---|---|---|---|---|
| Baseline (no agent) | — | — | 0.9s | 18% |
| LangGraph | 14 hrs | 412 | 1.8s | 23% |
| CrewAI | 6 hrs | 184 | 2.3s | 26% |
| Claude Agent SDK SHIPPED | 3 hrs | 138 | 1.4s | 34% |
Note: numbers are from my own build log across April 2026. Your eval will look different. The point is to have one.
The Decision Framework
If you are looking at the same three, here is the cheat sheet.
| If you are | Pick | Because |
|---|---|---|
| A team of 3+ engineers | LangGraph | The state contracts and graph discipline pay back when more than one person touches the code. |
| Prototyping a multi-agent demo | CrewAI | Role-based agents ship a convincing prototype in hours. Do not put it in front of paying users. |
| A solo founder shipping to users | Claude Agent SDK | One file, one mental model. MCPs, memory, subagents all hang off the same loop. |
The Decision Flowchart
If the table is still too much, follow the arrows.

The 2am Test
Aphorism
“The framework you can debug at 2am is the framework you ship.”
This is the stack running Alma's matching engine today. Every student-to-mentor intro you see on Alma is drafted by an agent built on the Claude Agent SDK with an MCP server reading the mentor pool from Postgres.
If you are building alone, try the Claude Agent SDK this week. Wire up one tool, one MCP, one real task.
Then tell me what broke. I read every reply.
Related Resources
AI Agents for Mechanical Engineers
Agent frameworks, browser automation, local AI, and MCP connectors filtered through an engineering lens.
Technical Deep DiveUsing Claude to Debug Hydraulic Simulations
How structured prompts found a buried 10x coefficient error in 90 seconds.
Resource PostF-1 to Founder: The 36-Month Playbook
OPT, STEM OPT, and the founder path, phase by phase, for international students.
Follow me on LinkedIn for weekly builder logs from solo founder land
Follow on LinkedInMore resources coming soon
← Back to all resources