Orchestration layer: LangGraph Workflows, LangChain Agents, LangSmith Tracing

As the enrolment workflow grew beyond a simple one-shot task into a stateful loop that needed to process multiple events, conditionally branch based on outcomes, and cycle back until a queue was empty, Claude Desktop chat and Cowork (which are designed for interactive human-driven conversations and tasks) became the wrong tools for the job.

LangGraph allows the same underlying logic to run as a fully automated backend agent, with explicit state management flowing through each node, conditional routing declared in code, and no human needed to prompt each step, making it a natural fit for a workflow that needs to run reliably on a server whenever new enrolment events arrive.

The graph has two main paths from _route_events:

Left (no events): straight to generate_dashboard → END

Right (has events): down the provisioning chain — process_next_event → grant_ghost_access → create_mcp_api_key → publish_and_ack — then _route_after_publish branches to either complete_event directly (done) or via draft_and_send_approval first (needs approval). Both converge at complete_event, which loops back via the dashed line to _route_events to process the next event in the queue.

From Claude Cowork to a Python Service

In previous articles, we ran the enrolment job directly inside Claude Cowork. That worked well for prototyping: describe the task in plain English, Claude figures out the tool calls, and you have something running end to end in minutes.

But Cowork has a ceiling. When a session finishes, what you have is a transcript rather than a structured record of which provisioning steps succeeded or failed per member.

For a signup flow that needs to run unattended at any hour, those are fundamental gaps. The new signup flow start in similar fashion below:

A new member provisioned in Ghost — name, email, labels, and newsletter subscription all set by the agent.

Moving to a standalone Python service with LangGraph closes all of them. The same MCP endpoints, the same Ghost access grant and API key creation, but now encoded as explicit graph nodes with typed state, deployable to AWS App Runner on a continuous polling loop, and traced in LangSmith for every run. Claude still does the one thing that benefits from language understanding: personalising the welcome email draft. Everything else is deterministic Python.

Why Deterministic, Not Reactive

The core design choice was to not give the LLM discretion over the provisioning sequence. "Grant Ghost access, create the MCP key, ack the event" is a fixed order. Letting a model reason about whether to skip a step is a failure mode, not a feature.

LangGraph's StateGraph encodes the sequence as nodes and edges. Routing is pure Python functions that check typed state. The LLM is invoked in exactly one node: reviewing the welcome email draft before it is stored for operator approval. If that call fails, the agent falls back to a template.

The full graph definition — eight nodes, fixed edges, and three conditional routing functions. No LLM in the routing path.

The grant_ghost_access node: @traceable decorator, typed state in, partial state update out. Each node returns only the keys it changed.

The Approval Flow

The welcome email is never sent automatically. After drafting it, the agent saves it to a JSON file keyed by a UUID token and emails the operator a preview with Approve and Decline buttons. Clicking Approve hits GET /enrolment/approve/:token on the MCP server, which dispatches the email and deletes the file. The agent itself has already exited. The dispatch is an asynchronous human action.

The operator approval email in Gmail — a full preview of the personalised welcome email with Approve & Send and Decline buttons.

Observability and Error Handling

Every graph node is decorated with @traceable and tagged by category. LangSmith picks up traces automatically via the LANGCHAIN_* environment variables, with no extra logging code. The project is inagentic-enrolment, kept separate from other agent traces so per-workflow alerting is straightforward.

Outbound email is sent via AWS SES, with CNAME records added to Route 53 to verify the sending domain. Both the operator approval email and the member welcome email route through SES.

Adding the SES verification CNAME to Route 53 — required before SES will send from the inagentic.ai domain.

The agent never retries. Ghost and MCP key creation already have exponential backoff in the provisioning MCP server. If those retries fail, the problem is external and retrying at the orchestration layer just delays the flag. Failures are logged to the event's error list and surfaced in the dashboard with enough detail for manual follow-up.

Running It

One-shot mode is useful for testing. Loop mode polls indefinitely via agent_loop(), calling run_once() on each iteration and sleeping for 10 seconds when the queue is empty. Both use the same graph. The full code (agent.py, mcp_client.py, email_templates.py, approval_store.py) is in the enrolment-agent/ directory of the repo.

The agent_loop function — run_once() on each tick, 10s backoff when the queue is empty, 30s recovery sleep on unhandled exceptions.

The Stack in Summary

The enrolment agent is built on three complementary tools from the LangChain ecosystem.

LangChain is a Python framework for building LLM-powered applications — it provides the abstractions for calling language models, chaining prompts, and integrating with external tools and APIs.

LangGraph extends LangChain with a graph-based orchestration layer: you define your workflow as a StateGraph, with each step as a named node and each transition as a typed edge, giving you deterministic control over execution order without giving up the ability to include LLM calls where they genuinely add value.

LangSmith is the observability layer that sits beneath both — every node execution, LLM call, and tool invocation is captured as a structured trace, viewable in a dashboard, filterable by project, and alertable on failure. For production agentic workflows, it replaces the need for custom logging infrastructure entirely.

Together they give the enrolment agent the properties that a Cowork session cannot: a fixed, auditable execution path; a deployment target that runs unattended on AWS App Runner; and a trace for every member provisioned, whether the run succeeds, partially fails, or hits an external error at 3am.