How Claude Code Workflows Actually Work
Anthropic recently shipped Dynamic Workflows in Claude Code. Describe a task, and instead of working through it turn by turn, Claude writes an orchestration script, fans out parallel subagents, and synthesizes the results in the background while your terminal stays responsive.
This post covers two things: what workflows are and how to use them, and what’s actually happening inside the binary when one runs.
What are workflows?
A dynamic workflow is a JavaScript script that orchestrates subagents. Claude writes the script for the task you describe, and a runtime runs it in the background.
The plan lives in the script, not in Claude’s context window. With regular subagents, Claude is the orchestrator — it decides turn by turn what to spawn next, and every intermediate result lands back in its context. That limits you to a handful of agents per turn and means context fills up fast.
With workflows, the script holds the loop, the branching, and the intermediate results. Claude’s context only gets the final answer.
Here’s how the two modes compare:
| Subagents | Workflows | |
|---|---|---|
| Who decides what runs next | Claude, turn by turn | The script |
| Where intermediate results live | Claude’s context window | Script variables |
| What’s repeatable | The worker definition | The orchestration itself |
| Scale | A few delegated tasks per turn | Dozens to hundreds of agents per run |
| Interruption | Restarts the turn | Resumable in the same session |
Good candidates: codebase-wide bug sweeps, 500-file migrations, research questions that need sources cross-checked against each other, plans worth drafting from several independent angles before committing.
How to trigger one
There are three ways to start a workflow.
Include the word workflow in your prompt. Claude writes the script and asks for approval before running.
Run a workflow to audit every API endpoint under src/routes/ for missing auth checks
Use /effort ultracode. This combines maximum reasoning effort with automatic workflow orchestration. With it on, Claude plans a workflow for every substantive task without you having to ask. One request can turn into several workflows in sequence: one to understand the code, one to make the change, one to verify it.
Run a saved workflow by name.
/deep-research What changed in the Node.js permission model between v20 and v22?
/deep-research is the built-in workflow. It fans out web searches across several angles, cross-checks sources, votes on each claim, and returns a cited report with the claims that didn’t survive cross-checking already removed.
Watching it run
Once a workflow starts, it runs in the background. Use /workflows to open the progress view:
- Each
phase()call in the script gets its own group - You can drill into any agent to see its prompt, recent tool calls, and result
ppauses/resumes,xstops an agent or the whole run,rrestarts an agentssaves the script as a named slash command for future reuse
That last point is worth pausing on. Once a workflow does what you wanted, press s and it becomes a / command available to anyone who clones the repo. It’s a decent way to accumulate reusable automation.
The script format
Workflow scripts are plain JavaScript. Every script exports a meta object and uses a small API in the body:
export const meta = {
name: 'security-audit',
description: 'Audit every endpoint for missing auth',
phases: [
{ title: 'Scan' },
{ title: 'Fix' },
],
}
phase('Scan')
const findings = await agent('find all unprotected endpoints in src/routes/', {
schema: {
type: 'array',
items: {
type: 'object',
properties: { file: { type: 'string' }, route: { type: 'string' } },
required: ['file', 'route'],
},
},
})
phase('Fix')
const results = await pipeline(
findings,
f => agent(`add auth middleware to the route at ${f.file}: ${f.route}`, {
label: f.route,
phase: 'Fix',
isolation: 'worktree',
})
)
log(`Fixed ${results.filter(Boolean).length} of ${findings.length} endpoints`)
The full API:
agent(prompt, opts?)— spawn a subagent, returns its result as a string or validated objectparallel(fns[])— run an array of zero-arg functions concurrently, waits for allpipeline(items, ...stages)— map items through stages; items run concurrently within each stagephase(name)— label a progress section in/workflowslog(message)— emit a message to the progress viewworkflow(nameOrRef, args?)— run a saved workflow as a sub-stepbudget— live token consumption (budget.total,budget.remaining(),budget.spent())
What’s actually happening inside
That’s the user-facing picture. Here’s what the binary is doing.
The script runs in a Node.js vm sandbox
The workflow body doesn’t run in a normal Node.js process. It runs inside a V8 context created with the Node.js vm module, specifically vm.Script + createContext + runInNewContext. The Bun runtime embedded in the Claude binary ships the full vm module.
The sandbox’s global scope is a curated object containing the script API (agent, parallel, pipeline, etc.) and standard JS builtins (JSON, Math, Array, Promise). It does not contain require(), filesystem APIs, Date.now() (argless), or Math.random().
Banning Date.now() and Math.random() isn’t arbitrary strictness — it’s what makes resume work. If a script called Date.now() at line 3, re-running it from a checkpoint would produce a different value and potentially change which agents get spawned. Banning non-deterministic builtins means the runtime can guarantee that replaying the script from a checkpoint produces identical control flow.
Why meta must be a pure literal
The runtime reads meta (name, description, phase list) before executing the script: to show you the confirmation prompt, to populate the /workflows view, and to validate that phase() calls in the body match the declared titles.
If meta allowed computed values, the runtime would have to execute the script just to learn its name, running code before the user approved anything. A pure literal means metadata can be extracted with a static parse.
Concurrency is CPU-aware
Each agent() call goes through a queue managed by a concurrency controller. The cap is:
Math.min(16, os.cpus().length - 2)
On a typical 8-core laptop, you get 6 concurrent agents. On a 20-core workstation, you hit the hard cap of 16. On a 4-core machine, you get 2. Passing 100 items to pipeline() is fine — they all eventually complete, just not all at once.
pipeline() vs parallel()
Both run things concurrently, but they compose differently.
parallel() is a barrier: all functions start together, and nothing proceeds until the last one finishes.
pipeline() is a streaming fan-out: each item flows through all stages in sequence, and a stage-2 agent starts as soon as its stage-1 result is ready, not when the slowest stage-1 item finishes.
If you have 20 files and one takes 3x longer than the others, pipeline() doesn’t stall 19 verify-agents waiting for that one slow review. parallel() would. Use parallel() only when stage N genuinely needs all of stage N-1 together (deduplication, early-exit on zero count). Otherwise pipeline() is faster.
The Workflow tool and async delivery
From Claude’s perspective, workflows go through a built-in tool called Workflow. Its input takes scriptPath, name, resumeFromRunId, and args. The tool returns a run ID immediately and does not block until the workflow finishes.
When the run completes, the runtime injects a <task-notification> tag into Claude’s context. That’s how Claude knows to synthesize the final answer. The async separation is what keeps your session responsive during a long run.
Resume: checkpoint map and determinism
Every agent() call that completes writes its result to an in-memory checkpoint map, keyed by (prompt, opts) serialized deterministically.
When you resume with resumeFromRunId: "wf_abc123", the runtime replays the script from the top. For each agent() call it checks the map — if the key matches, it returns the cached result instantly with no API call. If not, it runs the agent live.
You only pay for agents that didn’t finish before the interruption. The determinism requirement is what makes the key lookup reliable: the same script, replayed, hits the same agent() calls in the same order.
Resume only works within the same session. Exit Claude Code and the checkpoint map is gone.
Putting it all together
When you type run a workflow to audit every auth endpoint:
- Claude writes a
.jsscript with ametaexport and a body usingagent(),phase(), andpipeline(). - Claude calls the
Workflowtool with the script path. - The runtime parses
metastatically and shows you the plan: phase titles, token warning, approve/deny. - You approve. The runtime creates a V8 context via
vm.createContext, injects the script API globals, and executes withrunInNewContext. phase('Scan')fires, setting the progress label.agent(...)calls enqueue into the concurrency pool. The CPU-appropriate number run simultaneously, each as a full Claude subagent.- Results land in the checkpoint map and resolve as Promise values in the script.
pipeline()fans out the next stage per-item as stage-1 results arrive.- The run completes. Claude receives
<task-notification>and synthesizes the final answer.
The sandbox, concurrency controller, checkpoint map, and /workflows TUI are all compiled into the single Bun binary that ships as the claude executable.
The short version: the script is what makes this different. Moving the orchestration plan out of Claude’s context and into code is what enables scale, repeatability, and resume. Everything else — the vm sandbox, the CPU-aware concurrency cap, the determinism constraints — follows from that.
To try it: describe any large multi-file task and include the word workflow in your prompt, or run /deep-research for a quick example.