How Claude Code Workflows Actually Work

May 28, 2026

Anthropic recently shipped Dynamic Workflows in Claude Code. Describe a task, and instead of working through it turn by turn, Claude writes an orchestration script, fans out parallel subagents, and synthesizes the results in the background while your terminal stays responsive.

This post covers two things: what workflows are and how to use them, and what’s actually happening inside the binary when one runs.

What are workflows?

A dynamic workflow is a JavaScript script that orchestrates subagents. Claude writes the script for the task you describe, and a runtime runs it in the background.

The plan lives in the script, not in Claude’s context window. With regular subagents, Claude is the orchestrator — it decides turn by turn what to spawn next, and every intermediate result lands back in its context. That limits you to a handful of agents per turn and means context fills up fast.

With workflows, the script holds the loop, the branching, and the intermediate results. Claude’s context only gets the final answer.

Here’s how the two modes compare:

	Subagents	Workflows
Who decides what runs next	Claude, turn by turn	The script
Where intermediate results live	Claude’s context window	Script variables
What’s repeatable	The worker definition	The orchestration itself
Scale	A few delegated tasks per turn	Dozens to hundreds of agents per run
Interruption	Restarts the turn	Resumable in the same session

Good candidates: codebase-wide bug sweeps, 500-file migrations, research questions that need sources cross-checked against each other, plans worth drafting from several independent angles before committing.

How to trigger one

There are three ways to start a workflow.

Include the word workflow in your prompt. Claude writes the script and asks for approval before running.

Run a workflow to audit every API endpoint under src/routes/ for missing auth checks

Use /effort ultracode. This combines maximum reasoning effort with automatic workflow orchestration. With it on, Claude plans a workflow for every substantive task without you having to ask. One request can turn into several workflows in sequence: one to understand the code, one to make the change, one to verify it.

Run a saved workflow by name.

/deep-research What changed in the Node.js permission model between v20 and v22?

/deep-research is the built-in workflow. It fans out web searches across several angles, cross-checks sources, votes on each claim, and returns a cited report with the claims that didn’t survive cross-checking already removed.

Watching it run

Once a workflow starts, it runs in the background. Use /workflows to open the progress view:

Each phase() call in the script gets its own group
You can drill into any agent to see its prompt, recent tool calls, and result
p pauses/resumes, x stops an agent or the whole run, r restarts an agent
s saves the script as a named slash command for future reuse

That last point is worth pausing on. Once a workflow does what you wanted, press s and it becomes a / command available to anyone who clones the repo. It’s a decent way to accumulate reusable automation.

The script format

Workflow scripts are plain JavaScript. Every script exports a meta object and uses a small API in the body:

export const meta = {
  name: 'security-audit',
  description: 'Audit every endpoint for missing auth',
  phases: [
    { title: 'Scan' },
    { title: 'Fix' },
  ],
}

phase('Scan')
const findings = await agent('find all unprotected endpoints in src/routes/', {
  schema: {
    type: 'array',
    items: {
      type: 'object',
      properties: { file: { type: 'string' }, route: { type: 'string' } },
      required: ['file', 'route'],
    },
  },
})

phase('Fix')
const results = await pipeline(
  findings,
  f => agent(`add auth middleware to the route at ${f.file}: ${f.route}`, {
    label: f.route,
    phase: 'Fix',
    isolation: 'worktree',
  })
)

log(`Fixed ${results.filter(Boolean).length} of ${findings.length} endpoints`)

The full API:

agent(prompt, opts?) — spawn a subagent, returns its result as a string or validated object
parallel(fns[]) — run an array of zero-arg functions concurrently, waits for all
pipeline(items, ...stages) — map items through stages; items run concurrently within each stage
phase(name) — label a progress section in /workflows
log(message) — emit a message to the progress view
workflow(nameOrRef, args?) — run a saved workflow as a sub-step
budget — live token consumption (budget.total, budget.remaining(), budget.spent())

What’s actually happening inside

That’s the user-facing picture. Here’s what the binary is doing.

The script runs in a Node.js `vm` sandbox

The workflow body doesn’t run in a normal Node.js process. It runs inside a V8 context created with the Node.js vm module, specifically vm.Script + createContext + runInNewContext. The Bun runtime embedded in the Claude binary ships the full vm module.

The sandbox’s global scope is a curated object containing the script API (agent, parallel, pipeline, etc.) and standard JS builtins (JSON, Math, Array, Promise). It does not contain require(), filesystem APIs, Date.now() (argless), or Math.random().

Banning Date.now() and Math.random() isn’t arbitrary strictness — it’s what makes resume work. If a script called Date.now() at line 3, re-running it from a checkpoint would produce a different value and potentially change which agents get spawned. Banning non-deterministic builtins means the runtime can guarantee that replaying the script from a checkpoint produces identical control flow.

Why `meta` must be a pure literal

The runtime reads meta (name, description, phase list) before executing the script: to show you the confirmation prompt, to populate the /workflows view, and to validate that phase() calls in the body match the declared titles.

If meta allowed computed values, the runtime would have to execute the script just to learn its name, running code before the user approved anything. A pure literal means metadata can be extracted with a static parse.

Concurrency is CPU-aware

Each agent() call goes through a queue managed by a concurrency controller. The cap is:

Math.min(16, os.cpus().length - 2)

On a typical 8-core laptop, you get 6 concurrent agents. On a 20-core workstation, you hit the hard cap of 16. On a 4-core machine, you get 2. Passing 100 items to pipeline() is fine — they all eventually complete, just not all at once.

`pipeline()` vs `parallel()`

Both run things concurrently, but they compose differently.

parallel() is a barrier: all functions start together, and nothing proceeds until the last one finishes.

pipeline() is a streaming fan-out: each item flows through all stages in sequence, and a stage-2 agent starts as soon as its stage-1 result is ready, not when the slowest stage-1 item finishes.

If you have 20 files and one takes 3x longer than the others, pipeline() doesn’t stall 19 verify-agents waiting for that one slow review. parallel() would. Use parallel() only when stage N genuinely needs all of stage N-1 together (deduplication, early-exit on zero count). Otherwise pipeline() is faster.

The `Workflow` tool and async delivery

From Claude’s perspective, workflows go through a built-in tool called Workflow. Its input takes scriptPath, name, resumeFromRunId, and args. The tool returns a run ID immediately and does not block until the workflow finishes.

When the run completes, the runtime injects a <task-notification> tag into Claude’s context. That’s how Claude knows to synthesize the final answer. The async separation is what keeps your session responsive during a long run.

Resume: checkpoint map and determinism

Every agent() call that completes writes its result to an in-memory checkpoint map, keyed by (prompt, opts) serialized deterministically.

When you resume with resumeFromRunId: "wf_abc123", the runtime replays the script from the top. For each agent() call it checks the map — if the key matches, it returns the cached result instantly with no API call. If not, it runs the agent live.

You only pay for agents that didn’t finish before the interruption. The determinism requirement is what makes the key lookup reliable: the same script, replayed, hits the same agent() calls in the same order.

Resume only works within the same session. Exit Claude Code and the checkpoint map is gone.

Putting it all together

When you type run a workflow to audit every auth endpoint:

Claude writes a .js script with a meta export and a body using agent(), phase(), and pipeline().
Claude calls the Workflow tool with the script path.
The runtime parses meta statically and shows you the plan: phase titles, token warning, approve/deny.
You approve. The runtime creates a V8 context via vm.createContext, injects the script API globals, and executes with runInNewContext.
phase('Scan') fires, setting the progress label.
agent(...) calls enqueue into the concurrency pool. The CPU-appropriate number run simultaneously, each as a full Claude subagent.
Results land in the checkpoint map and resolve as Promise values in the script.
pipeline() fans out the next stage per-item as stage-1 results arrive.
The run completes. Claude receives <task-notification> and synthesizes the final answer.

The sandbox, concurrency controller, checkpoint map, and /workflows TUI are all compiled into the single Bun binary that ships as the claude executable.

The short version: the script is what makes this different. Moving the orchestration plan out of Claude’s context and into code is what enables scale, repeatability, and resume. Everything else — the vm sandbox, the CPU-aware concurrency cap, the determinism constraints — follows from that.

To try it: describe any large multi-file task and include the word workflow in your prompt, or run /deep-research for a quick example.