← all video summaries
Field Notes · Claude Code

Claude Code just shipped Workflows. Here's what they actually are.

Opus 4.8 grabbed the headlines, but Workflows may be the more important drop. A 16-minute walkthrough, distilled: the mental model, the live demos, the token reality, and exactly when to reach for one.

Watch on YouTube
TL;DR

A Workflow moves the orchestrator out of Claude's head and into a script.

  1. The shift — instead of Claude juggling sub-agents in its own context, a workflow.js script becomes the manager. State lives in variables, loops are deterministic, only final answers return to chat.
  2. Sub-agents, scaled — same isolated-context sub-agents you already had, now coordinated programmatically. 16 concurrent, up to 1,000 per run.
  3. It is expensive — one vitamin-C deep-research run burned 3 million tokens across 105 agents. Not your everyday tool.
  4. Model tiering — set a different model per phase: Haiku for fan-out, Sonnet for scoring, Opus for synthesis.
  5. When to use it — reach for a Workflow when work fans out across many similar items or needs deterministic, resumable loops. Otherwise use a skill or just chat.

01 · The drop

The quietly bigger announcement

Workflows officially shipped alongside Opus 4.8 — and the creator's take is that Workflows, not the new model, are the most valuable part of the announcement. The whole feature rests on one prerequisite concept: the sub-agent. So the video starts there.

02 · The problem

Context is a container that fills from the top down

A normal Claude Code session accumulates everything — tool calls, MCP responses, long reasoning, file reads — until the context window is bloated. Even a 1M ceiling fills with junk, and compaction is lossy by design: earlier detail gets summarized, not recovered.

The core problem. Work expands to fill the context window, no matter how big the ceiling.00:48

Sub-agents are the existing fix: spawn a fresh Claude Code session with its own isolated context to do one task, and return only the answer. A 60,000-token job done in a sub-agent returns ~500 tokens to your main session — no bloat, no compaction.

03 · The shift

The script becomes the manager

Sub-agents alone don't solve everything. When Claude itself is the orchestrator, it has to hold intermediate state, make routing decisions turn by turn, and track every agent — which breaks down at scale. The Workflow's insight: move the manager into a script.

The whole idea in one frame. The sub-agents are the same — the orchestrator changed.02:58
We no longer have this overburdened main context window. We have a workflow.js script that holds the state inside variables. It has deterministic loops, and only the final answers return to our main context.— Mansel Scheffel

04 · Under the hood

The runtime, the journal, and the hard limits

At runtime, workflow.js runs as a separate process that loads the JavaScript, spawns sub-agents, and tracks everything in a journal — which is what makes pause/resume work (completed agents return cached results).

Runtime + journal + the three hard limits, all in one slide.03:22
No FS / shell from the script

The script can't touch the filesystem or shell directly — the agents do that.

16 concurrent agents max

The rest queue and start as slots free up.

1,000 agents total per run

A runaway-loop backstop. You can still field a massive swarm.

Setup gotchaWorkflows only run in the Claude desktop app or the IDE terminal — not the VS Code extension. No mid-run user input; only agent permission prompts can pause a run.

05 · Live demo

Deep research — five phases, and a sobering token bill

The built-in deep-research skill is a Workflow. Asked to research the benefits of vitamin C, it runs five phases:

  1. ScopeBreak the question into 5 search angles.
  2. Search5 parallel web searches, one per angle.
  3. FetchDedupe URLs, pull the top ~15 sources, extract falsifiable claims.
  4. VerifyAdversarial 3-vote fact-checking on each claim — a claim needs 2 of 3 refutes to be killed. This is where the agents explode.
  5. SynthesizeMerge duplicates, rank by confidence, write a cited report.
The five phases, with the background-tasks panel tracking live agent count and tokens.05:18
The Fetch phase fanned out: each agent ~33k tokens. This is how the bill adds up.06:18

The numbers are the headline. One minute in: 27 agents, 682k tokens. By the end: 105 agents, 3 million tokens, 15 minutes — for a vitamin C summary.

Where the 105 agents went. Verify is "the big one" — 25 top claims × 3 independent verifiers.13:05
Just because we've isolated this to separate sub-agents doesn't mean we're magically going to get perfect usage. We are still using this much.— Mansel Scheffel
When it's worth itNot for everyday research. Reserve it for one genuinely hard, specific problem — interpreting confusing blood work, deep competitor research — where fanning out from every angle actually pays for itself.

06 · Beyond code

It's not just for developers — the patterns generalize

Asked to show off, Claude built a "startup forge" Workflow that maps cleanly onto the reusable orchestration primitives:

The primitives in practice: pipeline() with no barrier, schema-validated judges, parallel() adversarial stress-test, single-agent synthesis.06:48
pipeline()

Ideas get judged the moment each is generated — no barrier waiting for all four.

structured output

Judges return validated schema (novelty, market, feasibility) — zero text parsing.

parallel() + verify

The top idea is attacked by 3 skeptics, each with a distinct lens, hunting for a fatal flaw.

synthesize

One agent writes an honest investor pitch that must confront every objection.

Other fits the creator names: code reviews, PR trawling, audits, lead-gen fan-out (replacing brittle skill-chaining), and codebase-wide bug sweeps.

07 · Model tiering

Different model per phase

You don't have to run everything on Opus. The model is set per agent() call — and you can tier them by task difficulty. The "brand foundry" demo does exactly this:

Haiku for high-volume generation, Sonnet for critique, Opus for synthesis — plus the cost-warning permission prompt.09:48
The more clarity that we provide the system up front, the better the output down the line. Come to it with a very specific request.— Mansel Scheffel

08 · Control

Three levels of control — and every off switch

You're not handing over the keys blindly. There are three levels of intervention, from loosest to tightest:

Level 1: steer in plain English. Level 2: inspect the generated .js before approving (Ctrl+G). Level 3: edit the file like any code.13:50

By default a Workflow runs with edit-accept permissions — it won't bypass permissions unless you tell it to. And there are four ways to trigger one, three ways to switch it off.

The full control surface. Note: Max/Team plans have this on by default; Pro and orgs are off (for budget-protection reasons).14:42

09 · The verdict

When to actually reach for a Workflow

Anthropic's own criteria, verbatim: workflows are "more agents than one conversation can coordinate," orchestration "codified as a script you can read and rerun."15:18
Use a Workflow when…

Work fans out across many similar items · you want deterministic loops · you need resumability mid-run · the orchestration itself should be repeatable.

Use a skill or just chat when…

Claude's turn-by-turn judgment is the value · a single conversation handles the scope · you want repeatable instructions, not orchestration · one-off tasks.

The creator's bottom line: skills for the daily, repeatable business work; Workflows for the specific, fan-out-heavy jobs — bug sweeps, large migrations, cross-checked research. Just remember it's still in research preview, and the token bill is real.