From Userland Hack to SDK Native: The CMS Migration Blueprint

Knowledge Comic Series — Maximum technical depth, minimum words.

Why Migrate

CMS works. 50+ iterations, zero context rot, checkpoint recovery. So why touch it?

CMS Today:
  Shell script → spawns Claude CLI process → parses stdout → writes checkpoint
  ├── Depends on CLI flags that can change any release
  ├── Parses <report> tags from unstructured text output
  ├── Cannot integrate with SDK's type system
  └── Cannot be unit tested without a running Claude process

The Claude Agent SDK gives us query() — a single function call that creates a fresh conversation. Same isolation. But now it’s a Python function, not a subprocess. Testable. Typed. Version-pinned.

The goal: Replace the transport, keep the architecture.

The Core Insight

CMS’s power comes from one architectural decision:

The orchestrator is not an LLM.

A shell script loops. An LLM iterates. The shell script never fills its context window because it doesn’t have one.

The SDK migration must preserve this property:

WRONG approach:
  Orchestrator = long-running query() session
  → Accumulates results from sub-queries
  → Same context rot problem
  → You just rebuilt Agent Teams Lead

RIGHT approach:
  Orchestrator = Python class (IterationEngine)
  → Calls query() per iteration (no session resume)
  → Each query() gets a fresh context
  → IterationEngine has zero token footprint
  → You preserved CMS's core advantage

The key line:

result = await query(prompt=iteration_prompt, session_id=None)
#                                              ^^^^^^^^^^^^
#                             No resume. Fresh context. Every time.

Architecture: Before and After

CMS External (Current)
═══════════════════════

  ┌─────────────────┐
  │  cms-iterate.sh  │  ← Shell script (not an LLM)
  │                  │
  │  for i in 1..N:  │
  │    claude --print │─────→  [Claude CLI Process]
  │      --prompt ... │           │
  │                  │           │ stdout
  │    parse output  │←──────────┘
  │    write ckpt    │───→  checkpoint.json
  │  done            │
  └─────────────────┘

SDK Native (Target)
═══════════════════

  ┌──────────────────────┐
  │  IterationEngine     │  ← Python class (not an LLM)
  │                      │
  │  for i in 1..N:      │
  │    await query(      │─────→  [SDK query() call]
  │      prompt=...,     │           │
  │      session_id=None │           │ ResultMessage
  │    )                 │←──────────┘
  │    parse report      │
  │    checkpoint.save() │───→  checkpoint.json
  │  done                │
  └──────────────────────┘

Same loop. Same checkpoint. Same isolation. Different transport.

What Changes

Layer	CMS External	SDK Native
Orchestrator	Shell script	`IterationEngine` Python class
Spawn	`claude --print --prompt`	`await query(prompt, session_id=None)`
Output parsing	Regex on stdout `<report>...</report>`	`IterationReport.parse(result.result)`
Prompt template	`.cms-iterate/prompts/iterator-1.md`	`prompt_loader.get("iterator-system", **vars)`
Configuration	Hardcoded in shell	`IterationConfig` dataclass + `defaults.yaml`
Query options	CLI flags	`orchestrator.create_fresh_query()`
Error handling	Process crash = next iteration	`try/except` with typed errors
Testing	Integration only (needs running CLI)	Unit testable (mock `query()`)

What Does NOT Change

Component	Why
`.cms-iterate/` directory structure	Checkpoint, reports, prompts, backups — all same paths
`checkpoint.json` format (v1.1.0)	Backward compatible — old checkpoints load into new engine
Decision logic	Same 4 stop conditions, same continue rules
External CMS Skills	Kept for now, deprecated later — engine runs alongside
Existing orchestrator methods	`chat()`, `run()`, `run_skill()` untouched
Agent definitions	All `.claude/agents/` untouched

The Four Design Decisions

Decision 1: Public API for Fresh Queries

The engine needs to call query() without accumulating context. This requires a clean public method on the orchestrator — not reaching into private internals.

WRONG:
  engine calls orchestrator._build_options(session_id=None)
  └── Private method. Breaks when orchestrator refactors internals.

RIGHT:
  engine calls orchestrator.create_fresh_query(prompt, max_turns=30)
  └── Public contract. Engine doesn't know how options are built.

Why it matters: The orchestrator owns the SDK client, API keys, model selection, and tool configuration. The engine shouldn’t duplicate any of that. It should say “give me a fresh conversation with this prompt” and get back a result.

One method. Clean boundary. The engine never imports the SDK directly.

Decision 2: Parallel Execution from Day One

CMS already runs independent tasks in parallel:

CMS parallel (current):
  cms-iterate.sh spawns 3 CLI processes simultaneously
  └── Process 1: Task A (no dependencies)
  └── Process 2: Task B (no dependencies)
  └── Process 3: Task C (no dependencies)
  Wait for all → continue with Task D (depends on A, B)

The SDK migration must preserve this. Not as a v2 feature. As a v1 parameter.

SDK parallel (target):
  engine.start(request, parallel=True)
  └── asyncio.gather(
        query(task_A_prompt, session_id=None),
        query(task_B_prompt, session_id=None),
        query(task_C_prompt, session_id=None),
      )
  Wait for all → continue with Task D

Config interface:

iteration:
  parallel: false          # v1 default: sequential
  max_parallel_queries: 3  # When parallel=true

Even if v1 ships with parallel=false, the parameter exists. No API signature change for v2.

Decision 3: Evolve Hook Points

CMS’s Self-Evolving Loop is its biggest differentiator over Agent Teams:

Iteration fails → extract learning → evolve skills → retry with better tools

The engine must have hook points for this, even if v1 doesn’t implement the full loop.

_run_loop pseudocode:

  for iteration in range(max):
      report = await _run_single_iteration(checkpoint)

      if report.status == "completed":
          _apply_report(checkpoint, report)
          checkpoint.recovery.failure_count = 0    ← reset on success

      elif report.status == "failed":
          checkpoint.recovery.failure_count += 1

          if config.enable_evolving:               ← HOOK POINT
              await _evolve(checkpoint, report)     ← extract + evolve

          if failure_count >= threshold:
              break

      checkpoint.save()

v1 implementation of _evolve(): pass. Five characters. But the hook is in the loop, the config flag exists, and the IterationReport already captures the failure context needed for learning.

v2 plugs in: experience-extractor agent → skill-evolver agent → retry with evolved skills. Zero changes to the loop structure.

Decision 4: Defensive Error Handling

CMS processes are naturally isolated — a crashed CLI process doesn’t take down the shell script. SDK query() calls run in the same Python process. An unhandled exception kills the engine.

Three error tiers:

Tier 1 — Expected failures (rate limit, timeout):
  → Return IterationReport(status="failed", errors=[...])
  → Engine continues to next iteration or stops based on threshold
  → Checkpoint saved before and after

Tier 2 — Query anomalies (no <report> tag in output):
  → Return IterationReport(status="partial", raw_output=text)
  → Engine logs warning, attempts next iteration
  → Human can inspect raw output in reports/

Tier 3 — Infrastructure failure (SDK crash, network down):
  → Save checkpoint immediately
  → Re-raise exception (engine stops)
  → Resume picks up from last saved checkpoint

Critical invariant: Checkpoint is always saved before any operation that might fail. The engine never loses more than one iteration’s work.

_run_single_iteration:
  checkpoint.save()          ← Save BEFORE query (crash protection)
  result = await query(...)  ← This might fail
  report = parse(result)     ← This might fail
  checkpoint.save()          ← Save AFTER success
  return report

Data Models

Checkpoint (v1.1.0 Compatible)

Checkpoint
├── version: "1.1.0"
├── iteration_type: "auto-cycle" | "auto-explore" | "custom"
├── request: str
├── current_iteration: int
├── max_iterations: int
├── status: "running" | "completed" | "failed" | "stopped"
│
├── original_context
│   ├── goal: str
│   └── acceptance_criteria_file: str
│
├── context_summary
│   ├── current: str
│   ├── key_decisions: list[str]
│   ├── blockers: list[str]
│   └── next_action: str
│
├── completed_items: list[dict]
├── pending_items: list[dict]
├── history: list[dict]         ← One entry per iteration
│
├── progress
│   ├── percent: int
│   └── estimated_remaining: int
│
└── recovery
    ├── last_successful_iteration: int
    └── failure_count: int

Serialization contract: Checkpoint.to_dict() output is byte-identical to what CMS shell script writes. Old checkpoints load with Checkpoint.from_file(). New checkpoints are readable by old CMS.

IterationReport

IterationReport
├── task_id: str
├── iteration: int
├── status: "completed" | "partial" | "failed" | "blocked"
│
├── iteration_result
│   ├── action_taken: str
│   ├── files_changed: list[str]
│   ├── tests_passed: bool
│   └── errors: list[str]
│
├── checkpoint_update
│   ├── completed_items: list[dict]
│   ├── pending_items: list[dict]
│   ├── progress_percent: int
│   └── context_summary: str
│
└── continue_decision
    ├── should_continue: bool
    └── reason: str

Parsed from: <report>JSON</report> tags in query() output — same format CMS already uses.

Engine Public API

IterationEngine
├── start(request, type, max_iterations, ...) → Checkpoint
│     Creates fresh checkpoint, runs loop
│
├── resume() → Checkpoint
│     Loads checkpoint from disk, continues from last iteration
│
├── stop() → None
│     Sets flag, loop exits after current iteration completes
│
└── status() → Checkpoint
      Reads checkpoint from disk (no side effects)

Exposed as 3 MCP tools: iteration_start, iteration_resume, iteration_status

Also exposed as orchestrator method: orchestrator.run_iterations(request, ...)

Two entry points, same engine. MCP tools for Claude-driven workflows. Orchestrator method for programmatic use.

Decision Logic (Unchanged from CMS)

CONTINUE if ALL true:
  ├── pending_items is not empty
  ├── current_iteration < max_iterations
  ├── recovery.failure_count < failure_threshold (default: 3)
  └── stop() was not called

STOP if ANY true:
  ├── pending_items is empty                    → status: "completed"
  ├── current_iteration >= max_iterations       → status: "stopped"
  ├── recovery.failure_count >= threshold       → status: "failed"
  └── stop() was called                         → status: "stopped"

Same four conditions. Same behavior. Now testable without running Claude.

File Map

NEW FILES:
  src/core/iteration_engine.py      ~400 lines  ← Engine + data models
  config/prompts/iterator-system.md  ~250 lines  ← Ported from CMS prompt
  tests/core/test_iteration_engine.py ~300 lines ← Unit + integration tests

MODIFIED FILES:
  src/core/orchestrator.py          +30 lines    ← run_iterations() + create_fresh_query()
  src/tools/self_dev_tools.py       +90 lines    ← 3 MCP tools
  config/defaults.yaml              +6 lines     ← iteration config section

UNTOUCHED:
  .cms-iterate/                     ← Same directory, same format
  .claude/agents/                   ← All agent definitions
  .claude/skills/cms-*              ← Kept as fallback
  src/core/orchestrator.py (rest)   ← chat(), run(), run_skill()

Total new code: ~1,070 lines Total modified: ~126 lines Risk surface: 2 existing files get small additions

Implementation Order

Phase 1 — Foundation (no integration, pure unit testable)
  1. iteration_engine.py — Checkpoint + IterationReport data models only
  2. test_iteration_engine.py — Data model tests + backward compat with real checkpoint
  3. Run: pytest tests/core/test_iteration_engine.py -v

Phase 2 — Engine Logic (mock query, still no real integration)
  4. iteration_engine.py — IterationEngine class with decision logic
  5. iterator-system.md — Port prompt template
  6. test_iteration_engine.py — Decision logic + mocked iteration tests
  7. Run: pytest tests/core/test_iteration_engine.py -v

Phase 3 — Integration (wire into orchestrator)
  8. orchestrator.py — create_fresh_query() + run_iterations()
  9. self_dev_tools.py — 3 MCP tools
  10. defaults.yaml — iteration config
  11. Run: pytest tests/core/ -v (full suite, no regression)

Phase 4 — Smoke Test
  12. Load real .cms-iterate/checkpoint.json → verify all fields
  13. Checkpoint roundtrip: from_file() → to_dict() → byte compare
  14. IterationReport parse real CMS report text

Each phase is independently shippable. Phase 1 alone provides typed checkpoint handling. Phase 2 adds testable decision logic. Phase 3 connects everything. Phase 4 validates backward compatibility.

Migration Timeline

Week 1: Phase 1-2 (data models + engine logic)
  └── CMS external still runs all production workloads
  └── Engine exists but isn't wired in

Week 2: Phase 3-4 (integration + smoke test)
  └── Both paths available: orchestrator.run_iterations() AND /cms skill
  └── Run same task through both, compare checkpoints

Week 3: Shadow mode
  └── Engine runs alongside CMS for real tasks
  └── Compare checkpoint outputs
  └── If divergence: investigate, fix engine, keep CMS as source of truth

Week 4: Switch
  └── Default to engine
  └── CMS skills marked deprecated (not removed)
  └── Remove CMS skills in v2 after 1 month with no issues

What This Enables

Once the engine is SDK-native, three things become possible that CMS couldn’t do:

1. Programmatic Iteration from Any Python Code

# Before: only through Claude CLI / Skill invocation
# After:
engine = IterationEngine(orchestrator, config)
checkpoint = await engine.start("Refactor auth module to use JWT")
print(f"Completed in {checkpoint.current_iteration} iterations")

Iteration loops become a library call. Tests, CI pipelines, external tools can all trigger them.

2. Typed Error Handling

# Before: parse stdout text for error patterns
# After:
if report.status == "failed":
    for error in report.iteration_result.errors:
        logger.error(f"Iteration {report.iteration}: {error}")

No more regex. No more “did the CLI output something unexpected?” Structured data all the way down.

3. Composable with Agent Teams

# The endgame: Agent Teams teammate that runs iterations
Task(
    subagent_type="general-purpose",
    team_name="dev-team",
    prompt="Use iteration_start tool to run 10 iterations on the auth module"
)

An Agent Teams teammate can invoke the IterationEngine through MCP tools. CMS inside Agent Teams. Best of both architectures.

The Principle

Build userland solutions with clean interfaces. When the platform catches up, your migration is a transport swap, not a rewrite.

CMS → IterationEngine is 1,070 lines of new code and 126 lines of modifications. The checkpoint format doesn’t change. The directory structure doesn’t change. The decision logic doesn’t change.

Because CMS was built with the right abstractions from the start.