Symphony SPEC · Draft v1
Language-agnostic service specification 语言无关的服务规范

Orchestrate coding agents
like a long-running daemon.
像守护进程一样
编排编码 Agent

Symphony continuously reads work from an issue tracker, creates an isolated workspace for each issue, and runs a coding-agent session inside it — repeatable, bounded, and observable. Symphony 持续从 issue tracker 读取工作,为每个 issue 创建独立 workspace,并在其中运行编码 agent 会话 —— 可重复、有上限、可观测。

Tracker Linear-compatible Runtime Codex app-server Policy WORKFLOW.md (in-repo) State in-memory + tracker
8
Main components from Loader to Logging
5
Orchestration states per issue
11
Run attempt phases in lifecycle
3
Safety invariants on workspaces
§1 — Problem statement §1 · 问题描述

Four operational problems
that turn into a single daemon.
四个运维问题,
合并为一个守护进程。

Instead of manual scripts and ad-hoc shells, Symphony treats issue execution as a repeatable, bounded, in-repo workflow. 与人工脚本和零散 shell 不同,Symphony 把 issue 执行视为一种可重复、有上限、随仓库走的工作流。

01

Repeatable workflow可重复的工作流

Issue execution becomes a daemon loop, not a manual script run. issue 的执行从人工脚本变成守护循环。

02

Isolated execution隔离执行

Per-issue workspaces — agent commands run only inside their own directory. 每个 issue 独立 workspace —— agent 命令只在自己的目录内运行。

03

In-repo policy仓库内的策略

WORKFLOW.md versions prompt and runtime settings alongside code. WORKFLOW.md 把提示词与运行时设置和代码一起做版本管理。

04

Observability可观测性

Enough visibility to operate and debug many concurrent agent runs. 足以运维与调试多个并发 agent 运行的可见性。

§3 — System overview §3 · 系统总览

Eight components, six layers,
one authoritative orchestrator.
八个组件、六个层级,
一个权威的编排器。

Click any node to inspect its responsibilities. The orchestrator owns the runtime state; everything else feeds or executes work on its behalf. 点击任意节点查看其职责。编排器拥有运行时状态,其它组件为它供给或代为执行工作。

Orchestrator single authority
component

Orchestrator

Owns the poll tick, owns the in-memory runtime state, and decides which issues to dispatch, retry, stop, or release. 拥有轮询 tick 与内存运行时状态;决定哪些 issue 被分发、重试、停止或释放。

  • Serializes state mutations through one authority通过单一权威序列化状态变更
  • Tracks session metrics and retry queue state跟踪会话指标与重试队列
  • Reconciles active runs every tick before dispatching每个 tick 在分发前先调和活跃运行
running map claimed set retry_attempts codex_totals
L1PolicyRepo-defined WORKFLOW.md prompt body and team rules for ticket handling, validation, and handoff.仓库定义的 WORKFLOW.md 提示词正文,以及团队的 ticket 处理、校验与交接规则。
L2ConfigurationTyped getters that parse front matter, apply defaults, env tokens, and path normalization.类型化 getter;解析 front matter,应用默认值、环境变量与路径归一化。
L3CoordinationThe orchestrator: polling loop, eligibility checks, concurrency, retries, reconciliation.编排器:轮询循环、合规判断、并发、重试、调和。
L4ExecutionWorkspace + agent subprocess: filesystem lifecycle, prep, coding-agent protocol.Workspace + agent 子进程:文件系统生命周期、准备、编码 agent 协议。
L5IntegrationLinear adapter — API calls and normalization for tracker data.Linear 适配器 —— tracker 数据的 API 调用与归一化。
L6ObservabilityLogs and an OPTIONAL status surface so operators can see what the orchestrator and agents are doing.日志与 OPTIONAL 的状态展示面,让运维能看到编排器和 agent 的行为。
§7 — Orchestration state machine §7 · 编排状态机

Five claim states.
Distinct from tracker states.
五个认领状态,
不同于 tracker 状态。

Tracker has Todo, In Progress, etc. Symphony tracks a separate claim state so it never dispatches the same issue twice. Hover a state to see what triggers move it. Tracker 有 TodoIn Progress 等;Symphony 维护独立的认领状态,避免重复分发。悬停状态可查看触发条件。

Unclaimed no run, no retry
Claimed reserved · prevents dup
Running worker task alive
RetryQueued timer pending
Released terminal / not-active / missing
A successful worker exit does not mean the issue is done. The orchestrator still schedules a ~1 s continuation retry to re-check whether the issue is still active. Worker 正常退出并不代表 issue 完成。编排器仍会排程约 1 秒的 续传重试,再次确认 issue 是否仍活跃。

Transition triggers迁移触发器

  • Poll TickReconcile · validate config · fetch candidates · dispatch until slots exhausted.调和;校验配置;拉取候选;直至并发槽用尽。
  • Worker Exit (normal)Remove running entry · update totals · schedule continuation retry (attempt 1).移除 running;更新累计;排程续传重试(attempt 1)。
  • Worker Exit (abnormal)Remove running entry · update totals · schedule exponential-backoff retry.移除 running;更新累计;指数退避重试。
  • Codex Update EventUpdate live session, token counters, rate limits.更新 live session、token 计数与速率限制。
  • Retry Timer FiredRe-fetch active candidates and attempt re-dispatch, or release if ineligible.重新拉取活跃候选并尝试再分发,否则释放认领。
  • Reconciliation RefreshStop runs whose issue states are terminal or no longer active.停止 issue 已进入终态或不再活跃的运行。
  • Stall TimeoutKill the worker and schedule a retry.杀死 worker 并排程重试。

Run attempt lifecycle运行尝试生命周期

01PreparingWorkspacecreate / reuse dir创建/复用目录
02BuildingPromptrender template渲染模板
03LaunchingAgentspawn process启动进程
04InitSessionhandshake握手
05StreamingTurnmain loop主循环
06Finishingcleanup收尾
07Succeededterminal · ok终态 · 成功
08Failedterminal · retry?终态 · 是否重试
09TimedOutterminal · retry?终态 · 是否重试
10Stalledno activity无活动
11Canceledby reconciliation调和取消
in-progress phase进行中阶段 success terminal成功终态 failure terminal失败终态
§8 — Polling, scheduling, reconciliation §8 · 轮询、调度、调和

Every tick is six steps,
in a strict order.
每个 tick 是六个步骤,
严格按序执行。

Press Run tick to watch the orchestrator reconcile, validate, fetch, sort, dispatch, and notify. 点击 Run tick,观察编排器执行调和、校验、拉取、排序、分发与通知。

1
Reconcile running issues调和运行中 issue
stall check · refresh tracker state · stop ineligible runs停滞检查;刷新 tracker 状态;停止不合规运行
reconcile()
2
Dispatch preflight validation分发预检校验
workflow loaded · agent exec present · tracker creds resolvedworkflow 已加载;agent 可执行;tracker 凭据已解析
validate()
3
Fetch candidates拉取候选
tracker query filtered to active_statestracker 查询限定到 active_states
tracker.list()
4
Sort by dispatch priority按分发优先级排序
priority ↑ · created_at ↑ · identifier lex tie-breakerpriority ↑;created_at ↑;identifier 字典序决胜
sort()
5
Dispatch while slots remain在槽位可用时分发
global & per-state concurrency · blocker rule on Todo全局与按状态并发;Todo 的 blocker 规则
dispatch()
6
Notify observability通知可观测层
structured logs · OPTIONAL status surface / metrics结构化日志;OPTIONAL 状态面/指标
emit()

Live orchestrator log实时编排器日志

tick=0
§8.4 — Retry & backoff §8.4 · 重试与退避

Exponential, but capped. 指数增长,但有上限。

Failure-driven retries grow as 10000 · 2^(attempt − 1), clamped to agent.max_retry_backoff_ms (default 5 min). Adjust the cap and attempt to feel the effect. 失败重试以 10000 · 2^(attempt − 1) 增长,并被 agent.max_retry_backoff_ms(默认 5 分钟)封顶。拖动滑块查看效果。

delay = min(10 000 · 2^(attempt − 1), 300 000)

Two different retry kinds两种重试

A clean worker exit triggers a continuation retry with a fixed 1000 ms delay — the worker may have just finished a turn loop, and the orchestrator wants to check whether the issue is still active. Worker 正常退出会触发 续传重试,固定延时 1000 ms —— worker 可能刚结束一个 turn 循环,编排器需要再确认 issue 是否仍活跃。

An abnormal exit triggers a failure retry with exponential backoff and a per-issue cap. 异常退出触发 失败重试,按指数退避并按 issue 上限封顶。

CURRENT DELAY
40 000 ms
CAPPED?
no
CONTINUATION
1 000 ms (fixed)
DEFAULT CAP
300 000 ms · 5 min
§9.5 — Safety invariants §9.5 · 安全不变量

Three rules the runtime must uphold. 运行时必须守住的三条规则。

These are the most important portability constraints. Each is enforced before the agent subprocess is launched. 这是最关键的可移植性约束。三条规则均在 agent 子进程启动前校验。

01

Agent runs only in its workspaceAgent 只在其 workspace 内运行

Before launching, the runtime verifies that the subprocess cwd equals the issue's workspace path. Anything else aborts the attempt. 启动前,运行时校验子进程的 cwd 等于该 issue 的 workspace 路径,否则终止本次尝试。

assert cwd == workspace_path
02

Workspace stays inside workspace rootWorkspace 必须位于根目录内

Both paths are normalized to absolute and the workspace path must have the root as a prefix directory. Any escape is rejected. 两条路径都归一化为绝对路径;workspace 路径必须以 root 为前缀目录,任何越界都被拒绝。

workspace_path.startswith(workspace_root + '/')
03

Workspace key is sanitizedWorkspace key 经过清洗

Only [A-Za-z0-9._-] survive in workspace directory names. Everything else is replaced with _. workspace 目录名仅允许 [A-Za-z0-9._-],其余字符一律替换为 _

re.sub(r'[^A-Za-z0-9._-]', '_', identifier)
§14 — Failure model & recovery §14 · 失败模型与恢复

Five failure classes,
one principle: stay alive.
五类失败,
一个原则:保持存活。

The orchestrator never crashes on transient errors. Dashboards, sinks, fetches — all degrade gracefully. Worker failures become retries. 编排器不会因瞬时错误崩溃。Dashboard、日志 sink、tracker 拉取等失败都做优雅降级;Worker 失败转换为重试。

01

Workflow / ConfigWorkflow / 配置

  • Missing WORKFLOW.md缺失 WORKFLOW.md
  • Invalid YAML front matter非法 YAML front matter
  • Unsupported tracker kind不支持的 tracker
  • Missing agent executable缺失 agent 可执行
02

WorkspaceWorkspace

  • Directory creation failed目录创建失败
  • Population / sync failed填充/同步失败
  • Invalid path config非法路径配置
  • Hook timeout / failurehook 超时/失败
03

Agent SessionAgent 会话

  • Startup handshake failure启动握手失败
  • Turn failed / cancelledTurn 失败/取消
  • Turn timeoutTurn 超时
  • Subprocess exit子进程退出
  • Stalled session会话停滞
04

TrackerTracker

  • API transport errorsAPI 传输错误
  • Non-200 status非 200 状态
  • GraphQL errorsGraphQL 错误
  • Malformed payloads畸形载荷
05

Observability可观测性

  • Snapshot timeout快照超时
  • Dashboard render errorsDashboard 渲染错误
  • Log sink config failure日志 sink 配置失败

Recovery behavior恢复行为

stay alive

dispatch validation

Skip new dispatches, keep service alive, continue reconciliation.跳过新分发;保持服务存活;继续调和。

worker failure

Convert to exponential-backoff retry.转化为指数退避重试。

tracker fetch failure

Skip this tick — try again on the next one.跳过本 tick,下一 tick 再试。

state-refresh failure

Keep current workers, retry on next tick.保留当前 worker,下一 tick 重试。

dashboard / log failure

Never crash the orchestrator.绝不让编排器崩溃。

process restart

No in-memory retry timer / live session survives. Recovery is tracker- and filesystem-driven.内存中的重试定时器/live session 不跨重启保留;恢复由 tracker 与文件系统驱动。