Runs IDE: Smithers at runs.supachad.com¶

Chad's second front-end is a web IDE for durable workflows, served at runs.supachad.com. Where Open WebUI is the chat surface, the runs IDE is the workflow surface — it lets the operator watch, inspect, edit, and launch Smithers workflows (experiments, fusion runs, the autonomy ladder) with full run history and crash-resume.

It exists because the deterministic experiment cron could stall mid-loop and leave silence on every surface — no artifact, no ledger, nothing to debug. Smithers persists every frame to SQLite regardless of whether the model cooperates, and this dashboard reads those DBs directly. A stalled run is now visible, not invisible.

What you see¶

The dashboard is seven tabs over a dark, dependency-light UI (vanilla JS + two CDN libs: CodeMirror for the editor, mermaid for the DAG).

Runs dashboard — run list and live run detail

Tab	What it shows
Runs	Every run across every workflow DB, newest first. Live-polls (5 s for the list, 2.5 s while a run is active). Click a run for the task tree, per-node outputs (with per-node token counts and a "reasoning hidden" badge), logs, chat, and diffs.
Workflows	The full catalog of `.jsx` files — including scaffolds that have never run (with run counts) — each rendered as an interactive DAG (`smithers graph --format json`). Inline CodeMirror editor to read/edit/save the source (path-validated), a Launch settings drawer (input JSON, reasoning, timeouts, max-output-tokens, backend, dry-run), and Re-run ⟳ on any finished run.
Approvals	Pending `<Approval>` gates across every DB, plus the notify-channel config (which channels the server pushes a waiting gate on).
Chains	Workflow chaining — string several workflows into a sequential pipeline; each step launches when the prior reaches `finished`, optionally feeding its output forward. Per chain: Run again (re-run the whole chain in place), Resume (continue a failed/stalled chain from the stuck step), Fork (copy to a new run), Delete, Cancel. A step whose run dies with no heartbeat is auto-marked stalled (not left falsely "running").
Schedules	The scheduled jobs (launchd timers) — schedule, target workflow, and live status.
Experiments	The evolutionary-experiment leaderboard (variants scored, ranked, retired) plus the cost-savings hero metric.
Directives	Operator free-text that steers the self-improvement loop, the trace-grounded DB-signal digest (failed/stale/low-quality runs), and the arena fixtures (static + harvested from real run inputs).

Workflows tab — the full catalog with run counts

Workflows tab — interactive DAG of a Smithers workflow

From a run detail you can cancel, resume a stalled run, approve / deny an <Approval> gate, fork a run for replay, diff any node's output, and launch a new run of any workflow — all from the browser.

How it's served¶

The summary: a durable Hono server reads the Smithers SQLite DBs directly. There is no separate Smithers daemon to babysit — the DBs are the source of truth, and the server auto-discovers new ones.

flowchart LR
    Op[Browser<br/>runs.supachad.com] --> CFA[Cloudflare Access<br/>SSO / service token]
    CFA --> CFT[Cloudflare Tunnel<br/>cloudflared in Docker]
    CFT --> Hono[serve-runs.js<br/>Hono :7331]
    Hono -->|read-write WAL| DBs[(workflow *.db<br/>experiments.db<br/>fusion.db ...)]
    Hono -->|CLI in temp dir| CLI[smithers up / graph<br/>cancel / approve / fork]
    Agent[chad-runs CLI<br/>Chad's hands] --> Hono

Key pieces, all under scripts/chad-smithers/:

Path	Role
`serve-runs.js`	Hono server. Scans the workspace for `.db` (skipping `smithers.db`), opens each read-write* (WAL mode throws on readonly), exposes the REST API below.
`public/index.html`	The single-file dashboard. No build step.
`chad-runs`	An ESM Node CLI — Chad's programmatic hands on the same API (see below).
`workflows/*.jsx`	The workflow definitions (fusion, experiments, probes, the autonomy ladder).
`dev.nemoclaw.chad-runs-ui.plist`	launchd agent. Binds `0.0.0.0:7331`, `KeepAlive=true`.

Auth model¶

The server is fail-closed. Every request must present either:

x-smithers-key: <SMITHERS_RUNS_API_KEY> (read from credentials.json) — how chad-runs and other programmatic callers authenticate; or
Cf-Access-Authenticated-User-Email — the header cloudflared injects after Cloudflare Access SSO, how a human operator authenticates.

Anonymous requests (neither present) get 401. Write endpoints (launch / cancel / approve / fork / save-file) additionally pass through an operator(c) gate.

The CLI-vs-DB quirk¶

Smithers' CLI finds its DB by walking up for a file literally named smithers.db. Chad's workflows use named DBs (fusion.db, experiments.db, …) so several can coexist in one workspace. The server bridges this with cliWithDb(dbPath, args): it execs the CLI in a temp dir holding a smithers.db symlink to the target DB. It also captures output instead of throwing on non-zero exit, because smithers cancel exits code 2 on success.

The REST API¶

serve-runs.js endpoints (all under /api). Every one has a chad-runs verb (see below) — the CLI is a 1:1 mirror of the API.

Method + path	Purpose
`GET /health`	Liveness + DB count.
`GET /runs`	All runs across all DBs, newest first.
`GET /runs/:id`	One run: task tree + per-node outputs + status + telemetry.
`GET /runs/:id/logs`	Run logs (event stream).
`GET /runs/:id/chat`	Run chat transcript (`smithers chat`).
`GET /runs/:id/diff/:node`	A node's output diff.
`GET /runs/:id/trace/:node`	Token / time / failure attribution for a node.
`POST /runs/:id/cancel\\|resume\\|fork`	Run lifecycle (cancel, checkpoint-resume, time-travel fork).
`POST /runs/:id/approve\\|deny`	Resolve an autonomy gate (then auto-resume).
`GET /approvals`	Pending gates across all DBs.
`GET/POST /notify-config`	Read / set the channels the server pushes gates on.
`GET /workflows` · `GET /catalog`	Available workflow files · catalog with matched DB + run counts.
`GET /workflow-graph`	`{dag, tree}` from `smithers graph --format json`.
`GET/POST /workflow-file`	Read / write a workflow's `.jsx` source (path-validated).
`POST /preflight` · `POST /launch`	Validate a launch's settings · start a new run.
`GET /models` · `GET /model-limits` · `GET /model-matrix`	Live roster · per-model ceilings · best-model-per-task grid.
`GET /efficiency` · `GET /schedules`	Tokens + downgrade savings · scheduled jobs + status.
`GET /experiments`	The experiment leaderboard.
`GET /signal` · `GET /fixtures`	Trace-grounded review signal · arena fixture set.
`GET/POST /directives`	Read / set the operator directives steering the loop.
`GET /chains` · `GET /chains/:id`	List chains · one chain's step states.
`POST /chains` · `POST /chains/:id/fork\\|run-again`	Create · copy to a new chain · re-run the whole chain in place.
`POST /chains/:id/cancel\\|resume\\|rerun-step` · `DELETE /chains/:id`	Cancel · resume a failed/stalled chain · re-run from a step · delete a terminal chain.

Write endpoints (anything that launches, mutates state, or saves a file) require an authenticated operator (Cloudflare Access email or the machine key); reads only need the key. See Auth model above.

chad-runs — the agent's hands¶

Chad doesn't click a browser. scripts/chad-smithers/chad-runs is an ESM CLI that talks to the same API so Chad can drive runs from a cron turn or a chat reply. It reads SMITHERS_RUNS_API_KEY and the CF_ACCESS_CLIENT_ID/SECRET service-token pair from credentials.json, and targets CHAD_RUNS_URL (default 127.0.0.1:7331).

chad-runs health
chad-runs runs                       # list (live + history)
chad-runs get <id>                   # run detail + telemetry
chad-runs logs <id> / chat <id> / trace <id> <node> / diff <id> <node>
chad-runs workflows / catalog        # launchable files / catalog w/ run counts
chad-runs graph <wf> / cat <wf> / save <wf> <file>
chad-runs preflight <wf> [--env JSON]          # is this launch safe?
chad-runs launch <wf> [--input JSON] [--env JSON]
chad-runs cancel <id> / resume <id> / fork <id>
chad-runs approvals / approve <id> / deny <id>  # autonomy gates
chad-runs models / model-limits / model-matrix / efficiency / schedules
chad-runs experiments                # evolutionary leaderboard
chad-runs signal [--days N] / fixtures          # self-improvement inputs
chad-runs directives / set-directives <file>    # steer the loop
chad-runs notify-config / set-notify <ch1,ch2>  # approval-notify channels
chad-runs chains / chain <id>                   # workflow chaining
chad-runs chain-create <file.json> / chain-cancel|resume <id> / chain-rerun <id> --index N

Run chad-runs with no args for the grouped help. The CLI mirrors every endpoint above 1:1. The sibling Smithers/runs skill (scripts/chad-smithers/SKILL.md) documents each verb with worked examples (steering the self-improvement loop, driving a chain).

Workflow catalog¶

Every .jsx under the workspace is auto-discovered (it shows in the Workflows tab and chad-runs workflows). Side-effecting workflows are shadow-safe by default — they log what they would do until an explicit env flag flips them to real mode, the same draft-only contract as Chad's shell cron wrappers.

Side-effecting workflows run in shadow by default and take an explicit env flag to enable irreversible actions (send, apply, publish) — the same fail-safe default as Chad's cron wrappers and the chad-action-gate. The "enable side-effects" column is the flag, not a missing feature.

Workflow	What it does	Enable side-effects
`experiments.jsx`	Evolutionary drafter-prompt arena (start wide → score → keep)	runs nightly
`fusion.jsx`	One prompt across N models in parallel → fuse best	`--input '{"prompt":"…"}'`
`mcp-health-probe.jsx`	Probe gbrain / webui MCP surfaces, escalate on failure	runs as-is
`fail-only-report.jsx`	Quiet on green; report only on failure	runs as-is
`email-ladder.jsx`	Autonomy ladder: triage → draft → moderate → `Approval` → send	`CHAD_EMAIL_SEND=1` + allowlist
`issue-triage.jsx`	Fetch issues → score+route → `Parallel` spawn fixes → report	`CHAD_SPAWN_SSH=<host>`
`content-pipeline.jsx`	research → draft → review (spawns) → `Approval` → publish	`CHAD_CONTENT_PUBLISH=1`
`self-improve.jsx`	Cron telemetry → propose tunings → gate → apply	`CHAD_SELFIMPROVE_APPLY=1`
`memory-curator.jsx`	Inactivity-gate → snapshot → propose consolidations → `Approval`	`CHAD_CURATOR_APPLY=1`
`log-digest.jsx`	Cluster host service-log errors → note (quiet if clean)	`CHAD_LOGDIGEST_POST=1`
`token-optimize.jsx`	"Tokenmaxxing": probe whether a cheaper model matches a task's quality → `Approval`-gated downgrade written into `task-profiles.json`. Feeds the Experiments model × task matrix.	`CHAD_TOKENOPT_APPLY=1`
`bug-report.jsx`	Chad catches his OWN failures (failed runs/nodes + host logs) → clusters into distinct bugs → `Approval` → `gh issue create` (dedups).	`CHAD_BUGREPORT_POST=1`
`skill-improve.jsx`	Chad proposes ENHANCEMENTS to his own workflows/skills → `Approval` → files GitHub enhancement issues (never edits source).	`CHAD_SKILLIMPROVE_POST=1`
`code-review-loop.jsx`	Iterate a PR review to convergence with the `<Loop>` primitive + explicit context threading — produce → judge → refine until `approved` or max iters. (Uses raw `<Loop>`, not the `<ReviewLoop>` composite, which doesn't inject the produced work into the reviewer's prompt.) Read-only diff, draft-only.	`--input '{"repo":"o/r","pr":N}'`, `CHAD_CODEREVIEW_POST=1`
`dependency-update.jsx`	Keep deps current via `<ScanFixVerify>` — triage `npm outdated` safe/review/risky → draft bump set → verify. Proposal only.	`CHAD_DEPUPDATE_APPLY=1`
`debate.jsx`	Adversarial reasoning via `<Debate>` — two models argue for/against, a judge rules. The counterpart to `fusion.jsx`.	`--input '{"topic":"…"}'`, `CHAD_DEBATE_POST=1`
`canary-judge.jsx`	Post-deploy verification via `<Poller>` — poll a health endpoint until stably healthy or timeout → judge promote/hold/rollback. Advisory.	`--input '{"url":"…/health"}'`, `CHAD_CANARY_POST=1`
`changelog.jsx`	Draft a changelog entry from recent git log → `Approval` → note. A plain `Sequence`.	`CHAD_CHANGELOG_POST=1`
`pr-shepherd.jsx`	Keep open PRs moving — fetch → deterministic per-PR action (`lib/pr.js`, no LLM) → one digest of "what's blocked on whom". Read-only, advisory.	`CHAD_PRSHEP_REPO`, `CHAD_PRSHEP_POST=1`
`coverage-loop.jsx`	Raise coverage toward a target via `<Loop>` — measure → draft focused tests → re-measure until target/max iters. Draft-only unless `APPLY=1`.	`CHAD_COVERAGE_TARGET`, `CHAD_COVERAGE_APPLY=1`
`coding-task.jsx`	Chad (nemotron) DRIVES a long, validated opencode big-pickle build: plan (+ a real validate command) → [ code → validate → assess ] looped until done → Moshi ping → `Approval` → optional draft PR. Coder: `spawn` (isolated pod/GHA, default) or `direct` (host opencode). Never edits the live repo.	`--input '{"task":"…"}'`, `CHAD_CODING_CODER=direct`, `CHAD_CODING_APPLY=1`
`landing-lab.jsx`	Grounded landing-page lab: nemotron distills the real docs into a fact sheet (claim → source) → opencode writes N distinct pages per angle using only those facts → nemotron compares + flags any unbacked claim. Never touches the live landing repo.	`CHAD_LAB_ANGLES="dev\\|founder"`, `CHAD_LAB_SOURCES`

The seven new workflows landed with the Smithers 0.26 upgrade and lean on Smithers' own composite components (ScanFixVerify, Debate, Poller, Loop) where the shape fits. One exception: the <ReviewLoop> composite doesn't inject the produced work into the reviewer's prompt (nor converge on approved), so code-review-loop hand-rolls produce→judge over <Loop> with explicit ctx.outputs threading — verified against a live PR (iterated to approved, no "no work provided"). Routing stays deterministic where it can (pr-shepherd, issue-triage), and fan-outs (fusion, token-optimize) cap concurrency with <Parallel maxConcurrency>. See Smithers version.

The five autonomy-ladder rows (email-ladder … memory-curator) are the ported chad-spawn / cron features (the keep-both decision below); the last three are Chad's self-improvement loop — he profiles his own cost, files his own bugs, and proposes his own enhancements, all Approval-gated. The Directives tab steers what he prioritizes; see Self-improvement. They run on the same dashboard, resume after a crash, and route their spawns through the bridge — see Orchestrator for the per-workflow mapping.

Fusion — all the models, fused¶

workflows/fusion.jsx is the marquee workflow: run one prompt across N models in parallel, then fuse the answers into a single best result (the mixture-of-models pattern, with a model shootout as a side effect).

Each model is its own durable <Task> inside a <Parallel>, so a slow or failed model doesn't sink the run.
The roster resolves in order: CHAD_FUSION_MODELS env → the daily-refreshed state/models.json featured list → a hardcoded fallback. This is how the runs IDE gets access to every model Open WebUI does — the same NVIDIA catalog, kept current automatically.
A capable synthesizer (pickAgent("judge")) merges the candidates, picks best_model, and explains the choice.

Daily model refresh¶

refresh-models.js (launchd dev.nemoclaw.chad-models-refresh, daily 04:45 local) fetches integrate.api.nvidia.com/v1/models, filters to chat / agentic models (excluding embed / safety / guard classes), and writes state/models.json with featured (priority-ordered flagships), chat, new, and removed lists. When NVIDIA launches a new open model, the fusion roster and experiments pick it up the next morning with no manual edit. (GLM 5.1 is on the NVIDIA API today; 5.2 is not yet — the refresh adopts it automatically when it lands.)

Two orchestrators, on purpose¶

The runs IDE drives chad-Smithers, which coexists with the older chad-spawn sub-agent orchestrator. They are complementary, not a migration:

	chad-spawn	chad-Smithers
Shape	imperative one-shot sub-agents (kind manifests)	declarative durable workflows (`.jsx`)
State	branch-as-record on chad-state	SQLite + live dashboard + resume
Best for	isolated one-shot agents (writer / coder / reviewer), GHA offload	multi-step pipelines, experiments, the autonomy ladder, anything inspectable / resumable
Substrate	in-container or GHA	host (live) — plus GHA via the bridge

The decision is keep both, bridge them: chad-spawn stays the GHA-isolated one-shot substrate, and a Smithers workflow can call chad-spawn to offload one heavy step (a build, a big parallel eval) onto a GitHub Actions runner, reconciling its result.json as that task's output. The bridge is lib/spawn.js's runSpawn() — a durable task helper that shells out to the existing chad-spawn rather than rebuilding any GHA machinery. See Orchestrator for the bridge code and the migrated-workflow mapping, and Substrates for the offload path.

Models and reasoning¶

Workflows route through agents.js, a model router that auto-detects tiers and routes by role (pickAgent(role)), with opts.model for any specific model. The default capable agent is Nemotron 3 Ultra 550B (nvidia/nemotron-3-ultra-550b-a55b) with reasoning on — the tool-call harness bug that affected earlier models is absent in Ultra (verified by a tool-call round-trip). Backends: nemotron / local / claudecode / codex / anthropic / opencode (the opencode/big-pickle free model, via Smithers' built-in OpenCode adapter).

Smithers version¶

The workspace pins smithers-orchestrator ^0.26.1 (upgraded from 0.23.0; previously an unpinned latest, which risked a surprise jump). The 0.23→0.26.1 upgrade landed clean — all workflows graph-validate, the full test suite passes, and the dashboard's read + graph + durable-write paths were smoke-tested on 0.26. There were no breaking changes to createSmithers, the JSX components, or the CLI across that range. The one thing that could have bitten us — 0.24.0 moving smithers.db resolution to a .smithers/ project anchor — doesn't, because the runs IDE points the CLI at our named DBs via cliWithDb's smithers.db symlink shim rather than relying on CWD discovery.

Because the runs IDE reads the SQLite DBs directly and ships its own Hono server + vanilla-JS dashboard, it doesn't depend on the upstream UI packages that were rebuilt (gateway-client/gateway-react, 0.24.0) or removed (the POC chat/studio apps, 0.25.0).

Features adopted from the upgrade:

Reliable <Loop> (0.24.0 — parallel loops no longer starve, deps resolve across loop boundaries). This is what makes code-review-loop.jsx safe to ship (iterate-until-clean). The graph renderer draws the loop back-edge.
<Saga> / sub-workflows render with a group badge in the DAG (the renderer walks these container types).

Deferred, with rationale:

Native masked-child-failure fields (failedChildren/failedChildKeys, 0.25.1) live on the gateway event API and the run-result object. The dashboard reads SQLite directly, so it keeps inferring the "tolerated failure" banner from the attempts table (count failed node states on a finished run) — correct for a DB-read architecture, no gateway needed.
Gateway event streaming (0.24.0) — a headless smithers gateway that streams persisted events from detached runs — is the native replacement for the dashboard's 2.5–5 s polling and the hand-rolled resumeDetached(). It's the biggest single upgrade available but a larger migration; tracked as future work.
Hermes agent runtime integration (0.26.0) adds a native Smithers plugin + slash commands + tools. Chad's model routing (agents.js) could gain a hermes backend here; deferred until there's a reason to add another capable-tier option.
Typed ctx.output() / ctx.outputMaybe() (0.25.0) and workflow input JSON schemas in inspect (0.24.0, could auto-generate the launch-drawer form) — DX niceties, adopt incrementally.

Deploying it¶

Host side is already wired: the launchd agent binds 0.0.0.0:7331 and cloudflared (in Docker, with host.docker.internal:host-gateway) fronts it. The one operator step is dashboard-only (token-managed tunnels can't be edited from a config file):

In Cloudflare Zero Trust, add a public-hostname route runs.supachad.com → http://host.docker.internal:7331.
Add an Access application for that subdomain mirroring chad.supachad.com's policy.

Exact values are in scripts/chad-smithers/cloudflared-runs.ingress.example.yaml. Until that route is live, the nightly leaderboard still reaches the operator as an OpenWebUI note.

Surfacing runs in Open WebUI¶

The two front-ends are bridged, not merged (Open WebUI has no slot for embedding an external app):

Run-report note (live today): a post-run task writes a "Night of <date>" note into Open WebUI — tasks run, verdicts, artifacts created, anything stalled. The execution record is operator-visible even on a failed night.
Run-state tool (later): an Open WebUI Tool that queries live run state, so "what happened last night?" in chat answers from the Smithers DBs.

See Front-ends for the chat surface.