Premium escalation¶

Chad's primary inference is nvidia/nemotron-3-super-120b-a12b on the NVIDIA Endpoints free tier. It's fast, cheap, and reasoning-safe. It is not, however, deep enough for every task — multi-turn coding, long architecture sketches, careful PR review can outrun what an embedded 12B-active reasoning model produces.

For those, Chad escalates to Anthropic Claude (Sonnet by default, Opus on demand) through a tightly gated wrapper. "Tightly gated" is the interesting part — premium routes cost real money, so the policy boundary is more aggressive than the autonomy gate.

The four components¶

Component	Path	What it does
`chad-premium`	`scripts/chad-cron-wrappers/chad-premium`	User-facing wrapper. Auto-detects `NEMOCLAW_INVOKER_TOKEN` for terminal use; reads `$CHAD_AUTH_CONTEXT_PATH` for cron use.
`chad-premium-client`	`scripts/chad-cron-wrappers/chad-premium-client`	Python helper that POSTs to `api.anthropic.com/v1/messages`. Shells the actual HTTP out to `curl` so OPA can pin a real binary identity.
`chad-auth-context`	`scripts/chad-cron-wrappers/chad-auth-context`	AuthContext drop and show. Each context is `{source, verifiedIdentity, allowsPremium, createdAt, scope}`.
`chad-premium.yaml`	`nemoclaw-blueprint/policies/presets/chad-premium.yaml`	L7 policy preset. Only `chad-premium-client` and `curl` may POST to `/v1/messages`.

Why a Python helper that shells out to curl¶

The L7 policy keys on /proc/self/exe — the literal binary identity of the calling process. If chad-premium-client made the HTTP request directly through Python's urllib, the policy would have to allow /usr/bin/python3 to reach Anthropic. That's far too broad — any Python script in the sandbox would inherit network access to api.anthropic.com.

By shelling the actual POST out to curl, the policy can pin both binaries: chad-premium-client is allowed to invoke curl in this context, and curl is allowed to reach /v1/messages. A compromised Python script that fabricates an AuthContext still can't reach Anthropic — its /proc/self/exe is /usr/bin/python3, which has no allowlist entry for that endpoint.

The AuthContext¶

Premium calls fail closed unless an AuthContext blob is on disk with allowsPremium: true. Four code paths drop a valid context:

Path	Trigger	How AuthContext is dropped
Dashboard `/premium <prompt>`	User types the prefix in the chat front-end	`chad-route-prompt` calls `chad-auth-context drop --source dashboard --identity operator-1`
Terminal `chad-premium`	Operator runs the binary directly	Wrapper auto-detects `NEMOCLAW_INVOKER_TOKEN` env, drops a context with `source: terminal`
`email-check` cron	`From:` matches an allowlisted operator address	Cron drops `source: email`, `identity: <sender>`
`issue-triage` cron	An open issue body or comment mentions an allowlisted operator handle	Cron drops `source: issue`, `identity: <mention>`

Cron ticks with no inbound trigger have no AuthContext. The Python helper checks for the file at startup; missing or stale context means exit 1 before any network call.

The full call path¶

flowchart LR
    Caller[Caller<br/>cron, terminal, dashboard] --> Auth{AuthContext<br/>present?}
    Auth -- no --> Refuse[Exit 1<br/>no premium for you]
    Auth -- yes --> Premium[chad-premium]
    Premium --> Client[chad-premium-client]
    Client --> Curl[curl]
    Curl --> L7{L7 proxy<br/>OPA policy}
    L7 -- pinned by exe --> API[api.anthropic.com<br/>v1/messages]
    L7 -- wrong exe --> Block[Refused]
    API --> Log[chad-premium.jsonl<br/>model, tokens, latency]

The L7 hop is the load-bearing one. Even if chad-premium-client were compromised and called urllib directly, the proxy would deny the egress because Python isn't in the chad-premium.yaml allowlist.

Logging¶

Every call appends to /tmp/chad-premium.jsonl — one JSON record per line, with model, source, identity, in/out tokens, latency. chad-budget-audit rolls these up weekly:

model × source × identity × calls × tokens × p95 latency
Sender breakdown (which inbound triggers cause the most spend)
Outliers — single calls with high latency or high token cost

The roll-up is appended to feedback-proposals.md for human review.

When Chad uses premium vs Nemotron¶

The gate is the phase2: block in scripts/task-profiles.json. Two profiles are wired today:

email-check — uses premium when the wrapper has parked replies worth thinking about and the sender is allowlisted.
issue-triage — uses premium when an open issue mentions an allowlisted operator handle and is in the top-2 by score.

Other crons (workspace-backup, gbrain-dream, spawn-poll) do not escalate. They're deterministic shell — there's nothing to think about.

Daily cap¶

auto-actions.json includes _budgets.spawn_premium, today set to 8 calls/day. The action gate enforces the cap before the AuthContext check:

{
  "_budgets": { "spawn_premium": 8 },
  "spawn_premium": {
    "_default": "block",
    "operator-1": "auto",
    "agent":     "auto"
  }
}

When the counter is at 8, premium calls return budget (exit 3) and fall through to draft mode — the work is parked, not refused.

Drafts are still drafts¶

A premium-quality reply is still draft-only unless the action type (email_reply, issue_comment, pr_open, etc.) is set to auto under the recipient. Premium changes the quality of the reasoning; the autonomy gate decides whether the result ships.

The two boundaries compose:

Autonomy gate — may Chad take this kind of action, against this target, today?
Premium gate — if the action is allowed, may Chad use Anthropic to reason about it?

A reply can be auto + premium (ship now, with deep thinking), auto + nemotron (ship now, with embedded reasoning), draft + premium (think hard, park for review), or block (refuse).