# Witbitz Agent Manifest — author spec

This is the spec for the **agent manifest** (the registry record) third parties submit to run their own
AI agent on Witbitz. A manifest is a JSON object. The control plane validates it
(`control-plane/src/lib/agentRecord.mjs` — the source of truth), stores it in the registry, and on each
launch **bakes it into the agent container's environment** (`taskenv.mjs`) so the one shared image
(`Dockerfile`) runs as *your* agent: your name, persona, wake words, tile, and tools.

> One image, many agents. You don't ship code or a container — you ship a manifest. The image is generic;
> the manifest configures it per launch.

Related: [affordance-contract.md](affordance-contract.md) (how a tool result reaches the call),
[third-party-mcp-agents.md](third-party-mcp-agents.md) (the program, identity, metering, revenue share),
[agent-actions.md](agent-actions.md) (the in-call menu the agent publishes).

---

## How a manifest becomes a running agent

```
your manifest  ──POST /agents──▶  validateAgentRecord  ──▶  registry (DynamoDB)
                                                              │
launch (coupon) ──▶ /launch-agent ──▶ identityEnvFromRecord ──▶ task env ──▶ ECS Fargate
                                       + per-tool catalog        (WAKE_WORD,    runs the generic
                                                                  AGENT_PERSONA, image AS your agent
                                                                  AGENT_TOOLS…)
```

- **Identity** (name/persona/wake/tile…) → env the agent reads at boot.
- **Tools** (`declaredTools`) → `AGENT_TOOLS` JSON the agent offers the LLM as function-calling tools and
  routes through the metered `/tool` gateway (or, operator-only, spawns locally).

---

## Top-level fields

| Field | Type | Req | Rules / default |
|---|---|---|---|
| `agentId` | string | ✓ | lowercase slug `^[a-z0-9][a-z0-9-]{0,63}$`. Stable id; the launch key. |
| `author` | string | ✓ | Your keyless author identity (`a_<hash>` from your coupon). **Injected server-side** for self-serve — don't set it by hand. |
| `defaultLang` | string | – | 2-letter (`en` default). Your agent's own default language — shown **first** in the marketplace and the fallback for every localized field. See [Localization](#localization-one-manifest-many-languages). |
| `display.name` | string \| {lang:text} | ✓ | ≤80 chars. The tile/menu label. **Localizable** (see Localization). |
| `display.emoji` | string | – | ≤8 codepoints. The tile's avatar (when `runtime.tile`). |
| `display.tagline` | string \| {lang:text} | – | ≤280 chars. One-line description in the catalog. **Localizable**. |
| `disclosure` | string | – | ≤500. Shown to humans: what the agent does with their audio/data. |
| `media` | enum | – | `audio` (default) · `video` · `text`. |
| `status` | enum | – | `draft` (default) · `active` · `suspended`. Only `active` launches. |
| `requiredKeys` | string[] | – | BYOK env-var names the agent/tools need (e.g. `OPENAI_API_KEY`). Each must be a safe, non-reserved env name. |
| `caps` | object | – | `{ dailyUsd, perSessionUsd }` spend caps. |
| `declaredTools` | object[] | – | The agent's tools (see below). Absent → a chat/voice agent with no tools. |
| `runtime` | object | – | Behavior + wake/tile config (see below). |
| `settings` | object[] | – | Launch-time knobs the user fills; each value is templated into the persona via `{{key}}` (see below). |

`comped` and `sealedDefaultKeys` are **operator-only** (billing exemption / custodied default keys) and are
ignored from self-serve submissions.

---

## Localization (one manifest, many languages)

A manifest serves every language from **one record**. A **localized field** is either a plain string (your
agent's one language) **or** a per-language map of 2-letter codes to text:

```json
"display": { "name": { "en": "Expert", "he": "מומחה", "es": "Experto" } }
```

**Localizable fields:** `display.name`, `display.tagline`, `runtime.persona`, `runtime.greeting`,
`runtime.wakeWord`. (Per-tool `pending` notices are also a string-or-map.) Everything else —
`musicStyle`, `wakePrefix`, `disclosure`, ids, prices — is single-value.

- **`defaultLang`** (top-level, default `en`) is your agent's own language: it's shown **first** in the
  marketplace, and it's the fallback for any viewer/session language you didn't translate. A plain string is
  treated as your `defaultLang`.
- **At launch**, each localized field resolves to the **session language** (`field[sessionLang] ?? your
  default`) — so your agent's **in-call name, greeting and wake word follow the language of the call**.
- **In the catalog**, the headline is your `defaultLang`; the public `name`/`tagline` maps are returned so a
  localized marketplace can switch. (`persona`/`greeting`/`wakeWord` stay server-side — never published.)

### THE RULE — completeness (a manifest is rejected if it breaks it)

> If your manifest **declares** a language anywhere, that language must be defined in **every** localized
> field. Half-translated manifests are an **error**, not a silent fallback.

A manifest declares a language as the union of the languages used across its localized fields, plus
`defaultLang`. So if `display.name` has `{en, he}`, then `tagline`, `persona`, `greeting`, and `wakeWord`
(any you include) must **each** define both `en` and `he`. A single-language agent declares only its
`defaultLang` → trivially complete. (Validated by `enforceLocalized` in `agentRecord.mjs`.)

```jsonc
// ✗ REJECTED — name declares he+en, but greeting is missing "he"
{ "defaultLang": "en",
  "display":  { "name": { "en": "Expert", "he": "מומחה" } },
  "runtime":  { "greeting": { "en": "Hi there" } } }   // error: runtime.greeting missing "he"
```

---

## `runtime` — behavior, voice & presence

All optional; each absent → the image default. Sanitized (they become container env / prompt text).

| Field | Type | Rules | Becomes env |
|---|---|---|---|
| `persona` | string \| {lang:text} | ≤2000 | `AGENT_PERSONA` — the system prompt voice. **Localizable**. |
| `greeting` | string \| {lang:text} | ≤500 | `AGENT_GREETING` — spoken/chatted on join. **Localizable**. |
| `lang` | string | 2-letter (`he`,`en`,…) | `AGENT_LANG` — STT + menu language. (Not the same as `defaultLang`: this is the agent's *runtime* STT/menu language; the session usually overrides it.) |
| `speak` | bool | | `SPEAK` — TTS replies into the call. |
| `songMs` | int | 3000–600000 | `SONG_MS` — song length (song agents). |
| `wakeWord` | string \| {lang:text} | ≤40 | `WAKE_WORD` — the agent's name; "hey \<name>". **Localizable** (so "hey Expert" / "היי מומחה" each match in their language). |
| `wakePrefix` | string | ≤60, comma-list | `WAKE_PREFIX` — Siri-style prefix, e.g. `hey,היי,הי`. **With a prefix the bare word never triggers** (kills false triggers from passing mentions). |
| `wakeGeneric` | string | ≤40 | `WAKE_GENERIC` — a shared friendly wake (e.g. `חבר`/friend) every agent answers to **when it's the only agent**; auto-suppressed with ≥2 agents (then use the name), and the display name shrinks "friend · name" → "name". |
| `wakeRegex` | string | ≤200, must compile | `WAKE_REGEX` — a hand-tuned name matcher (the *core*, after the prefix), overriding the auto-derived one. Use when STT mis-transcribes your name (validated; an invalid regex is rejected at submit). |
| `tile` | bool | | `AGENT_TILE` — show a real tile with the emoji avatar (vs. an unseen kibitzer). |
| `smartTurn` | bool | | `SMART_TURN` — semantic endpointing (merge a mid-thought pause into one turn). |

**Wake-word reliability (important for voice agents):** the STT mis-transcribes distinctive/uncommon names
many ways (a Hebrew example: `פרשן`→פושן/פאשן/פונצ׳ן). Prefer a **common, clearly-articulated** word as the
name, always set a `wakePrefix`, and supply a `wakeRegex` for stubborn names. The generic `wakeGeneric`
("hey friend") is the most reliable trigger when your agent is solo.

---

## `declaredTools[]` — tools the agent can call

Each tool is offered to the agent's LLM as a function-calling tool (name + description + JSON-Schema), and
its result is delivered into the call via the **affordance contract**.

| Field | Type | Req | Rules |
|---|---|---|---|
| `name` | string | ✓ | lowercase, unique within the manifest. |
| `description` | string | – | ≤500. **The LLM reads this** to decide when to call — write it for the model. |
| `inputSchema` | object | – | A JSON-Schema object (size-bounded). The tool's arguments. |
| `transport` | enum | – | `http` (default) · `mcp` · `stdio`. `http`/`mcp` = the gateway calls your **URL**; `stdio` = the agent **spawns a local server** (operator-only — see security). |
| `endpoint` | string | ✓* | A **public https URL** (required for `http`/`mcp`). SSRF-guarded: no private/loopback/metadata hosts. |
| `keyName` | string | – | One of `requiredKeys` — the key the gateway presents upstream. |
| `priceCents` | int | ✓* | ≥0. Per-call price (required for `http`/`mcp`). |
| `byokPriceCents` | int | – | ≥0, default 0. Price when the caller brought their own key. |
| `timeoutMs` | int | – | 1000–28000. |
| `direct` | bool | – | `true` → not offered to the LLM (invoked directly by the agent), default false. |

### Affordance fields (how the result reaches the call)

The image owns a **closed surface vocabulary** — `chat`, `speak`, `audio`, `image`, `tile`, `screen`,
`file`, `map`, `widget` (`SURFACE_WIRED`; `menu` is declared but not yet wired). Your tool binds its output
to one. Surfaces are **medium-typed**: an image/file targets its own surface, never `chat` (which carries
text). See [affordance-contract.md](affordance-contract.md) for the full model.

| Field | Type | Rules |
|---|---|---|
| `result` | enum | `text` (default) · `audio` · `image` · `file` · `map` · `widget` · `none` — the modality your tool returns. |
| `surface` | enum/array | where it lands. Allowed per result: `text`→`chat`/`speak`, `audio`→`audio`/`file` (a clip that plays AND downloads), `image`→`image`/`tile`/`screen`, `file`→`file`, `map`→`map`, `widget`→`widget`, `none`→`menu`. |
| `resultPath` | string | dot-path into the tool's JSON result, e.g. `content.0.resource.blob` (`^[A-Za-z0-9_.]{1,128}$`). |
| `invoke` | enum | `brain` (the LLM decides via function-calling — default/recommended) · `intent` (deterministic wake-routed). |
| `intent` | slug | required for `invoke:intent` (`^[a-z][a-z0-9_-]{0,31}$`). |
| `caption` | string | ≤200. Chat caption shown alongside a non-text result. |
| `pending` | string \| {lang:text} | ≤300 each. A "one moment…" notice shown while the tool runs. |
| `perceive` | enum/array | `image` · `file` — an INPUT the loop SUPPLIES to your tool (the bytes a participant shared), so the model never handles base64. e.g. `read_pdf` declares `perceive:["file"]`. |
| `fileMime` / `fileName` | string | ≤100. For a `file` surface, the mime / name the posted file downloads with (the singer's audio → `audio/mpeg`, `song.mp3`). |
| `menu` | object | A menu action for this tool (see below) — puts it in the in-call menu AND the agent's capability list. |

#### `menu` — surface a tool as an in-call menu action

A tool that should appear in the agent's in-call menu (and in the capability list the LLM is told it can act
on) declares a `menu` block. So a painter shows **"Paint the conversation"** and a music agent **"Summarize
with a song"** — each agent's menu reflects *its own* tools, instead of a fixed list.

| Field | Type | Req | Rules |
|---|---|---|---|
| `id` | slug | – | `^[a-z][a-z0-9_-]{0,31}$`. The action id + its `/slash` command (e.g. `paint` → `/paint`). Defaults to the tool name. |
| `label` | string \| {lang:text} | ✓ | The menu button text. **Localizable**. |
| `request` | string | ✓ | ≤500. The prompt a tap / slash routes to the brain (which then calls the tool). Language-neutral. |
| `desc` | string \| {lang:text} | – | A one-line description under the label. **Localizable**. |
| `voice` | string \| {lang:text} | – | A short spoken phrase for the "say '\<wake> \<voice>'" hint. **Localizable**. |

### `stdio` tools — the bundled local MCP servers

`transport:stdio` lets the agent spawn a local MCP server **inside the container** (instead of the gateway
calling a URL). Because that runs a program on our box, the `command` must be one of a fixed, operator-baked,
**vetted allow-list** of bundled servers — any designer may declare those; an arbitrary command stays
**operator-only** (the RCE guard). See **[bundled-mcp-tools.md](./bundled-mcp-tools.md)** for the full catalog
(Wikipedia, DuckDuckGo, calculator, memory, weather, Brave/Tavily/Exa search, Firecrawl, Google Maps, …),
how to declare them, and the security model. Fields:

| Field | Rules |
|---|---|
| `name` | must equal the **MCP server's own tool name** (e.g. `search`, `search_wikipedia`, `get_weather`, `calculate`) — that's the name the agent calls over the wire. |
| `command` | a **bare** program name from the bundled allow-list (`^[A-Za-z0-9._-]{1,64}$`, no path/shell). Anything not baked + allow-listed is operator-only. |
| `args` | ≤32 strings, each ≤256 chars. |
| `env` | `{ NAME: value }` — **non-secret config only**. Names are gated by `isSafeKeyName` (no `AWS_*`/`LD_*`/`PATH`/`NODE_OPTIONS`/`PYTHON*`/`UV_*`/`WITBITZ_*`/`AGENT_*`…), ≤32 entries, values ≤512. |

For a **key-required** bundled server (Brave/Tavily/Exa/Firecrawl/Google Maps), add its key (e.g.
`BRAVE_API_KEY`) to your `requiredKeys` — the server reads it from the agent's inherited env (cloud creds +
session secrets are stripped from the child). Keyless servers (Wikipedia, DuckDuckGo, calculator, memory,
weather) need nothing. stdio tools take no `endpoint`/`price` — they don't traverse the gateway.

---

## `settings[]` — launch-time knobs that shape the persona

Let the **person launching** your agent customize it, without you writing code. Each setting is a typed
input shown on the launch form; its value is **substituted into your persona** wherever you write `{{key}}`.
This is the one lever a manifest-only agent has over its own behavior (the shared image reads a fixed env
set, so a custom setting feeds the **system prompt**, not an arbitrary env).

| Field | Type | Req | Rules |
|---|---|---|---|
| `key` | string | ✓ | lowercase slug `^[a-z][a-z0-9_]{0,31}$`. Used as the `{{key}}` placeholder in `runtime.persona`. Unique. |
| `type` | enum | – | `text` (default) · `textarea` · `number` · `select` · `color` · `emoji`. |
| `label` | string | – | ≤80. The field label on the form. |
| `default` | string\|number | – | Used when the user doesn't set it. |
| `options` | (string\|{value,label})[] | ✓* | Required for `select` — the choices (≤32). |
| `help` | string | – | ≤200. A hint under the field. |
| `maxLength` | int | – | 1–2000, for `text`/`textarea`. |

At launch each `{{key}}` is replaced by the user's value → else the setting's `default` → else empty. A
`{{x}}` that isn't a declared setting is left as literal text. (At most 16 settings.)

```json
{
  "agentId": "tutor",
  "runtime": { "persona": "You are a patient tutor for {{subject}} at a {{level}} level. Keep it simple." },
  "settings": [
    { "key": "subject", "type": "text",   "label": "Subject", "default": "math" },
    { "key": "level",   "type": "select", "label": "Level",   "default": "beginner",
      "options": ["beginner", "intermediate", "advanced"] }
  ]
}
```

A launcher who picks *subject = physics, level = advanced* gets the system prompt *"You are a patient tutor
for physics at an advanced level. Keep it simple."*

---

## Lifecycle

1. **Register** — `POST /agents` with your manifest (auth = your coupon-derived author identity). It lands
   as `draft`.
2. **Activate** — flip `status:'active'` (subject to review for the catalog). Only `active` agents launch.
3. **Launch** — a human funds a session with a coupon → `/launch-agent` bakes your manifest into a Fargate
   task. New launches pick up manifest changes immediately (no image rebuild — the image is generic).
4. **Kill switch** — `status:'suspended'` blocks all launches instantly.

See [third-party-mcp-agents.md](third-party-mcp-agents.md) for identity, metering, and revenue share.

---

## Complete example (a voice agent with a metered HTTP tool)

```json
{
  "agentId": "acme-weather",
  "display": { "name": "Meteo", "emoji": "🌦️", "tagline": "Live forecasts, right in your call." },
  "media": "audio",
  "disclosure": "An AI agent from witbitz.chat is on this call. It listens and transcribes; audio goes to OpenAI & ElevenLabs.",
  "requiredKeys": ["OPENAI_API_KEY", "ELEVENLABS_API_KEY", "WEATHER_API_KEY"],
  "declaredTools": [
    {
      "name": "forecast",
      "description": "Get the weather forecast for a place. Call when someone asks about weather.",
      "inputSchema": {
        "type": "object",
        "properties": { "place": { "type": "string" }, "when": { "type": "string" } },
        "required": ["place"]
      },
      "transport": "http",
      "endpoint": "https://acme.example/mcp/forecast",
      "keyName": "WEATHER_API_KEY",
      "priceCents": 2,
      "byokPriceCents": 0,
      "result": "text",
      "surface": "speak",
      "invoke": "brain",
      "pending": { "en": "Checking the forecast…", "he": "בודק את התחזית…" }
    }
  ],
  "runtime": {
    "persona": "You are Meteo, a concise, upbeat weather companion on a live call. Speak only when addressed; give one clear, useful forecast.",
    "greeting": "Hi, I'm Meteo — say 'hey Meteo' or 'hey friend' for a forecast.",
    "lang": "en",
    "speak": true,
    "wakeWord": "Meteo",
    "wakePrefix": "hey",
    "wakeGeneric": "friend",
    "tile": true,
    "smartTurn": true
  }
}
```

This runs the generic image as **Meteo 🌦️**: a tiled, English voice agent that wakes on "hey Meteo" / "hey
friend", endpoints with smart-turn, and offers the LLM a metered `forecast` tool whose spoken answer is sung
into the call.