Architecture Overview
Claudezilla is a four-component pipeline that gives Claude Code direct control over a Firefox browser. ## Component Pipeline ``` ┌─────────────────────┐ │ Firefox Extension │ MV2 WebExtension │ (background.js + │ │ content.js) │ └─────────┬───────────┘ │ Native Messaging │ (stdin/stdout, 4-byte framing) ▼ ┌─────────────────────┐ │ Native Host │ Bridge process (Node.js) │ (host/index.js) │ └─────────┬───────────┘ │ Unix socket (macOS/Linux) │ Named pipe (Windows) │ 32-byte auth token, 0600 perms ▼ ┌─────────────────────┐ │ MCP Server │ ~30 tool definitions (Node.js) │ (mcp/server.js) │ └─────────┬───────────┘ │ stdio / MCP protocol (JSON-RPC) ▼ ┌─────────────────────┐ │ Claude Code CLI │ AI agent └─────────────────────┘ ``` ## Component Responsibilities ### Firefox Extension (`extension/`) The browser-side component, packaged as a Manifest V2 WebExtension. - **background.js** — Maintains a persistent connection to the native messaging host. Receives commands, dispatches them to content scripts or browser APIs, and returns results. - **content.js** — Injected into web pages to perform DOM operations: clicking elements, typing text, reading page content, capturing accessibility snapshots, and rendering the Claudezilla watermark. - **popup/** — Toolbar popup showing connection status, tab pool usage, and focus loop state. The extension enforces a shared **12-tab pool** across all connected Claude agents. Tab ownership, screenshot serialization, and slot reservations are all tracked in the background script. ### Native Messaging Host (`host/`) A Node.js process launched by Firefox when the extension connects. It acts as the bridge between the browser and external tooling. - **protocol.js** — Implements Firefox's native messaging wire format: 4-byte little-endian length prefix followed by UTF-8 JSON. Host-to-extension messages are capped at 1 MB; extension-to-host at 4 GB. - **index.js** — Main event loop. Reads commands from stdin (from the extension), and also runs a Unix domain socket server (or named pipe on Windows) for the MCP server to connect to. Routes commands between the two channels. - **ipc.js** — Platform abstraction for socket paths, auth tokens, and file permissions across macOS, Linux, and Windows. Security features include a command whitelist, 10 MB buffer limit, socket authentication tokens, and 0600 file permissions. ### MCP Server (`mcp/`) A Model Context Protocol server that exposes browser automation as callable tools for Claude Code. - **server.js** — Registers ~30 MCP tools (`firefox_navigate`, `firefox_click`, `firefox_screenshot`, etc.) and translates tool calls into commands sent over the Unix socket to the native host. - Generates a unique **128-bit agent ID** at startup for multi-agent tab ownership. - Handles graceful shutdown (`goodbye` command) to release tabs when a Claude session ends. - Manages per-operation timeouts (default 150 seconds, configurable 5-300 seconds per call). ### Claude Code CLI The AI agent that invokes MCP tools. Claude Code discovers the Claudezilla MCP server through its settings file (`~/.claude/settings.json`) and calls tools like `firefox_navigate`, `firefox_get_content`, and `firefox_screenshot` as part of its workflow. ## Data Flow Example A typical `firefox_screenshot` call flows through the system like this: 1. **Claude Code** calls the `firefox_screenshot` MCP tool with `{ tabId: 42 }`. 2. **MCP Server** validates the request, attaches the agent ID, and sends `{ command: "screenshot", tabId: 42, agentId: "agent_a1b2..." }` over the Unix socket. 3. **Native Host** receives the socket message, verifies the auth token, and forwards the command to Firefox via stdin (native messaging protocol). 4. **Extension background.js** receives the command, checks tab ownership, acquires the screenshot mutex, switches to the target tab, and tells the content script to prepare. 5. **Extension content.js** hides the watermark, waits for page readiness (network idle, visual idle, render settlement), then signals ready. 6. **Extension background.js** calls `browser.tabs.captureVisibleTab()`, compresses the image (JPEG 60%, 50% scale), and sends the base64 data URL back through the native messaging channel. 7. **Native Host** relays the response over the Unix socket. 8. **MCP Server** returns the result to Claude Code. ## Security Boundaries | Boundary | Mechanism | |----------|-----------| | Extension ↔ Host | Firefox native messaging (launched by browser, not user-accessible) | | Host ↔ MCP Server | Unix socket with 32-byte auth token, 0600 permissions | | MCP Server ↔ Claude Code | stdio (same user, same machine) | | Command filtering | Whitelist of ~30 allowed commands in the host | | Expression safety | Blocked patterns in `firefox_evaluate` (no `fetch`, `eval`, `document.cookie`) | | URL validation | Scheme whitelist: `http:`, `https:`, `about:`, `file:` | | Agent isolation | 128-bit agent IDs, tab ownership enforcement, screenshot mutex |Claudezilla is a four-component pipeline that gives Claude Code direct control over a Firefox browser.
Component Pipeline
Section titled “Component Pipeline”┌─────────────────────┐│ Firefox Extension │ MV2 WebExtension│ (background.js + ││ content.js) │└─────────┬───────────┘ │ Native Messaging │ (stdin/stdout, 4-byte framing) ▼┌─────────────────────┐│ Native Host │ Bridge process (Node.js)│ (host/index.js) │└─────────┬───────────┘ │ Unix socket (macOS/Linux) │ Named pipe (Windows) │ 32-byte auth token, 0600 perms ▼┌─────────────────────┐│ MCP Server │ ~30 tool definitions (Node.js)│ (mcp/server.js) │└─────────┬───────────┘ │ stdio / MCP protocol (JSON-RPC) ▼┌─────────────────────┐│ Claude Code CLI │ AI agent└─────────────────────┘Component Responsibilities
Section titled “Component Responsibilities”Firefox Extension (extension/)
Section titled “Firefox Extension (extension/)”The browser-side component, packaged as a Manifest V2 WebExtension.
- background.js — Maintains a persistent connection to the native messaging host. Receives commands, dispatches them to content scripts or browser APIs, and returns results.
- content.js — Injected into web pages to perform DOM operations: clicking elements, typing text, reading page content, capturing accessibility snapshots, and rendering the Claudezilla watermark.
- popup/ — Toolbar popup showing connection status, tab pool usage, and focus loop state.
The extension enforces a shared 12-tab pool across all connected Claude agents. Tab ownership, screenshot serialization, and slot reservations are all tracked in the background script.
Native Messaging Host (host/)
Section titled “Native Messaging Host (host/)”A Node.js process launched by Firefox when the extension connects. It acts as the bridge between the browser and external tooling.
- protocol.js — Implements Firefox’s native messaging wire format: 4-byte little-endian length prefix followed by UTF-8 JSON. Host-to-extension messages are capped at 1 MB; extension-to-host at 4 GB.
- index.js — Main event loop. Reads commands from stdin (from the extension), and also runs a Unix domain socket server (or named pipe on Windows) for the MCP server to connect to. Routes commands between the two channels.
- ipc.js — Platform abstraction for socket paths, auth tokens, and file permissions across macOS, Linux, and Windows.
Security features include a command whitelist, 10 MB buffer limit, socket authentication tokens, and 0600 file permissions.
MCP Server (mcp/)
Section titled “MCP Server (mcp/)”A Model Context Protocol server that exposes browser automation as callable tools for Claude Code.
- server.js — Registers ~30 MCP tools (
firefox_navigate,firefox_click,firefox_screenshot, etc.) and translates tool calls into commands sent over the Unix socket to the native host. - Generates a unique 128-bit agent ID at startup for multi-agent tab ownership.
- Handles graceful shutdown (
goodbyecommand) to release tabs when a Claude session ends. - Manages per-operation timeouts (default 150 seconds, configurable 5-300 seconds per call).
Claude Code CLI
Section titled “Claude Code CLI”The AI agent that invokes MCP tools. Claude Code discovers the Claudezilla MCP server through its settings file (~/.claude/settings.json) and calls tools like firefox_navigate, firefox_get_content, and firefox_screenshot as part of its workflow.
Data Flow Example
Section titled “Data Flow Example”A typical firefox_screenshot call flows through the system like this:
- Claude Code calls the
firefox_screenshotMCP tool with{ tabId: 42 }. - MCP Server validates the request, attaches the agent ID, and sends
{ command: "screenshot", tabId: 42, agentId: "agent_a1b2..." }over the Unix socket. - Native Host receives the socket message, verifies the auth token, and forwards the command to Firefox via stdin (native messaging protocol).
- Extension background.js receives the command, checks tab ownership, acquires the screenshot mutex, switches to the target tab, and tells the content script to prepare.
- Extension content.js hides the watermark, waits for page readiness (network idle, visual idle, render settlement), then signals ready.
- Extension background.js calls
browser.tabs.captureVisibleTab(), compresses the image (JPEG 60%, 50% scale), and sends the base64 data URL back through the native messaging channel. - Native Host relays the response over the Unix socket.
- MCP Server returns the result to Claude Code.
Security Boundaries
Section titled “Security Boundaries”| Boundary | Mechanism |
|---|---|
| Extension ↔ Host | Firefox native messaging (launched by browser, not user-accessible) |
| Host ↔ MCP Server | Unix socket with 32-byte auth token, 0600 permissions |
| MCP Server ↔ Claude Code | stdio (same user, same machine) |
| Command filtering | Whitelist of ~30 allowed commands in the host |
| Expression safety | Blocked patterns in firefox_evaluate (no fetch, eval, document.cookie) |
| URL validation | Scheme whitelist: http:, https:, about:, file: |
| Agent isolation | 128-bit agent IDs, tab ownership enforcement, screenshot mutex |