Architecture Overview

Claudezilla is a four-component pipeline that gives Claude Code direct control over a Firefox browser.

Component Pipeline

┌─────────────────────┐
│  Firefox Extension  │  MV2 WebExtension
│  (background.js +   │
│   content.js)       │
└─────────┬───────────┘
          │ Native Messaging
          │ (stdin/stdout, 4-byte framing)
          ▼
┌─────────────────────┐
│   Native Host       │  Bridge process (Node.js)
│   (host/index.js)   │
└─────────┬───────────┘
          │ Unix socket (macOS/Linux)
          │ Named pipe (Windows)
          │ 32-byte auth token, 0600 perms
          ▼
┌─────────────────────┐
│   MCP Server        │  ~30 tool definitions (Node.js)
│   (mcp/server.js)   │
└─────────┬───────────┘
          │ stdio / MCP protocol (JSON-RPC)
          ▼
┌─────────────────────┐
│   Claude Code CLI   │  AI agent
└─────────────────────┘

Component Responsibilities

Firefox Extension (`extension/`)

The browser-side component, packaged as a Manifest V2 WebExtension.

background.js — Maintains a persistent connection to the native messaging host. Receives commands, dispatches them to content scripts or browser APIs, and returns results.
content.js — Injected into web pages to perform DOM operations: clicking elements, typing text, reading page content, capturing accessibility snapshots, and rendering the Claudezilla watermark.
popup/ — Toolbar popup showing connection status, tab pool usage, and focus loop state.

The extension enforces a shared 12-tab pool across all connected Claude agents. Tab ownership, screenshot serialization, and slot reservations are all tracked in the background script.

Native Messaging Host (`host/`)

A Node.js process launched by Firefox when the extension connects. It acts as the bridge between the browser and external tooling.

protocol.js — Implements Firefox’s native messaging wire format: 4-byte little-endian length prefix followed by UTF-8 JSON. Host-to-extension messages are capped at 1 MB; extension-to-host at 4 GB.
index.js — Main event loop. Reads commands from stdin (from the extension), and also runs a Unix domain socket server (or named pipe on Windows) for the MCP server to connect to. Routes commands between the two channels.
ipc.js — Platform abstraction for socket paths, auth tokens, and file permissions across macOS, Linux, and Windows.

Security features include a command whitelist, 10 MB buffer limit, socket authentication tokens, and 0600 file permissions.

MCP Server (`mcp/`)

A Model Context Protocol server that exposes browser automation as callable tools for Claude Code.

server.js — Registers ~30 MCP tools (firefox_navigate, firefox_click, firefox_screenshot, etc.) and translates tool calls into commands sent over the Unix socket to the native host.
Generates a unique 128-bit agent ID at startup for multi-agent tab ownership.
Handles graceful shutdown (goodbye command) to release tabs when a Claude session ends.
Manages per-operation timeouts (default 150 seconds, configurable 5-300 seconds per call).

Claude Code CLI

The AI agent that invokes MCP tools. Claude Code discovers the Claudezilla MCP server through its settings file (~/.claude/settings.json) and calls tools like firefox_navigate, firefox_get_content, and firefox_screenshot as part of its workflow.

Data Flow Example

A typical firefox_screenshot call flows through the system like this:

Claude Code calls the firefox_screenshot MCP tool with { tabId: 42 }.
MCP Server validates the request, attaches the agent ID, and sends { command: "screenshot", tabId: 42, agentId: "agent_a1b2..." } over the Unix socket.
Native Host receives the socket message, verifies the auth token, and forwards the command to Firefox via stdin (native messaging protocol).
Extension background.js receives the command, checks tab ownership, acquires the screenshot mutex, switches to the target tab, and tells the content script to prepare.
Extension content.js hides the watermark, waits for page readiness (network idle, visual idle, render settlement), then signals ready.
Extension background.js calls browser.tabs.captureVisibleTab(), compresses the image (JPEG 60%, 50% scale), and sends the base64 data URL back through the native messaging channel.
Native Host relays the response over the Unix socket.
MCP Server returns the result to Claude Code.

Security Boundaries

Boundary	Mechanism
Extension ↔ Host	Firefox native messaging (launched by browser, not user-accessible)
Host ↔ MCP Server	Unix socket with 32-byte auth token, 0600 permissions
MCP Server ↔ Claude Code	stdio (same user, same machine)
Command filtering	Whitelist of ~30 allowed commands in the host
Expression safety	Blocked patterns in `firefox_evaluate` (no `fetch`, `eval`, `document.cookie`)
URL validation	Scheme whitelist: `http:`, `https:`, `about:`, `file:`
Agent isolation	128-bit agent IDs, tab ownership enforcement, screenshot mutex