Skip to main content
Back to blog
Product May 20, 2025 8 min read

Introducing FORG v3: The AI Control Plane

After six months of rebuilding from the ground up, FORG v3 is here. Complete architecture rewrite, three new pillars, and a fundamentally different approach to AI observability at team scale.


When we shipped FORG v1.5.2 last September, we had a working product. Engineers could install the Claude Code adapter, signals would flow into the dashboard, and you could see token counts. It was useful. But it wasn't what we set out to build.

The problem with v1 was that it answered the question "how much are we spending?"without answering the more important ones: why are we spending that,who is spending it, what budgets should guide it going forward, andwhat can we learn from six months of usage data. Those are the questions engineering leaders actually have. v3 is built to answer them.

The Three Pillars

FORG v3 is organized around three pillars: Observe,Control, and Optimize. Each pillar has dedicated infrastructure, dedicated UI surface, and dedicated API primitives. They're designed to be used together, but each delivers standalone value.

Pillar 1: Observe

Observation in FORG v3 means capturing a complete, structured record of every LLM interaction your team makes through a connected adapter — across every supported tool, every model, every developer — without storing a single prompt or completion.

Each signal contains: timestamp, session ID, adapter ID (which tool), model, token counts (input/output/cached), latency (TTFT + total), cost in USD, and a set of dimensions you can add (user, project, team, environment). That's it. No payloads. No context. No way to accidentally exfiltrate sensitive code or data.

FORG v3 ingests signals at over 10,000/second per worker with sub-5ms median processing latency. The schema is stable and versioned. You can backfill from local adapter logs if you started collecting before connecting to the cloud.

Pillar 2: Control

The Control pillar is new in v3. It focuses on budgets, threshold alerts, anomaly alerts, and optional gateway hard-blocks when a budget is exceeded. Default telemetry mode stays out of the LLM request path; gateway mode is opt-in.

A budget looks like this at the configuration level:

budgets:
  - name: "monthly-dev-budget"
    scope: user
    limit_usd: 100.00
    period: monthly
    alert_at_percent: 80
    mode: alert

  - name: "team-daily-cap"
    scope: team
    limit_usd: 500.00
    period: daily
    alert_at_percent: 80
    notify: webhook
    gateway_hard_block: opt-in

Budgets are cumulative across daily or monthly windows. Threshold crossings, anomaly alerts, and optional gateway hard-blocks are written to the audit log with the scope, budget, and triggering usage metadata.

Pillar 3: Optimize

Optimize is the intelligence layer. It's two things: a cost intelligence dashboard that surfaces waste patterns, and Ask FORG — a deterministic assistant that answers questions about your usage data using keyword and intent matching ($0 LLM cost, no embeddings).

The cost intelligence dashboard surfaces patterns like: developer X is running 50 sessions/day that each use a 128k context but terminate after 3 turns (probably a misconfigured tool), team Y is paying 3x the market rate for a model family that underperforms on their task type, 40% of your spend happens in the last 2 hours of the business day.

Ask FORG lets you ask: "Which model has the best cost-per-task ratio for code review tasks in the last 30 days?" and get an answer grounded in your actual usage data.

Technical Architecture

FORG v3 is three independent services:

  • Signal Ingestion Worker — Cloudflare Worker atforg.pro/engine/*. Signal ingestion, classification, budget checks, Supabase profile read/write. No LLM in the real-time path.
  • License Worker — Cloudflare Worker atforg.pro/agent/*. License and identity on D1. Handles activation, verification, machine fingerprinting, release manifest.
  • Dashboard (site/) — Next.js 15 on Vercel. Marketing plus the authenticated dashboard. Supabase auth. Reads from both workers via API routes.

Data is split by design: D1 for license/identity data (License Worker), Supabase + pgvector for all behavioral data. These two stores never talk to each other directly. The agent binary holds a signed license token (format: lic_<20hex>) and session keys derived per-session using HKDF-SHA256.

The Agent: Signal Collection Only

The forg binary is a Go agent. It is a signal collector. Nothing more. Zero on-device intelligence. It hooks into your tools (Claude Code, Cursor, VS Code) via lightweight adapters, captures metadata from LLM call completions, and emits signals to FORG over HTTPS. The critical design constraint: the agent never touches prompt content. It reads token counts and timing from the tool's completion event, not from the HTTP body.

# Install and activate
curl -fsSL https://forg.pro/install | sh

# Activate with your license key
forg activate lic_a1b2c3d4e5f6a7b8c9d0

# Check status
forg status

# Tail live signals
forg tail --format=json

The signal payload that flows from your machine to FORG looks like this:

{
  "v": 3,
  "session_id": "sess_01hwxyzabc123",
  "adapter": "claude-code",
  "model": "claude-sonnet-4-5",
  "ts": 1716840000000,
  "tokens": {
    "input": 2847,
    "output": 412,
    "cache_read": 1200,
    "cache_write": 0
  },
  "cost_usd": 0.00892,
  "latency_ms": {
    "ttft": 312,
    "total": 1847
  },
  "dimensions": {
    "user": "alice@company.com",
    "project": "backend-api",
    "environment": "development"
  }
}

No prompt. No completion. No file paths. No tool calls. Just the metadata you need to understand and govern your AI usage.

What's in v3.0.0 GA

  • Complete Go agent rewrite — CGO-native keystore on macOS/Linux/Windows
  • Budgets and alerts — daily/monthly cost controls, anomaly alerts, and optional gateway hard-blocks
  • Signal ingestion API with versioned schema
  • Unified dashboard with Observe / Control / Optimize tabs
  • Ask FORG — deterministic usage queries ($0 LLM, no embeddings)
  • Claude Code, Cursor, VS Code, JetBrains adapters
  • Team + Enterprise plans with org hierarchy, SCIM, SAML SSO
  • Data residency: US and EU (Enterprise)
  • Audit log with cryptographic chain (tamper-evident)
  • Webhooks for budget and anomaly events

Roadmap

v3 is the foundation. Here's what we're building next:

  • Q3 2025 — Gateway mode for zero-latency enforcement, pre-call budget enforcement via sidecar proxy
  • Q3 2025 — Terraform provider for budgets
  • Q4 2025 — Anomaly detection on usage patterns
  • Q4 2025 — Cost forecasting with 30/60/90 day projections
  • Q1 2026 — Model recommendation engine (optimize for cost-per-task by category)

Getting Started

FORG v3 is available today on all plans. If you're on v1 or v2, the migration is straightforward — the adapter protocol is backwards-compatible, and we have an automated migration guide in the docs.

New to FORG? Start with the Solo plan — one user, full access to the Observe pillar, and pricing that scales cleanly into Team and Enterprise as your usage grows.

The goal was never to build a billing dashboard for AI. The goal was to build the control plane that gives engineering leaders the same visibility into their AI toolchain that they have into their cloud infrastructure. v3 is the first version that actually does that.

If you have questions, the best place is community. We're in there daily, alongside a growing community of engineers who are all dealing with the same problems you are.

Download FORG v3 and read the docs. We can't wait to see what you build.