Skip to main content
Back to blog
Engineering Jun 12, 2026 9 min read

The FORG Index, Explained

We get asked about it every week. What does the number on the dashboard actually mean? Why is mine stuck at 54? And what should I do when it drops? Here's the full accounting.

A 0–100 score, but not a grade

The FORG Index is the single number FORG shows on the session detail page and the dashboard's hero card. It is a 0–100 score. The intuition: how much of your AI work is being kept productive vs. wasted. Higher is better, and the math is simple enough to fit on an index card.

The index is a ratio:

FORG Index = 100 * (
  measured_savings +
  protected_savings (capped)
) / (
  measured_savings +
  protected_savings (capped) +
  tokens_used
)

Read that as: the share of total token flow that FORG kept productive. A 70 means 70% of the tokens flowing through your session were either cached, compacted, or otherwise prevented from re-entering the model. A 30 means the model did 70% of the work fresh.

The five inputs

The numerator and denominator are both sums. Each input is a different kind of savings, and the canonical renderer (see computeSavings in site/lib/session-metrics.ts) treats them differently on purpose.

  1. Cache reuse tokens — tokens the model would have re-sent that came from the cache. Measured, in dollars. This is the largest contributor for most sessions.
  2. Compaction tokens — tokens saved by summarizing the conversation history. Measured, in dollars. Fires on every pre-compact signal.
  3. Cascade savings tokens — tokens FORG kept from re-entering context by detecting repeated goal drift. Estimated, never billed as dollars.
  4. Counterfactual overflow tokens — tokens that would have overflowed the model's context window if FORG had not pre-compacted. Estimated, range only.
  5. Avoided rework tokens — tokens saved by anchoring the goal and avoiding correction_loops. Estimated, capped at 20% of tokens used. The cap is important; it prevents runaway extrapolation.

Measured dollars (1 + 2) are the only ones ever shown as $X.XX. The rest are reported as token counts with a ±35% range. We will never bill a customer for an estimated dollar amount, and we will never show a savings $ that we cannot back up with a real token accounting delta.

What each band means

Through tens of thousands of sessions across the beta cohort, we have observed the following distribution. The bands are not a grade; they are a fingerprint.

Above 85 · "Excellent"

Tight feedback loop. Goals are declared, models are appropriate, rework is rare. You're operating near the FORG ceiling.

65–85 · "Good"

You're saving more than you waste. There is room to compound, but the workflow is healthy.

40–65 · "Fair"

The AI is doing most of the work fresh. Most users in this band can lift 10–15 points by declaring a goal at session start, or by switching the routine parts of the workflow to a smaller model.

Below 40 · "Needs attention"

Rework or oversize contexts are dominating. The recommendation panel on the session detail page will name the specific cause: peak context too high, correction loops firing, or the model class is a poor fit for the task.

Why we won't ship a single number for it

A common ask is "can you just give me one number that means FORG is good?" The honest answer is no, and here is why.

A single number would force us to collapse two fundamentally different things: the measured dollars (which are real and billable) and the estimated protection (which is a forecast, not a bill). The natural temptation is to weight them, but any weighting lies about what is going on. A 70 that is "70% measured dollars" is not the same kind of business as a 70 that is "70% measured + estimated protection mixed in equal parts." The first one is something you can put in front of a CFO; the second one is not.

So the index is published as a single number, but the dashboard always shows the decomposition alongside it: the measured savings in dollars, the protected savings as a token range, and a confidence label (Final, Calculated, Collecting) that tells you which inputs are real and which are inferred. The index is the summary; the breakdown is the truth.

How to read your own index

The most useful thing to do with the index is watch its shape over time, not its absolute value. The trend tells you whether the patterns you are building are compounding or degrading. A 60 that is climbing 3 points a week is healthier than a 75 that is flat.

The Progress view on the dashboard draws exactly that trend, bucketed by day, with second-half-vs-first-half delta and a best-day callout. If you want to push the index up, the recommendation panel on any session will point you at the highest-leverage fix: declare a goal, enable pre-compaction, or split an over-sized workflow into sub-tasks.

The index is not a vanity number. It is the proportion of your AI work that FORG kept productive, computed from the same token accounting that backs your bill. We will keep publishing it as long as it stays honest.