Methodology

Show me my number.

Plug in your actual setup — hardware, OS, subscriptions, usage. Mooter projects your tier mix and monthly cost against the all-Opus baseline, computed live from the same per-tier costs as the benchmark below. Every routing decision is logged so you can verify it yourself.

Step 1 · Hardware

No discrete GPUMacBook Air, basic laptop, Chromebook8 GB GPURTX 3060/4060, M1 Pro 16GB16 GB GPURTX 4070, M2/M3 Pro 32GB24+ GB GPURTX 4090, RTX 5090, M2/M3/M4 Max

Step 2 · Operating system

Step 3 · Subscriptions

Step 4 · Usage patternPrompts per day 80% critical (T3) 8%

Your monthly projection

without mooter

$100.80

all-Opus on every prompt

with mooter

$7.49

mooter-routed

saved

$93.31

93%

Predicted tier distribution

T0local · Ollama

62%

T1haiku · gpt-4o-mini

18%

T2sonnet · gpt-4o

14%

T3opus · o1-pro

Your stack supports

All local models (24+ GB GPU)

Sonnet & Opus (Anthropic)

GPT-4o (OpenAI)

Gemini (Google)

A concrete case

Solo founder, Claude Code Max plan, RTX 4090, ~80 prompts/day, ~8% critical. Most of the day is renames, commits, small edits and “explain this” — those route to T0 local (free). The ~8% that's real debugging or a cross-file refactor goes to Sonnet/Opus. Set those inputs above to see this profile's monthly figure — it's computed from the same per-tier costs as the N=34 benchmark below, not a marketing number. Your mix (and your savings) shift with how much you keep local.

Benchmark proof

Cost per prompt on a 34-prompt blind-judged validation set.

Mooter

$0.022

Sonnet-only

$0.028

Opus-only

$0.034

Reproduce it yourself

The benchmark is pre-registered and open. 34 prompts × 3 arms (mooter, Sonnet-only, Opus-only) with a blind LLM judge — design, prompts, raw rows and per-pack diagnostics all live in the repo. Clone it and run the harness:

git clone https://github.com/pauloloureiroshp-ship-it/mooter
cd mooter/packages/router/scripts/wave1-benchmark
tsx run.ts            # full run: 34 × 3 arms → outputs/

See wave1-benchmark/README.md + BENCHMARK_DESIGN.md for the full method, confidence intervals and mis-routing analysis. Pinned to mooter v1.39.0.

N=34 is a small set — only medium-to-large effects are detectable. On this cloud-only set mooter matches the quality bar at lower cost per prompt; the larger savings come from routing simple work to free local T0, which this set doesn't isolate.