What it does
Cross-references your throughput target with the pricing of major hosted-inference providers and the all-in cost of running your own GPU instances. Self-hosting wins above some utilization threshold; hosted wins below it. This tool tells you where that crossover is for your workload.
Why it’s useful
“It’s cheaper to self-host” and “it’s cheaper to use a provider” are both true depending on parameters most teams haven’t pinned down. The cost calculator forces those parameters into the open so the decision rests on numbers instead of vibes.
How to use it
- Enter target tokens/sec or requests/sec.
- Pick the model class (7B / 13B / 70B / MoE).
- Toggle between on-demand, reserved, and spot pricing for self-hosting.
- Compare the rendered cost curves side by side.
Limitations
- Provider prices update fast — values are sourced quarterly.
- Doesn’t model fine-tuning, custom models, or BYOC arrangements.
- Cold-start costs and minimum instance commitments are noted but not exhaustively modeled.