Inference Cost Calculator — Playground

What it does

Cross-references your throughput target with the pricing of major hosted-inference providers and the all-in cost of running your own GPU instances. Self-hosting wins above some utilization threshold; hosted wins below it. This tool tells you where that crossover is for your workload.

Why it’s useful

“It’s cheaper to self-host” and “it’s cheaper to use a provider” are both true depending on parameters most teams haven’t pinned down. The cost calculator forces those parameters into the open so the decision rests on numbers instead of vibes.

How to use it

Enter target tokens/sec or requests/sec.
Pick the model class (7B / 13B / 70B / MoE).
Toggle between on-demand, reserved, and spot pricing for self-hosting.
Compare the rendered cost curves side by side.

Limitations

Provider prices update fast — values are sourced quarterly.
Doesn’t model fine-tuning, custom models, or BYOC arrangements.
Cold-start costs and minimum instance commitments are noted but not exhaustively modeled.