Inference Cost Calculator

API Frontier LLM

Model name

Input token price

$ per 1M tokens

Output token price

$ per 1M tokens

Oumi-Hosted Custom SLM

Model name

H100 GPU cost

$ per GPU hour

Throughput

samples (queries) per second

Input : Output ratio

—

derived from task token counts below

Task Query Profile

Avg input tokens per query

tokens

Avg output tokens per query

tokens

GPU utilization

fraction of each GPU hour in active use (0 – 1)

API-Based Cost

GPT-4.1 · OpenAI

—

per 1,000 queries

Oumi-Hosted Cost

Qwen3.5-4B · Alibaba

—

per 1,000 queries

Savings

Self-hosted vs API

—

cheaper with self-hosted

Assumptions: Short context window · FP16/BF16 precision · Batch size 1 · No continuous batching · Caching effects ignored · Web & storage costs negligible · Engineering & training costs negligible · GPU cost can be prorated by utilization

Solutions Calculator