API-Based Cost
GPT-4.1 · OpenAI
—
per 1,000 queries
Oumi-Hosted Cost
Qwen3.5-4B · Alibaba
—
per 1,000 queries
Savings
Self-hosted vs API
—
cheaper with self-hosted
Assumptions:
Short context window · FP16/BF16 precision · Batch size 1 · No continuous batching ·
Caching effects ignored · Web & storage costs negligible · Engineering & training costs negligible ·
GPU cost can be prorated by utilization