o10Last updated 2026-06-09

Owned / Open-Weight + o10

Self-hosted open-weight models integrate as a venue in o10's routing graph with eval-gated selection alongside cloud APIs.

Spread observed
638×
Routing modes
shadow → enforce
Framework
KYI

"Cheaper tokens miss the point. Up to 90% of an AI system's operational life is inference — where value, reliability, and risk are decided."

— Shen Pandi, Know Your Inference
Dashboards observe.
o10 enforces.

Cost dashboards tell you what you spent. o10 sits in the request path and changes what you spend — shadow first, then enforce.

SummaryKey takeaways

What you need to know

Short, self-contained answers with cited stats — read the sections below for full context.

How does o10 integrate with Self-hosted?

Self-hosted open-weight models integrate as a venue in o10's routing graph with eval-gated selection alongside cloud APIs.

o10's State of Inference Spend 2026 found up to 638× compliant price spread across venues for identical workloads.

Where does o10 sit relative to Self-hosted?

Self-hosted solves access and connectivity in the inference stack. o10 is the control plane above — enforcing eval-gated routing, shadow proof, spend envelopes, and KYI governance without replacing the developer-facing gateway or cloud API.

What is the rollout pattern with Self-hosted?

Connect Self-hosted in shadow mode first. Mirror a week of traffic, segment by use case, prove compliant savings versus your baseline, then flip enforce mode — typically within two to three weeks without a six-week migration project.

01Deep dive

Stack position: Self-hosted + o10

Self-hosted open-weight models integrate as a venue in o10's routing graph with eval-gated selection alongside cloud APIs.

Enterprises already use Self-hosted for unified API access or cloud inference. Finance still receives blended invoices; platform teams lack a single control point when prompts, models, or retries change.

o10 routes every call to the cheapest model that clears the use-case quality floor — across Self-hosted and other venues — with an immutable per-call ledger.

  • Self-hosted: access and connectivity
  • o10: policy, evals, routing, ledger
  • Shadow mode before enforce
  • KYI scores the supply chain above routing
02Deep dive

Deployment pattern

Typical enterprise rollout starts in shadow, proves per use case, then enforces.

Week one: connect venue, mirror traffic, run eval suites per workload.

Week two: verified savings per use case; CFO sign-off on envelopes.

Week three: enforce mode; KYI composite live for board reporting.

Venue price tiers ($/1M tokens)
TierGatewayCommitted
Mini-class$2.40$1.85
Sonnet-class$9.40$7.20
Frontier$31.90$24.50
How-toOperational steps

Rolling out o10 with Self-hosted

  1. 01

    Connect venue

    Wire Self-hosted into o10 without changing application code paths initially.

  2. 02

    Shadow mirror

    Replay traffic; quantify compliant savings per use case.

  3. 03

    Prove eval equivalence

    Cheaper candidates must clear quality floor on your samples.

  4. 04

    Enforce routes

    Hold budget envelopes; KYI and ledger stay continuous.

SourceMethodology

Integration guide for Owned / Open-Weight + o10. o10 State of Inference Spend 2026. Shen Pandi, KYI framework.

FAQFrequently asked questions

Common questions

Does o10 work with Self-hosted?

Self-hosted open-weight models integrate as a venue in o10's routing graph with eval-gated selection alongside cloud APIs. o10 integrates above Self-hosted, not as a replacement — preserving developer ergonomics while adding eval-gated routing, shadow-mode savings proof, spend enforcement, and KYI governance. Traffic flows through the control plane; o10 selects the cheapest compliant model across Self-hosted and other connected venues per use-case policy.

Does o10 replace the gateway?

No. o10 does not replace your AI gateway or developer-facing APIs. It sits above gateways and clouds, adding spend enforcement, eval-gated routing, policy, and CFO-grade ledger — not proxy compatibility. Teams keep Vercel AI Gateway, OpenRouter, or LiteLLM for access; o10 changes which model and venue serve each request based on cost, eval floor, and governance rules. The split is intentional: gateways provide doors; control planes enforce economics.

What is shadow mode?

Shadow mode mirrors live inference traffic through o10 without changing production routes. For every request, o10 evaluates candidate models against your per-use-case quality floors and records which route would have been cheapest and compliant — along with the cost delta — while the original provider still serves the response. Engineering sees proof without production risk; finance gets a verified savings figure tied to your traffic, not industry averages. Most teams run shadow for 7–14 days segmented by use case (support, RAG, code, batch) before flipping enforce mode.

What venues are supported together?

o10 unifies routing policy and ledger across Vercel AI Gateway (per-token API), OpenRouter (multi-provider aggregator), Amazon Bedrock (per-token and committed capacity), and owned or open-weight infrastructure. A single control plane sits above all venues — you do not need separate dashboards per provider. o10 selects the cheapest compliant supply per call while honoring data residency, zero-retention, and model approval rules. Committed Bedrock drawdown and open-weight routing are first-class venues, not afterthoughts.

How fast to integrate?

Most stacks connect o10 in shadow mode within a day: point traffic through the control plane, segment by use case, and start the verified savings clock. Enforce mode follows after per-use-case eval equivalence is proven — typically one to two weeks for enterprises with multiple workloads. No six-week gateway migration is required; o10 sits above existing gateways and clouds. KYI scoring and the immutable ledger stay live from day one in shadow.

What is KYI?

Know Your Inference (KYI) is a governance framework by Shen Pandi that scores inference systems across five weighted pillars: Performance (25%), Economics (25%), Integration (20%), Strategy (20%), and Risk (10%). Each pillar scores 0–100; the composite rolls into a confidence level and board-signable recommendation. KYI runs continuously in the o10 control plane — not as a one-off audit — so every routed call and eval updates the score. A composite floor of 65 triggers enforcement levers: cap, rightsizing, or sunset per policy.

Where is pricing documented?

o10 pricing combines a governance fee for the control plane (evals, KYI, policy, ledger) with gainshare on verified shadow savings — you pay a share of savings only when they are proven against your baseline, not estimated. Shadow mode is the audit that establishes the baseline. This aligns incentives: o10 wins when compliant routing reduces your fully loaded inference cost per use case.

What research backs benchmarks?

Benchmarks and spread methodology are documented in the State of Inference Spend 2026 report at o10.io/research/state-of-inference-spend-2026, including venue price tables, workload savings models, and the 638× compliant spread calculation. The KYI framework whitepaper at o10.io/research/kyi-whitepaper provides the governance methodology cited across glossary and hub content. Both are primary sources designed for search snippets and AI answer engine citation.

What is enforce mode?

Enforce mode places o10 in the request path. On every call, o10 selects the cheapest model and venue that clears your eval-defined quality floor, holds the budget envelope, and applies residency and retention policy before the request reaches the provider. Failed eval candidates are never routed. Each enforced call writes an immutable ledger entry: model, venue, policy, jurisdiction, and fully loaded cost. Enforce without shadow proof is possible but discouraged — shadow establishes trust with engineering and finance first.

How are savings verified?

Savings are verified against your own shadow baseline per use case — not industry averages or vendor marketing claims. o10 mirrors a week or more of production traffic, segments by workload, and compares what you actually spent versus what you would have spent on the cheapest eval-passing route at the same quality floor. Finance signs off on the delta before enforce mode flips. Gainshare pricing ties o10 fees to this verified number, so savings must be real and auditable.

o10Set the envelope. o10 holds it.

See what you're overpaying.

Paste a week of traffic. Get the number that books the audit.

See what you're overpaying
verified savings methodology · State of Inference Spend 2026