<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>o10 — inference spend &amp; AI supply chain</title>
<link>https://o10.io</link>
<description>o10 routes every AI inference call to the cheapest model that clears your quality floor — across Vercel AI Gateway, OpenRouter, Amazon Bedrock, and owned capacity. Shadow mode, evals, and KYI governance.</description>
<language>en-us</language>
<lastBuildDate>Tue, 09 Jun 2026 00:00:00 GMT</lastBuildDate>
<atom:link href="https://o10.io/rss.xml" rel="self" type="application/rss+xml"/>
<item>
<title>Inference spend trends in 2026</title>
<link>https://o10.io/blog/inference-spend-2026-trends</link>
<guid isPermaLink="true">https://o10.io/blog/inference-spend-2026-trends</guid>
<description>Enterprise inference spend is shifting from frontier defaults to eval-gated routing across gateways, Bedrock committed capacity, and open-weight.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>When OpenAI prices change, routing matters</title>
<link>https://o10.io/blog/openai-price-change-routing</link>
<guid isPermaLink="true">https://o10.io/blog/openai-price-change-routing</guid>
<description>Per-token list price changes are only half the story — venue mix and model tier selection drive fully loaded cost.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Why shadow mode is non-negotiable</title>
<link>https://o10.io/blog/shadow-mode-savings-proof</link>
<guid isPermaLink="true">https://o10.io/blog/shadow-mode-savings-proof</guid>
<description>CFOs will not flip enforce mode without a verified baseline — shadow mirrors traffic without changing production.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Drawing down Bedrock commitments</title>
<link>https://o10.io/blog/bedrock-commitment-drawdown</link>
<guid isPermaLink="true">https://o10.io/blog/bedrock-commitment-drawdown</guid>
<description>Reserved AWS AI capacity lowers marginal cost — route compliant steady workloads through committed tiers.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>RAG token explosion and what to do</title>
<link>https://o10.io/blog/rag-token-explosion</link>
<guid isPermaLink="true">https://o10.io/blog/rag-token-explosion</guid>
<description>Retrieval plus generation multiplies tokens; eval-gated mini-class routing is often the largest absolute savings lever.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Agent inference cost compounding</title>
<link>https://o10.io/blog/agent-cost-compounding</link>
<guid isPermaLink="true">https://o10.io/blog/agent-cost-compounding</guid>
<description>Multi-step agents multiply spend; per-step routing prevents frontier defaults on every hop.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>KYI for board reporting</title>
<link>https://o10.io/blog/kyi-board-reporting</link>
<guid isPermaLink="true">https://o10.io/blog/kyi-board-reporting</guid>
<description>Boards need recommendation and risk — not token totals. KYI composite scores five pillars continuously.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Eval drift in production</title>
<link>https://o10.io/blog/eval-drift-production</link>
<guid isPermaLink="true">https://o10.io/blog/eval-drift-production</guid>
<description>Models drift; weekly eval replay on production samples keeps quality floors honest.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Gateway sprawl and the control plane</title>
<link>https://o10.io/blog/gateway-sprawl</link>
<guid isPermaLink="true">https://o10.io/blog/gateway-sprawl</guid>
<description>Multiple gateways without unified policy fragment spend — one control plane above all venues.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Open-weight in production 2026</title>
<link>https://o10.io/blog/open-weight-production-2026</link>
<guid isPermaLink="true">https://o10.io/blog/open-weight-production-2026</guid>
<description>8B-class open-weight on committed infra clears many workloads at $0.05/1M when evals permit.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Four CFO questions that stick</title>
<link>https://o10.io/blog/cfo-inference-questions</link>
<guid isPermaLink="true">https://o10.io/blog/cfo-inference-questions</guid>
<description>Fully loaded cost, cost per outcome, failing unit economics, forecast drivers — each with a lever.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>UK inference residency in practice</title>
<link>https://o10.io/blog/uk-inference-residency</link>
<guid isPermaLink="true">https://o10.io/blog/uk-inference-residency</guid>
<description>Policy PDFs do not route traffic — per-call jurisdiction enforcement does.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>One ledger across multi-cloud inference</title>
<link>https://o10.io/blog/multi-cloud-inference-ledger</link>
<guid isPermaLink="true">https://o10.io/blog/multi-cloud-inference-ledger</guid>
<description>Immutable per-call records across AWS, gateways, and self-hosted — finance-grade attribution.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Quality floors without evals are hopes</title>
<link>https://o10.io/blog/quality-floor-without-evals</link>
<guid isPermaLink="true">https://o10.io/blog/quality-floor-without-evals</guid>
<description>Define the floor from replayed production samples — not vendor marketing tiers.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>The 638× spread explained</title>
<link>https://o10.io/blog/638x-spread-explained</link>
<guid isPermaLink="true">https://o10.io/blog/638x-spread-explained</guid>
<description>Same workload, same eval floor, different venues — compliant price spread drives routing economics.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>FinOps reporting vs enforcement</title>
<link>https://o10.io/blog/finops-vs-enforcement</link>
<guid isPermaLink="true">https://o10.io/blog/finops-vs-enforcement</guid>
<description>Reporting last month versus changing next request — different layers, both needed, only one controls spend.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Support bot routing economics</title>
<link>https://o10.io/blog/support-bot-routing</link>
<guid isPermaLink="true">https://o10.io/blog/support-bot-routing</guid>
<description>High volume + strict QA floor still clears on mini tiers for many enterprises.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Code copilot eval gates</title>
<link>https://o10.io/blog/code-copilot-eval-gates</link>
<guid isPermaLink="true">https://o10.io/blog/code-copilot-eval-gates</guid>
<description>Correctness suites often clear below frontier — prove on your repos before paying frontier prices.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Forecast inference from business drivers</title>
<link>https://o10.io/blog/inference-forecasting-drivers</link>
<guid isPermaLink="true">https://o10.io/blog/inference-forecasting-drivers</guid>
<description>Users, tickets, documents — not straight-line token growth.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>llms.txt and GEO hygiene</title>
<link>https://o10.io/blog/llms-txt-for-geo</link>
<guid isPermaLink="true">https://o10.io/blog/llms-txt-for-geo</guid>
<description>Machine-readable site summaries orient AI crawlers — supplement to extractable passages, not a substitute.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>What is AI inference?</title>
<link>https://o10.io/answers/what-is-ai-inference</link>
<guid isPermaLink="true">https://o10.io/answers/what-is-ai-inference</guid>
<description>AI inference is running a trained model on live inputs to produce outputs — the operational phase where up to 90% of AI cost and risk accrues in production.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>What are AI tokens?</title>
<link>https://o10.io/answers/what-are-ai-tokens</link>
<guid isPermaLink="true">https://o10.io/answers/what-are-ai-tokens</guid>
<description>AI tokens are the billing units LLM APIs use — typically subword pieces of text charged separately for prompts and completions.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>How to reduce LLM inference cost?</title>
<link>https://o10.io/answers/how-to-reduce-llm-inference-cost</link>
<guid isPermaLink="true">https://o10.io/answers/how-to-reduce-llm-inference-cost</guid>
<description>Reduce LLM inference cost by routing each use case to the cheapest model that clears its eval quality floor — prove in shadow mode, then enforce in the request path.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>What is LLM routing?</title>
<link>https://o10.io/answers/what-is-llm-routing</link>
<guid isPermaLink="true">https://o10.io/answers/what-is-llm-routing</guid>
<description>LLM routing selects which model and venue serves each request — ideally eval-gated so the cheapest compliant option wins, not the default frontier model.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>What is shadow mode in AI routing?</title>
<link>https://o10.io/answers/what-is-shadow-mode</link>
<guid isPermaLink="true">https://o10.io/answers/what-is-shadow-mode</guid>
<description>Shadow mode mirrors production traffic through a control plane without changing live routes — building a verified savings baseline before enforce mode.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>What is enforce mode?</title>
<link>https://o10.io/answers/what-is-enforce-mode</link>
<guid isPermaLink="true">https://o10.io/answers/what-is-enforce-mode</guid>
<description>Enforce mode changes production routes in the request path — holding budget envelopes and quality floors on every subsequent call.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>What is an AI quality floor?</title>
<link>https://o10.io/answers/what-is-quality-floor</link>
<guid isPermaLink="true">https://o10.io/answers/what-is-quality-floor</guid>
<description>A quality floor is a measured eval bar per use case — the minimum score a model must clear. The cheapest model that passes is the route to select.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>What is Know Your Inference (KYI)?</title>
<link>https://o10.io/answers/what-is-know-your-inference</link>
<guid isPermaLink="true">https://o10.io/answers/what-is-know-your-inference</guid>
<description>Know Your Inference (KYI) scores inference systems across performance, economics, integration, strategy, and risk — producing a board-signable recommendation.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>How much does GPT-4o cost per million tokens?</title>
<link>https://o10.io/answers/gpt-4o-price-per-million-tokens</link>
<guid isPermaLink="true">https://o10.io/answers/gpt-4o-price-per-million-tokens</guid>
<description>GPT-4o-class frontier pricing ranges roughly $2.50–$15/1M input and higher for output depending on venue — gateway, aggregator, or committed cloud capacity.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>How much does Claude 3.5 Sonnet cost?</title>
<link>https://o10.io/answers/claude-3-5-sonnet-pricing</link>
<guid isPermaLink="true">https://o10.io/answers/claude-3-5-sonnet-pricing</guid>
<description>Sonnet-class models typically run $3–$9/1M input tokens on gateways and lower on committed Bedrock or enterprise agreements.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>OpenRouter vs LiteLLM — what is the difference?</title>
<link>https://o10.io/answers/openrouter-vs-litellm</link>
<guid isPermaLink="true">https://o10.io/answers/openrouter-vs-litellm</guid>
<description>LiteLLM is an open-source LLM gateway for API unification. OpenRouter is a multi-provider aggregator. Neither enforces spend — both sit below a control plane like o10.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>AI gateway vs control plane?</title>
<link>https://o10.io/answers/ai-gateway-vs-control-plane</link>
<guid isPermaLink="true">https://o10.io/answers/ai-gateway-vs-control-plane</guid>
<description>Gateways provide API access and failover. Control planes enforce policy, evals, routing, and ledger above all gateways and clouds.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Inference vs training cost?</title>
<link>https://o10.io/answers/inference-vs-training-cost</link>
<guid isPermaLink="true">https://o10.io/answers/inference-vs-training-cost</guid>
<description>Training is a one-time CapEx spike; inference is continuous OpEx that compounds with users, agents, and retries — often 90%+ of operational AI spend.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>How much does RAG inference cost?</title>
<link>https://o10.io/answers/how-much-does-rag-cost</link>
<guid isPermaLink="true">https://o10.io/answers/how-much-does-rag-cost</guid>
<description>RAG multiplies tokens via retrieval plus generation — often the highest-volume enterprise workload. Savings from eval-gated routing to mini or haiku tiers are typically largest here.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>What is Amazon Bedrock committed capacity?</title>
<link>https://o10.io/answers/bedrock-committed-capacity</link>
<guid isPermaLink="true">https://o10.io/answers/bedrock-committed-capacity</guid>
<description>Bedrock committed capacity reserves inference throughput at a lower marginal $/token than on-demand — ideal for steady workloads once evals clear on reserved tiers.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>AI</title>
<link>https://o10.io/ai</link>
<guid isPermaLink="true">https://o10.io/ai</guid>
<description>AI in production is inference — where up to 90% of operational life and spend accrues. o10 is the control plane that enforces inference spend, not just reports it.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>AI Inference</title>
<link>https://o10.io/ai-inference</link>
<guid isPermaLink="true">https://o10.io/ai-inference</guid>
<description>AI inference is running trained models on live requests. o10 routes inference to the cheapest compliant model across gateways, Bedrock, and owned capacity — with up to 638× observed price spread.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>AI Tokens (LLM Token Pricing)</title>
<link>https://o10.io/tokens</link>
<guid isPermaLink="true">https://o10.io/tokens</guid>
<description>AI tokens are how LLMs meter text for billing — not cryptocurrency. o10 optimizes token spend by routing to the cheapest compliant model per use case. Interactive calculator and venue price tables.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>AI Models</title>
<link>https://o10.io/ai-models</link>
<guid isPermaLink="true">https://o10.io/ai-models</guid>
<description>AI model selection should be eval-gated: cheapest model clearing your quality floor. Compare open-weight, mini, sonnet-class, and frontier tiers with live eval gate demo.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>LLM Inference Routing</title>
<link>https://o10.io/routing</link>
<guid isPermaLink="true">https://o10.io/routing</guid>
<description>AI model routing sends each inference call to the cheapest compliant model and venue. o10 offers shadow mode (prove savings) and enforce mode (hold the envelope in the path). Live routing console demo.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>AI Supply Chain</title>
<link>https://o10.io/ai-supply-chain</link>
<guid isPermaLink="true">https://o10.io/ai-supply-chain</guid>
<description>The AI supply chain spans sourcing capacity, evals, enforcement, KYI governance, and board assurance. o10 is the control plane; KYI scores sustainability across five pillars.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Inference Spend</title>
<link>https://o10.io/inference-spend</link>
<guid isPermaLink="true">https://o10.io/inference-spend</guid>
<description>Inference spend is production AI cost metered in tokens and venues. o10 enforces budget envelopes per use case — dashboards observe, o10 changes what you spend.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>AI FinOps</title>
<link>https://o10.io/ai-finops</link>
<guid isPermaLink="true">https://o10.io/ai-finops</guid>
<description>AI FinOps answers four CFO questions with levers — not slides. Live ledger per use case, unit economics, kill criteria, and forecasts tied to volume drivers.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>AI Governance</title>
<link>https://o10.io/governance</link>
<guid isPermaLink="true">https://o10.io/governance</guid>
<description>Enforce zero-retention, jurisdiction-aware routing, and immutable per-call audit trails at the control plane — not in policy documents alone.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
<item>
<title>Know Your Inference</title>
<link>https://o10.io/kyi</link>
<guid isPermaLink="true">https://o10.io/kyi</guid>
<description>KYI scores inference use cases across five pillars — performance, economics, integration, strategy, risk — with a composite score and board recommendation. By Shen Pandi.</description>
<pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>