How does context window affect document summarization?
A context window is the maximum tokens a model accepts in one request (prompt + completion). Larger windows enable richer prompts but increase per-call cost linearly with tokens used. For document summarization at 22.0B/mo, context window ties to Up to 80% compliant routing opportunity at a balanced floor.
Up to 638× spread between most and least expensive compliant routes for identical workloads at the same quality floor (o10 State of Inference Spend 2026).