OpenAI launches Flex processing for cheaper, slower AI tasks
In a bid to more aggressively compete with rival AI companies like Google, OpenAI is launching Flex processing, an API option that provides lower AI model usage prices in exchange for slower response times and “occasional resource unavailability.”
Flex processing, which is available in beta for OpenAI’s recently released o3 and o4-mini reasoning models, is aimed at lower-priority and “non-production” tasks such as model evaluations, data enrichment, and asynchronous workloads, OpenAI says.
It reduces API costs by exactly half. For o3, Flex processing is $5/M input tokens (~750,000 words) and $20/M output tokens versus the standard $10/M input tokens and $40/M output tokens. For o4-mini, Flex brings the price down to $0.55/M input tokens and $2.20/M output tokens from $1.10/M input tokens and $4.40/M output tokens.
The launch of Flex processing comes as the price of frontier AI continues to climb, and as rivals release cheaper, more efficient budget-oriented models. On Thursday, Google rolled out Gemini 2.5 Flash, a reasoning model that matches or bests DeepSeek’s R1 in terms of performance at a lower input token cost.
In an email to customers announcing the launch of Flex pricing, OpenAI also indicated that developers in tiers 1-3 of its usage tiers hierarchy will have to complete the newly introduced ID verification process to access o3. (Tiers are determined by the amount of money spent on OpenAI services.) O3’s reasoning summaries and streaming API support are also gated behind verification.
OpenAI previously said ID verification is intended to stop bad actors from violating its usage policies.