By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
RebruitRebruitRebruit
  • Latest
  • Finance
  • Technology
  • Security
  • Guides
Reading: OpenAI Introduces “Flex” Pricing: Now Half the Price
Font ResizerAa
RebruitRebruit
Font ResizerAa
Search
  • Latest
  • Finance
  • Technology
  • Security
  • Guides
Follow US
  • About
  • Our Standards
  • Contact
  • Privacy Policy
  • Cookie Policy
  • Terms of use
© 2025 REBRUIT | We don’t control content on external sites. Read more about how we handle external links

Home » OpenAI Introduces “Flex” Pricing: Now Half the Price

Latest

OpenAI Introduces “Flex” Pricing: Now Half the Price

April 18, 2025
Share
6 Min Read
openAI Flex
SHARE

OpenAI is giving developers a new way to trim their AI bills. The company has rolled out Flex processing, a beta‑only pricing tier that trades blazing‑fast responses for substantial savings. This tier is perfect for background jobs or experimental work where speed isn’t critical.

Contents
Flex vs. BatchWhy Flex sometimes feels “almost as fast”When to choose whichNew Verification Requirements

It’s a new budget‑friendly option in the OpenAI API. Flex processing slices every per‑token rate in half for the o3 and o4‑mini models, but your calls drop to the back of the queue. In practice, that means:

ModelStandard InputFlex InputStandard OutputFlex Output
o3$10 / M tokens$5 / M$40 / M tokens$20 / M
o4‑mini$1.10 / M tokens$0.55 / M$4.40 / M tokens$2.20 / M

For example, the “$5 / M” means $5 per million tokens processed. In other words, if you send the model one million input tokens (≈ 750,000 English words), you’ll be charged $5 on the Flex tier.

READ ALSO: Why OpenAI is courting Windsurf, not Cursor, for a potential $3B acquisition

  • Target use‑cases: Data enrichment pipelines, large‑scale evaluations, asynchronous tasks, and any project you’d tag as “lower priority” or “non‑production.”
  • Trade‑off: Responses may take longer, and, at peak demand, requests might be queued or throttled.

Flex vs. Batch

Flex (service_tier=”flex”)Batch (/v1/batch endpoint)
Call styleNormal /synchronous chat/completions (just add service_tier:"flex").Upload one .jsonl file that can bundle millions of requests.
Turn‑aroundGuaranteed ≤ 24 h. In practice, you often get results inside ~10 min–1 h, but OpenAI reserves a full day.Guaranteed ≤ 24 h. In practice, you often get results inside ~10 min–1 h, but OpenAI reserves a full day. OpenAI Help Center, OpenAI Platform
DiscountNo stated SLA. Requests run after real‑time traffic clears. Typical latency is seconds to minutes, but can spike to “please retry later.” TechCrunch–50 % versus synchronous price list for the same model. OpenAI Help Center
Rate limitsStill governed by your normal per‑minute/token caps, just deprioritized.Separate “batch quota”—today up to ~250 M input tokens in a single job—and it doesn’t count against live‑API rate limits. OpenAI Community
Streaming / functionsAllowed (everything the live endpoint supports, including streaming chunks and function‑calling).No streaming. Each response is written to an output file you download after the job finishes. OpenAI Help Center
Integration effortOne extra parameter; ideal if your code already makes chat/completions calls.Requires building a small pipeline: create file → submit batch → poll status → fetch results.
Best forMedium‑latency tasks that still benefit from an immediate HTTP response:
• user‑facing features that can wait a bit
• eval dashboards where freshness matters.
Huge offline workloads:
• nightly data enrichment.
• embedding or summarising millions of documents.
• large prompt A/B tests where real‑time speed is irrelevant.

Why Flex sometimes feels “almost as fast”

Flex jobs piggyback on whatever idle GPU slots are free at the moment. During quiet periods, the queue may be practically empty, so you get your answer in under a minute, exactly what you would have seen.

But unlike Batch, there’s no SLA that guarantees completion; at peak usage, you can hit multi‑minute waits or transient “resource unavailable” errors. If consistent latency matters, you still have to pay full price (or build retry logic).

When to choose which

Choose Flex if…Choose Batch if…
You can tolerate variable latency, but not the complexity of new tooling.You’re processing hundreds of thousands or millions of prompts and don’t need them back immediately.
You can tolerate variable latency but not the complexity of new tooling.You need to blow past your normal rate limits or run jobs while you sleep.

Just remember:
• Seconds‑to‑minutes latency target → Flex.
• Minutes‑to‑hours latency target, huge volume, or you want to forget about rate limits → Batch.

Both tiers deliver the same model quality—only the queueing strategy changes. So if 50 % off was the main attraction of Batch and your workloads need answers in < 10 min, Flex is the simpler lever to pull.

New Verification Requirements

Flex isn’t the only change: developers in usage tiers 1‑3 must now clear an ID‑verification step to unlock o3 (and certain features such as reasoning summaries and the streaming API). OpenAI says the measure helps keep malicious actors out of its ecosystem

Just hours before OpenAI’s announcement, Google unveiled Gemini 2.5 Flash, a leaner model that squares up to DeepSeek’s R1 while undercutting it on input costs. OpenAI’s move indicates a broader race to serve developers who care as much about price efficiency as raw horsepower.

If your application can tolerate the occasional delay or brief unavailability, Flex processing offers a straightforward way to halve your token spend without switching models or vendors. As for the latency‑sensitive production systems, the traditional “priority” tier still reigns.

This is a welcome relief as the cost of running frontier models keeps creeping upward, and competitors rush out “budget” alternatives.

Share This Article
Facebook Whatsapp Whatsapp Bluesky Copy Link
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

Block
Square’s Bitcoin Payment Pilot: Bringing Crypto to Everyday Retail
Finance
METAMASK
MetaMask Embraces Solana: A New Era for Multi-Chain Wallets
Finance
Read Smarter, Not Harder
Read Smarter, Not Harder: These Apps Will Help You Read More Anywhere
Guides
Samsung-One-UI-8
15+ New Features Coming to One UI 8
Latest
samsung
Don Belle Boost Samsung Galaxy A-Series Buzz
Latest

You Might Also Like

Claude AI
Latest

What You Need to Know About the Claude 4 Release

May 23, 2025
Xperia 1 VII
Latest

Xperia 1 VII: Sony’s Most Creator-Focused Smartphone Yet

May 13, 2025
Apple’s AI Search Move Could Cost Google Billions
Latest

Apple’s AI Search Move Could Cost Google Billions

May 13, 2025
AMD Beats Q1 2025 Expectations, Despite Trade Pressures Between the U.S. and China
Latest

AMD Beats Q1 2025 Expectations, Despite Trade Pressures Between the U.S. and China

May 6, 2025
gtaVI
Latest

Grand Theft Auto VI is officially coming on May 26, 2026

May 6, 2025
WWDC 2025: Here’s what we can expect from Apple in June
Latest

WWDC 2025: Here’s what we can expect from Apple in June

May 5, 2025
Apple and Anthropic Are Teaming Up to Build an AI-Powered Coding Platform
Latest

Apple and Anthropic Are Teaming Up to Build an AI-Powered Coding Platform

May 3, 2025
Tariffs Cost Apple Nearly $1B—Here’s How It’s Fighting Back
Latest

Tariffs Cost Apple Nearly $1B—Here’s How It’s Fighting Back

May 2, 2025
Follow US
© 2025 REBRUIT | We don’t control content on external sites. Read more about how we handle external links
  • About
  • Our Standards
  • Contact
  • Privacy Policy
  • Cookie Policy
  • Terms of use
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?