By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
RebruitRebruitRebruit
  • Latest
  • Finance
  • Technology
  • Security
  • Guides
Reading: OpenAI Introduces “Flex” Pricing: Now Half the Price
Font ResizerAa
RebruitRebruit
Font ResizerAa
Search
  • Latest
  • Finance
  • Technology
  • Security
  • Guides
Follow US
  • About
  • Our Standards
  • Contact
  • Privacy Policy
  • Cookie Policy
  • Terms of use
© 2025 REBRUIT | We don’t control content on external sites. Read more about how we handle external links

Home » OpenAI Introduces “Flex” Pricing: Now Half the Price

Latest

OpenAI Introduces “Flex” Pricing: Now Half the Price

rebruit icon
rebruit
April 18, 2025
Share
6 Min Read
openAI Flex
SHARE

OpenAI is giving developers a new way to trim their AI bills. The company has rolled out Flex processing, a beta‑only pricing tier that trades blazing‑fast responses for substantial savings. This tier is perfect for background jobs or experimental work where speed isn’t critical.

Contents
Flex vs. BatchWhy Flex sometimes feels “almost as fast”When to choose whichNew Verification Requirements

It’s a new budget‑friendly option in the OpenAI API. Flex processing slices every per‑token rate in half for the o3 and o4‑mini models, but your calls drop to the back of the queue. In practice, that means:

ModelStandard InputFlex InputStandard OutputFlex Output
o3$10 / M tokens$5 / M$40 / M tokens$20 / M
o4‑mini$1.10 / M tokens$0.55 / M$4.40 / M tokens$2.20 / M

For example, the “$5 / M” means $5 per million tokens processed. In other words, if you send the model one million input tokens (≈ 750,000 English words), you’ll be charged $5 on the Flex tier.

READ ALSO: Why OpenAI is courting Windsurf, not Cursor, for a potential $3B acquisition

  • Target use‑cases: Data enrichment pipelines, large‑scale evaluations, asynchronous tasks, and any project you’d tag as “lower priority” or “non‑production.”
  • Trade‑off: Responses may take longer, and, at peak demand, requests might be queued or throttled.

Flex vs. Batch

Flex (service_tier=”flex”)Batch (/v1/batch endpoint)
Call styleNormal /synchronous chat/completions (just add service_tier:"flex").Upload one .jsonl file that can bundle millions of requests.
Turn‑aroundGuaranteed ≤ 24 h. In practice, you often get results inside ~10 min–1 h, but OpenAI reserves a full day.Guaranteed ≤ 24 h. In practice, you often get results inside ~10 min–1 h, but OpenAI reserves a full day. OpenAI Help Center, OpenAI Platform
DiscountNo stated SLA. Requests run after real‑time traffic clears. Typical latency is seconds to minutes, but can spike to “please retry later.” TechCrunch–50 % versus synchronous price list for the same model. OpenAI Help Center
Rate limitsStill governed by your normal per‑minute/token caps, just deprioritized.Separate “batch quota”—today up to ~250 M input tokens in a single job—and it doesn’t count against live‑API rate limits. OpenAI Community
Streaming / functionsAllowed (everything the live endpoint supports, including streaming chunks and function‑calling).No streaming. Each response is written to an output file you download after the job finishes. OpenAI Help Center
Integration effortOne extra parameter; ideal if your code already makes chat/completions calls.Requires building a small pipeline: create file → submit batch → poll status → fetch results.
Best forMedium‑latency tasks that still benefit from an immediate HTTP response:
• user‑facing features that can wait a bit
• eval dashboards where freshness matters.
Huge offline workloads:
• nightly data enrichment.
• embedding or summarising millions of documents.
• large prompt A/B tests where real‑time speed is irrelevant.

Why Flex sometimes feels “almost as fast”

Flex jobs piggyback on whatever idle GPU slots are free at the moment. During quiet periods, the queue may be practically empty, so you get your answer in under a minute, exactly what you would have seen.

But unlike Batch, there’s no SLA that guarantees completion; at peak usage, you can hit multi‑minute waits or transient “resource unavailable” errors. If consistent latency matters, you still have to pay full price (or build retry logic).

When to choose which

Choose Flex if…Choose Batch if…
You can tolerate variable latency, but not the complexity of new tooling.You’re processing hundreds of thousands or millions of prompts and don’t need them back immediately.
You can tolerate variable latency but not the complexity of new tooling.You need to blow past your normal rate limits or run jobs while you sleep.

Just remember:
• Seconds‑to‑minutes latency target → Flex.
• Minutes‑to‑hours latency target, huge volume, or you want to forget about rate limits → Batch.

Both tiers deliver the same model quality—only the queueing strategy changes. So if 50 % off was the main attraction of Batch and your workloads need answers in < 10 min, Flex is the simpler lever to pull.

New Verification Requirements

Flex isn’t the only change: developers in usage tiers 1‑3 must now clear an ID‑verification step to unlock o3 (and certain features such as reasoning summaries and the streaming API). OpenAI says the measure helps keep malicious actors out of its ecosystem

Just hours before OpenAI’s announcement, Google unveiled Gemini 2.5 Flash, a leaner model that squares up to DeepSeek’s R1 while undercutting it on input costs. OpenAI’s move indicates a broader race to serve developers who care as much about price efficiency as raw horsepower.

If your application can tolerate the occasional delay or brief unavailability, Flex processing offers a straightforward way to halve your token spend without switching models or vendors. As for the latency‑sensitive production systems, the traditional “priority” tier still reigns.

This is a welcome relief as the cost of running frontier models keeps creeping upward, and competitors rush out “budget” alternatives.

Share This Article
Facebook Whatsapp Whatsapp Bluesky Copy Link
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

Nvidia’s New H20 Chip for China Is a Tactical Workaround to U.S. Sanctions
Nvidia’s New H20 Chip for China Is a Tactical Workaround to U.S. Sanctions
Technology
Apple’s AI Search Move Could Cost Google Billions
Apple’s AI Search Move Could Cost Google Billions
Latest
AMD Beats Q1 2025 Expectations, Despite Trade Pressures Between the U.S. and China
AMD Beats Q1 2025 Expectations, Despite Trade Pressures Between the U.S. and China
Latest
M&S Cyber Attack
How Hackers Broke Into M&S and Co-op: It Wasn’t a Glitch — It Was Human Error
Security
gtaVI
Grand Theft Auto VI is officially coming on May 26, 2026
Latest

You Might Also Like

WWDC 2025: Here’s what we can expect from Apple in June
Latest

WWDC 2025: Here’s what we can expect from Apple in June

May 5, 2025
Apple and Anthropic Are Teaming Up to Build an AI-Powered Coding Platform
Latest

Apple and Anthropic Are Teaming Up to Build an AI-Powered Coding Platform

May 3, 2025
Tariffs Cost Apple Nearly $1B—Here’s How It’s Fighting Back
Latest

Tariffs Cost Apple Nearly $1B—Here’s How It’s Fighting Back

May 2, 2025
Port Houston
Latest

Major Xfinity Outage Knocks Out Internet Across Houston

May 2, 2025
XBox
Latest

Xbox is Getting more Expensive—And it’s not just them

May 2, 2025
street view of london with famous department stores
Latest

Harrods Responds to Cyber Attack as UK Retail Faces Ongoing Threats

May 2, 2025
Marks-&-Spencer-cyber-attack
Latest

M&S Cyberattack Enters New Month as Online Services Remain Offline

May 1, 2025
Apple
Latest

Apple Warns of $900M Tariff Hit as It Shifts iPhone Production to India

May 3, 2025
Follow US
© 2025 REBRUIT | We don’t control content on external sites. Read more about how we handle external links
  • About
  • Our Standards
  • Contact
  • Privacy Policy
  • Cookie Policy
  • Terms of use
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?