Flat-Rate AI Is Dead: Why GitHub Copilot, Cursor, and Windsurf All Chose Token Billing

Flat-rate LLM subscriptions used to be simple: pay a monthly fee and you can “use it as much as you want” (with quiet limits in the background). That model is now breaking down. Reasoning models and agent-style workflows can vary compute cost by orders of magnitude, and providers can’t sustainably subsidize heavy usage inside a single fixed price.

GitHub Copilot is the clearest signal yet: it will retire “Premium requests” on June 1, 2026 and move fully to token-based billing via GitHub AI Credits. Cursor (June 2025) and Windsurf (March 2026) already made similar shifts. Using Copilot’s redesign as a concrete example, this article explains what’s changing across the industry, how each plan is affected, and what engineering, procurement, and operations teams should do to stay in control.

What’s happening

From 2025 through 2026, AI coding tools have been rapidly moving from “flat monthly fee + request-count limits” to “usage-based + token/credit metering.” Cursor (June 2025), Windsurf (March 2026), and now GitHub Copilot (scheduled for June 2026) are all converging on the same direction. This isn’t a one-off price hike—it’s a structural shift in how LLM products can be priced sustainably.

Copilot started as a product that felt like “pay monthly, use it.” Under the hood, however, costs vary dramatically by model choice and feature (chat, agents, code review, and so on). GitHub has gradually moved toward a design where it (1) isolates higher-cost experiences, (2) measures them, (3) introduces caps, and (4) pushes overages toward usage-based billing. You can see the same pattern in other AI coding tools (Cursor, Windsurf, Claude Code, and others), even if they use different naming.

In GitHub’s April 2026 announcement, Chief Product Officer Mario Rodriguez explained the core issue: today a quick chat question and an hours-long automated coding session can cost the user the same amount. GitHub has been absorbing most of the growing inference cost, but the Premium requests model is no longer sustainable.

Key takeaways

“Flat rate = unlimited” is increasingly incompatible with modern LLM workloads
Monthly usage limits (budget) and short-window throttling (rate limits / capacity) are different constraints
Vendors are converging on token-based credits and quota-style controls—2026 is a tipping point
GitHub Copilot will retire Premium requests on June 1, 2026 and switch to GitHub AI Credits (1 credit = $0.01 USD)

How Copilot has measured usage

What “Premium requests” meant

In Copilot, GitHub separated “baseline” experiences (for example, inline code completions) from experiences that are more likely to be expensive (for example, higher-end chat models and agent-style features). Those higher-cost actions were metered as Premium requests.

GitHub’s documentation also makes the enforcement dates explicit: billing for Premium requests began on June 18, 2025 for paid Copilot plans on GitHub.com, and on August 1, 2025 on GHE.com.

Premium request consumption depended on model-specific multipliers. The intent was reasonable—make the cost impact of “which model did you pick?” visible to the user. In practice, it also made budgeting harder, because teams had to reason about both monthly allowances and multipliers.

GitHub’s documentation states that Premium request billing for paid Copilot plans on GitHub.com began on June 18, 2025.

How overages worked

For Copilot in organizations (Business / Enterprise), GitHub’s design allowed add-on purchases once the included Premium requests were exhausted (depending on your contract and admin settings). GitHub Docs listed an add-on price of $0.04 USD per request. Copilot Free and some mobile-based subscribers were not eligible for overage billing.

Warning

You can sometimes hit a “can’t use Copilot right now” situation even when you still have Premium requests left. In many cases, that’s not a monthly budget issue—it’s a rate limit issue. Operationally, monitor “billing (budget)” and “availability (rate limiting)” as separate signals.

Why flat-rate pricing is collapsing

Cost variance has exploded

Two interactions can look identical to the user, while being wildly different computationally. Models that do heavier reasoning, long-context prompts, multi-step agent runs, and repo-wide analysis can multiply per-task cost by orders of magnitude. Under a fixed subscription, heavy usage can quickly turn unit economics negative for the vendor.

Reporting has also highlighted this dynamic: complex prompts that trigger extensive “thinking” can cost more to run than what the user pays in subscription revenue. That mismatch is fatal to a true “all-you-can-eat” plan.

Agents are the real tipping point

Modern coding agents don’t just generate once—they plan, execute, test/verify, and iterate. So even if the user experiences it as “one task,” the product may be making dozens or hundreds of model calls behind the scenes. Metered units like Premium requests (and model multipliers) were an attempt to align pricing with that hidden work.

There’s also a broader operational reality: as agent usage grows, the cost to operate AI features can jump quickly, forcing vendors to redesign both pricing and reliability controls.

Fairness and abuse prevention

“Effectively unlimited” flat-rate plans encourage a small fraction of heavy users to consume a disproportionate share of compute. Breaking usage into measurable units—then setting caps and overages—serves multiple goals: cost recovery, abuse mitigation, and performance protection during peak load. In other words, LLM products are shifting from classic SaaS logic (“maximize LTV with a predictable subscription”) toward cloud-compute logic (“meter usage as close to cost as possible”).

The same shift across the industry

This isn’t only about GitHub Copilot. Major AI coding tools are moving in the same direction. The metering labels differ (credits, quotas, Premium requests), but the core idea is the same: align pricing to token consumption and compute.

Cursor: switched to credit-based usage (June 2025)

Cursor moved away from a flat monthly allowance of “fast requests” and adopted a credit system tied to actual model usage. The design gives each paid plan a monthly credit pool (roughly aligned to the plan price), and consumption varies with model choice and context length. Higher tiers effectively buy larger usage envelopes.

The rollout also showed the risk of abrupt billing changes. Some heavy users saw unexpected charges immediately after the transition, triggering significant community pushback. Cursor issued an apology and processed refunds for certain unexpected charges during the early transition window. The lesson: if you change the unit economics, you must also change communication, controls, and observability at the same time.

Windsurf: moved to quota-based limits (March 2026)

Windsurf shifted from credits to quota-based limits (daily/weekly usage caps). It also raised its Pro price from $15 to $20 and added a $200/month tier. Existing users were auto-migrated without grandfathering; long-time users received a limited one-time goodwill credit.

Claude Code: tiered quotas at similar price points

Claude Code uses tiered quotas (for example, 5× and 20× tiers) to give heavy users a way to buy substantially larger token budgets. Because Anthropic controls the underlying models, optimization can sometimes make high-usage scenarios relatively cost-effective compared to direct API usage—depending on your workload.

Watch the pricing convergence

A “Pro at ~$20 / power tier at ~$200” structure has emerged across multiple tools (Cursor, Windsurf, Claude Code). Even with different metering systems (credits, quotas, Premium requests), the market is landing in the same place: a $20/month seat can’t fund unlimited agentic compute anymore. Copilot’s June 2026 shift brings GitHub into that broader equilibrium.

The 2026 inflection point

Full move to usage-based billing

According to GitHub’s official documentation and blog, Copilot will move all plans from request-based metering to usage-based billing starting June 1, 2026. Premium requests will be retired. Instead, each plan includes a monthly allowance of GitHub AI Credits, and any additional usage is billed based on token consumption at the published per-model rates.

Usage is calculated from the total of input tokens, output tokens, and cached tokens, converted into credits using each model’s public API pricing. GitHub AI Credits have a fixed conversion value: 1 credit = $0.01 USD.

Monthly AI Credits by plan

After the transition, the monthly subscription price stays the same, but it effectively becomes a monthly AI Credits allowance of equal value.

Plan	Monthly price	Included AI Credits	Notes
Copilot Pro	$10	1,000 credits (=$10)	Individual
Copilot Pro+	$39	3,900 credits (=$39)	Higher-tier individual
Copilot Business	$19/user	1,900 credits (3,000 for Jun–Aug)	Organizations; temporary boost for first 3 months
Copilot Enterprise	$39/user	3,900 credits (7,000 for Jun–Aug)	Enterprises; temporary boost for first 3 months

For existing Business and Enterprise customers, GitHub provides an automatic transition buffer: increased AI Credits are granted from June 1, 2026 through September 1, 2026, then revert to the standard allowance.

How existing subscriptions are handled

Monthly Pro / Pro+ subscribers will be automatically migrated to usage-based billing on June 1, 2026. Annual Pro / Pro+ subscribers will keep Premium-request-based billing until their renewal date, but multipliers change starting June 1, 2026 (meaning premium request consumption increases for certain models). Annual plans won’t auto-renew; at the end of term, accounts downgrade to Copilot Free (and you can later re-subscribe monthly). If you switch from annual to monthly mid-term, GitHub applies prorated credits based on remaining time.

No more “cheap model fallback” after you run out

Under the current system, some users could keep working with cheaper models after exhausting Premium requests. After the move to usage-based billing, that fallback goes away. Once you exhaust AI Credits, you must either (1) set additional spend/budget to continue, (2) wait for the monthly reset, or (3) upgrade your plan. If a user hits their individual budget cap, access stops for that user even if the organization’s pooled balance still exists.

Code review will also consume GitHub Actions minutes

Another significant change: starting June 1, 2026, Copilot Code Review will consume GitHub Actions minutes. Each code review can therefore incur both (1) AI Credits (usage-based) and (2) GitHub Actions minutes for private repositories. Public repositories continue to receive free Actions minutes under GitHub’s standard model. If a user without a Copilot license runs a review, it can still consume the organization’s Actions minutes.

This is the key shift

Premium requests were a coarse, feature-level meter. Usage-based billing is closer to the underlying compute reality because it tracks tokens. Expect LLM pricing to look less like SaaS and more like cloud infrastructure (CPU/storage metering).

There’s also a practical downside: usage-based billing is inherently non-deterministic. You can’t know in advance exactly how many tokens (and how much runtime) a model will spend on a given prompt—especially with agentic workflows. That forces a change in how teams plan and control budgets.

GitHub also plans to ship a “preview invoice” experience in early May 2026, so you can estimate what your usage would cost under the new model before the June 1 cutover.

Understanding the differences by comparison

Billing models directly shape adoption decisions, budgets, and governance. Here are the most common patterns.

Model	Billing unit	Vendor goal	Customer watch-outs
Flat-rate (effectively unlimited)	Monthly fee	Reduce adoption friction	Limits appear under heavy usage; weak long-term sustainability
Flat-rate + caps (quota-based)	Monthly fee + daily/weekly limits	Predictable costs	Definitions can get complex across features/models
Flat-rate + overages	Monthly fee + metered overage	Recover cost + fairness	Budget overrun risk; misconfiguration can cause bill spikes
Usage-based	Tokens/credits	Price closer to cost	Hard to estimate upfront; requires visibility and controls

Across the market, the common path is: start with “effectively unlimited,” add caps and/or overages, then move closer to full usage-based. Copilot is following that path: Premium requests were the “flat-rate + overage” bridge, and June 2026 is the move to fully usage-based billing. Cursor jumped more directly to usage-based, while Windsurf partially “rebalanced” toward quotas—but the overall direction remains alignment to real compute cost.

What to consider before rollout

From seat-based thinking to value-based deployment

As flat-rate pricing fades, the question becomes less about “how many seats?” and more about “where does this deliver measurable value?” For example: Does PR summarization cut review time? Do agents reduce repetitive refactor work? If you can point to specific high-value workflows, it becomes much easier to defend usage-based spend with ROI.

The opposite approach—“give everyone a seat and see what happens”—collides with usage-based billing. Light users still receive an allowance you may not fully utilize, while heavy users burn through credits and trigger overages. That’s how you end up with both weak outcomes and unpleasant bills.

After June 2026, Copilot Pro’s $10/month effectively becomes $10/month worth of AI Credits. With inference costs rising quickly in the agent era, budgeting purely as “seats × monthly fee” is no longer enough. Teams increasingly need cloud-style forecasting: “task type × expected token spend × unit price,” plus guardrails.

Procurement and governance checklist

Included allowances: what’s metered, and at what reset cadence (monthly/weekly/daily)
Overage pricing and spend caps: token/credit rates, and how flexible your limits can be
Rate limits and SLA implications: how throttling impacts developer workflow
BYOK (Bring Your Own Key): whether heavy users can shift to direct API billing
Logs and auditability: who used what, and whether you can export via API
Change management: how the vendor communicates billing changes (see Cursor June 2025, Windsurf March 2026, GitHub June 2026)
Fallback behavior: whether “cheaper mode after you run out” exists (Copilot removes this after June 2026)
Linked cost centers: cross-product metering such as GitHub Actions minutes

Because Copilot plan details can change, confirm the latest terms on GitHub’s official pricing pages before making purchasing decisions. If you have access to GitHub’s early-May 2026 “preview invoice,” use it to estimate spend based on your real usage patterns before June 1.

Need guardrails for usage-based AI spend?

If Copilot (and other coding assistants) now behave like metered cloud services, you need the same playbook: budgets, visibility, and governance. Talk to us about setting up usage controls, cost reporting, and rollout policies that won’t surprise your teams—or finance.

Contact UsFeel free to reach out for scraping consultations and quotes

Get in Touch

Conclusion

LLM pricing is changing because modern usage is changing. Reasoning-heavy models, long-context prompts, and agentic workflows make “one flat monthly fee” increasingly unworkable. GitHub has officially announced that Copilot will retire Premium requests and transition all plans to usage-based billing on June 1, 2026, using GitHub AI Credits (1 credit = $0.01 USD). Subscription prices remain the same, but the value is delivered as monthly credits; overages are billed based on token consumption. Cheap-model fallback is removed, and Copilot Code Review will also begin consuming GitHub Actions minutes.

Copilot isn’t alone. Cursor’s June 2025 shift and Windsurf’s March 2026 move show the same structural trend across AI coding tools. Even as metering units differ, vendors are converging on a world where heavy usage must be explicitly paid for—and where “$20 Pro / ~$200 power tier” pricing structures are increasingly common. Cursor’s backlash during its transition is a reminder that the biggest risk isn’t just cost—it’s poor communication and insufficient controls during a billing-model change.

For customers, the practical response is clear: (1) understand the metering unit, (2) put guardrails on overages, (3) operationalize rate-limit monitoring separately from budget monitoring, and (4) focus spend on workflows with measurable value. The end of flat-rate AI doesn’t automatically mean worse ROI—it means you have to deploy AI with intent, not by default.

About the Author

Ibuki Yamamoto

Web scraping engineer with over 10 years of practical experience, having worked on numerous large-scale data collection projects. Specializes in Python and JavaScript, sharing practical scraping techniques in technical blogs.

Leave it to the
Data Collection Professionals

Our professional team with over 100 million data collection records annually solves all challenges including large-scale scraping and anti-bot measures.

100M+

Annual Data Collection

24/7

Uptime

High Quality

Data Quality