Flat-rate LLM subscriptions used to be simple: pay a monthly fee and you can âuse it as much as you wantâ (with quiet limits in the background). That model is now breaking down. Reasoning models and agent-style workflows can vary compute cost by orders of magnitude, and providers canât sustainably subsidize heavy usage inside a single fixed price.
GitHub Copilot is the clearest signal yet: it will retire âPremium requestsâ on June 1, 2026 and move fully to token-based billing via GitHub AI Credits. Cursor (June 2025) and Windsurf (March 2026) already made similar shifts. Using Copilotâs redesign as a concrete example, this article explains whatâs changing across the industry, how each plan is affected, and what engineering, procurement, and operations teams should do to stay in control.
Whatâs happening
From 2025 through 2026, AI coding tools have been rapidly moving from âflat monthly fee + request-count limitsâ to âusage-based + token/credit metering.â Cursor (June 2025), Windsurf (March 2026), and now GitHub Copilot (scheduled for June 2026) are all converging on the same direction. This isnât a one-off price hikeâitâs a structural shift in how LLM products can be priced sustainably.
Copilot started as a product that felt like âpay monthly, use it.â Under the hood, however, costs vary dramatically by model choice and feature (chat, agents, code review, and so on). GitHub has gradually moved toward a design where it (1) isolates higher-cost experiences, (2) measures them, (3) introduces caps, and (4) pushes overages toward usage-based billing. You can see the same pattern in other AI coding tools (Cursor, Windsurf, Claude Code, and others), even if they use different naming.
In GitHubâs April 2026 announcement, Chief Product Officer Mario Rodriguez explained the core issue: today a quick chat question and an hours-long automated coding session can cost the user the same amount. GitHub has been absorbing most of the growing inference cost, but the Premium requests model is no longer sustainable.
Key takeaways
- âFlat rate = unlimitedâ is increasingly incompatible with modern LLM workloads
- Monthly usage limits (budget) and short-window throttling (rate limits / capacity) are different constraints
- Vendors are converging on token-based credits and quota-style controlsâ2026 is a tipping point
- GitHub Copilot will retire Premium requests on June 1, 2026 and switch to GitHub AI Credits (1 credit = $0.01 USD)
How Copilot has measured usage
What âPremium requestsâ meant
In Copilot, GitHub separated âbaselineâ experiences (for example, inline code completions) from experiences that are more likely to be expensive (for example, higher-end chat models and agent-style features). Those higher-cost actions were metered as Premium requests.
GitHubâs documentation also makes the enforcement dates explicit: billing for Premium requests began on June 18, 2025 for paid Copilot plans on GitHub.com, and on August 1, 2025 on GHE.com.
Premium request consumption depended on model-specific multipliers. The intent was reasonableâmake the cost impact of âwhich model did you pick?â visible to the user. In practice, it also made budgeting harder, because teams had to reason about both monthly allowances and multipliers.
GitHubâs documentation states that Premium request billing for paid Copilot plans on GitHub.com began on June 18, 2025.
How overages worked
For Copilot in organizations (Business / Enterprise), GitHubâs design allowed add-on purchases once the included Premium requests were exhausted (depending on your contract and admin settings). GitHub Docs listed an add-on price of $0.04 USD per request. Copilot Free and some mobile-based subscribers were not eligible for overage billing.
Warning
You can sometimes hit a âcanât use Copilot right nowâ situation even when you still have Premium requests left. In many cases, thatâs not a monthly budget issueâitâs a rate limit issue. Operationally, monitor âbilling (budget)â and âavailability (rate limiting)â as separate signals.
Why flat-rate pricing is collapsing
Cost variance has exploded
Two interactions can look identical to the user, while being wildly different computationally. Models that do heavier reasoning, long-context prompts, multi-step agent runs, and repo-wide analysis can multiply per-task cost by orders of magnitude. Under a fixed subscription, heavy usage can quickly turn unit economics negative for the vendor.
Reporting has also highlighted this dynamic: complex prompts that trigger extensive âthinkingâ can cost more to run than what the user pays in subscription revenue. That mismatch is fatal to a true âall-you-can-eatâ plan.
Agents are the real tipping point
Modern coding agents donât just generate onceâthey plan, execute, test/verify, and iterate. So even if the user experiences it as âone task,â the product may be making dozens or hundreds of model calls behind the scenes. Metered units like Premium requests (and model multipliers) were an attempt to align pricing with that hidden work.
Thereâs also a broader operational reality: as agent usage grows, the cost to operate AI features can jump quickly, forcing vendors to redesign both pricing and reliability controls.
Fairness and abuse prevention
âEffectively unlimitedâ flat-rate plans encourage a small fraction of heavy users to consume a disproportionate share of compute. Breaking usage into measurable unitsâthen setting caps and overagesâserves multiple goals: cost recovery, abuse mitigation, and performance protection during peak load. In other words, LLM products are shifting from classic SaaS logic (âmaximize LTV with a predictable subscriptionâ) toward cloud-compute logic (âmeter usage as close to cost as possibleâ).
The same shift across the industry
This isnât only about GitHub Copilot. Major AI coding tools are moving in the same direction. The metering labels differ (credits, quotas, Premium requests), but the core idea is the same: align pricing to token consumption and compute.
Cursor: switched to credit-based usage (June 2025)
Cursor moved away from a flat monthly allowance of âfast requestsâ and adopted a credit system tied to actual model usage. The design gives each paid plan a monthly credit pool (roughly aligned to the plan price), and consumption varies with model choice and context length. Higher tiers effectively buy larger usage envelopes.
The rollout also showed the risk of abrupt billing changes. Some heavy users saw unexpected charges immediately after the transition, triggering significant community pushback. Cursor issued an apology and processed refunds for certain unexpected charges during the early transition window. The lesson: if you change the unit economics, you must also change communication, controls, and observability at the same time.
Windsurf: moved to quota-based limits (March 2026)
Windsurf shifted from credits to quota-based limits (daily/weekly usage caps). It also raised its Pro price from $15 to $20 and added a $200/month tier. Existing users were auto-migrated without grandfathering; long-time users received a limited one-time goodwill credit.
Claude Code: tiered quotas at similar price points
Claude Code uses tiered quotas (for example, 5Ă and 20Ă tiers) to give heavy users a way to buy substantially larger token budgets. Because Anthropic controls the underlying models, optimization can sometimes make high-usage scenarios relatively cost-effective compared to direct API usageâdepending on your workload.
Watch the pricing convergence
A âPro at ~$20 / power tier at ~$200â structure has emerged across multiple tools (Cursor, Windsurf, Claude Code). Even with different metering systems (credits, quotas, Premium requests), the market is landing in the same place: a $20/month seat canât fund unlimited agentic compute anymore. Copilotâs June 2026 shift brings GitHub into that broader equilibrium.
The 2026 inflection point
Full move to usage-based billing
According to GitHubâs official documentation and blog, Copilot will move all plans from request-based metering to usage-based billing starting June 1, 2026. Premium requests will be retired. Instead, each plan includes a monthly allowance of GitHub AI Credits, and any additional usage is billed based on token consumption at the published per-model rates.
Usage is calculated from the total of input tokens, output tokens, and cached tokens, converted into credits using each modelâs public API pricing. GitHub AI Credits have a fixed conversion value: 1 credit = $0.01 USD.
Monthly AI Credits by plan
After the transition, the monthly subscription price stays the same, but it effectively becomes a monthly AI Credits allowance of equal value.
| Plan | Monthly price | Included AI Credits | Notes |
|---|---|---|---|
| Copilot Pro | $10 | 1,000 credits (=$10) | Individual |
| Copilot Pro+ | $39 | 3,900 credits (=$39) | Higher-tier individual |
| Copilot Business | $19/user | 1,900 credits (3,000 for JunâAug) | Organizations; temporary boost for first 3 months |
| Copilot Enterprise | $39/user | 3,900 credits (7,000 for JunâAug) | Enterprises; temporary boost for first 3 months |
For existing Business and Enterprise customers, GitHub provides an automatic transition buffer: increased AI Credits are granted from June 1, 2026 through September 1, 2026, then revert to the standard allowance.
How existing subscriptions are handled
Monthly Pro / Pro+ subscribers will be automatically migrated to usage-based billing on June 1, 2026. Annual Pro / Pro+ subscribers will keep Premium-request-based billing until their renewal date, but multipliers change starting June 1, 2026 (meaning premium request consumption increases for certain models). Annual plans wonât auto-renew; at the end of term, accounts downgrade to Copilot Free (and you can later re-subscribe monthly). If you switch from annual to monthly mid-term, GitHub applies prorated credits based on remaining time.
No more âcheap model fallbackâ after you run out
Under the current system, some users could keep working with cheaper models after exhausting Premium requests. After the move to usage-based billing, that fallback goes away. Once you exhaust AI Credits, you must either (1) set additional spend/budget to continue, (2) wait for the monthly reset, or (3) upgrade your plan. If a user hits their individual budget cap, access stops for that user even if the organizationâs pooled balance still exists.
Code review will also consume GitHub Actions minutes
Another significant change: starting June 1, 2026, Copilot Code Review will consume GitHub Actions minutes. Each code review can therefore incur both (1) AI Credits (usage-based) and (2) GitHub Actions minutes for private repositories. Public repositories continue to receive free Actions minutes under GitHubâs standard model. If a user without a Copilot license runs a review, it can still consume the organizationâs Actions minutes.
This is the key shift
Premium requests were a coarse, feature-level meter. Usage-based billing is closer to the underlying compute reality because it tracks tokens. Expect LLM pricing to look less like SaaS and more like cloud infrastructure (CPU/storage metering).
Thereâs also a practical downside: usage-based billing is inherently non-deterministic. You canât know in advance exactly how many tokens (and how much runtime) a model will spend on a given promptâespecially with agentic workflows. That forces a change in how teams plan and control budgets.
GitHub also plans to ship a âpreview invoiceâ experience in early May 2026, so you can estimate what your usage would cost under the new model before the June 1 cutover.
Understanding the differences by comparison
Billing models directly shape adoption decisions, budgets, and governance. Here are the most common patterns.
| Model | Billing unit | Vendor goal | Customer watch-outs |
|---|---|---|---|
| Flat-rate (effectively unlimited) | Monthly fee | Reduce adoption friction | Limits appear under heavy usage; weak long-term sustainability |
| Flat-rate + caps (quota-based) | Monthly fee + daily/weekly limits | Predictable costs | Definitions can get complex across features/models |
| Flat-rate + overages | Monthly fee + metered overage | Recover cost + fairness | Budget overrun risk; misconfiguration can cause bill spikes |
| Usage-based | Tokens/credits | Price closer to cost | Hard to estimate upfront; requires visibility and controls |
Across the market, the common path is: start with âeffectively unlimited,â add caps and/or overages, then move closer to full usage-based. Copilot is following that path: Premium requests were the âflat-rate + overageâ bridge, and June 2026 is the move to fully usage-based billing. Cursor jumped more directly to usage-based, while Windsurf partially ârebalancedâ toward quotasâbut the overall direction remains alignment to real compute cost.
What to consider before rollout
From seat-based thinking to value-based deployment
As flat-rate pricing fades, the question becomes less about âhow many seats?â and more about âwhere does this deliver measurable value?â For example: Does PR summarization cut review time? Do agents reduce repetitive refactor work? If you can point to specific high-value workflows, it becomes much easier to defend usage-based spend with ROI.
The opposite approachââgive everyone a seat and see what happensââcollides with usage-based billing. Light users still receive an allowance you may not fully utilize, while heavy users burn through credits and trigger overages. Thatâs how you end up with both weak outcomes and unpleasant bills.
After June 2026, Copilot Proâs $10/month effectively becomes $10/month worth of AI Credits. With inference costs rising quickly in the agent era, budgeting purely as âseats Ă monthly feeâ is no longer enough. Teams increasingly need cloud-style forecasting: âtask type Ă expected token spend Ă unit price,â plus guardrails.
Procurement and governance checklist
- Included allowances: whatâs metered, and at what reset cadence (monthly/weekly/daily)
- Overage pricing and spend caps: token/credit rates, and how flexible your limits can be
- Rate limits and SLA implications: how throttling impacts developer workflow
- BYOK (Bring Your Own Key): whether heavy users can shift to direct API billing
- Logs and auditability: who used what, and whether you can export via API
- Change management: how the vendor communicates billing changes (see Cursor June 2025, Windsurf March 2026, GitHub June 2026)
- Fallback behavior: whether âcheaper mode after you run outâ exists (Copilot removes this after June 2026)
- Linked cost centers: cross-product metering such as GitHub Actions minutes
Because Copilot plan details can change, confirm the latest terms on GitHubâs official pricing pages before making purchasing decisions. If you have access to GitHubâs early-May 2026 âpreview invoice,â use it to estimate spend based on your real usage patterns before June 1.
Need guardrails for usage-based AI spend?
If Copilot (and other coding assistants) now behave like metered cloud services, you need the same playbook: budgets, visibility, and governance. Talk to us about setting up usage controls, cost reporting, and rollout policies that wonât surprise your teamsâor finance.
Conclusion
LLM pricing is changing because modern usage is changing. Reasoning-heavy models, long-context prompts, and agentic workflows make âone flat monthly feeâ increasingly unworkable. GitHub has officially announced that Copilot will retire Premium requests and transition all plans to usage-based billing on June 1, 2026, using GitHub AI Credits (1 credit = $0.01 USD). Subscription prices remain the same, but the value is delivered as monthly credits; overages are billed based on token consumption. Cheap-model fallback is removed, and Copilot Code Review will also begin consuming GitHub Actions minutes.
Copilot isnât alone. Cursorâs June 2025 shift and Windsurfâs March 2026 move show the same structural trend across AI coding tools. Even as metering units differ, vendors are converging on a world where heavy usage must be explicitly paid forâand where â$20 Pro / ~$200 power tierâ pricing structures are increasingly common. Cursorâs backlash during its transition is a reminder that the biggest risk isnât just costâitâs poor communication and insufficient controls during a billing-model change.
For customers, the practical response is clear: (1) understand the metering unit, (2) put guardrails on overages, (3) operationalize rate-limit monitoring separately from budget monitoring, and (4) focus spend on workflows with measurable value. The end of flat-rate AI doesnât automatically mean worse ROIâit means you have to deploy AI with intent, not by default.