AutomationScraping

What is “429 Too Many Requests”? Causes and solutions

Learn what HTTP 429 Too Many Requests means, how to read Retry-After and rate-limit headers, and implement safe Python backoff for APIs and scraping.

Ibuki Yamamoto
Ibuki Yamamoto
May 22, 2026 6min read

HTTP status code 429 Too Many Requests means you’re sending requests faster than the server (or an upstream gateway/CDN/WAF) allows. It’s common in API integrations, web scraping, and batch jobs. If your retry logic doesn’t handle wait time, retry limits, and request distribution correctly, you’ll usually just loop into the same failure.
This guide breaks down the most common 429 patterns and the most practical fixes on both the client and server side—with code you can copy into production.

What You’ll Learn
  • What 429 Too Many Requests means (and when it’s returned)
  • How to tell 429 vs 503 vs 403 apart
  • How to interpret Retry-After and X-RateLimit-* headers
  • Exponential backoff in Python (hand-rolled, tenacity, and backoff)
  • How to design scrapers to avoid 429—and what to do when it doesn’t go away

What is 429 Too Many Requests?

Bottom line: a 429 response is the server telling you “you’re making too many requests.” This isn’t a random server outage—it’s intentional throttling (rate limiting). In most cases, you can fix it from the client side.
The three fundamentals are:
(1) follow the response’s Retry-After header and wait,
(2) reduce request rate and concurrency, and
(3) retry with exponential backoff + jitter
.

More precisely, 429 Too Many Requests is an HTTP client error returned when a client sends too many requests within a given time window. The typical trigger is a rate limit, where the server (or an upstream layer) deliberately rejects requests to protect capacity.

MDN explains that 429 is returned when a client has sent too many requests in a given amount of time, and that the response may include a Retry-After header indicating how long to wait before retrying. 429 was added in RFC 6585 (published April 2012).

“429 Too Many Requests” indicates that the user has sent too many requests in a given amount of time and might include a “Retry-After” header to indicate how long to wait before retrying.

Telling 429 vs 503 vs 403 apart

These statuses can all look like “my requests aren’t getting through,” but they mean very different things. If you classify the failure correctly up front, the fix usually becomes obvious.

Status Typical cause What it implies What to do on the client
429 Too Many Requests Your request rate exceeded a limit Throttling applied per client (IP/key/user/endpoint) Slow down and implement a proper retry strategy
503 Service Unavailable Server overload or maintenance Service-side capacity issue affecting many clients Wait, then retry (with backoff)
403 Forbidden Missing permission or access denied (including bot detection) Policy/authorization problem Fix auth, review User-Agent/headers, confirm ToS

429 and 503 are generally “eventually recoverable” if you wait and retry appropriately. 403 usually isn’t. If you keep getting 403, spending effort on more aggressive retries is typically wasted—focus on authentication, headers, IP reputation, and the site’s access rules instead.

Common causes

Exceeding a rate limit

The most common case is exceeding a quota like “N requests per second” or “N requests per minute.” Sometimes this is documented as an API constraint. Other times it’s enforced by an upstream layer (for example, a CDN/WAF such as Cloudflare) and applied per IP.

Too much concurrency

Even if your average request rate looks low, a high burst of parallel requests (too many concurrent connections) can trigger 429. Typical culprits include launching a large batch of parallel fetches without a queue, letting threads/async tasks grow without bounds, or failing to cap per-host concurrency.

Infinite “retry immediately” loops

Retrying instantly on failure looks a lot like abusive traffic from the server’s perspective. It often triggers tighter throttling—or temporary blocks—making the problem worse.

Shared egress IPs (NAT) and noisy neighbors

In office networks, cloud NAT setups, or serverless environments with shared outbound IPs, other workloads can consume the same rate limit. Your code may be fine, but the combined traffic still crosses the threshold.

Warning: a 429 is not always “your code’s fault.” If limits are per IP or per account, other jobs sharing the same egress can trigger it.

What to check first

Response headers (Retry-After and X-RateLimit-*)

When you get a 429, check the response headers before changing anything else. Many services tell you exactly when to retry or how close you are to the limit. Ignoring that and using a fixed sleep() is basically throwing away the best available signal.

Retry-After tells the client how long to wait before making a follow-up request. Per MDN, it can be returned in 429 responses and indicates the delay before retrying. It comes in two formats:

  • Seconds: Retry-After: 120 (retry after 120 seconds)
  • HTTP-date: Retry-After: Thu, 16 Nov 2026 10:00:00 GMT (retry at the specified time)

In production, parse and support both formats. Seconds are common in APIs and gateways/CDNs; date-form values show up frequently around planned maintenance windows.

You’ll also often see X-RateLimit-* headers. They’re not standardized in an RFC, but they’re widely used in practice:

  • X-RateLimit-Limit: maximum requests allowed in the time window
  • X-RateLimit-Remaining: requests left before you hit the limit
  • X-RateLimit-Reset: when the quota resets (either a Unix timestamp or “seconds until reset,” depending on the API)

These headers are most useful before you hit 429. If Remaining drops into the single digits, start slowing down proactively. Adaptive throttling like this often prevents 429 entirely.

What the limit is keyed on

The best fix depends on whether the limit is per IP, per API key/user, per endpoint, or a combination. If the API documentation defines quotas, match the documented rules rather than guessing.

Your logging granularity

Log at least: timestamp, endpoint, status code, request rate, and concurrency. If you can answer “when did it start, which endpoint, how many parallel workers,” root cause analysis gets dramatically faster.

Client-side mitigations

Exponential backoff

The most common mitigation for 429 is exponential backoff: increase the delay after each failed attempt (often doubling), and stop after a fixed number of retries. To avoid a “thundering herd” where many clients retry at the same time, add jitter (randomness) to the wait time.

Prefer Retry-After when available

If the server provides Retry-After, treat it as authoritative. The HTTP Semantics specification describes Retry-After as a server signal for how long the user agent ought to wait before a follow-up request.

Cap concurrency

Limit threads, async tasks, and queue consumers to control maximum in-flight requests. For web scraping in particular, keeping per-host concurrency low is one of the simplest ways to stabilize runs.

Cache and fetch only what changed

Don’t fetch the same data repeatedly. If the target supports conditional requests like ETag/If-None-Match or Last-Modified/If-Modified-Since, you can reduce traffic dramatically.

Example: hand-rolled retry logic

If you implement it without external libraries, a practical baseline is: follow Retry-After when present, otherwise use exponential backoff with jitter.

import random
import time
import requests

MAX_RETRIES = 6
BASE_DELAY = 1.0  # seconds

url = "https://example.com/api"

for attempt in range(MAX_RETRIES):
    r = requests.get(url, timeout=15)
    if r.status_code != 429:
        r.raise_for_status()
        print(r.json())
        break

    # Prefer Retry-After when provided
    retry_after = r.headers.get("Retry-After")
    if retry_after is not None:
        try:
            wait = float(retry_after)
        except ValueError:
            # Date formats are simplified here (parse in production)
            wait = BASE_DELAY
    else:
        # Exponential backoff + jitter
        wait = BASE_DELAY * (2 ** attempt) + random.uniform(0, 0.25)

    time.sleep(wait)
else:
    raise RuntimeError("Aborting because 429 persisted")

Example: using tenacity

In real projects, retry behavior is often clearer and safer when expressed declaratively. tenacity lets you define exponential backoff + jitter and a retry cap with a single decorator.

import requests
from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_exception_type

class RateLimited(Exception):
    pass

@retry(
    wait=wait_random_exponential(multiplier=1, max=60),
    stop=stop_after_attempt(5),
    retry=retry_if_exception_type(RateLimited),
    reraise=True,
)
def fetch(url):
    r = requests.get(url, timeout=15)
    if r.status_code == 429:
        raise RateLimited(f"429 for {url}")
    r.raise_for_status()
    return r

wait_random_exponential returns an exponentially increasing delay with randomization, capped at 60 seconds. The custom RateLimited exception keeps retries scoped to 429 only—other errors fail fast instead of masking real issues.

Example: using backoff

Another good option is backoff. It integrates naturally with requests exceptions and supports a giveup condition to express “don’t retry certain statuses.”

import backoff
import requests

def is_fatal(e):
    # Retry only 429 among 4xx responses. Give up immediately on other 4xx.
    if e.response is None:
        return False
    status = e.response.status_code
    return 400 <= status < 500 and status != 429

@backoff.on_exception(
    backoff.expo,
    requests.exceptions.RequestException,
    max_tries=8,
    max_time=300,
    giveup=is_fatal,
)
def fetch(url):
    r = requests.get(url, timeout=15)
    r.raise_for_status()
    return r

Both tenacity and backoff can solve the core problem. If you need fine-grained control (pluggable strategies, hooks before/after retries), tenacity is often the better fit. If you want a minimal, clean declaration, backoff is usually enough.

Practical baseline: design retries with a max attempts, a max backoff, and an overall timeout. Avoid infinite retries.

Server-side mitigations

Design rate limits that are diagnosable

If your service returns 429, make it easy for clients to recover. At minimum, expose which limit was hit via logs, documentation, or response metadata. The fewer guesses clients have to make, the more stable your overall traffic will be.

Return Retry-After and X-RateLimit-*

When possible, return Retry-After and X-RateLimit-* so clients can decide when to retry and how close they are to the quota. Better-behaved clients reduce load and improve system stability.

Use 429 vs 503 intentionally

If you’re explicitly rejecting requests due to rate limiting, 429 is the clearest signal. If the service is overloaded or temporarily unable to process requests for broader reasons, 503 may communicate the situation better.

Double-check WAF / CDN settings

If your application isn’t intentionally returning 429 but clients still see it, an upstream WAF/CDN (for example, Cloudflare rate limiting) may be generating 429 responses. Check dashboard logs for overly strict thresholds or bot rules that are catching legitimate traffic.

Quick comparison (what to fix, fast)

In production, you often just need to answer: “What should we change first?” Here’s a simple mapping from symptoms to the highest-leverage action.

Situation First action Expected impact
429 with Retry-After Wait exactly as instructed by Retry-After Fastest stabilization
X-RateLimit-Remaining is low Slow down proactively Prevents 429 before it happens
High concurrency Cap parallelism / in-flight requests Reduces burst load
Retries are too aggressive Exponential backoff + jitter Strong protection against repeat incidents
Suspected shared egress IP Separate egress IPs / review routing Makes root causes visible
Unknown limits Confirm provider quota rules Eliminates trial-and-error

Web scraping tips

429 is especially common in web scraping. Many sites don’t use a single threshold; they combine signals like bursty access patterns, IP concentration, and “non-browser-like” headers. That means waiting longer may help—but it may not fully solve the issue.

Spread out request timing

Jittered intervals are usually safer than fixed delays because they reduce accidental synchronization and “obvious automation” patterns.

Request only what you truly need

Repeatedly fetching search result pages, crawling too deep into pagination, or re-downloading unchanged pages are easy ways to trigger throttling. Prefer differential fetches, pre-filter target URLs, and avoid redundant hits.

Pick concurrency based on measurement

There’s no universal “correct” concurrency. Start conservative (for example, one worker with a one-second delay), increase gradually, and treat the last stable point before you see 429s or rising latency as your operational limit.
Also note: a concurrency level that works at night may fail during peak traffic. Set limits based on your busiest expected window.

Terms of service and robots.txt

Even if something is technically feasible, scraping in violation of a site’s terms or robots.txt can create operational and legal risk. For business use cases, consider whether an official API exists and whether you need permission.

When it still won’t go away: decision points

If you’ve implemented the mitigations above and 429 (or 403) still persists, you’re likely past the “retry harder” phase. This is usually the signal to revisit assumptions and system design.

You may be blocked for policy reasons

If you lower the rate significantly and still see persistent 429 or 403, the site may be intentionally blocking your source. Useful signals include:

  • No recovery after waiting: typical rate limits lift within minutes to hours; blocks may last days or indefinitely
  • It persists at very low volume: if it still fails at ~1 request/minute, it’s likely not a simple quota issue
  • Other IPs/environments work: if only your IP fails, you may be on an IP-based blocklist

At this point, spending more time on evasive technical workarounds is often less effective than re-checking the site’s terms and contacting the operator via legitimate channels when appropriate.

Consider switching to a commercial API

If the target offers a paid API, decide based on:

  • If your required volume fits the API plan, switching usually wins on reliability, legal risk, and operational overhead
  • If the API doesn’t cover some fields you need (certain search filters, older data, etc.), consider a hybrid: API for core data + limited scraping for gaps
  • If pricing doesn’t work, adjust requirements first (lower frequency, narrower scope) before investing in fragile scraping

Continuing to scrape when a suitable API exists is usually not a technical decision—it’s a business design decision. The “cheap” option often becomes more expensive once you include compliance risk, anti-bot escalation, and maintenance time.

Need production-grade 429 handling?

If your API client or scraper keeps hitting 429s, we can help you design safer throttling, retries (backoff + jitter), concurrency controls, monitoring, and compliance-friendly data collection.

Contact UsFeel free to reach out for scraping consultations and quotes
Get in Touch

Summary

  • 429 Too Many Requests means you’re sending too many requests in a short time window
  • It’s different from 503 (service overload) and 403 (policy/permission). Classify first
  • Always read response headers like Retry-After and X-RateLimit-*
  • Prevent repeat incidents with exponential backoff + jitter, concurrency caps, and caching
  • In Python, tenacity and backoff make retries much easier to express safely
  • Consider non-code causes like shared egress IPs and WAF/CDN rate limits
  • If it persists at low rates, reassess: policy blocks and/or switching to an official API

For exact header meaning and protocol intent, keep the official references close—they speed up debugging and reduce guesswork.

This article reflects information current as of May 2026. HTTP standards, service-specific rate limits, and library behavior can change—confirm details in the latest official documentation before using in production.

About the Author

Ibuki Yamamoto
Ibuki Yamamoto

Web scraping engineer with over 10 years of practical experience, having worked on numerous large-scale data collection projects. Specializes in Python and JavaScript, sharing practical scraping techniques in technical blogs.

Leave it to the
Data Collection Professionals

Our professional team with over 100 million data collection records annually solves all challenges including large-scale scraping and anti-bot measures.

100M+
Annual Data Collection
24/7
Uptime
High Quality
Data Quality