HTTP status code 429 Too Many Requests means youâre sending requests faster than the server (or an upstream gateway/CDN/WAF) allows. Itâs common in API integrations, web scraping, and batch jobs. If your retry logic doesnât handle wait time, retry limits, and request distribution correctly, youâll usually just loop into the same failure. Bottom line: a 429 response is the server telling you âyouâre making too many requests.â This isnât a random server outageâitâs intentional throttling (rate limiting). In most cases, you can fix it from the client side. More precisely, 429 Too Many Requests is an HTTP client error returned when a client sends too many requests within a given time window. The typical trigger is a rate limit, where the server (or an upstream layer) deliberately rejects requests to protect capacity. MDN explains that 429 is returned when a client has sent too many requests in a given amount of time, and that the response may include a â429 Too Many Requestsâ indicates that the user has sent too many requests in a given amount of time and might include a âRetry-Afterâ header to indicate how long to wait before retrying.
This guide breaks down the most common 429 patterns and the most practical fixes on both the client and server sideâwith code you can copy into production.
Retry-After and X-RateLimit-* headerstenacity, and backoff)What is 429 Too Many Requests?
The three fundamentals are:
(1) follow the responseâs Retry-After header and wait,
(2) reduce request rate and concurrency, and
(3) retry with exponential backoff + jitter.
Retry-After header indicating how long to wait before retrying. 429 was added in RFC 6585 (published April 2012).
These statuses can all look like âmy requests arenât getting through,â but they mean very different things. If you classify the failure correctly up front, the fix usually becomes obvious. 429 and 503 are generally âeventually recoverableâ if you wait and retry appropriately. 403 usually isnât. If you keep getting 403, spending effort on more aggressive retries is typically wastedâfocus on authentication, headers, IP reputation, and the siteâs access rules instead. The most common case is exceeding a quota like âN requests per secondâ or âN requests per minute.â Sometimes this is documented as an API constraint. Other times itâs enforced by an upstream layer (for example, a CDN/WAF such as Cloudflare) and applied per IP. Even if your average request rate looks low, a high burst of parallel requests (too many concurrent connections) can trigger 429. Typical culprits include launching a large batch of parallel fetches without a queue, letting threads/async tasks grow without bounds, or failing to cap per-host concurrency. Retrying instantly on failure looks a lot like abusive traffic from the serverâs perspective. It often triggers tighter throttlingâor temporary blocksâmaking the problem worse. In office networks, cloud NAT setups, or serverless environments with shared outbound IPs, other workloads can consume the same rate limit. Your code may be fine, but the combined traffic still crosses the threshold. Warning: a 429 is not always âyour codeâs fault.â If limits are per IP or per account, other jobs sharing the same egress can trigger it. When you get a 429, check the response headers before changing anything else. Many services tell you exactly when to retry or how close you are to the limit. Ignoring that and using a fixed Retry-After tells the client how long to wait before making a follow-up request. Per MDN, it can be returned in 429 responses and indicates the delay before retrying. It comes in two formats: In production, parse and support both formats. Seconds are common in APIs and gateways/CDNs; date-form values show up frequently around planned maintenance windows. Telling 429 vs 503 vs 403 apart
Status
Typical cause
What it implies
What to do on the client
429 Too Many Requests
Your request rate exceeded a limit
Throttling applied per client (IP/key/user/endpoint)
Slow down and implement a proper retry strategy
503 Service Unavailable
Server overload or maintenance
Service-side capacity issue affecting many clients
Wait, then retry (with backoff)
403 Forbidden
Missing permission or access denied (including bot detection)
Policy/authorization problem
Fix auth, review User-Agent/headers, confirm ToS
Common causes
Exceeding a rate limit
Too much concurrency
Infinite âretry immediatelyâ loops
Shared egress IPs (NAT) and noisy neighbors
What to check first
Response headers (Retry-After and X-RateLimit-*)
sleep() is basically throwing away the best available signal.
Retry-After: 120 (retry after 120 seconds)Retry-After: Thu, 16 Nov 2026 10:00:00 GMT (retry at the specified time)
Youâll also often see X-RateLimit-* headers. Theyâre not standardized in an RFC, but theyâre widely used in practice:
X-RateLimit-Limit: maximum requests allowed in the time windowX-RateLimit-Remaining: requests left before you hit the limitX-RateLimit-Reset: when the quota resets (either a Unix timestamp or âseconds until reset,â depending on the API)
These headers are most useful before you hit 429. If Remaining drops into the single digits, start slowing down proactively. Adaptive throttling like this often prevents 429 entirely.
What the limit is keyed on
The best fix depends on whether the limit is per IP, per API key/user, per endpoint, or a combination. If the API documentation defines quotas, match the documented rules rather than guessing.
Your logging granularity
Log at least: timestamp, endpoint, status code, request rate, and concurrency. If you can answer âwhen did it start, which endpoint, how many parallel workers,â root cause analysis gets dramatically faster.
Client-side mitigations
Exponential backoff
The most common mitigation for 429 is exponential backoff: increase the delay after each failed attempt (often doubling), and stop after a fixed number of retries. To avoid a âthundering herdâ where many clients retry at the same time, add jitter (randomness) to the wait time.
Prefer Retry-After when available
If the server provides Retry-After, treat it as authoritative. The HTTP Semantics specification describes Retry-After as a server signal for how long the user agent ought to wait before a follow-up request.
Cap concurrency
Limit threads, async tasks, and queue consumers to control maximum in-flight requests. For web scraping in particular, keeping per-host concurrency low is one of the simplest ways to stabilize runs.
Cache and fetch only what changed
Donât fetch the same data repeatedly. If the target supports conditional requests like ETag/If-None-Match or Last-Modified/If-Modified-Since, you can reduce traffic dramatically.
Example: hand-rolled retry logic
If you implement it without external libraries, a practical baseline is: follow Retry-After when present, otherwise use exponential backoff with jitter.
import random
import time
import requests
MAX_RETRIES = 6
BASE_DELAY = 1.0 # seconds
url = "https://example.com/api"
for attempt in range(MAX_RETRIES):
r = requests.get(url, timeout=15)
if r.status_code != 429:
r.raise_for_status()
print(r.json())
break
# Prefer Retry-After when provided
retry_after = r.headers.get("Retry-After")
if retry_after is not None:
try:
wait = float(retry_after)
except ValueError:
# Date formats are simplified here (parse in production)
wait = BASE_DELAY
else:
# Exponential backoff + jitter
wait = BASE_DELAY * (2 ** attempt) + random.uniform(0, 0.25)
time.sleep(wait)
else:
raise RuntimeError("Aborting because 429 persisted")Example: using tenacity
In real projects, retry behavior is often clearer and safer when expressed declaratively. tenacity lets you define exponential backoff + jitter and a retry cap with a single decorator.
import requests
from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_exception_type
class RateLimited(Exception):
pass
@retry(
wait=wait_random_exponential(multiplier=1, max=60),
stop=stop_after_attempt(5),
retry=retry_if_exception_type(RateLimited),
reraise=True,
)
def fetch(url):
r = requests.get(url, timeout=15)
if r.status_code == 429:
raise RateLimited(f"429 for {url}")
r.raise_for_status()
return rwait_random_exponential returns an exponentially increasing delay with randomization, capped at 60 seconds. The custom RateLimited exception keeps retries scoped to 429 onlyâother errors fail fast instead of masking real issues.
Example: using backoff
Another good option is backoff. It integrates naturally with requests exceptions and supports a giveup condition to express âdonât retry certain statuses.â
import backoff
import requests
def is_fatal(e):
# Retry only 429 among 4xx responses. Give up immediately on other 4xx.
if e.response is None:
return False
status = e.response.status_code
return 400 <= status < 500 and status != 429
@backoff.on_exception(
backoff.expo,
requests.exceptions.RequestException,
max_tries=8,
max_time=300,
giveup=is_fatal,
)
def fetch(url):
r = requests.get(url, timeout=15)
r.raise_for_status()
return rBoth tenacity and backoff can solve the core problem. If you need fine-grained control (pluggable strategies, hooks before/after retries), tenacity is often the better fit. If you want a minimal, clean declaration, backoff is usually enough.
Practical baseline: design retries with a max attempts, a max backoff, and an overall timeout. Avoid infinite retries.
Server-side mitigations
Design rate limits that are diagnosable
If your service returns 429, make it easy for clients to recover. At minimum, expose which limit was hit via logs, documentation, or response metadata. The fewer guesses clients have to make, the more stable your overall traffic will be.
Return Retry-After and X-RateLimit-*
When possible, return Retry-After and X-RateLimit-* so clients can decide when to retry and how close they are to the quota. Better-behaved clients reduce load and improve system stability.
Use 429 vs 503 intentionally
If youâre explicitly rejecting requests due to rate limiting, 429 is the clearest signal. If the service is overloaded or temporarily unable to process requests for broader reasons, 503 may communicate the situation better.
Double-check WAF / CDN settings
If your application isnât intentionally returning 429 but clients still see it, an upstream WAF/CDN (for example, Cloudflare rate limiting) may be generating 429 responses. Check dashboard logs for overly strict thresholds or bot rules that are catching legitimate traffic.
Quick comparison (what to fix, fast)
In production, you often just need to answer: âWhat should we change first?â Hereâs a simple mapping from symptoms to the highest-leverage action.
| Situation | First action | Expected impact |
|---|---|---|
| 429 with Retry-After | Wait exactly as instructed by Retry-After | Fastest stabilization |
| X-RateLimit-Remaining is low | Slow down proactively | Prevents 429 before it happens |
| High concurrency | Cap parallelism / in-flight requests | Reduces burst load |
| Retries are too aggressive | Exponential backoff + jitter | Strong protection against repeat incidents |
| Suspected shared egress IP | Separate egress IPs / review routing | Makes root causes visible |
| Unknown limits | Confirm provider quota rules | Eliminates trial-and-error |
Web scraping tips
429 is especially common in web scraping. Many sites donât use a single threshold; they combine signals like bursty access patterns, IP concentration, and ânon-browser-likeâ headers. That means waiting longer may helpâbut it may not fully solve the issue.
Spread out request timing
Jittered intervals are usually safer than fixed delays because they reduce accidental synchronization and âobvious automationâ patterns.
Request only what you truly need
Repeatedly fetching search result pages, crawling too deep into pagination, or re-downloading unchanged pages are easy ways to trigger throttling. Prefer differential fetches, pre-filter target URLs, and avoid redundant hits.
Pick concurrency based on measurement
Thereâs no universal âcorrectâ concurrency. Start conservative (for example, one worker with a one-second delay), increase gradually, and treat the last stable point before you see 429s or rising latency as your operational limit.
Also note: a concurrency level that works at night may fail during peak traffic. Set limits based on your busiest expected window.
Terms of service and robots.txt
Even if something is technically feasible, scraping in violation of a siteâs terms or robots.txt can create operational and legal risk. For business use cases, consider whether an official API exists and whether you need permission.
When it still wonât go away: decision points
If youâve implemented the mitigations above and 429 (or 403) still persists, youâre likely past the âretry harderâ phase. This is usually the signal to revisit assumptions and system design.
You may be blocked for policy reasons
If you lower the rate significantly and still see persistent 429 or 403, the site may be intentionally blocking your source. Useful signals include:
- No recovery after waiting: typical rate limits lift within minutes to hours; blocks may last days or indefinitely
- It persists at very low volume: if it still fails at ~1 request/minute, itâs likely not a simple quota issue
- Other IPs/environments work: if only your IP fails, you may be on an IP-based blocklist
At this point, spending more time on evasive technical workarounds is often less effective than re-checking the siteâs terms and contacting the operator via legitimate channels when appropriate.
Consider switching to a commercial API
If the target offers a paid API, decide based on:
- If your required volume fits the API plan, switching usually wins on reliability, legal risk, and operational overhead
- If the API doesnât cover some fields you need (certain search filters, older data, etc.), consider a hybrid: API for core data + limited scraping for gaps
- If pricing doesnât work, adjust requirements first (lower frequency, narrower scope) before investing in fragile scraping
Continuing to scrape when a suitable API exists is usually not a technical decisionâitâs a business design decision. The âcheapâ option often becomes more expensive once you include compliance risk, anti-bot escalation, and maintenance time.
Need production-grade 429 handling?
If your API client or scraper keeps hitting 429s, we can help you design safer throttling, retries (backoff + jitter), concurrency controls, monitoring, and compliance-friendly data collection.
Summary
- 429 Too Many Requests means youâre sending too many requests in a short time window
- Itâs different from 503 (service overload) and 403 (policy/permission). Classify first
- Always read response headers like
Retry-AfterandX-RateLimit-* - Prevent repeat incidents with exponential backoff + jitter, concurrency caps, and caching
- In Python,
tenacityandbackoffmake retries much easier to express safely - Consider non-code causes like shared egress IPs and WAF/CDN rate limits
- If it persists at low rates, reassess: policy blocks and/or switching to an official API
For exact header meaning and protocol intent, keep the official references closeâthey speed up debugging and reduce guesswork.
This article reflects information current as of May 2026. HTTP standards, service-specific rate limits, and library behavior can changeâconfirm details in the latest official documentation before using in production.