Youâve probably seen it: âI made my User-Agent look like Chrome, but I still get a 403.â Or: âI stopped using headless mode and it still blocks me.â Akamai, Cloudflare, and Imperva donât rely on HTTP headers alone. They combine signals from the network layer (TLS/HTTP2), the execution environment (JavaScript/device fingerprinting), and behavior over a sessionâthen continuously recalibrate those signals using large-scale, real-world traffic observations. This article breaks down whatâs actually being measured, and why âjust making requests look browser-likeâ rarely holds up in production. Akamai, Cloudflare, and Imperva detect scraping because they validate not only individual request traits, but also: âIs this client actually a real browser?â and âDoes this session look like a human flow?â They cross-check multiple layers of signals. A useful mental model is these three layers: Key point: These vendors rarely âwinâ with a single silver-bullet detector. Instead, they stack many weaker signals to reduce false positives while making evasion expensive. Cloudflare explicitly positions its bot classification as a combination of machine learning, behavioral analysis, and fingerprinting. Even if a scraper looks âHTTP-correct,â the TLS handshake and HTTP/2 behavior often reveals the underlying library/runtime. For many protected sites, this is the first major filter. Cloudflare documents that JA3/JA4 fingerprints can be used as part of its bot solutions. How Akamai, Cloudflare, and Imperva Detect Web Scraping (Anti-Bot Explained)
What you’ll learn
The short answer
The three detection layers
Network-layer fingerprints
Client-layer signals
Next comes consistency of the execution environment, mainly from JavaScript. This is where âdevice fingerprintingâ and âheadless/automation tellsâ show up.
- JS challenges / JS-based detection: return HTML first, execute JS, and feed execution results into detection
- Fingerprint consistency: alignment between UA, screen size, WebGL/Canvas, fonts, timezone, and other attributes
- Automation indicators: suspicious navigator properties, skewed event patterns, and execution-timing signatures
Warning: A pure HTTP client that never executes JavaScript leaves this layer blank. Anti-bot systems can treat âno JS telemetryâ as an elevated risk signal (with exceptions like documented APIs).
Cloudflare explains that JavaScript detections typically donât have enough data on the very first request; after an HTML response, Cloudflare injects/sets what it needs. Cloudflare also states that cf_clearance is required for JavaScript detections.
Behavior-layer scoring
Finally, session-level behavior tends to separate humans from automation. Scrapers often diverge from real user flowsâeven when each individual request looks plausible.
- Request rate patterns: fixed intervals, overly consistent pacing, excessive parallelism
- Unnatural navigation: missing referrers, skipping typical page â asset loading sequences (images/JS/CSS), jumping straight to deep URLs
- State inconsistencies: cookie/local storage/cache behavior that isnât browser-like
- Identity breaks: stable IP with changing fingerprint, or stable fingerprint spread across too many IPs/locations
Cloudflare states it classifies bots using a combination of machine learning, behavioral analysis, and fingerprinting. Akamaiâs Bot Manager messaging likewise emphasizes multi-layer detection, including behavioral analysis (AI/ML), browser/device fingerprinting, and anomaly detectionâamong other signals.
How Cloudflare works
Cloudflareâs bot defense starts with edge evaluation (ML/heuristics/behavior). When necessary, it adds JavaScript-derived signals for stronger confidence. The key technical pieces to understand are:
Separation of detection engines
Cloudflare documents a set of bot detection âengines.â In particular, its JSD (JavaScript Detections) engine is described as identifying malicious fingerprints associated with headless browsers and similar automation. Cloudflareâs documentation also describes how the __cf_bm cookie relates to smoothing bot score calculation.
Continuous evaluation via cookies
Cloudflare states that __cf_bm is set by Bot Management (and Bot Fight Mode) and contains information related to Cloudflareâs proprietary bot score. It also notes that when Anomaly Detection is enabled on Bot Management, the cookie includes a session identifier. In other words, Cloudflare strongly favors session continuity over one-off requests.
Prerequisites for JS detections
JavaScript detections often require an initial HTML request because the first request typically does not contain enough signal. This is why implementations that âonly hit the API endpointâ often struggle: they never establish the same client-layer evidence a normal browser session produces.
How Akamai works
Akamai Bot Manager publicly emphasizes a multi-layer strategy: behavioral analysis, fingerprinting, HTTP anomaly detection, and policy-driven responses. Akamai also highlights that client-side telemetry can be enabled by inserting a lightweight scriptâmaking behavior-layer signals much richer.
Stacking multiple detection layers
Akamaiâs public materials list signals such as AI/ML-based behavior analysis, browser/device fingerprinting, HTTP anomaly detection, automation/headless detection, and user interaction signals. Product materials also commonly highlight patterns like high request rates and protocol/HTTP inconsistencies as part of classification.
Collecting client-side signals
Akamai notes that client-side behavior telemetry can be enabled by inserting a lightweight script. This helps capture differences scrapers often canât reproduce wellâlike realistic event sequences and interaction-driven timing.
A structured detection framework
Akamaiâs TechDocs describes that Bot Manager provides multiple detection methods. In practice, teams typically map these into operational controls such as âcategory,â âscore,â and âactionâ (allow/monitor/challenge/deny) to tune outcomes over time.
How Imperva works
Imperva positions Advanced Bot Protection as using advanced techniques like behavior analysis and device fingerprinting. It also states it can build a fingerprint across 700+ âdimensions.â The key idea is that this is not a single parameterâitâs multivariate classification.
700+ dimensional fingerprints
Impervaâs product page states that it detects across 700+ dimensions to separate humans, good bots, and bad bots, producing fingerprints designed to hold up against sophisticated evasion. Put differently: âfixing headersâ isnât remotely enough when the model uses hundreds of correlated signals.
Combining behavior analysis with fingerprinting
Imperva also emphasizes using behavior analysis and device fingerprinting in a way that can block bots without constantly interrupting real users. This aligns with the âquiet classification in the backgroundâ model, rather than relying on CAPTCHAs for everything.
Comparing the three vendors
Most implementation details are black-boxed, but we can still compare the âcenter of gravityâ of each approach based on what they publicly document.
| Area | Cloudflare | Akamai | Imperva |
|---|---|---|---|
| Core strategy | ML + behavioral analysis + fingerprinting | Multi-layer detection + policy-driven response | High-dimensional fingerprinting + behavior analysis |
| JS-derived signals | JavaScript Detections, cf_clearance, etc. | Client telemetry via lightweight script insertion | States fingerprinting/behavior, but details are less explicit publicly |
| Network fingerprints | Explicitly documents JA3/JA4 availability | Explicitly documents HTTP anomaly detection (plus broader detection methods) | Emphasizes high-dimensional fingerprinting depth |
| Scoring / classification | Documents bot-score-related cookies and session handling | Publishes Bot Score model (0â100) | Emphasizes separating human vs good vs bad bots |
Common patterns that get scrapers flagged
Calling APIs directly without JS
If you never fetch HTML and only call JSON endpoints, you often miss JS-derived signals and struggle to build a browser-like session state. That can lower trust and trigger challenges, 403s, or rate limits.
TLS doesnât match real browsers
Many curl/requests-style clients and some automation runtimes produce TLS/HTTP2 fingerprints that do not match real browsers. Even with âperfectâ headers, the network layer can still say âthis isnât Chrome/Safari/Firefox.â
Navigation doesnât look human
Humans typically follow paths like search â list â detail â back. Scrapers often enumerate detail pages directly, with thin referrer context and fewer supporting asset requests. Behavior scoring tends to catch this.
Identity consistency breaks
Rotating only the IP, rotating only the fingerprint, or constantly discarding cookies can create contradictions. Anti-bot systems love contradictionsâbecause theyâre hard to explain as legitimate user behavior.
How to think about mitigation (without âbypassâ tactics)
This section focuses on practical design and operational responses, not on providing evasion steps. Always validate applicable laws, terms of service, and robots.txt before collecting data.
Start with legitimate access paths
Your best option is an official API, partner feed, export feature, or a commercial data offering. The stronger the anti-bot stack, the higher the long-term cost of unofficial scraping tends to be.
Reduce load and collection frequency
Minimize what you fetch, implement incremental updates, cache aggressively, and tune refresh intervals. âHigh volume + high frequencyâ increases both detection risk and your own operational burden.
Design for failure
Challenges/403/429 will happen. In production, what matters is having retry strategy, recovery paths, and instrumentation (log where the request failed, what changed, and how often it happens).
import time
import random
def backoff(attempt: int, base: float = 1.0, cap: float = 60.0) -> float:
"""Exponential backoff + jitter (not for evasion, but for retry control on failures)."""
exp = min(cap, base * (2 ** attempt))
return exp * (0.5 + random.random())
for attempt in range(6):
try:
# request() etc.
pass
except Exception:
time.sleep(backoff(attempt))If youâre defending your own site
If youâre on the protection side, the same three-layer model is a helpful checklist. In practice, you can often improve outcomes faster by focusing on operations and tuning, not just âadding one more control.â
- Use bot scores to apply staged controls on high-value flows (login/search/cart, etc.)
- Handle good bots (search engines, monitoring, partners) differently from abusive automation
- Donât jump from âallow everythingâ to âblock everything.â Start with observe â limited rollout â expand coverage
Official quote
Cloudflare Bot Management uses machine learning, behavioral analysis, and fingerprinting to accurately classify bots.
Need a scraping plan that survives anti-bot?
If youâre collecting data from sites protected by Akamai, Cloudflare, or Imperva, the hard part is usually operations: access paths, frequency, session design, and failure handling. We can help you define a realistic data collection architecture for your requirements.
Summary
- Akamai, Cloudflare, and Imperva can detect web scraping because they score stacked signals across the network layer, client layer, and behavior layer
- Cloudflare publicly documents elements like JA3/JA4 fingerprints, JavaScript Detections, and bot-related cookies such as
__cf_bmandcf_clearance - Akamai emphasizes multi-layer detection (behavior analysis, fingerprinting, HTTP anomalies) and client-side telemetry enabled via script insertion
- Imperva highlights high-dimensional (700+) fingerprints combined with behavioral analysis