Scraping

How Akamai, Cloudflare, and Imperva Detect Web Scraping (Anti-Bot Explained)

Learn how Akamai, Cloudflare, and Imperva detect web scraping using TLS/HTTP2 fingerprints, JS device signals, and session behavior scoring.

Ibuki Yamamoto
Ibuki Yamamoto
April 24, 2026 5min read

How Akamai, Cloudflare, and Imperva Detect Web Scraping (Anti-Bot Explained)

You’ve probably seen it: “I made my User-Agent look like Chrome, but I still get a 403.” Or: “I stopped using headless mode and it still blocks me.” Akamai, Cloudflare, and Imperva don’t rely on HTTP headers alone. They combine signals from the network layer (TLS/HTTP2), the execution environment (JavaScript/device fingerprinting), and behavior over a session—then continuously recalibrate those signals using large-scale, real-world traffic observations.

This article breaks down what’s actually being measured, and why “just making requests look browser-like” rarely holds up in production.

What you’ll learn

What You’ll Learn
  • The full picture of anti-bot signals (network/device/behavior)
  • How Akamai, Cloudflare, and Imperva differ in approach (based on public info)
  • Common patterns that trigger scraping detection—and how to think about mitigation

The short answer

Akamai, Cloudflare, and Imperva detect scraping because they validate not only individual request traits, but also: “Is this client actually a real browser?” and “Does this session look like a human flow?” They cross-check multiple layers of signals. A useful mental model is these three layers:

  • Network layer: TLS fingerprints (JA3/JA4, etc.), HTTP/2 quirks, abnormal request structure
  • Client layer: JavaScript-derived device/browser fingerprinting, headless/automation indicators
  • Behavior layer: human-like navigation and timing, plus consistency across a session

Key point: These vendors rarely “win” with a single silver-bullet detector. Instead, they stack many weaker signals to reduce false positives while making evasion expensive. Cloudflare explicitly positions its bot classification as a combination of machine learning, behavioral analysis, and fingerprinting.

The three detection layers

Network-layer fingerprints

Even if a scraper looks “HTTP-correct,” the TLS handshake and HTTP/2 behavior often reveals the underlying library/runtime. For many protected sites, this is the first major filter.

  • TLS fingerprinting: fingerprint ClientHello details like cipher suite order, extensions, ALPN, etc. (for example, JA3/JA4)
  • HTTP/2 fingerprinting: SETTINGS values, prioritization behavior, pseudo-header ordering, and other implementation quirks
  • HTTP anomalies: missing headers, unnatural ordering, contradictions in Accept* headers, inconsistent compression/cache behavior

Cloudflare documents that JA3/JA4 fingerprints can be used as part of its bot solutions.


Client-layer signals

Next comes consistency of the execution environment, mainly from JavaScript. This is where “device fingerprinting” and “headless/automation tells” show up.

  • JS challenges / JS-based detection: return HTML first, execute JS, and feed execution results into detection
  • Fingerprint consistency: alignment between UA, screen size, WebGL/Canvas, fonts, timezone, and other attributes
  • Automation indicators: suspicious navigator properties, skewed event patterns, and execution-timing signatures

Warning: A pure HTTP client that never executes JavaScript leaves this layer blank. Anti-bot systems can treat “no JS telemetry” as an elevated risk signal (with exceptions like documented APIs).

Cloudflare explains that JavaScript detections typically don’t have enough data on the very first request; after an HTML response, Cloudflare injects/sets what it needs. Cloudflare also states that cf_clearance is required for JavaScript detections.


Behavior-layer scoring

Finally, session-level behavior tends to separate humans from automation. Scrapers often diverge from real user flows—even when each individual request looks plausible.

  • Request rate patterns: fixed intervals, overly consistent pacing, excessive parallelism
  • Unnatural navigation: missing referrers, skipping typical page → asset loading sequences (images/JS/CSS), jumping straight to deep URLs
  • State inconsistencies: cookie/local storage/cache behavior that isn’t browser-like
  • Identity breaks: stable IP with changing fingerprint, or stable fingerprint spread across too many IPs/locations

Cloudflare states it classifies bots using a combination of machine learning, behavioral analysis, and fingerprinting. Akamai’s Bot Manager messaging likewise emphasizes multi-layer detection, including behavioral analysis (AI/ML), browser/device fingerprinting, and anomaly detection—among other signals.


How Cloudflare works

Cloudflare’s bot defense starts with edge evaluation (ML/heuristics/behavior). When necessary, it adds JavaScript-derived signals for stronger confidence. The key technical pieces to understand are:

Separation of detection engines

Cloudflare documents a set of bot detection “engines.” In particular, its JSD (JavaScript Detections) engine is described as identifying malicious fingerprints associated with headless browsers and similar automation. Cloudflare’s documentation also describes how the __cf_bm cookie relates to smoothing bot score calculation.


Continuous evaluation via cookies

Cloudflare states that __cf_bm is set by Bot Management (and Bot Fight Mode) and contains information related to Cloudflare’s proprietary bot score. It also notes that when Anomaly Detection is enabled on Bot Management, the cookie includes a session identifier. In other words, Cloudflare strongly favors session continuity over one-off requests.


Prerequisites for JS detections

JavaScript detections often require an initial HTML request because the first request typically does not contain enough signal. This is why implementations that “only hit the API endpoint” often struggle: they never establish the same client-layer evidence a normal browser session produces.


How Akamai works

Akamai Bot Manager publicly emphasizes a multi-layer strategy: behavioral analysis, fingerprinting, HTTP anomaly detection, and policy-driven responses. Akamai also highlights that client-side telemetry can be enabled by inserting a lightweight script—making behavior-layer signals much richer.

Stacking multiple detection layers

Akamai’s public materials list signals such as AI/ML-based behavior analysis, browser/device fingerprinting, HTTP anomaly detection, automation/headless detection, and user interaction signals. Product materials also commonly highlight patterns like high request rates and protocol/HTTP inconsistencies as part of classification.


Collecting client-side signals

Akamai notes that client-side behavior telemetry can be enabled by inserting a lightweight script. This helps capture differences scrapers often can’t reproduce well—like realistic event sequences and interaction-driven timing.


A structured detection framework

Akamai’s TechDocs describes that Bot Manager provides multiple detection methods. In practice, teams typically map these into operational controls such as “category,” “score,” and “action” (allow/monitor/challenge/deny) to tune outcomes over time.


How Imperva works

Imperva positions Advanced Bot Protection as using advanced techniques like behavior analysis and device fingerprinting. It also states it can build a fingerprint across 700+ “dimensions.” The key idea is that this is not a single parameter—it’s multivariate classification.

700+ dimensional fingerprints

Imperva’s product page states that it detects across 700+ dimensions to separate humans, good bots, and bad bots, producing fingerprints designed to hold up against sophisticated evasion. Put differently: “fixing headers” isn’t remotely enough when the model uses hundreds of correlated signals.


Combining behavior analysis with fingerprinting

Imperva also emphasizes using behavior analysis and device fingerprinting in a way that can block bots without constantly interrupting real users. This aligns with the “quiet classification in the background” model, rather than relying on CAPTCHAs for everything.


Comparing the three vendors

Most implementation details are black-boxed, but we can still compare the “center of gravity” of each approach based on what they publicly document.

Area Cloudflare Akamai Imperva
Core strategy ML + behavioral analysis + fingerprinting Multi-layer detection + policy-driven response High-dimensional fingerprinting + behavior analysis
JS-derived signals JavaScript Detections, cf_clearance, etc. Client telemetry via lightweight script insertion States fingerprinting/behavior, but details are less explicit publicly
Network fingerprints Explicitly documents JA3/JA4 availability Explicitly documents HTTP anomaly detection (plus broader detection methods) Emphasizes high-dimensional fingerprinting depth
Scoring / classification Documents bot-score-related cookies and session handling Publishes Bot Score model (0–100) Emphasizes separating human vs good vs bad bots


Common patterns that get scrapers flagged

Calling APIs directly without JS

If you never fetch HTML and only call JSON endpoints, you often miss JS-derived signals and struggle to build a browser-like session state. That can lower trust and trigger challenges, 403s, or rate limits.

TLS doesn’t match real browsers

Many curl/requests-style clients and some automation runtimes produce TLS/HTTP2 fingerprints that do not match real browsers. Even with “perfect” headers, the network layer can still say “this isn’t Chrome/Safari/Firefox.”

Navigation doesn’t look human

Humans typically follow paths like search → list → detail → back. Scrapers often enumerate detail pages directly, with thin referrer context and fewer supporting asset requests. Behavior scoring tends to catch this.

Identity consistency breaks

Rotating only the IP, rotating only the fingerprint, or constantly discarding cookies can create contradictions. Anti-bot systems love contradictions—because they’re hard to explain as legitimate user behavior.

How to think about mitigation (without “bypass” tactics)

This section focuses on practical design and operational responses, not on providing evasion steps. Always validate applicable laws, terms of service, and robots.txt before collecting data.

Start with legitimate access paths

Your best option is an official API, partner feed, export feature, or a commercial data offering. The stronger the anti-bot stack, the higher the long-term cost of unofficial scraping tends to be.

Reduce load and collection frequency

Minimize what you fetch, implement incremental updates, cache aggressively, and tune refresh intervals. “High volume + high frequency” increases both detection risk and your own operational burden.

Design for failure

Challenges/403/429 will happen. In production, what matters is having retry strategy, recovery paths, and instrumentation (log where the request failed, what changed, and how often it happens).

import time
import random

def backoff(attempt: int, base: float = 1.0, cap: float = 60.0) -> float:
    """Exponential backoff + jitter (not for evasion, but for retry control on failures)."""
    exp = min(cap, base * (2 ** attempt))
    return exp * (0.5 + random.random())

for attempt in range(6):
    try:
        # request() etc.
        pass
    except Exception:
        time.sleep(backoff(attempt))

If you’re defending your own site

If you’re on the protection side, the same three-layer model is a helpful checklist. In practice, you can often improve outcomes faster by focusing on operations and tuning, not just “adding one more control.”

  • Use bot scores to apply staged controls on high-value flows (login/search/cart, etc.)
  • Handle good bots (search engines, monitoring, partners) differently from abusive automation
  • Don’t jump from “allow everything” to “block everything.” Start with observe → limited rollout → expand coverage

Official quote

Cloudflare Bot Management uses machine learning, behavioral analysis, and fingerprinting to accurately classify bots.


Need a scraping plan that survives anti-bot?

If you’re collecting data from sites protected by Akamai, Cloudflare, or Imperva, the hard part is usually operations: access paths, frequency, session design, and failure handling. We can help you define a realistic data collection architecture for your requirements.

Contact UsFeel free to reach out for scraping consultations and quotes
Get in Touch

Summary

  • Akamai, Cloudflare, and Imperva can detect web scraping because they score stacked signals across the network layer, client layer, and behavior layer
  • Cloudflare publicly documents elements like JA3/JA4 fingerprints, JavaScript Detections, and bot-related cookies such as __cf_bm and cf_clearance
  • Akamai emphasizes multi-layer detection (behavior analysis, fingerprinting, HTTP anomalies) and client-side telemetry enabled via script insertion
  • Imperva highlights high-dimensional (700+) fingerprints combined with behavioral analysis

About the Author

Ibuki Yamamoto
Ibuki Yamamoto

Web scraping engineer with over 10 years of practical experience, having worked on numerous large-scale data collection projects. Specializes in Python and JavaScript, sharing practical scraping techniques in technical blogs.

Leave it to the
Data Collection Professionals

Our professional team with over 100 million data collection records annually solves all challenges including large-scale scraping and anti-bot measures.

100M+
Annual Data Collection
24/7
Uptime
High Quality
Data Quality