How to Block Headless Browsers on Shopify (Puppeteer, Selenium, Playwright)
Headless browsers power most modern scraping and bot fraud. They look like real browsers — to first inspection. Here's how to detect and block them in 2026.

Almost every modern scraping, automation, and bot-fraud tool runs on a headless browser. The original generation of "bots" used direct HTTP libraries (Python requests, curl scripts) that were easy to detect from lack of JavaScript execution. The current generation uses real browsers — Chrome, Firefox — running without a visible window. They execute JavaScript, render pages, click buttons, submit forms. To naive detection, they look exactly like normal users.
This guide covers what headless browsers are, why they've become the default tool for sophisticated bots, and the detection techniques that work in 2026.
What headless browsers are
A headless browser is a regular browser engine — usually Chromium, but also Firefox and WebKit — running without a graphical UI. The engine is real: JavaScript executes, the DOM renders, network requests behave normally, cookies persist correctly. The only difference: no human is looking at the screen.
The major headless tools:
Puppeteer. Google's Node.js library for controlling Chrome/Chromium. The dominant tool for scraping and automation as of 2026 — well-supported, well-documented, well-maintained by Google itself.
Selenium. Older, broader cross-browser framework. Common in QA testing originally; widely used for scraping. Less performant than Puppeteer but supports more browsers.
Playwright. Microsoft's modern competitor to Puppeteer. Cross-browser (Chromium, Firefox, WebKit), increasingly popular for both legitimate testing and scraping.
Headless Chrome (direct). Chrome's built-in --headless mode. Usable without a framework. Common among smaller-scale scrapers.
Custom Chromium builds. Modified Chromium with anti-detection features baked in. Used by serious operations.
These tools all have legitimate uses — QA testing, end-to-end verification, accessibility scanning, screenshot generation. The same tools, pointed at someone else's store, become scraping and fraud infrastructure.
Why detection is harder than it sounds
A first-pass developer might think: "I'll check for the headless user-agent and block it." That works for the laziest scrapers and stops working for everyone else within minutes. Sophisticated scrapers know all the basic checks and trivially evade them.
The honest framing: bot detection raises the cost of scraping rather than eliminating it. Casual scrapers get caught easily. Serious scrapers can often get through, but the engineering cost goes up — sometimes enough that they target easier stores instead.
A headless browser running anti-detection extensions can spoof user-agent, set realistic browser fingerprints, fake mouse movements, randomize timing, rotate proxies, and pass most browser-side detection. The detection arms race is real.
What headless browsers actually leak
Even sophisticated headless browsers leak detectable signals if you know where to look:
User-agent strings
The default user-agent for Chrome in headless mode contains "HeadlessChrome." Almost every public-facing scraping tool changes this immediately, so checking catches only the worst attempts. Still worth doing — it's free.
navigator.webdriver flag
When a browser is controlled by automation (Selenium, Puppeteer, Playwright), navigator.webdriver is set to true. Real user-controlled browsers have it false or undefined. Sophisticated scrapers patch this out, but unpatched ones leak.
Browser feature inconsistencies
Headless browsers sometimes have inconsistent or missing features. Real Chrome has plugins; headless Chrome doesn't by default. Real Chrome has a chrome object with specific properties; headless Chrome has it with different shape. Real Chrome supports specific WebGL capabilities; headless Chrome supports a different set.
Each is checkable from JavaScript. Sophisticated scrapers patch each, but each patch is engineering work, and most scrapers have only patched the most obvious ones.
Timing patterns
Real users have inconsistent interaction timing — pause to read, re-read, hover before clicking. Headless browsers tend toward machine consistency: cart added exactly 250ms after page load, click on buy button within 80ms of cart completion. Behavioral-detection systems use these patterns.
Mouse movement
Real users move the mouse with characteristic micro-movements and corrections. A headless browser either doesn't move the mouse, or moves in straight lines between points. Mouse-movement entropy is a strong behavioral signal.
TLS fingerprinting (JA3/JA4)
The lowest-level signal. Different browser engines have different TLS handshake characteristics — cipher suites advertised, order, extensions supported. The JA3 / JA4 fingerprints capture this.
Real Chrome on Windows has a specific JA3. Puppeteer-controlled Chrome has a different JA3 even if browser version matches.
TLS fingerprinting is harder to evade because it happens at the protocol level, before JavaScript runs. Commercial bot-detection services use it heavily.
The five detection layers
A working detection setup combines several techniques:
Layer 1: User-agent filtering
Block requests with user-agents matching common headless patterns:
HeadlessChromePhantomJSchromedriverpython-requestsGo-http-clientcurl,wget
Apply at the edge (CDN/WAF) or as the first check in your fraud app.
Catches: Lazy scrapers, default-configured tools, hobbyist scripts.
Layer 2: JavaScript challenge
Serve a small JavaScript snippet performing browser-fingerprinting checks: navigator.webdriver, plugin count, language settings, WebGL capabilities, presence of the chrome object. If responses are inconsistent or match headless signatures, flag the session.
Catches: Moderately-sophisticated scrapers that haven't patched all detection points.
Layer 3: Behavioral analysis
Track session-level signals: time-on-page, mouse movement entropy, scroll behavior, keystroke timing. Compare to known-human patterns. Flag anomalies.
Catches: Scrapers that pass all static checks but have machine-like behavior.
Layer 4: TLS fingerprinting
At the WAF or CDN layer, capture TLS handshake characteristics and match against known browser-engine fingerprints. Flag requests with TLS signatures that don't match a real browser.
Catches: Sophisticated scrapers that pass JavaScript and behavior checks but use libraries with non-browser TLS signatures.
Layer 5: CAPTCHA on suspicion
For sessions flagged by other layers, serve a CAPTCHA. Modern CAPTCHAs (reCAPTCHA v3, hCaptcha) are designed to be invisible to most legitimate users while challenging bots.
Trade-off: any CAPTCHA degrades the legitimate experience somewhat. Apply selectively, only on sessions already flagged by velocity or fingerprint signals.
Catches: Bots that pass all static and behavior checks but can't pass interactive challenges.
What works for which scenarios
The right combination depends on what you're defending against:
| Threat level | Layers needed |
|---|---|
| Casual scraping (default tools, small-time) | Layers 1-2. Configures in an afternoon, catches 80%. |
| Commercial scraping services | Layers 1-3. Commercial scrapers patch obvious signals but skimp on behavioral mimicry. |
| Determined adversaries | Layers 1-5 + TLS fingerprinting. Required for stores under sustained scraping. |
| Sustained scalping bots | All layers + rate limiting + behavior-pattern matching. Scalpers are willing to invest more because per-event revenue is higher. |
False positives and how to handle them
Headless detection produces specific false-positive patterns:
- Customers with disabled JavaScript (NoScript, uBlock aggressive settings, privacy-focused browsers)
- Customers using accessibility tools (screen readers, automation aids)
- Mobile customers in battery-saver mode with unusual browser behavior
- Corporate browser-isolation services that route through isolated environments with automated characteristics
The mitigation: don't auto-block on suspicion. Use softer actions — serve a CAPTCHA, slow down the response, require additional authentication — that real customers can solve but bots usually can't.
Where Shieldy fits in
Shieldy Fraud Filter includes headless-browser detection out of the box:
- Auto-block spam bots (Settings → Mitigator → toggle on) — covers Layer 1 (user-agent) and Layer 2 (basic JavaScript checks)
- Spy Extensions Blocker — handles a specific category (Alihunter, PPSpy, Minea) that overlaps with headless detection
- IP/ASN-level filtering on major cloud providers — catches the bulk of headless traffic running on AWS/GCP/Azure/OVH/Hetzner
- Rate limiting at checkout — Layer 5-adjacent, catches velocity-based bot abuse
For stores under serious sustained attacks (scalping on limited drops, large-scale card testing), Shieldy can be layered with a dedicated bot-management service (Cloudflare Bot Management, DataDome, Akamai Bot Manager) that handles Layers 4-5 at the edge.
A practical first-week setup
If you're starting from "no specific bot defense":
- Enable Shieldy's Auto-block spam bots
- Apply rate limiting at 90 requests/minute per IP for non-whitelisted traffic
- Block ASN-level traffic from major cloud providers (AWS, GCP, Azure, OVH, DigitalOcean, Hetzner)
- Configure the Allowed Bot List with major search engines and social platforms
- Subscribe to threat-intelligence updates (Shieldy bundles this)
That setup catches a large fraction of casual scraping in an afternoon. The remaining defenses (Layers 3-5) come later if you observe sustained attacks.
A practical close
Headless browsers are the default infrastructure for sophisticated bots. Detection is layered, never perfect, but raises the cost of scraping enough to redirect most attacks to easier targets.
Shieldy covers the practical layers for most Shopify stores. Specialized bot-management services exist for high-volume targets. Start with Shieldy's defaults; escalate as your traffic shows you need to.
Protect your Shopify store today
Install Shieldy free — block fraud, bots, and VPNs in under 5 minutes.
Install on Shopify — Free


