HomeBlogAllowed Bot List on Shopify — Protect SEO While Blocking Malicious Bots

Tutorial2026-05-208 min read

Allowed Bot List on Shopify — Protect SEO While Blocking Malicious Bots

Aggressive bot blocking damages SEO. The fix is an explicit allowed-bot list. Here's the exact list of crawlers to whitelist and how to verify them.

A surprising number of "the fraud app broke our SEO" stories trace to the same root cause: the merchant enabled aggressive bot blocking, didn't configure an explicit allowed-bot list, and the fraud system started returning blocking pages to Googlebot, Bingbot, and social-platform crawlers along with the actual bad traffic.

Search rankings drifted downward. The merchant didn't notice for weeks. Recovery took months.

The fix is straightforward: maintain an explicit allowed-bot list that ensures legitimate crawlers can always reach your store, regardless of other fraud rules. Most major fraud apps support this. Configuring it should be one of the first things you do, not one of the last.

Why an allowed-bot list is non-optional

If you do nothing else in bot defense, do this. The reason is direct:

Search engines crawl from identifiable infrastructure. Googlebot runs from Google's data centers in specific US-based ranges. Bingbot runs from Microsoft's. When you block "all traffic from data centers" or "all anonymization services," you'll catch them along with the bad traffic.

Social platforms fetch your URLs from their infrastructure. Facebook's open-graph crawler fetches pages when someone shares a link to generate a preview. LinkedIn's, Twitter/X's, and Pinterest's crawlers do the same. These all run from data-center infrastructure that fraud systems might treat as suspicious.

SEO and analytics tools crawl from their own infrastructure. Ahrefs, Semrush, Screaming Frog, Moz — your own diagnostic tools. They run from cloud infrastructure overlapping with IP ranges fraud systems classify as bot-like.

Without explicit allow-listing, every aggressive bot-blocking rule degrades one of these channels. The damage is rarely immediate but compounds: lower indexing, broken social previews, incomplete analytics, missed marketing opportunities.

What should be on the allowed-bot list

A working list covers four categories:

1. Search engine crawlers

The big four search engines you almost certainly want crawling your store:

Googlebot (Google Search)
Googlebot-Image (Google Images, separate user-agent)
Googlebot-News (if your store has news/blog content)
Bingbot (Microsoft Bing)
DuckDuckBot (DuckDuckGo)
YandexBot (Yandex, primarily for Russian-language markets)
Baiduspider (Baidu, primarily for Chinese-language markets)
AppleBot (Apple Spotlight Search and Siri suggestions)

Verify these by user-agent AND by reverse-DNS lookup to the search engine's official domain. Impersonators frequently claim to be Googlebot; verification distinguishes real from fake.

2. Social-platform crawlers

The crawlers that generate link previews when your URLs are shared:

facebookexternalhit (Facebook and Instagram previews)
LinkedInBot (LinkedIn previews)
Twitterbot (X / Twitter card generation)
PinterestBot (Pinterest Rich Pins)
WhatsApp (WhatsApp link previews)
TelegramBot (Telegram link previews)
Slackbot-LinkExpanding (Slack message link previews)

These run continuously every time someone shares your URL. Without allow-listing, shared links break or look ugly.

3. SEO and analytics tools

Tools you or your team use to monitor your own SEO:

AhrefsBot
SemrushBot
MJ12bot (Majestic SEO)
DataForSEO (and sub-bots)
DotBot (Moz)
Screaming Frog SEO Spider (when you run it)

You may want to rate-limit rather than fully allow, because they can crawl aggressively. Allow with caps.

4. Ecommerce / marketplace crawlers

If you syndicate products or appear in shopping aggregators:

GoogleOther (Google's general-purpose crawler)
AdsBot-Google (Google Ads landing-page checks)
Smartling, Phrase, Lokalise (if you use localization tools)
Comparison-engine bots (specific to your category)

Varies by store type and partnerships. Add the specific ones relevant to your business.

How to verify a bot is who it claims to be

User-agent strings are trivially spoofable. A fraudster's scraper can claim User-Agent: Googlebot/2.1 and be a Python script running on AWS. Without verification, a permissive allow-list becomes a hole in your bot defense.

The standard verification: reverse DNS lookup combined with forward DNS lookup.

For a request claiming to be Googlebot:

Get the request's IP address
Reverse-DNS-lookup the IP — should resolve to a hostname like crawl-66-249-66-1.googlebot.com
Forward-DNS-lookup that hostname — should resolve back to the same IP
If both match and the hostname is in Google's official crawler-domain list, the request is genuinely Googlebot

Each major search engine publishes verification criteria:

Search engine	Verified hostname pattern
Google	`.googlebot.com` or `.google.com`
Microsoft	`*.search.msn.com`
Yandex	(their published list)
Baidu	`.baidu.com` or `.baidu.jp`

Most fraud apps support this verification when you enable "verify allowed bots." If yours doesn't, the verification is doable at the WAF level or in your store's middleware.

A small operational note: reverse DNS lookups are slow. Caching verified IPs aggressively (with reasonable TTL) is important for performance.

How to add bots to your allowed list

Most fraud apps offer three management methods:

By user-agent pattern. Allow requests whose user-agent matches a known bot signature. Easy to configure, easy to spoof — pair with verification.

By IP range. Allow requests from specific published IP ranges (Google publishes their crawler IPs in their documentation). Less brittle than user-agent matching; better for high-confidence whitelisting.

By verified hostname. Allow requests where verified reverse-DNS hostname matches a trusted domain. Most secure; pair with caching to manage latency.

For most Shopify stores: combine all three. User-agent for matching, IP range for sanity-check, hostname verification for security.

Configuring it in Shieldy

Shieldy Fraud Filter supports allowed-bot list out of the box:

Settings → Allowed Bot List
Toggle on the bots you want to allow (Googlebot, Bingbot, Facebook, LinkedIn, Twitter/X, Pinterest, Ahrefs, Semrush, etc. are pre-listed)
Reverse-DNS verification is automatic and cached (24-hour TTL)
Custom additions: add your own user-agent patterns or IP ranges for specific tools

Once enabled, allowed bots always reach your store regardless of other fraud rules (country blocks, VPN/proxy detection, IP-range bans). This is critical: configure the allowed-bot list before any aggressive blocking rules.

Common mistakes

A few patterns to avoid:

Allowing by user-agent only. Without verification, you've allowed anyone willing to spoof the user-agent. Most fraud apps default to verifying — check yours.

Forgetting niche but important crawlers. Apple's AppleBot drives Siri suggestions and Spotlight search. Missed crawlers cost specific channels.

Allowing aggressive crawlers without rate limits. Ahrefs, Semrush, and similar can hammer your store if unrestricted. Allow with rate limits.

Not updating the list. New search engines, social platforms, crawlers appear over time. A list from 2023 might be missing 2026 entrants. Quarterly review keeps it current.

Whitelisting "bots" generically. Some merchants allow anything bot-like to avoid the SEO problem. Defeats the purpose of bot blocking entirely. Specific whitelisting, not blanket permissiveness.

What if you've already damaged SEO

If you suspect over-aggressive fraud configuration has been degrading SEO:

Check Google Search Console

Look at "Index coverage." If pages show "Crawled - currently not indexed" or "Discovered - not indexed" at unusual rates, crawling is failing somewhere.

Run a Fetch as Google test

From Search Console, request a fetch of a representative product page. If it fails with a 403 or your blocking page, you've found the leak.

Review fraud-app logs

Search for requests from Googlebot user-agents. If you're seeing 403s in volume, you have a problem.

Configure the allowed bot list correctly

Enable verification, add missing crawlers, test from each major crawler perspective if possible.

Submit re-indexing requests

Once the block is fixed, request re-indexing of your most important pages through Search Console.

Recovery takes weeks to months — Google's crawl frequency doesn't instantly speed up because you fixed configuration. The earlier you catch the problem, the faster recovery.

A practical close

If you take one thing from this guide: enable the allowed-bot list before any other bot-blocking rule.

Shieldy ships with sensible defaults pre-enabled — Googlebot, Bingbot, major social platforms, top SEO tools all whitelisted out of the box. You can customize and add specific tools you use.

Total setup time: 30-60 minutes. Prevents almost all "fraud app broke my SEO" stories.

Protect your Shopify store today

Install Shieldy free — block fraud, bots, and VPNs in under 5 minutes.

Install on Shopify — Free

How to Block Headless Browsers on Shopify (Puppeteer, Selenium, Playwright)

Headless browsers power most modern scraping and bot fraud. They look like real browsers — to first inspection. Here's how to detect and block them in 2026.

Tutorial9 min read