Blackhole Pro automatically detects, traps, and blocks bad bots. There are some cases, however, where you may want to always allow access for a particular bot or IP address. This may come in handy for testing purposes, proxy servers, caching plugins, and so forth. This tutorial explains how to whitelist (always allow) bots based on their reported IP address and/or user agent.

Note: The information in this post applies to both free and pro versions of Blackhole. Check the Help tab on the plugin settings page for more details, tricks, and tips.

Whitelist bots by User Agent

Blackhole Pro blocks bad bots based on their reported IP address. To whitelist a bot by its reported user agent, visit the plugin setting “Whitelisted Bots”. There you can enter any strings that should never be blocked. That way you will never block important things like Google, Facebook, Twitter, etc. Any strings entered here will be matched against the reported user agent via regular expression.

Note: When adding user agents to the list, keep the names short, simple, and as unique as possible. Also do not include any special characters. For example, let’s say you want to block the following reported user agent:

Cygnus X-1 (Space Invaders) User Agent (Atari 2600), Mobile Business Edition, Est. 2020

Instead of adding that entire lengthy user agent to the whitelist setting, just pick the most unique, relevant part of the string, for example this:

Cygnus

..probably would be sufficient to match all instances of the target user agent. Likewise or alternately you could use Space Invaders, Atari 2600, or Mobile Business Edition, etc. Choosing the best unique portion of the user-agent string is a bit of an art. A little trial and error should help get you there. If in doubt, feel free to reach us anytime and we’ll try to help asap.

Important: Commas are used to separate the user-agent strings. Do NOT include them anywhere else.
Important information

The “Whitelist Bots” setting works by checking the bot’s reported user agent. Bots often spoof or fake their reported user agent. So if the bot is claiming to be Googlebot, and “google” is included in the Whitelist Bots setting, the bot will not be added to the block list, and will be able to visit your site just like anyone else. To prevent this, you can remove all entries from the Whitelist Bots setting, so it is empty. That way, there will be no whitelisted user agents to spoof, and all bots will be blocked or not blocked based on their IP address.

The downside to not whitelisting any user agents, is that “good” bots are known to disobey robots.txt and nofollow rules, and thus may fall into the blackhole trap and get blocked from your site. So basically you have a couple of options:

  • Recommended approach: leave the whitelist bots in place and let the plugin work as it has for over 10 years. This way, many bad bots will be blocked, but any bots pretending to be Googlebot or any other whitelisted user agent will not be blocked.
  • Or, you can remove all entries from the Whitelist Bots setting, and use the “Whitelist IPs” setting instead. It requires more work to find and add the related IP addresses, but doing so will make it impossible to spoof any user agents.

If you go with the second approach, you can find the IP addresses for most bots online. For example, here is how to get all IP addresses for googlebot, and all IP addresses for Bingbot.

Default whitelisted bots

By default, Blackhole Pro whitelists (always allows) the following user agents. Note that these are subject to change, check the Help tab on the settings screen for current defaults.

a6-indexer, adsbot-google, ahrefsbot, aolbuild, apis-google, baidu, bingbot, bingpreview, butterfly, chrome, cloudflare, duckduckgo, embedly, facebookexternalhit, facebot, googlebot, google page speed, ia_archiver, linkedinbot, mediapartners-google, msnbot, netcraftsurvey, outbrain, pinterest, quora, rogerbot, showyoubot, slackbot, slurp, sogou, teoma, tweetmemebot, twitterbot, uptimerobot, urlresolver, vkshare, w3c_validator, wordpress, wp rocket, yandex
Important: The default user-agent strings added for this setting ensures that the main search engines and other popular services never are blocked, so don’t make any changes unless you are 100% sure that you know what you are doing.

Whitelist bots by IP Address

To whitelist specific bots by their reported IP address, visit the setting “Whitelisted IPs”, and enter the IPs that you would like to always allow access. Any IPs entered in the Whitelisted IPs option will be matched against the reported IP address via regular expression. So you can do any of the following:

  • Block an individual IP address, like 173.203.204.22
  • Block a range of sequential IP addresses, like 173.203.
  • Block a range of IP addresses in CIDR notation, like 173.203.204.22/24

Separate multiple IP/strings with commas. Note that the plugin automatically adds your server IP address and local IP address, if it is available. If you are using anything like caching, load-balancing, or reverse proxy, make sure to add their respective IPs to the whitelist.

Important: Commas are used to separate the IP addresses. Do NOT include them anywhere else.

By default, Blackhole Pro whitelists the following IP addresses:

  1. The admin’s server IP address
  2. The admin’s local IP address

For more information, check out the article on whitelisting plugins.

Redirecting whitelisted bots

Along with the other whitelist settings, there also is an option to specify a custom URL to which all whitelisted bots will be redirected. Normally whitelisted bots can access and surf around your site just like any other visitor. This setting enables you to redirect all bots in your user-agent or IP-address whitelists. Important: recommended to leave this setting blank unless you know what you are doing.

Related Information