There are upcoming maintenance events which may impact our services. Learn more

USING ROBOTS.TXT IN ORDER TO OPTIMIZE YOUR SITE PERFORMANCE AND REDUCE YOUR WEBSITE LOAD Print

  • 0

Web crawlers or so-called web spiders/robots can cause significant load on your website, especially if you are using platforms such as WordPress. 

 
This article contains a couple of quick ways to reduce the load from crawlers, but we recommend that you consult with a web designer/professional for more information on how to effectively implement.
 
Here is a sample robots.txt which is WP friendly (similar can be applied for any website and platform).
 
You need to create this file on your computer and upload it via FTP or your control panel file manager (in cPanel you can create this file using the cPanel File Manager directly), and place in the root folder of each of your domain names (same folder where your main index.php / index.html file resides).
 
Simply copy/paste between the lines "---" below:
 
---
 
Crawl-delay: 30
 
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.com
Disallow: /wp-login.php
Disallow: /wp-content/plugins/
Disallow: /comments/feed/

Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/
 
 
User-agent: Yandex
Disallow: /
 
User-agent: Baiduspider
Disallow: /
 
User-agent: Googlebot-Image
Disallow: /
 
User-agent: bingbot
Crawl-delay: 10
Disallow:
 
User-agent: Slurp
Crawl-delay: 10
Disallow:
 
---

The above sample will slow down the search engines so that they don't aggressively scan your site all at ones (this does not impact how often the search engine will crawl your site), the code will also block some spiders such as Baiduspider (Chinese search engine, you should disable unless your site need to be indexed in Chinese), Yandex (Russian search engines, leave enabled if you have Russian website or visitors from Russia), and also prevent Google Image bot to scan your site. If you need any of these engines, simply remove the part between "User-agent" and "Disallow" and leave the remaining code.

In addition, you can control how Google and Bing index your site, and take measures to slow them down individually. For more info on Google, you need to set up an account with Google Web Master tools http://www.google.com/webmasters/tools/and follow their direction on how to optimize the crawling rates and speeds.

For Bing, please signup/login with their webmaster tools at http://www.bing.com/toolbox/webmaster and follow their direction on how to optimize their crawling settings.

There are many other optimizations you can implement to your robots.txt file, but we recommend working with professionals in order to get the best results and avoid breaking your website.

For advanced users:

In addition to robots.txt implementation, you can block unwanted crawlers (especially those that completely ignore your robots.txt file) using the following .htaccess code:

RewriteEngine On
 
RewriteCond %{HTTP_USER_AGENT} Baiduspider [OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider/2.0 [OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider/3.0 [OR]
RewriteCond %{HTTP_USER_AGENT} MJ12bot/v1.4.5 [OR]
RewriteCond %{HTTP_USER_AGENT} MJ12 [OR]
RewriteCond %{HTTP_USER_AGENT} AhrefsBot/5.1 [OR]
RewriteCond %{HTTP_USER_AGENT} YandexBot/3.0 [OR]
RewriteCond %{HTTP_USER_AGENT} YandexImages/3.0 [OR]
RewriteCond %{HTTP_USER_AGENT} YandexBot
RewriteRule . – [F,L]

 


Was this answer helpful?

« Back

Powered by WHMCompleteSolution