What is Robots.txt?

Robots.txt is a text file that is placed in the root directory of a website and instructs search engine robots which pages of the website they should index and which ones they should not. The robots.txt file is not a hard directive, but only a recommendation, but most search engines such as Google, Yandex, Bing, Yahoo!

What is robots.txt for?

This file gives you control over how your website is presented in search results.

Here are a few reasons why you need robots.txt:

Prevent unnecessary pages from being indexed: Robots.txt can help you prevent search engines from indexing pages such as site search results pages, login pages, or service pages. This can improve the relevancy of search results for your site.
Protect sensitive information: You can use robots.txt to prevent search engines from indexing pages that contain sensitive information, such as pages with passwords or personal information.
Increase crawling efficiency: Robots.txt can help search engine robots crawl your site more efficiently by telling them which pages they should crawl first. This can result in your site being indexed faster and ranked faster in search results.

It’s important to note that robots.txt is not a hard-and-fast rule: search engine robots can ignore it if they see fit. However, it is still an important tool that can help you improve your website’s SEO.

The main directives of Robots.txt

User-agent:

This directive is used to specify to which search robots the following rules apply.

*User-agent: : This entry applies to all search robots.
User-agent: Googlebot: This entry applies only to the Google search robot.

Disallow:

This directive is used to prevent search robots from crawling certain pages or folders on your site.

Disallow: /admin/:** This entry prevents Googlebot from crawling any pages in the /admin/ folder.
Disallow: /images/:** This entry prevents Googlebot from crawling any pages in the /images/ folder.

Allow:

This directive is used to allow search engine robots to crawl certain pages or folders on your site, which are disallowed by default.

Allow: /index.html: This entry allows Googlebot to crawl the /index.html page, even if it is in a folder that is disallowed by default.

Sitemap:

This directive is used to point search engine robots to the location of your sitemap.

Sitemap: https://www.site.com/sitemap.xml: This entry indicates to Googlebot that the sitemap is located at https://www.site.com/sitemap.xml.

Crawl-delay:

This directive is used to tell search robots how many seconds they should wait before crawling the next page on your site.

Crawl-delay: 10: This entry tells Googlebot to wait 10 seconds before crawling the next page on your site.

Important:

The robots.txt file is not a 100% safe way to deny access to pages.

It is not recommended to use robots.txt to block important pages of your website.
It is recommended to create a backup copy before making changes to robots.txt.

How to create a robots.txt file:

Create a text file named “robots.txt”.
Add the robots.txt directives to the file.
Save the file in the root directory of your website.

Once the file is saved, it will become valid and search engine robots will use it when indexing your site.

Common mistakes when setting up Robots.txt

Incorrectly setting up robots.txt can lead to important pages of your website not being indexed, which will negatively affect your website’s SEO.

Blocking important pages:

The most common mistake is blocking important pages on your site, such as product pages, categories or contact information. This can lead to users not being able to find your site using search engines.

Incorrect use of the Disallow directive:

The Disallow directive is used to prevent search engines from indexing certain pages. Incorrect use of this directive can lead to blocking of important pages.

Lack of a site map:

A sitemap is a file that lists all the pages on your website. Having a sitemap helps search engines find and index all the pages on your site.

To avoid these errors:

Before editing robots.txt, read the Google Search Console documentation.
Use the Disallow directive only to block pages that should not be indexed.
Create a sitemap and submit it to Google Search Console.

Customizing robots.txt can be a daunting task. If you are not sure how to do it, it is recommended to contact an SEO specialist.

Conclusion

By properly customizing the Robots.txt file, you can effectively control the access of search engine robots to different parts of your website. By using this file, you will keep some pages private, improve site indexing and prevent unwanted indexing. Remember that proper Robots.txt setup is an important part of your website’s SEO strategy.

What is Robots.txt and how to set it up correctly