This post will discuss how your SEO can be negatively affected if you block content using the robots.txt file on your website.

Accidentally blocking website content using the robots.txt file can prevent search engines from accessing your content and indexing it for users to see in search results. In other words, even when a page is indexed, it won’t appear in Google’s database because the site’s robots.txt file can block the search engine from showing it in SERPs.

Knowing the implications of blocking crawlers is essential because it is vital for Search Engines to find information on your site.

How to fix "Indexed, though blocked by robots.txt" in Google Search Console. aka Google Search Console URL is not on google.
How to fix “Indexed, though blocked by robots.txt” in Google Search Console

What is a robots.txt file?

A robots.txt file is a text file that provides instructions to robots (search engine crawlers) regarding the pages on your site that should be crawling and those they should not. In the example below, we can see that Walmart is telling crawlers not to visit the URL “/cart” by using the disallow rule. This also implies that since the robots are unable to crawl the page, they shouldn’t be able to index it. However, this is not always the case.

Walmart robots.txt file
Walmart robots.txt file

How to fix “Indexed, though blocked by robots.txt”

1. Export or Sort through the URLs with errors in GSC

Export the list of URLs from Google Search Console. Make sure you want these pages indexed. If you don’t want these pages indexed, ignore the warning. If you do want these pages in the SERPs (search engine result pages, aka Google Search Results); check a few pages from the exported list for any settings in your CMS that could have added a no-index tag/setting to the page.

Often, a no-index setting was added in your Page/Post Settings. If you are using WordPress and the Yoast plugin, check advanced settings for no-index settings.

Allow search engines to show this Post in search results? No
Allow search engines to show this Post in search results? No

2. Check for Disallow text in robots.txt

The easiest way to figure out if you are blocking pages is by using the robots.txt tester in GSC, identifying any issues.

If you know what you’re looking for, you can navigate to yourdomain.com/robots.txt to read the file. You will be looking to remove code that looks like this:

Disallow: /

How to edit your Robots.txt file

To edit your Robots.txt file you will need access to FTP, or File Manager in cPanel, or by using a File Manager plugin, or by using Yoast (outlined below).

Using WordPress + Yoast SEO Plugin to edit your Robots.txt

If you’re using the Yoast SEO plugin, follow the steps below to edit your robots.txt file.

  1. Go to Yoast Plugin in WordPress; It’s named “SEO” in the sidebar
  2. Click on Tools
  3. Select File Editor
  4. Edit your Robots.txt file
  5. Press Save
Tools: Yoast Plugin robots.txt editor
Yoast plugin robots.txt editor

Rankmath

If you use Rankmath SEO plugin, follow these steps:

  1. Log in to WordPress
  2. In the sidebar, go to the Rank Math plugin
  3. Click on General Settings
  4. Select Robots.txt

All in One SEO

To edit your robots.txt file using All in One SEO plugin:

  1. Login to WordPress
  2. In the sidebar, go to All in One SEO plugin
  3. Select Robots.txt

3. WordPress is set to No-Index

Quite often, when a website has been redesigned, the developer forgets to remove the no-index checkbox in WordPress. Developers set the website to no index in staging because they don’t want the staging server to be visible to the public, nor do they want the site to be seen as duplicate content.

How to fix WordPress websites set to no-index 

  1. Logged into WordPress
  2. Go to Settings
  3. Select the Reading tab
  4. UncheckDiscourage search engines from indexing this site
WordPress: Search engine visibility
Search engine visibility

4. Your website is adding intermittent blocks

How to fix intermittent blocks

If you are receiving intermittent blocks, check to see if you’ve been pushing a staging server into production and back again. This process could be adding a no-index tag, which is being added and removed by a developer. Creating intermittent blocking signals to crawlers. It’s best to clear your cache, validate within GSC and check with your developers.

5. User-agents is blocking the crawler

How to fix user-agent blocks

User-agent blocks can be challenging to pinpoint the issue. The first thing you should check is your .htaccess file.

You can access the .htaccess file using an FTP manager with “View Invisible files” turned on or using the Yoast file editor tool (mentioned above).

Within the .htaccess file you should be able to locate user agent blocking crawlers.

If you have a WordPress website, the file should look like this:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

If you run into trouble, you can delete the .htaccess file and save the permalinks in WordPress; this will force WP to create a new .htaccess file.

If you DO NOT want the URL indexed

When you block a page from being crawled, Google may still index it because crawling and indexing are two different things. If you want the page no-indexed, you can add a no-index tag to the page’s header.

Here is how to block search engines:

Add a <meta> tag

To prevent search engine crawlers from indexing a page on your website, add the following meta tag into the <head> section of your page:

<meta name="robots" content="noindex">

To prevent only Google crawlers from indexing a page:

<meta name="googlebot" content="noindex">

You should be aware that some search engine crawlers might ignore the no-index tag and your page might still appear in search results from other search engines.

No-index a page using Yoast

How to No-Index a specific page using the Yoast Plugin:

  1. Find the page/post you want to edit
  2. Press edit in WordPress
  3. Scroll to the bottom of the page
  4. Click on the Advanced dropdown
  5. Set “Allow search engines to show this Page in search results?” to NO
Yoast Plugin No-index setting
Yoast plugin no-index setting

Conclusion

Hopefully, this guide helped you to resolve the Google Search Console warning. If not, you can always contact me for additional troubleshooting and support.