If you’ve been following the series you’ll remember that we created and submitted XML sitemaps in the last post. Before I move into the spider simulation, let’s check on the status of the sitemap. Navigate to your Google Webmaster tools page and make note of any messages or warnings you see there. In my case, there is nothing new to report. No news is good news!
The Google Webmaster Tools dashboard is a great tool to quickly potential issues such as crawl errors or problems with your sitemap. The middle chart depicts the number of times my site is showing up in Google search. It’s low, but that’s to be expected with a new campaign. My focus is on the right graph, which shows the status of my sitemap.
It looks like things are running smoothly in the sitemap department, but my next step is to simulate a crawl of my website using a tool called Screaming Frog.
Screaming Frog SEO Spider
The free version of Screaming Frog will up to 500 URLs (pages), but since my website is new, that’s plenty. You can download Screaming Frog here: http://www.screamingfrog.co.uk/seo-spider/. To get an idea of what Screaming Frog can do, watch this video:
Starting from my home page, as directed, the spider will hop through hyperlinks much like Googlebot or another search engine spider would. Sitemaps and crawling a website are 2 ways that spiders can discover and report on content. For this exercise I’m going to filter my results to show only HTML. In other words, my pages. Screaming Frog found about 20 pages versus the 7 pages Google Webmaster Tools is showing in my sitemap.
Earlier in the series I used Yoast’s WordPress SEO plugin to give the search engine spiders directions. I asked them (not ordered them) not to index certain types of pages: tags and categories. I also instructed my WordPress installation to exclude these post types from the sitemap. The image below shows that even though these pages were found, the directive “noindex” is present.
Looking good! Let’s do one final check to make sure this piece of the puzzle really is falling into place. I mentioned earlier that directives like “noindex” are requests, not really commands. It’s up to the search engine to follow these requests, so I’m going to do a site search of this domain to see what Google has in its index. (Google is partly a big, really big, database.)
You can see what Google has indexed for a website by typing site:http://yourwebsite as the search string. In my case, it’s site:http://marketingchris.com.
I see that there are 14 pages indexed versus 7 pages in my sitemap. Now what happened? Google must have crawled my site prior to me updating the settings asking it to not include categories in its index. Not a big deal! I know that these instructions are in place now, so I just need to be patient and wait for the next crawl.
Fetch as Google
There is a tool in Google Webmaster Tools for the impatient SEO, like me, called Fetch as Googlebot.
I want Googlebot to re-spider my content to see that some of my requests of what to index have changed. Please let me know if you have any questions and I’ll see you next time!