If things are radically out of whack, you can download a table of pages in the index from webmaster central and diagnose on a page by page level to see what is or or isn’t in the index.
Next, you want to try and do a full crawl of the website using something like Xenu. While it’s usually used to check for broken links, in the process it does crawl the website. If you have a large website, you are going to want to limit the crawling.
Another product that I like to use is Website Auditor. One of the interesting things about using Website Auditor is that you can specify crawling depth, which is how deep you want a crawl to go. Start at the homepage and go only one level. Run it again, this time with 2 levels, then 3. Additionally use your Webmaster Central report on most linked pages (think of them as link hubs). If your important pages aren’t within 2-3 pages of linking hubs on your website, you will have problems. IMHO it’s more important than ever to cultivate deep linking and to use that deep linking to spread your link equity, inbound trust, and authority wisely around your website.
Pages that have the most links are going to get crawled more frequently. Pages that have the most trust and authority are going to get crawled most often. Pages that are linked to from those linking hubs, or trusted and authoritative hubs, will get crawled next most frequently. At each step away from the linking hubs, or authority points, crawl frequency will decrease–think of it like a classic pagerank model.
By Michael Gray | http://www.wolf-howl.com/seo/diagnose-improve-crawling/
When you are reviewing a website, whether for your own projects or for a client project, one of the important areas to review is crawlability. In this post I’d like to talk about some of the ways you can look for and diagnose crawling issues.
If your important pages aren’t within 2-3 pages of linking hubs on your website, you will have problems …
The first step to diagnosing a crawling problem is to use a simple [site:example.com] search and compare how many pages you really have with how many Google thinks you have. Now, bear in mind that this number is an estimate. What you are trying to do is get a rough estimate of how many pages Google knows about, as Matt Cutts recently discussed in a Webmaster Central Video:
If you have several hundred or thousand pages but Google only shows 100, then you have a problem. Depending on how large the site is, anywhere from 10-30% accuracy would be a good rule of thumb.
The second thing you would want to look at would be Webmaster Central. If you submit a sitemap, Google tells you how many URLs you submitted and how many are in the index. The closer those numbers are, the better. Don’t worry if it’s not a 100% match because sometimes you include pages in your sitemap that get blocked at the page level with a robots meta tag. At this point, you are just concerned with gross numbers. Leer más “How to Diagnose and Improve Website Crawling”