How to Diagnose and Improve Website Crawling

Sitemap Statistics

If things are radically out of whack, you can download a table of pages in the index from webmaster central and diagnose on a page by page level to see what is or or isn’t in the index.

Next, you want to try and do a full crawl of the website using something like Xenu. While it’s usually used to check for broken links, in the process it does crawl the website. If you have a large website, you are going to want to limit the crawling.

Another product that I like to use is Website Auditor. One of the interesting things about using Website Auditor is that you can specify crawling depth, which is how deep you want a crawl to go. Start at the homepage and go only one level. Run it again, this time with 2 levels, then 3. Additionally use your Webmaster Central report on most linked pages (think of them as link hubs). If your important pages aren’t within 2-3 pages of linking hubs on your website, you will have problems. IMHO it’s more important than ever to cultivate deep linking and to use that deep linking to spread your link equity, inbound trust, and authority wisely around your website.

In recent years Google has done away with the term/classification “supplemental index.” IMHO this was more of a public relations move, as they just grew tired of hearing from people who were upset that any part of their site was in the supplemental index–but I digress. There are certain parts of your website that aren’t as important as others or, as in the case of say a privacy policy, are important to people but not for rankings. To help you understand what pages Google thinks are important, you need to look at last crawl date in the Google Cache.

Pages that have the most links are going to get crawled more frequently. Pages that have the most trust and authority are going to get crawled most often. Pages that are linked to from those linking hubs, or trusted and authoritative hubs, will get crawled next most frequently. At each step away from the linking hubs, or authority points, crawl frequency will decrease–think of it like a classic pagerank model.


Michael Gray - Graywolf's SEO Blog

Michael Gray

By Michael Gray | http://www.wolf-howl.com/seo/diagnose-improve-crawling/

When you are reviewing a website, whether for your own projects or for a client project, one of the important areas to review is crawlability. In this post I’d like to talk about some of the ways you can look for and diagnose crawling issues.

If your important pages aren’t within 2-3 pages of linking hubs on your website, you will have problems …

The first step to diagnosing a crawling problem is to use a simple [site:example.com] search and compare how many pages you really have with how many Google thinks you have. Now, bear in mind that this number is an estimate. What you are trying to do is get a rough estimate of how many pages Google knows about, as Matt Cutts recently discussed in a Webmaster Central Video:

If you have several hundred or thousand pages but Google only shows 100, then you have a problem. Depending on how large the site is, anywhere from 10-30% accuracy would be a good rule of thumb.

The second thing you would want to look at would be Webmaster Central. If you submit a sitemap, Google tells you how many URLs you submitted and how many are in the index. The closer those numbers are, the better. Don’t worry if it’s not a 100% match because sometimes you include pages in your sitemap that get blocked at the page level with a robots meta tag. At this point, you are just concerned with gross numbers. Leer más “How to Diagnose and Improve Website Crawling”

The Rise of Page View Journalism

In the early days of newspapers, success and advertising was measured by total circulation. The ability to measure how many people were reading just the business section, lifestyle section, or sports section didn’t exist. As more consumers switch their news reading habits to online consumption, our ability to track which section and pages are being read has improved. However, this enhanced tracking has a dark side: the rise of page view journalism. Simply put, page view journalism is the deliberate creation of stories that are designed to increase page views. It often results in an increase of low quality, high volume reporting and off topic stories.
people will have to reach the conclusion that there is some quality news that is worth paying to have access to …

While page view journalism is often attributed as the primary cause of demand media style content, the fact is it’s so pervasive now that it has almost become the norm. Look at the homepage of Techmeme on any given day and you’ll see an increasingly large number of websites trying to siphon off some of that traffic by “reblogging ” the top stories of the day, adding little or no value to the discussion. While rebloggers are at the lower end of the food chain, page view journalism also occurs at the top. Techcrunch, for example, covers with voluminous detail almost every story that is even slightly connected to twitter. It wouldn’t surprise me if MG Siegler did an expose on how Mary in the AP department at Twitter killed the staple market by switching to paper clips. Don’t laugh…it’s not that far fetched.

Want an example of how to lose your focus? Check out Mashable, a site that regularly stretches to cover things like Tiger Woods and Fashion Week in an effort to bolster page views. The king of page view media is the Huffington Post, which reblogs, over-covers everything, and has gone off-topic so much it no longer has a main topic.
if you aren’t paying something, then you aren’t a customer: you are the product that’s being sold…


In the early days of newspapers, success and advertising was measured by total circulation. The ability to measure how many people were reading just the business section, lifestyle section, or sports section didn’t exist. As more consumers switch their news reading habits to online consumption, our ability to track which section and pages are being read has improved. However, this enhanced tracking has a dark side: the rise of page view journalism. Simply put, page view journalism is the deliberate creation of stories that are designed to increase page views. It often results in an increase of low quality, high volume reporting and off topic stories.

people will have to reach the conclusion that there is some quality news that is worth paying to have access to …

While page view journalism is often attributed as the primary cause of demand media style content, the fact is it’s so pervasive now that it has almost become the norm. Look at the homepage of Techmeme on any given day and you’ll see an increasingly large number of websites trying to siphon off some of that traffic by “reblogging ” the top stories of the day, adding little or no value to the discussion. While rebloggers are at the lower end of the food chain, page view journalism also occurs at the top. Techcrunch, for example, covers with voluminous detail almost every story that is even slightly connected to twitter. It wouldn’t surprise me if MG Siegler did an expose on how Mary in the AP department at Twitter killed the staple market by switching to paper clips. Don’t laugh…it’s not that far fetched.

Want an example of how to lose your focus? Check out Mashable, a site that regularly stretches to cover things like  Tiger Woods and Fashion Week in an effort to bolster page views. The king of page view media is the Huffington Post, which reblogs, over-covers everything, and has gone off-topic so much it no longer has a main topic.

if you aren’t paying something, then you aren’t a customer: you are the product that’s being sold… Leer más “The Rise of Page View Journalism”

How to Do A Content Audit of Your Website

If you have a website that’s been around for a few years and you’re looking for ways to make some improvements, one of the tactics I recommend is doing a content audit.

When you do a content audit you have a few goals in mind:

* Get rid of any low quality or unimportant pages
* Look for pages or sections that can be improved or updated
* Improve your rankings by more effectively using your link equity, internal anchor text, and interlinking your content

Get the Data
your inbound link equity can only support a certain number of pages …

The first thing you need to do is to get an understanding of where your website currently stands. You’ll need a list of the pages of your website, the number of inbound links, and amount of visitors your page receives. If you are using Webmaster central, you can export a spreadsheet of all the pages with the number of links. The next thing you have to do is add a column for page views. I like to use a timeframe between a year and year and half.

Depending on the number of pages your website has, it could take a while to get all this data. This is the perfect task for an intern or outsourced labor from a place like ODesk. I recently performed this task on a website that has 1800 URL’s. It cost me $75, and I had the data back in just over 24 hours.
Identify the Low Performing Pages

The two primary factors I like to look at are how many links does a post/page have and how much traffic did it generate in the past 18 months. Any page that generated less than 100 page views is a candidate for deletion. Additionally, any page that generated less than 25 links is also a candidate for deletion.


Michael Gray

By Michael Gray
http://www.wolf-howl.com/seo/content-audit-website/

If you have a website that’s been around for a few years and you’re looking for ways to make some improvements, one of the tactics I recommend is doing a content audit. 

When you do a content audit you have a few goals in mind:

  • Get rid of any low quality or unimportant pages
  • Look for pages or sections that can be improved or updated
  • Improve your rankings by more effectively using your link equity, internal anchor text, and interlinking your content

Get the Data

your inbound link equity can only support a certain number of pages …

The first thing you need to do is to get an understanding of where your website currently stands. You’ll need a list of the pages of your website, the number of inbound links, and amount of visitors your page receives. If you are using Webmaster central, you can export a spreadsheet of all the pages with the number of links. The next thing you have to do is add a column for page views. I like to use a timeframe between a year and year and half.

Depending on the number of pages your website has, it could take a while to get all this data. This is the perfect task for an intern or outsourced labor from a place like ODesk. I recently performed this task on a website that has 1800 URL’s. It cost me $75, and I had the data back in just over 24 hours.

Identify the Low Performing Pages

The two primary factors I like to look at are how many links does a post/page have and how much traffic did it generate in the past 18 months. Any page that generated less than 100 page views is a candidate for deletion. Additionally, any page that generated less than 25 links is also a candidate for deletion. Leer más “How to Do A Content Audit of Your Website”

Can Google Detect an Affiliate Website

Now if the folks at Sitonomy can detect that 4% of the* links on a page are from CJ, I’m positive that Google can as well. I’m sure Google can tell on page level throughout the site and the site as a whole. I’m also quite sure Google has an idea at what point, whether by percentage or by total number of links, that a site becomes an affiliate website. It would also be fairly easy to say, once you cross that threshold, you need a higher level of trust to rank for competitive terms. This is one of the reasons I strongly disagree with Lori Weiman, who says affiliates should never cloak links.

UPDATED: the % is total links scanned not just links on the page, my bad.

So what are the takeaways here:

* Use a tool like Sitonomy to check your most important pages and see what they are able to find as far as affiliate links
* Look into redirection tools that mask your links, and make sure you block them from search engine spiders
* Obfuscate some of your other links as well even if they aren’t affiliate links: people should always be unsure of your intent
* Always make sure you comply with FTC regulations for disclosure. If needed, use a nice non-machine-readable graphic for maximum stealthiness


Michael Gray

By Michael Gray
http://www.wolf-howl.com

One of the questions that often comes up is does Google hate affiliate websites, and are they penalized in the algorithm?

I’m also quite sure Google has an idea at what point, whether by percentage or by total number of links, that a site becomes an affiliate website

The answer to that is slightly nuanced but, for simplicity’s sake, they don’t hate affiliate websites. Nor have I seen any evidence that shows affiliate sites are penalized. What Google does hate is thin affiliate websites with little or no trust. However, a better question to ask is can Google detect affiliate websites, and can they make it harder  for affiliate websites to rank … ? But those are entirely different questions.

If you’ve read the leaked quality rater guide from 2009, you’ll see that Google has set up lot of hurdles specifically making it harder for affiliate websites to “pass” the sniff test. One of the quickest and easiest ways that Google can determine an affiliate website is through “naked” links to common affiliate programs like Linkshare, CJ, ShareASale, and others. But, really, how good can Google be at detecting those links? Well, here’s a publicly available free tool put out by Sitonomy that checks what types of programming tools are being used by a website. Leer más “Can Google Detect an Affiliate Website”

Adsense: Why Bloggers Don’t Get It

You may post about commercial related subjects like your job, what you like to buy, or even your hobbies. However these posts are all about your life, they are no more commercially viable or attractive than say Aunt Millie’s Holiday Newsletter. Yes we all have an Aunt Millie in our family, every year she sends out a finely crafted newsletter in a coordinating envelope she ordered from paperdirect.com telling us all about her family. We learn how hard her husband works, how many activities her kids are in and how good they are at them. We also read the details of how her scrapbooking business hasn’t taken off yet, but she promises to spend more time on it right after New Years. So if you were a business owner would you want to advertise anywhere on Aunt Millie’s Newsletter? Then why would a business want to pay you top dollar to advertise on your blog? What’s that, you say your blog gets (insert a high number here) of readers per day, surely that has to be worth something? Well did you know Aunt Millie sends out over 800 copies of her holiday newsletter to 17 countries, on 4 continents? Now before you get all fired up about it, understand that I don’t have a problem with you having a personal blog or sharing it with the public. However your expectation that it has value outside of your family/friends/community, is a serious misconception.


Michael Gray

By Michael Gray | //wolf-howl.com

In doing the research for my series of Adsense articles, two common ideas kept getting repeated:

  • My Adsense ads are horrible, they only pay out (insert low dollar figure here)
  • My Adsense CTR is horrible, I only get a (insert extremely low CTR here)

To be fair these comments weren’t coming just from bloggers, but bloggers did make up an overwhelmingly large percentage. I think this stems from a misconception on the part of the bloggers that they are entitled to high payout and CTR. I’d like to spend a little time to share my feelings on this subject. In the early days a blog may just have been an online diary or journal, but like the days of the Nehru jackets, they are gone. What a blog is now is Chronologically Structured Content Management System, as opposed to the classic web hierarchical structured implementation. Let’s be clear, you can still use a blog as your online diary or journal, but nowdays it’s just as likely to be used as a commercial blog. Yes, I did just say commercial blog, and no the earth didn’t open under my feet and swallow me whole for saying it. Let’s take some time to look at a your typical blog. Leer más “Adsense: Why Bloggers Don’t Get It”

Your Facebook Fan Page – 5 Ways to Make the Most of It

If your business targets consumers, do you already have a Facebook fan page? If not, why? With more than 350 million active Facebook users, it is time to embrace social media and let it help grow your business. Not only is Facebook a great place to expose yourself to thousands of new customers, but Google is now using content from social media sites in their search results. Once you have created a fan page, there are a few things you can do to ensure you get the most out of it. Here are some tips and guidelines to use.


Curtis Stevens

By Curtis Stevens | //wolf-howl.com

If your business targets consumers, do you already have a Facebook fan page?  If not, why?  With more than 350 million active Facebook users, it is time to embrace social media and let it help grow your business.  Not only is Facebook a great place to expose yourself to thousands of new customers, but Google is now using content from social media sites in their search results.  Once you have created a fan page, there are a few things you can do to ensure you get the most out of it.  Here are some tips and guidelines to use. Leer más “Your Facebook Fan Page – 5 Ways to Make the Most of It”

How To Silo Your Website:The Sidebar


Post image for How To Silo Your Website:The Sidebar

Michael Gray

By Michael Gray o

The following post is part of a series on How to Silo Your Website. You should review, How to Silo Your Website the Masthead, How to Silo Your Website the Breadcrumb, How to Silo Your Website the Content. For this part, we’ll be taking a look at the sidebar.

You want to keep the sidebar content dynamic …

IMHO the sidebar is the second most abused and misused part of a website (the footer is the most abused which we’ll talk about in a later article). The sidebar is so abused because people stick too much third party content, widgets, social blocks, and simply too many links. In the past year I have worked on 5 client sites with between 300-500 links in the sidebar. No, that’s not a typo. That’s over 300 links in just the sidebar.

My first bit of advice: do some click tracking to see what people are clicking on. I like to use crazyegg (full disclosure: they are an advertiser, but I used them before they became one) or similar service that actually tracks X/Y coordinates on a page. See what people are clicking on and remove the elements that people don’t use. Leer más “How To Silo Your Website:The Sidebar”