It’s a question that’s been brought up a lot recently and comes with a flurry of misconceptions. 404 pages don’t necessarily harm your search rankings and shouldn’t always be considered as a high priority issue, although some on-page optimisation changes are sometimes required to find dead links and improve customer experience.
What is a 404 page?
A 404 is a status code that is given to a page URL that is not found by the server, the error message is usually shown when the original page no longer exists without a 301 redirect in place, they are often discovered through internal links or via external sources, they shouldn’t be considered as high priority issues unless the site has a high number of 404 error status codes, it really is all about finding the right balance.
Should all 404’s be redirected?
Google won’t penalise you for having a natural balance of 404 error pages as they are frequent throughout the world wide web. Redirecting all 404 pages to the homepage can actually be more inconvenient for users than seeing the actual error page, as giving the information that the page no longer exists is actually more convenient than a redirect. For 404 pages that require redirects, it’s recommended to place a 301 redirect rather than a 302 redirect to ensure the link juice is passed through. There’s no need to submit a crawl request to Google as the search engine will automatically pick up on the pages and remove it from their index, although if you have an XML sitemap this process becomes much quicker.
You should do a little analysis to determine whether:
- The 404 pages are receiving important links from external sources: TIP – create a 301 redirect to keep the link juice flowing, Google Webmaster tools is great for this.
- They are receiving a large amount of visitor traffic: TIP – consider replacing the page to keep customers happy, Google Analytics can provide this data.
Are 404 errors all that bad, for SEO?
So here’s the deal, SERP rankings can only be attacked when there are an abnormally high volume of 404s within one given website, for example if your website has received 5,000 unique page views this month but 20,000 404 errors, this would be a problem.
You should base your optimisation changes on your visitors and not bots, usually if the user has a good experience then Google should give you the thumbs up.
How to find 404 errors
First of all, head over to Google Webmaster tools, click on your site URL then navigate to crawl > crawl errors. You should get something similar to the below. Further down on the page you will find a breakdown of all the 404 URLs, there is an option to download all the links. It’s recommended to only redirect the URLs which have a high number of internal links to improve customer experience.
What about status 200 “not found” pages that don’t return a 404?
Google’s Matt Cutts was asked the question back in March 2011 and the same answer still applies today. “How does Google deal with ‘page not found’ pages that are returning a 200 response code instead of a 404? Is this a form of spam? Can Google determine this mismatch algorithmically?”
“Well I’m not sure that I would necessary call it spam, almost all of the time it’s not done deliberately because if it were, people would return a 404 on purpose. We do have a name for this at Google, I’m not sure whether we’ve talked about it externally before but we call them ‘krymméno 404s’, ‘krymméno’ from the Greek meaning ‘hidden’. So a krymméno 404 is a page that looks like a 404 to a regular user but to a search engine it returns a 200 response code.
There is a team in charge of trying to write algorithms to deal with those sorts of issues including these krymméno 404s. So most of the time it works pretty well, if you look for some phrase like ‘page not found’, you can reasonably write an algorithm that says this returned a 200 status code or response code, but it still looks like a 404 page so treat it like a 404 page. Although, people do a lot of really weird 404 page not found pages, so there’s definitely a few that doesn’t look like a regular page not found and they’ll still return a 200 status code so I wouldn’t claim that we can handle that all of the time, but most of the time we do have relatively good algorithms in place to detect these so called ‘krymméno 404s’.”