If you have a blog or website with a lot of content, managing duplicate content can be complex. While Google has become very intelligent in handling duplicate content, there are still some parts where it cannot define original content. And if you are targeting other search engines (say, Bing), duplicate content becomes troublesome.
Here when I say duplicate content, there are two aspects of it.
The first is internal duplicate content or content repetition and the second is external duplicate content or plagiarism.
Let’s understand both.
In content repetition or internal duplicate content, you post similar content knowingly or unknowingly twice or more. This can be done page by page, or an archives page may have duplicate content covering your earlier published articles.
In external duplicate content, you post an article that was previously published somewhere on any other website without checking it for plagiarism. This happens most of the time unknowingly (unless your site has a content grabber).
In the first case, it is wise to occasionally check for duplicate content by using a content crawler or manually. Duplicate content can be problematic for your search engine rankings as the duplicated pages will compete against each other, negating one another’s effect.
In the second case, use a duplicate content checker. There are several such checkers but the duplicate content checker by Sitechecker covers your whole website and provides a list of tips to fix the problems.
But that’s not it. There are various scenarios that need different solutions, other than just checking for duplicate content.
Let’s how you can prevent duplicate content with these SEO checks listed below.
Scrapper sites are those websites that use a tool to copy content from other websites. While most of these are automated and Google understands how they work, you can prevent them from copying your content by using a simple trick.
The trick is to use relative URLs instead of absolute ones. For example, instead of using
/about wherever that is possible.
If you are on WordPress, there are several ways to use relative URLs. I’d suggest Relative URL plugin by Tunghsiao Liu if you are looking for something to use.
Also, create a license or terms page and state legal action if someone copies your content. Doing this is morally compelling and it scares the scrappers a little.
Even if all these don’t work, you can issue DMCA notices to the culprit websites and also to Google if it indexes plagiarised content.
This is a problem that most of us ignore and no, it is not related to scrapper sites.
Say you have a website like
example.com that covers the USA audience. And then you have an example.in which covers the Indian audience. In this case, you will have to copy your own content to both the domains (with interest-based modifications, of course) but that creates a conflict of interest in search results. If not set correctly, search engines may try to compete for your own pages against each other.
Here the trick is to use rel=”alternate” and hreflang tags. By doing so you can tell that the different domains are related to each others and are localized copies.
HTTP and HTTPS URLs
Sometimes you have both
HTTPS versions of your website. Doing this can only harm your website, not just in SERPs but also in user experience. So, no matter what kind of website you have, always use the HTTPS version and redirect non-HTTPS traffic to the main site. You must use a 301-redirect for this.
Similarly, you should also check for the WWW and non-WWW versions of your website and merge both with each other.
Not following these guidelines will result in the search bot crawling the pages as if they were two different websites.
If you use an RSS feed and have submitted your content to a directory, you will notice that they copy your content and syndicate that on their own website. By doing so, they become your search engine competitors which you wouldn’t want.
There are three ways you can handle this duplicate content issue.
The first one is the simplest and in this, the directory/syndication site adds canonical tags back to the original content.
The second method is that the syndicated content is no-indexed and doesn’t appear in the search results.
But there are still some cases, in which you didn’t even submit your site and some directory has your RSS-feed rendering. In this case you can truncate your RSS feed content and add a read more link so that the readers must come to your website in order to get the full content.
This is an exceptional case in which your removed subdomains may have some duplicate content which is still indexed in search engines.
Take an example. You had
sub.example.com website in which you wrote about a certain topic. After a while you decided that you’ll move all subdomain content to the main website and you did. But if you forget to de-index or redirect the subdomain properly, you will have the duplicate content issue as well.
Deindexing is the first step that you should do. Go to search console and look for Removals menu on the left which the “Domain Property selected”. There in Temporary Remove URL tab, insert the subdomain and check “Remove all URLs with this prefix” and click next. You are all done.
Redirecting the subdomain to the working website is very important as well, especially if your subdomain has a lot of backlinks pointing to it.
Regularly Remove Outdated Content
While adding new content is important, removing outdated content is important as well. But how does this help in handling duplicate content you may ask?
Once you remove outdated content from your website and search results, you can restructure the same content with new data and try to rank it on the same keyword with better chances.
So, that’s all in this article now. I hope these SEO checks will help you keep duplicate content in check and help you gain better search rankings.