As a website owner, it’s important that you understand how to avoid duplicate content because this type of content is often a sign of low quality and “spammy” websites. Duplicate content can cost you a lot if you are looking to increase your prominence on Google and other search engines.
One of the best ways to brand your online business is by consistently developing unique, top-notch, and credible content to your audience — content that provides value.
While search engines love fresh content, they don’t like websites with duplicate content. Whenever you submit duplicate content, you are forcing search engines to decide which of your pages/sites should be given credit for the published content. Search engines may fail to rank or index some of the websites with duplicate content, which is why you need to avoid internal duplicate content and cross domain duplicate content.
In this post, we are going to look at the best way to detect and avoid duplication. Read on to find out more.
What Is Duplicate Content?
The topic of duplicate content usually confuses many people. According to Google Search Console, “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.”
Duplicate content is a term commonly used by content marketers who use SEO techniques to promote their sites. The term refers to situations where different web pages, within or across domains, appear to contain very similar or identical content. Website owners are sometimes tempted to copy and paste content to different pages within their site in order to populate their web pages.
Any duplicate content will hurt your site’s SEO campaign because this kind of content compromises a user’s experience. Since your ultimate goal is to get to the number 1 position on the search engine results pages (SERP’s), your efforts may go to waste if you don’t produce unique, high quality and plagiarism-free content.
FACT: Content creation improves indexation rates by more than 434%.
Types of Duplicate Content
Typically, there are two broad categories duplicate content:
- Internal duplicate content: This is where one hostname/domain has duplicate content within the same website and has multiple internal URLs. The duplicate content is limited to your website domain.
- Cross-domain duplicate content: Also known as cross domain duplicate content, this type of content occurs when multiple domains have the same content ranked by search engines.
Impact of Duplicate Content on SEO
SEO experts know that information that has been replicated on various domains is rarely customer focused. Moreover, the aim of many search engines is to return high-quality result pages for its users. If search engines, such as Google, don’t aim to meet their users’ needs, users will seek alternatives.
Although Google doesn’t impose penalties on duplicate content, your site’s SEO campaign will be negatively affected since Google filters identical or almost similar information.
What does this mean for your site?
For many SEO experts, filtering is a penalty for your website because it is a loss of indexing for your web content. Irrespective of who produced the content, there are high chances that the original web page will not be selected for ranking in Google’s top search results.
According to Dan Petrovic of Dejan Marketing, ”If there are multiple instances of the same document on the web, the highest authority URL becomes the canonical version. The rest are considered duplicates.
How Do Duplicate Content Issues Occur?
There are many causes of duplicate content, with most of them being technical. It’s crucial that you identify and fix these issues before they can cause serious harm to your ranking.
You Might Also Like
Other than copied content, here are some of the main causes of duplicate content:
URL Structure
Different search engines have different rules on URL structures. While URLs are case-sensitive for Google, they aren’t case-sensitive for Bing.
- For instance: https://yourdomainname.com/url-r/ is the same as https://yourdomainname.com/url-R/ for Bing. However, these URLs are seen as different by the Google search engine.
You need to be very aware when you’re creating links for your content. Otherwise, a typo can lead to both versions of your URL not being ranked.
Order of Parameters
In cases where a Content Management System (CMS) doesn’t use a clean and nice URL, different URLs may show similar result pages for most sites but ranked as unique on search engines.
- For example, messy URLs such as: /?id=3&cat=4 and /?id=4&cat=3 can give similar results in website systems although they’re different URLs for search engines.
Printer-Friendly Pages
Does your website have printer-friendly pages? If so, do you link to those pages from your content/article pages? One has to wonder which of the two versions do you really want Google to show?
Linking to printer-friendly pages may be detrimental to your site’s SEO because Google usually locates printer-friendly pages and ranks them as, you guessed it, duplicate content. Here is a good illustration for this:
(Image credit: tronicglobal)
Index Pages
If your website homepage is misconfigured, people may come to your site through multiple URLs. Misconfiguration usually happens without your knowledge. If your website homepage URL is https://yourdomainname.com, it’s important to note that it can be accessed through other URLs such as:
- https://yourdomainname.com/index.asp
- https://yourdomainname.com/index.html
- https://yourdomainname.com/index.php
- https://yourdomainname.com/index.aspx
To avoid such cases, take your time to select the best way to serve your homepage.
Test your SEO in 60 seconds!
Diib is one of the best SEO tools in the world. Diib uses the power of big data to help you quickly and easily increase your traffic and rankings. We’ll even let you know if you already deserve to rank higher for certain keywords.
- Easy-to-use automated SEO tool
- Keyword and backlink monitoring + ideas
- Speed, security, + Core Vitals tracking
- Intelligently suggests ideas to improve SEO
- Over 500,000k global members
- Built-in benchmarking and competitor analysis
Used by over 500k companies and organizations:
Syncs with
WWW vs. NON-WWW or HTTP vs. HTTPS
Although this problem rarely occurs nowadays, some website owners still have an issue with serving their content. If you’re using HTTPS and the subdomain WWW, you prefer serving your web pages in the form of:
https://WWW.yourdomainname.com
However, in the cases that your web server is incorrectly configured, your articles can be accessed via different URLs such as:
https://yourdomainname.com or http://yourdomainname.com or http://www.yourdomainname.com
Dedicated Pages For Images
Does your website show images on an empty page? Well, your CMS can sometimes create different pages for every image you use on your content. Because such pages don’t have any content, they are similar to other image pages on the internet. As such, they are seen as duplicate content by search engines.
Content Syndication
This occurs quite often, especially if your website is popular in a given niche. Sometimes blogs or sites providing similar goods and services (just like you) may use your content. Usually, content syndication occurs without your consent, although other website owners can ask to use your content for various reasons.
If the re-published content doesn’t link to your site, search engines may not know the source of the article.
Search Result Pages
Your website probably allows visitors to search for information within your homepage. This means that search results displayed on these pages are more or less the same and don’t offer any value to search engines. To avoid this, it’s important that you don’t link your website content to your search result pages.
Session IDs
Quite often, you may want to track your website visitors. To achieve this, you need to give your visitors a “session.” So, what is a session?
A session is the history of your website visitors. It tells you the visitors’ activities on your sites, such as the number of items put in the shopping cart vs. the ones bought. For a website to maintain a session as visitors move from one page to another, a Session ID is used.
(Image credit: tronicglobal)
Session IDs are usually stored in the form of cookies. However, search engines never store cookies. This leads to confusion by search engines between a Session ID and its URL. In turn, it perceives them as a duplicate content.
How to Identify Duplicate Content
It’s not easy to identify duplicate content on your site. To find out if your website content is copied, go to the “content heading “and “Meta information” cards. You’ll find information relating to your title page, Meta description, and H1 headings.
For duplicate content outside your website, try searching for content already published on your website. For example, if you want to see if there is duplicate content for this article “How to Avoid Duplicate Content,” you can search for the words, “For duplicate content outside your website, try searching for content already published on your website.” Or “Which of these is one possible solution for dealing with the duplicate content issue?”(Used towards the ends of this post).
Since you’ll probably be publishing a lot of content on your website, it’s advisable to double-check your content with Google duplicate content checker tools to ensure your content is unique. Here is an example of the results you would expect to see from that tool:
(Image credit: Moz)
Here are some tools you can use to check for duplicate content and save your time.
We hope that you found this article useful.
If you want to know more interesting about your site health, get personal recommendations and alerts, scan your website by Diib. It only takes 60 seconds.
Copyscape
Copyscape is a widely recognized tool for checking duplicate content. It has a comparison tool that highlights any duplicate content in your text. The good thing with Copyscape is that the tool gives you results in just a few seconds, and you get to know the exact percentage of your text that has already been published.
Siteliner
Occasionally, you might need to check on duplicate content for your entire site. Siteliner is an excellent tool for checking your entire site for not only duplicate content but also broken links and identifying web pages that are prominently ranked by search engines.
Duplichecker
Duplichecker is a tool that checks your content for plagiarism. The site allows you to check your content in either DocX, Text file, or URL searches. Before signing up, you are only allowed to do one free search per day with the limit going up to 50 searches after you sign up.
PlagSpotter
PlagSpotter URL search is efficient, free, and delivers results within a few seconds. The results from your URL scan include links to the sources of the duplicate content. As such, you can compare your text with similar content online.
The tool can also automatically monitor your website every week.
Duplicate Content Removal
Finding solutions to your web content will greatly improve your site’s SEO, particularly if you have an online business. For effective duplicate content removal, here are a few things you can do.
Remove Unnecessary Duplication
Although very time-consuming, the first and easiest way to remove duplicate content is by rewriting your information or articles. Take your time and read similar content online, these can be multiple websites that cover the same topic, and then put the ideas you have read into your own words. And feel free to add more information and use various framing devices to ensure the content you produce is 100% unique.
Use a 301 Redirect
In a few cases, it may be impossible for you to entirely prevent your CMS from creating multiple or wrong URLs for your content. In most cases, it’s possible to redirect wrong URLs. A redirect makes a browser change from one URL to another, whether within the same website or multiple websites. This is an example of a 301 redirect:
Check Boilerplate Repetition
Long boilerplates should not be used on different pages within the same website. Rather, they should be used on one page. For example, rather than using a long copyright notice at the bottom of every page, write a summary of the notice and link it to a page with more information.
Noindex Meta Tag
As stated earlier, other website owners can copy your content without your knowledge. Because you might not avoid such things from happening, include a small note on your content page, usually at the bottom. Ask those who might use your content to use a “noindex” meta tag to prevent any duplicate content from being ranked by Google or other search engines.
Avoid Publishing Stubs
How would you feel if you opened a website page and only found a few words and several empty pages? You’d probably be shocked. In most cases, you’ll find that website owners are yet to publish content on such pages. This can be detrimental because Google will rank all of the empty pages as having duplicate content.
Whenever you want to create a placeholder page, always use noindex meta tags to prevent such pages from being indexed.
Use Only One URL
Although you can use several URLs to link to your website, it’s important that you choose only one URL. Keep your customers in mind when choosing your URL because your URL needs to be user friendly. A single URL makes it easier for not only Google to rank your website, but also your users to locate your site or a page.
You need to set your preferred standard as either WWW or non-WWW. The idea is to avoid creating any confusion to your users and search engines.
Use a Hreflang Tag
A hreflang tag uses an HTML signal meta tag that tells people the language and/or geographical location of your site. Hreflang is essential for sites with multiple languages. For example this type of tag makes this possible:
Catering for non-native search engine users means that their experience on your site is improved.
However, if you have various versions of a single page in different languages, you must use hreflang tags to tell Google or other search engines about the variation.
Always Link Back To Original Content
Which of these is one possible solution for dealing with the duplicate content issue? Well, if you can’t get rid of duplicate content for various reasons, always remember to include a link to the original content. This can be just below or on top of the duplicate content.
If search engines come across several articles links that are pointing to your content, they’ll figure out your content is the original or canonical version.
How Much Duplicate Content Is Acceptable?
Google only rewards unique content that adds value to customers, which means that Google doesn’t welcome any amount of content duplication. However, the answer to the question, “how much duplicate content is acceptable by Google or other search engines?” is still debatable because no one answer is perfect. As such, always use a Google duplicate content checker and ensure your articles are 100% before publishing them. This is how search engines determine duplicate content:
(Image credit: www.elliance.com)
Diib®: Boost Your SEO Ranking by Avoiding Duplicate Content
SEO experts will warn you against duplicate content — they are right. Although duplicate content occurs almost everywhere these days, it’s important that you keep an eye on what you want to publish on your site if you want to improve your ranking. The Diib User Dashboard is configured to spot any cases of duplicate content and send you an alert with steps for remediation. Here are some of the features of that dashboard you’re sure to appreciate:
- Keyword and backlink competitor research tools will help you find what keywords your competitors are ranking for and create content around those keywords.
- Key metrics, like bounce rate, duplicate content and returning visitors can keep your website healthy.
- Check how your Facebook page followers like content you share.
- Enjoy a monthly call with a Diib growth expert.
Click here for a free 60 second site analysis or call 800-303-3510 to chat with a Growth Expert today!
FAQ’s
The best idea is to analyze each page and look for duplicate content. If only a few items on the page are duplicated, you’re likely fine. If the majority of the page looks similar to another page, merge those pages into one strong page.
While duplicate content doesn’t actually get you penalized, it can confuse readers and cause a high bounce rate. Google is specifically targeting this issue in its latest algorithms.
So once you have identified your copied content, go to the Google DMCA page and select submit a legal request. “Web Search” (or “Blogger” if appropriate).
A canonical tag is the technical term for telling search engines that a certain page is the master copy of the page. Using the canonical tag prevents problems caused by identical or “duplicate” content appearing on multiple URLs.
All pages should contain a canonical tag, this helps to head off the possibility of any duplication. Even if there aren’t any possible duplications yet, that could happen in the future.