duplicate content penalty vs. Content Spinning

Summaries of Articles

Articles About Duplicate Content Penality

* [https://www.hobo-web.co.uk/duplicate-content-problems/ Duplicate content problems] QUOTE: “Important: The Lowest rating is appropriate if all or almost all of the MC (main content) on the page is copied with little or no time, effort, expertise, manual curation, or added value for users. Such pages should be rated Lowest, even if the page assigns credit for the content to another source.” – Google Search Quality Evaluator Guidelines July 2018 Google patent on identifying a primary version of duplicate pages does seem to find some importance in identifying what it believes to be the most important version among many duplicate documents. “We now recommend not blocking access to duplicate content on your website, whether with a robots.txt file or other methods” John Mueller Do not block your own duplicate content. * Google does not recommend blocking crawler access to duplicate content (dc) on your website, whether with a robots.txt file or other methods. * If search engines can’t crawl pages with dc, they can’t automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. * A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel=“canonical” link element, the URL parameter handling tool, or 301 redirects. In cases where DC leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools. * DC on a site is not grounds for action on that site unless it appears that the intent of the DC is to be deceptive and manipulate search engine results. Should pages have self-referencing Canonical link elements? * QUOTE: “I recommend doing this self-referential canonical because it really makes it clear to us which page you want to have indexed, or what the URL should be when it is indexed. Even if you have one page, sometimes there’s different variations of the URL that can pull that page up. For example, with parameters in the end, perhaps with upper lower case or www and non-www. All of these things can be cleaned up with a rel canonical tag.” John Mueller, Google 2017 QUOTE: “Use 301s: If you’ve restructured your site, use 301 redirects (“RedirectPermanent”) in your .htaccess file to smartly redirect users, Googlebot, and other spiders. (In Apache, you can do this with an .htaccess file; in IIS, you can do this through the administrative console.)” Google QUOTE: “Be consistent: “Try to keep your internal linking consistent. For example, don’t link to http://www.example.com/page/ and http://www.example.com/page and http://www.example.com/page/index.htm.” Google Advice for dealing with syndicating content to other sites * QUOTE: “Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you’d prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. You can also ask those who use your syndicated material to use the noindex meta tag to prevent search engines from indexing their version of the content.” Google avoid empty or stub pages, especially several of them without unique content. Use noindex meta tag to block them until they are built out. * google panda has an algorithm to spot thin content Paginated content solutions: * focus on ranking a canonical version of a URL at all times. * specify a view all page * QUOTE: “Use rel=“next” and rel=“prev” links to indicate the relationship between component URLs. This markup provides a strong hint to Google that you would like us to treat these pages as a logical sequence, thus consolidating their linking properties and usually sending searchers to the first page.” Google * QUOTE: “While it’s fine to set rel=”canonical” from a component URL to a single view-all page, setting the canonical to the first page of a parameter-less sequence is considered improper usage.” – Google internal search result pages - be very careful * Google wants internal search result pages blocked in robots.txt * one problem can be infinite search spaces, which should be avoided, because they use up crawler budget * very risky, especially in competitive niche casing in urls * better to force all urls to lower case, uppercase letters create a different url Google gives a few common examples of pages that often have thin content with little or no added value: * 1 . Automatically generated content, * 2. Thin affiliate pages * 3. Content from other sources. For example: Scraped content or low-quality guest blog posts * 4. Doorway pages Google classifications to avoid: “duplicate content” “copied content”, “thin content”, “boilerplate”, or hastily rewritten. Even worse: “synonymised” or “spun text” How to find duplicate content: * put piece of text in quotes and search on google for it * duplicate content checker tool is Copyscape - it can check duplicate content ratio between two selections of text. More reading * More reading http://googlewebmastercentral.blogspot.co.uk/2009/02/specify-your-canonical.html https://support.google.com/webmasters/answer/66359?hl=en http://googlewebmastercentral.blogspot.com/2009/10/reunifying-duplicate-content-on-your.html https://webmasters.googleblog.com/2011/05/more-guidance-on-building-high-quality.html https://support.google.com/webmasters/answer/2604719?hl=en http://www.SEObythesea.com/2008/02/google-omits-needless-words-on-your-pages/ https://support.google.com/webmasters/answer/1663744?hl=en&ref_topic=4617741 ttp:searchengineland.com/google-panda-demotes-adjusts-rankings-not-devalue-261142 Other important terms: * Low quality pages * doorway pages - be careful about building ‘doorway pages‘ by creating too many keywords, tags or category pages. * unnatural links - linkbacks from syndicated content can be considered unnatural links * thin content - google has manual actions against this == Other terms related to duplicate content penalty == Other important terms: * Low quality pages * doorway pages - be careful about building ‘doorway pages‘ by creating too many keywords, tags or category pages. * unnatural links - linkbacks from syndicated content can be considered unnatural links * thin content - google has manual actions against this * PBN - private blog network - https://www.postmm.com/seo/how-to-automate-large-disavow-link-list/ * Link farming - https://www.postmm.com/seo/link-farms-and-why-they-are-bad/ == Questions to research about duplicate content penalties ==

web_development/duplicate_content_penalty.txt · Last modified: 2019/05/31 01:14 by adminuser
Powered by PHP Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0 Valid HTML5