Jump to content

Blog scraping

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 146.171.254.65 (talk) at 00:59, 9 October 2007 (Defense: remove section. not a howto. links contain information on defense). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Blog scraping, is the process where automated software scans hundreds of thousands of blogs per day, searching for and copying content. The process is sometimes referenced by the name given the software or individuals responsible for the action, “blog scrapers.”

"Scraping" essentially stands for copying, or in the case of copyrighted material, stealing content off a blog that is not owned by the individual initiating the scraping process. The scraped content is often used on Spam blogs or splogs.

Dangers

Obviously, if blog scrapers are gathering content that is copyrighted material, that is a violation of law. But even ignoring for a moment the legal side, there are a number of more practical problems that Blog scraping causes for the person or business whose blog is being scraped. The problem of Blog scraping is particularly worrisome for business owners and business bloggers.

Sometimes a blog scraper will copy an entire post off an independent or business blog. That duplicate content will include the author's tag and a link back to the author's site (if that link appears in the author's tag.)

Many times though, blog scrapers copy just the portion of the content that is keyword relevant to their splog topic.

Why the more 'advanced' Blog scrapers do this is simple. By copying only the content that is relevant to their splog topic, they can increase the keyword relevancy of their site(s). Secondly, by not scraping the entire post, they eliminate any outbound links which would reduce their search engine ranking.

Additionally, scraped content can appear on literally any type of splog or RSS fed spam site. That means an unsuspecting individual could find their creative or even copyrighted material showing up on a site promoting pornography or other type of content that would be offensive to the original author or his/her audience. This can be damaging to the original author's reputation.


WordPress Feed Copywriter Plugin

Six Steps to Prevent Content Theft and Combat Copyright Infringement on Your Business Blog

Behind Splogging: Why Sploggers Splog