Bots and SEO Rankings
How Bad Bots affect your SEO rankings.
Most webmasters will ignore crawlers and scrapers that they view are ‘well behaved’. The definition of ‘well behaved’ is often that they don’t make too many requests, hog too much resource, or seem to have the potential to cause harm. Ranking the bot as well behaved just because it rate limits itself and appears to be well behaved is almost never a good idea. These ‘slow and low’ bot, that often slip under the radar, are programmed to do precisely that, and you should take special care of these. What harm can a crawler really do?
The answer is plenty!
Entire Web site Cloning
Imagine you wake up one morning and your entire website has been copied to another domain and brand name. The owner can’t be traced. All your handwork creating content has simply been hi-jacked. Worse, the new web site is generating lots of clicks and links, and starting to gain traction and domain authority over your domain.
Even worse, it’s not just a copy. The cybercriminals have built the new web site from scratch using the latest front-end technology and it looks much better than yours and is easier to navigate. They have also stolen content from your competitors so the cloned site has richer content.
We’re seeing a big increase in these types of total cloning. Although it seems like a huge effort on the part of the cybercriminal, sophisticated bots are able to automatically capture the content, and then automatically insert it into a new content template using a CMS framework. The website then just needs cleaning up and organizing.
How does Google respond to the cloning? Google’s algorithms aren’t looking at the original content date and assuming everything is a copy of the original. They are looking at which domain has the highest domain authority, which means which domain does the web actually rate the highest through inbound links which satisfy the user more? The cloned sites are often supported by link-building and even use click-farms to generate user interest. Site owners can report the offending site to Google or the DMCA. Only a tiny percentage of websites are manually monitored by Google, and the process will take time.
In the meantime, the risk is that Google regards both versions of the same content as ‘spammy’ and your website content is seen as ‘thin content’ as it’s no longer unique. You’ve lost that unique and exclusive content that you’ve spent so long building up in a matter of days. Your loss is the cybercriminals gain. One of the latest trends is to drive huge volumes of traffic via Social Media to the cloned web portal. The site looks legitimate, has rich content, and the new customers may well become customers and develop brand allegiance to the cloned web site. The concept of the ‘original’ has been destroyed.
Scraping can take away business
Even if your entire website is not cloned, scrapers can steal value content and IP which destroys the value of the investment you’ve made in that content. The scrapers can be much more subtle than just cloning the entire site. For example, one common scraper that systematically crawls through your entire inventory puts each product in the shopping cart and comprehensively scrapes all the pricing, shipping and total cost of the product to and from test locations. This ensures competitors can monitor the actual delivery cost and refine their competitive pricing policies accordingly. Often these bots are hard to spot as they are rate limited to ensure they won’t stand out in the weblogs or trigger any rate-limiting rules in the WAF. Of course, these bots will also skew your basket to conversion metrics considerably. These bots will never transact - so changes you are making in the UX to push conversion may be hidden by the skewed data from these bots.
Degraded Website User Engagement
Bots will degrade your overall site metrics - so not only the key conversion rates, but web site bounce rates, time spent per page, the dwell time - time spent on your content before returning to the search results, and the general metrics of your site. Although Google doesn’t monitor your Google Analytics metrics per se, Google does want to know which web sites users find valuable. Click-bait can generate lots of inbound links, but if the users are immediately clicking away, the content value is going to be degraded.
Increased web infrastructure costs
Sites with little headroom can see downtime from widespread scraper activity alone. Most sites have enough spare capacity to be able to cope with scraper activity, but why pay additional bandwidth fees for as much as 50% of the traffic you don’t actually want? The effects can be particularly hard on websites that support legacy databases that can’t just auto-scale.
The effect of form spam and fake listings
Malicious bots can fill in fraudulent registration forms and generate fake listings on your real-estate portal. The number of fake listings on some portals has gone as high as 80% of all listings on the portals. Spam leads will waste your sales efforts by focussing on leads that can never be contacted. This also results in negative user experience for genuine buyers, brokers and real estate agents.