What is a 'bot'?
Bot is the short name for a software application that runs automated tasks over the Internet without any human interaction. It is called web robot, robot or simply bot. Also the names ‘scraper’ or ‘spider’ can be used.
An internet bot is mostly used for web crawling or information scraping. The bot retrieves, processes and stores information from websites. According to a recent (2023) report, worldwide almost 50% of all internet traffic is generated by bots. [Read the complete article on the NDTV website]
Difference between ‘good’ and ‘bad’ bots
Some are ‘good’ bots – e.g. search engine spiders or bots for SEO analysis. We really need those so our website and information can be found. Bots and scrapers are also used to find out if your website doesn’t contain copyrighted material from others such as images. Other bots can be ‘bad’: they enable misuse, and attacks on websites. They allow the bot operator (which can include competitors and fraudsters) to perform many malicious activities such as competitive data mining (price lists and conditions), personal data retrieval (contact scraping), advertising fraud, denial-of-service (DoS) attacks, ruin your web store and much more. And with the newest techniques and availability of cloud servers and ready-made bot programs, building a bot is cheap and easy.
Identifying bot traffic
While building a bot is easy, disallowing access to the website information for a bot is less easy. After all, a website is supposed to be available for any visitor. There are measures a website owner can take to prevent bad bots from operating, such as bot detection and disallowing certain bots from retrieval of webpages. However, in response to these methods more and more bots manipulate the browser data, frequently change their IP-address, or use natural language processing to simulate being an actual person who visits your site.
Impact on analytics
Metrics such as number of visitors, visited pages, duration of an average website session by a visitor or bounce rates will be dramatically different from actual metrics for real visitors and this in turn can lead to budget decisions based on incorrect data.
Snoobi Analytics and bot detection
What can Snoobi do to tackle bots, so the analytics metrics reflect actual visitors?
Since its inception, Snoobi Analytics has the capability to exclude traffic from good and known bots. Traffic that is generated by bots, such as those used by search engines are stored as any other traffic data but excluded from the user reports.
Since new ‘good’ bots are created almost monthly and existing good bots change their profile, updates to the Snoobi bot lists are done regularly. Where possible, Snoobi also excludes traffic that can be identified as visits by other type of bots. In an upcoming release, Snoobi will enhance the process to recognize bot traffic so that additonal bot traffic is excluded from the standard web analytics reports including all historic data.
In addition, Snoobi has started a project to automatically identify patterns in the millions of hits that access the websites of our customers to ensure metrics are reliable and reflect actual visitors.