Detection of Advanced Web Bots by Combining Web Logs with Mouse Behavioural Biometrics

Authors:
Christos Iliou

Information Technologies Institute, CERTH and BU-CERT, Bournemouth University, Bournemouth, United Kingdom

Information Technologies Institute, CERTH and BU-CERT, Bournemouth University, Bournemouth, United Kingdom
View Profile

,
Theodoros Kostoulas

Department of Information & Communication Systems Engineering, University of the Aegean and Department of Computing and Informatics, Bournemouth University, Bournemouth, United Kingdom

Department of Information & Communication Systems Engineering, University of the Aegean and Department of Computing and Informatics, Bournemouth University, Bournemouth, United Kingdom
View Profile

,
Theodora Tsikrika

Information Technologies Institute, CERTH, Thessaloniki, Greece

Information Technologies Institute, CERTH, Thessaloniki, Greece
View Profile

,
Vasilis Katos

BU-CERT, Bournemouth University, Bournemouth, United Kingdom

BU-CERT, Bournemouth University, Bournemouth, United Kingdom

0000-0001-6132-3004
View Profile

,
Stefanos Vrochidis

Information Technologies Institute, CERTH, Thessaloniki, Greece

Information Technologies Institute, CERTH, Thessaloniki, Greece
View Profile

,
Ioannis Kompatsiaris

Information Technologies Institute, CERTH, Thessaloniki, Greece

Information Technologies Institute, CERTH, Thessaloniki, Greece
View Profile

Authors Info & Claims

Digital Threats: Research and Practice Volume 2 Issue 3Article No.: 24pp 1–26https://doi.org/10.1145/3447815

Published:08 June 2021Publication History

Digital Threats: Research and Practice

Abstract

Web bots vary in sophistication based on their purpose, ranging from simple automated scripts to advanced web bots that have a browser fingerprint, support the main browser functionalities, and exhibit a humanlike behaviour. Advanced web bots are especially appealing to malicious web bot creators, due to their browserlike fingerprint and humanlike behaviour that reduce their detectability. This work proposes a web bot detection framework that comprises two detection modules: (i) a detection module that utilises web logs, and (ii) a detection module that leverages mouse movements. The framework combines the results of each module in a novel way to capture the different temporal characteristics of the web logs and the mouse movements, as well as the spatial characteristics of the mouse movements. We assess its effectiveness on web bots of two levels of evasiveness: (a) moderate web bots that have a browser fingerprint and (b) advanced web bots that have a browser fingerprint and also exhibit a humanlike behaviour. We show that combining web logs with visitors’ mouse movements is more effective and robust toward detecting advanced web bots that try to evade detection, as opposed to using only one of those approaches.

References

Akamai. 2018. Akamai’s Bot Manager—Advanced Strategies to Flexibly Manage the Long-term Business and IT Impact of Bots. Retrieved from https://www.akamai.com/us/en/multimedia/documents/product-brief/bot-manager-product-brief.pdf.Google Scholar
Ismail Akrout, Amal Feriani, and Mohamed Akrout. 2019. Hacking Google recaptcha v3 Using Reinforcement Learning. arXiv:1903.01003. Retrieved from https://arxiv.org/abs/1903.01003.Google Scholar
Shafiq Alam, Gillian Dobbie, Yun Sing Koh, and Patricia Riddle. 2014. Web bots detection using particle swarm optimization based clustering. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’14). IEEE, 2955–2962.Google Scholar
Yasmin A. AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2013. Access patterns for robots and humans in web archives. In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 339–348.Google Scholar
Babak Amin Azad, Oleksii Starov, Pierre Laperdrix, and Nick Nikiforakis. 2020. Web Runner 2049: Evaluating Third-Party Anti-bot Services. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 135--159. DOI:https://doi.org/10.1007/978-3-030-52683-2_7Google Scholar
Quan Bai, Gang Xiong, Yong Zhao, and Longtao He. 2014. Analysis and detection of bogus behavior in web crawler measurement. In Proceedings of the Second International Conference on Information Technology and Quantitative Management (ITQM'14). Elsevier, 1084--1091. DOI:https://doi.org/10.1016/j.procs.2014.05.363Google Scholar
Anshul Bhargav and Munish Bhargav. 2014. Pattern discovery and users classification through web usage mining. In Proceedings of the International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT’14). IEEE, 632–636.Google Scholar
David Bianco. 2013. The pyramid of pain. Enterprise Detection & Response (2013). http://detect-respond.blogspot.com/2013/03/thepyramid-of-pain.html.Google Scholar
Kevin Bock, Daven Patel, George Hughey, and Dave Levin. 2017. unCaptcha: A low-resource defeat of recaptcha’s audio challenge. In Proceedings of the 11th {USENIX} Workshop on Offensive Technologies ({WOOT}’17).Google Scholar
Alberto Cabri, Grażyna Suchacka, Stefano Rovetta, and Francesco Masulli. 2018. Online web bot detection using a sequential classification approach. In Proceedings of the 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS’18). IEEE, 1536–1540.Google Scholar
Michele Campobasso, Pavlo Burda, and Luca Allodi. 2019. CARONTE: Crawling adversarial resources over non-trusted, high-profile environments. In Proceedings of the 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW’19). IEEE, 433–442.Google Scholar
Zi Chu, Steven Gianvecchio, and Haining Wang. 2018. Bot or human? A behavior-based online bot detection system. In From Database to Cyber Security: Essays Dedicated to Sushil Jajodia on the Occasion of His 70th Birthday. 432–449. DOI:https://doi.org/10.1007/978-3-030-04834-1_21Google Scholar
Zibusiso Dewa and Leandros A Maglaras. 2016. Data mining and intrusion detection systems. vol 7 (2016), 62–71.Google ScholarCross Ref
Distil Networks. 2018. 2018 BAD BOT REPORT: The Year Bad Bots Went Mainstream. Retrieved from https://resources.distilnetworks.com/white-paper-reports/2018-bad-bot-report.Google Scholar
Distil Networks. 2019. 2019 BAD BOT REPORT: The Bot Arms Race Continues. Retrieved from https://resources.distilnetworks.com/white-paper-reports/bad-bot-report-2019.Google Scholar
Wang Dong, Xi Lei, Zhang Hui, Liu Hebing, Zhang Hao, and Song Ting. 2015. Web robot detection with semi-supervised learning method. In 3rd International Conference on Material, Mechanical and Manufacturing Engineering (IC3ME'15). Atlantis Press, 2123--2128.Google Scholar
Derek Doran and Swapna S. Gokhale. 2012. A classification framework for web robots. J. Assoc. Inf. Sci. Technol. 63, 12 (2012), 2549–2554.Google ScholarDigital Library
Derek Doran and Swapna S. Gokhale. 2016. An integrated method for real time and offline web robot detection. Expert Syst. 33, 6 (2016), 592–606.Google ScholarDigital Library
Javad Hamidzadeh, Mahdieh Zabihimayvan, and Reza Sadeghi. 2017. Detection of Web site visitors based on fuzzy rough sets. Soft Comput. 22, 7 (2018), 2175--2188. DOI:https://doi.org/10.1007/s00500-016-2476-4Google Scholar
Christos Iliou, Theodoros Kostoulas, Theodora Tsikrika, Vasilis Katos, Stefanos Vrochidis, and Yiannis Kompatsiaris. 2019. Towards a framework for detecting advanced Web bots. In Proceedings of the 14th International Conference on Availability, Reliability and Security (ARES’19).18:1–18:10. DOI:https://doi.org/10.1145/3339252.3339267Google ScholarDigital Library
Christos Iliou, Theodora Tsikrika, Stefanos Vrochidis, and Yiannis Kompatsiaris. 2017. Evasive focused crawling by exploiting human browsing behaviour: A study on terrorism-related content. In Proceedings of the 1st International Workshop on Cyber Deviance Detection co-located with the 10th International Conference on Web Search and Data (Mining CyberDD @ WSDM 2017).Google Scholar
Hugo Jonker, Benjamin Krumnow, and Gabry Vlot. 2019. Fingerprint surface-based detection of web bot detectors. In Proceedings of the European Symposium on Research in Computer Security. Springer, 586–605.Google Scholar
Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser fingerprinting: A survey. ACM Trans. Web 14, 2 (2020), 1–33.Google ScholarDigital Library
Pierre Laperdrix, Walter Rudametkin, and Benoit Baudry. 2016. Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP’16). IEEE, 878–894.Google Scholar
Borui Li, Wei Wang, Yang Gao, Vir V. Phoha, and Zhanpeng Jin. 2020. Wrist in motion: A seamless context-aware continuous authentication framework using your clickings and typings. IEEE Trans. Biometr. Behav. Identity Sci. 2, 3 (2020), 294--307. DOI:https://doi.org/10.1109/TBIOM.2020.2997004Google ScholarCross Ref
G. Neelima and Sireesha Rodda. 2016. Predicting user behavior through sessions using the web log mining. In Proceedings of the International Conference on Advances in Human Machine Interaction (HMI’16). IEEE, 1–5.Google Scholar
Sergio Pastrana, Daniel R. Thomas, Alice Hutchings, and Richard Clayton. 2018. CrimeBB: Enabling cybercrime research on underground forums at scale. In Proceedings of the 2018 World Wide Web Conference on World Wide Web (WWW’18). 1845–1854. DOI:https://doi.org/10.1145/3178876.3186178Google Scholar
Pavel Pudil, Jana Novovicová, and Josef Kittler. 1994. Floating search methods in feature selection. Pattern Recogn. Lett. 15, 10 (1994), 1119–1125. DOI:https://doi.org/10.1016/0167-8655(94)90127-9Google ScholarDigital Library
Stefano Rovetta, Alberto Cabri, Francesco Masulli, and Grażyna Suchacka. 2017. Bot or not? A case study on bot recognition from web session logs. In Proceedings of the Italian Workshop on Neural Nets. Springer, 197–206.Google Scholar
H Nathan Rude and Derek Doran. 2015. Request type prediction for web robot and internet of things traffic. In Proceedings of the IEEE 14th International Conference on Machine Learning and Applications (ICMLA’15). IEEE, 995–1000.Google Scholar
Michael Schwarz, Florian Lackner, and Daniel Gruss. 2019. JavaScript template attacks: Automatically inferring host information for targeted exploits. In Proceedings of the Network and Distributed System Security Symposium (NDSS’19).Google Scholar
Merve Baş Seyyar, Ferhat Özgür Çatak, and Ensar Gül. 2018. Detection of attack-targeted scans from the Apache HTTP server access logs. Appl. Comput. Inf. 14, 1 (2018), 28--36.Google Scholar
Dilip Singh Sisodia and Shrish Verma. 2012. Web usage pattern analysis through web logs: A review. In Proceedings of the International Joint Conference on Computer Science and Software Engineering (JCSSE’12). IEEE, 49–53.Google Scholar
Dilip Singh Sisodia, Shrish Verma, and Om Prakash Vyas. 2015. Agglomerative approach for identification and elimination of web robots from web server logs to extract knowledge about actual visitors. J. Data Anal. Inf. Process. 3, 01 (2015), 1.Google ScholarCross Ref
Suphannee Sivakorn, Iasonas Polakis, and Angelos D. Keromytis. 2016. I am robot: (Deep) learning to break semantic image CAPTCHAs. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P’16). 388–403. DOI:https://doi.org/10.1109/EuroSP.2016.37Google Scholar
Suphannee Sivakorn, Jason Polakis, and Angelos D Keromytis. 2016. I’m not a human: Breaking the Google reCAPTCHA. In Black Hat ASIA 2016. 1--12.Google Scholar
Dusan Stevanovic, Aijun An, and Natalija Vlajic. 2012. Feature evaluation for web crawler detection with data mining techniques. Expert Syst. Appl. 39, 10 (2012), 8707–8717.Google ScholarDigital Library
Dusan Stevanovic, Natalija Vlajic, and Aijun An. 2013. Detection of malicious and non-malicious website visitors using unsupervised neural network learning. Appl. Soft Comput. 13, 1 (2013), 698–708.Google ScholarDigital Library
Antoine Vastel, Walter Rudametkin, Romain Rouvoy, and Xavier Blanc. 2020. FP-Crawlers: Studying the resilience of browser fingerprinting to block crawlers. In Proceedings of the NDSS Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb’20).Google Scholar
Luis Von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford. 2003. CAPTCHA: Using hard AI problems for security. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 294–311.Google Scholar
Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, and Ben Y. Zhao. 2013. You are how you click: Clickstream analysis for sybil detection. In Proceedings of the USENIX Security Symposium, Vol. 9. 1–008.Google Scholar
Ang Wei, Yuxuan Zhao, and Zhongmin Cai. 2019. A deep learning approach to web bot detection using mouse behavioral biometrics. In Proceedings of the 14th Chinese Conference on Biometric Recognition (CCBR’19). 388–395. DOI:https://doi.org/10.1007/978-3-030-31456-9_43Google Scholar
Mahdieh Zabihimayvan, Reza Sadeghi, H. Nathan Rude, and Derek Doran. 2017. A soft computing approach for benign and malicious web robot detection. Expert Syst. Appl. 87 (2017), 129--140. DOI:https://doi.org/10.1016/j.eswa.2017.06.004Google ScholarDigital Library

Index Terms

Detection of Advanced Web Bots by Combining Web Logs with Mouse Behavioural Biometrics
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Information systems
  1. World Wide Web
    1. Web mining
      1. Traffic analysis
      2. Web log analysis

Recommendations

Web Bot Detection Evasion Using Deep Reinforcement Learning
ARES '22: Proceedings of the 17th International Conference on Availability, Reliability and Security

Web bots are vital for the web as they can be used to automate several actions, some of which would have otherwise been impossible or very time consuming. These actions can be benign, such as website testing and web indexing, or malicious, such as ...
Read More
A Deep Learning Approach to Web Bot Detection Using Mouse Behavioral Biometrics
Biometric Recognition
Abstract
Web bots are automated scripts that perform online tasks like human. Abuse of bot technology poses various threats to the security of websites. Recently, mouse dynamics has been applied to bot detection by analyzing whether recorded mouse ...
Read More
Towards a framework for detecting advanced Web bots
ARES '19: Proceedings of the 14th International Conference on Availability, Reliability and Security

Automated programs (bots) are responsible for a large percentage of website traffic. These bots can either be used for benign purposes, such as Web indexing, Website monitoring (validation of hyperlinks and HTML code), feed fetching Web content and data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Digital Threats: Research and Practice Volume 2, Issue 3
September 2021
143 pages
EISSN:2576-5337
DOI:10.1145/3470118
Editors:
Arun Lakhotia
University of Louisiana at Lafayette and Cythereal, USA
,
Leigh Metcalf
CERT, USA
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 June 2021
- Online AM: 15 April 2021
- Accepted: 1 January 2021
- Revised: 1 December 2020
- Received: 1 February 2020
Published in dtrap Volume 2, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Web bot detection
advanced web bots
evasive web bots
humanlike behaviour
mouse biometrics
mouse movements
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 2,506
  Total Downloads
- Downloads (Last 12 months)866
- Downloads (Last 6 weeks)113
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Detection of Advanced Web Bots by Combining Web Logs with Mouse Behavioural Biometrics

Digital Threats: Research and Practice

Abstract

References

Cited By

Index Terms

Recommendations

Web Bot Detection Evasion Using Deep Reinforcement Learning

A Deep Learning Approach to Web Bot Detection Using Mouse Behavioral Biometrics

Towards a framework for detecting advanced Web bots