Abstract
Web bots vary in sophistication based on their purpose, ranging from simple automated scripts to advanced web bots that have a browser fingerprint, support the main browser functionalities, and exhibit a humanlike behaviour. Advanced web bots are especially appealing to malicious web bot creators, due to their browserlike fingerprint and humanlike behaviour that reduce their detectability. This work proposes a web bot detection framework that comprises two detection modules: (i) a detection module that utilises web logs, and (ii) a detection module that leverages mouse movements. The framework combines the results of each module in a novel way to capture the different temporal characteristics of the web logs and the mouse movements, as well as the spatial characteristics of the mouse movements. We assess its effectiveness on web bots of two levels of evasiveness: (a) moderate web bots that have a browser fingerprint and (b) advanced web bots that have a browser fingerprint and also exhibit a humanlike behaviour. We show that combining web logs with visitors’ mouse movements is more effective and robust toward detecting advanced web bots that try to evade detection, as opposed to using only one of those approaches.
- Akamai. 2018. Akamai’s Bot Manager—Advanced Strategies to Flexibly Manage the Long-term Business and IT Impact of Bots. Retrieved from https://www.akamai.com/us/en/multimedia/documents/product-brief/bot-manager-product-brief.pdf.Google Scholar
- Ismail Akrout, Amal Feriani, and Mohamed Akrout. 2019. Hacking Google recaptcha v3 Using Reinforcement Learning. arXiv:1903.01003. Retrieved from https://arxiv.org/abs/1903.01003.Google Scholar
- Shafiq Alam, Gillian Dobbie, Yun Sing Koh, and Patricia Riddle. 2014. Web bots detection using particle swarm optimization based clustering. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’14). IEEE, 2955–2962.Google Scholar
- Yasmin A. AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2013. Access patterns for robots and humans in web archives. In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 339–348.Google Scholar
- Babak Amin Azad, Oleksii Starov, Pierre Laperdrix, and Nick Nikiforakis. 2020. Web Runner 2049: Evaluating Third-Party Anti-bot Services. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 135--159. DOI:https://doi.org/10.1007/978-3-030-52683-2_7Google Scholar
- Quan Bai, Gang Xiong, Yong Zhao, and Longtao He. 2014. Analysis and detection of bogus behavior in web crawler measurement. In Proceedings of the Second International Conference on Information Technology and Quantitative Management (ITQM'14). Elsevier, 1084--1091. DOI:https://doi.org/10.1016/j.procs.2014.05.363Google Scholar
- Anshul Bhargav and Munish Bhargav. 2014. Pattern discovery and users classification through web usage mining. In Proceedings of the International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT’14). IEEE, 632–636.Google Scholar
- David Bianco. 2013. The pyramid of pain. Enterprise Detection & Response (2013). http://detect-respond.blogspot.com/2013/03/thepyramid-of-pain.html.Google Scholar
- Kevin Bock, Daven Patel, George Hughey, and Dave Levin. 2017. unCaptcha: A low-resource defeat of recaptcha’s audio challenge. In Proceedings of the 11th {USENIX} Workshop on Offensive Technologies ({WOOT}’17).Google Scholar
- Alberto Cabri, Grażyna Suchacka, Stefano Rovetta, and Francesco Masulli. 2018. Online web bot detection using a sequential classification approach. In Proceedings of the 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS’18). IEEE, 1536–1540.Google Scholar
- Michele Campobasso, Pavlo Burda, and Luca Allodi. 2019. CARONTE: Crawling adversarial resources over non-trusted, high-profile environments. In Proceedings of the 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW’19). IEEE, 433–442.Google Scholar
- Zi Chu, Steven Gianvecchio, and Haining Wang. 2018. Bot or human? A behavior-based online bot detection system. In From Database to Cyber Security: Essays Dedicated to Sushil Jajodia on the Occasion of His 70th Birthday. 432–449. DOI:https://doi.org/10.1007/978-3-030-04834-1_21Google Scholar
- Zibusiso Dewa and Leandros A Maglaras. 2016. Data mining and intrusion detection systems. vol 7 (2016), 62–71.Google ScholarCross Ref
- Distil Networks. 2018. 2018 BAD BOT REPORT: The Year Bad Bots Went Mainstream. Retrieved from https://resources.distilnetworks.com/white-paper-reports/2018-bad-bot-report.Google Scholar
- Distil Networks. 2019. 2019 BAD BOT REPORT: The Bot Arms Race Continues. Retrieved from https://resources.distilnetworks.com/white-paper-reports/bad-bot-report-2019.Google Scholar
- Wang Dong, Xi Lei, Zhang Hui, Liu Hebing, Zhang Hao, and Song Ting. 2015. Web robot detection with semi-supervised learning method. In 3rd International Conference on Material, Mechanical and Manufacturing Engineering (IC3ME'15). Atlantis Press, 2123--2128.Google Scholar
- Derek Doran and Swapna S. Gokhale. 2012. A classification framework for web robots. J. Assoc. Inf. Sci. Technol. 63, 12 (2012), 2549–2554.Google ScholarDigital Library
- Derek Doran and Swapna S. Gokhale. 2016. An integrated method for real time and offline web robot detection. Expert Syst. 33, 6 (2016), 592–606.Google ScholarDigital Library
- Javad Hamidzadeh, Mahdieh Zabihimayvan, and Reza Sadeghi. 2017. Detection of Web site visitors based on fuzzy rough sets. Soft Comput. 22, 7 (2018), 2175--2188. DOI:https://doi.org/10.1007/s00500-016-2476-4Google Scholar
- Christos Iliou, Theodoros Kostoulas, Theodora Tsikrika, Vasilis Katos, Stefanos Vrochidis, and Yiannis Kompatsiaris. 2019. Towards a framework for detecting advanced Web bots. In Proceedings of the 14th International Conference on Availability, Reliability and Security (ARES’19).18:1–18:10. DOI:https://doi.org/10.1145/3339252.3339267Google ScholarDigital Library
- Christos Iliou, Theodora Tsikrika, Stefanos Vrochidis, and Yiannis Kompatsiaris. 2017. Evasive focused crawling by exploiting human browsing behaviour: A study on terrorism-related content. In Proceedings of the 1st International Workshop on Cyber Deviance Detection co-located with the 10th International Conference on Web Search and Data (Mining CyberDD @ WSDM 2017).Google Scholar
- Hugo Jonker, Benjamin Krumnow, and Gabry Vlot. 2019. Fingerprint surface-based detection of web bot detectors. In Proceedings of the European Symposium on Research in Computer Security. Springer, 586–605.Google Scholar
- Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser fingerprinting: A survey. ACM Trans. Web 14, 2 (2020), 1–33.Google ScholarDigital Library
- Pierre Laperdrix, Walter Rudametkin, and Benoit Baudry. 2016. Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP’16). IEEE, 878–894.Google Scholar
- Borui Li, Wei Wang, Yang Gao, Vir V. Phoha, and Zhanpeng Jin. 2020. Wrist in motion: A seamless context-aware continuous authentication framework using your clickings and typings. IEEE Trans. Biometr. Behav. Identity Sci. 2, 3 (2020), 294--307. DOI:https://doi.org/10.1109/TBIOM.2020.2997004Google ScholarCross Ref
- G. Neelima and Sireesha Rodda. 2016. Predicting user behavior through sessions using the web log mining. In Proceedings of the International Conference on Advances in Human Machine Interaction (HMI’16). IEEE, 1–5.Google Scholar
- Sergio Pastrana, Daniel R. Thomas, Alice Hutchings, and Richard Clayton. 2018. CrimeBB: Enabling cybercrime research on underground forums at scale. In Proceedings of the 2018 World Wide Web Conference on World Wide Web (WWW’18). 1845–1854. DOI:https://doi.org/10.1145/3178876.3186178Google Scholar
- Pavel Pudil, Jana Novovicová, and Josef Kittler. 1994. Floating search methods in feature selection. Pattern Recogn. Lett. 15, 10 (1994), 1119–1125. DOI:https://doi.org/10.1016/0167-8655(94)90127-9Google ScholarDigital Library
- Stefano Rovetta, Alberto Cabri, Francesco Masulli, and Grażyna Suchacka. 2017. Bot or not? A case study on bot recognition from web session logs. In Proceedings of the Italian Workshop on Neural Nets. Springer, 197–206.Google Scholar
- H Nathan Rude and Derek Doran. 2015. Request type prediction for web robot and internet of things traffic. In Proceedings of the IEEE 14th International Conference on Machine Learning and Applications (ICMLA’15). IEEE, 995–1000.Google Scholar
- Michael Schwarz, Florian Lackner, and Daniel Gruss. 2019. JavaScript template attacks: Automatically inferring host information for targeted exploits. In Proceedings of the Network and Distributed System Security Symposium (NDSS’19).Google Scholar
- Merve Baş Seyyar, Ferhat Özgür Çatak, and Ensar Gül. 2018. Detection of attack-targeted scans from the Apache HTTP server access logs. Appl. Comput. Inf. 14, 1 (2018), 28--36.Google Scholar
- Dilip Singh Sisodia and Shrish Verma. 2012. Web usage pattern analysis through web logs: A review. In Proceedings of the International Joint Conference on Computer Science and Software Engineering (JCSSE’12). IEEE, 49–53.Google Scholar
- Dilip Singh Sisodia, Shrish Verma, and Om Prakash Vyas. 2015. Agglomerative approach for identification and elimination of web robots from web server logs to extract knowledge about actual visitors. J. Data Anal. Inf. Process. 3, 01 (2015), 1.Google ScholarCross Ref
- Suphannee Sivakorn, Iasonas Polakis, and Angelos D. Keromytis. 2016. I am robot: (Deep) learning to break semantic image CAPTCHAs. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P’16). 388–403. DOI:https://doi.org/10.1109/EuroSP.2016.37Google Scholar
- Suphannee Sivakorn, Jason Polakis, and Angelos D Keromytis. 2016. I’m not a human: Breaking the Google reCAPTCHA. In Black Hat ASIA 2016. 1--12.Google Scholar
- Dusan Stevanovic, Aijun An, and Natalija Vlajic. 2012. Feature evaluation for web crawler detection with data mining techniques. Expert Syst. Appl. 39, 10 (2012), 8707–8717.Google ScholarDigital Library
- Dusan Stevanovic, Natalija Vlajic, and Aijun An. 2013. Detection of malicious and non-malicious website visitors using unsupervised neural network learning. Appl. Soft Comput. 13, 1 (2013), 698–708.Google ScholarDigital Library
- Antoine Vastel, Walter Rudametkin, Romain Rouvoy, and Xavier Blanc. 2020. FP-Crawlers: Studying the resilience of browser fingerprinting to block crawlers. In Proceedings of the NDSS Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb’20).Google Scholar
- Luis Von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford. 2003. CAPTCHA: Using hard AI problems for security. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 294–311.Google Scholar
- Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, and Ben Y. Zhao. 2013. You are how you click: Clickstream analysis for sybil detection. In Proceedings of the USENIX Security Symposium, Vol. 9. 1–008.Google Scholar
- Ang Wei, Yuxuan Zhao, and Zhongmin Cai. 2019. A deep learning approach to web bot detection using mouse behavioral biometrics. In Proceedings of the 14th Chinese Conference on Biometric Recognition (CCBR’19). 388–395. DOI:https://doi.org/10.1007/978-3-030-31456-9_43Google Scholar
- Mahdieh Zabihimayvan, Reza Sadeghi, H. Nathan Rude, and Derek Doran. 2017. A soft computing approach for benign and malicious web robot detection. Expert Syst. Appl. 87 (2017), 129--140. DOI:https://doi.org/10.1016/j.eswa.2017.06.004Google ScholarDigital Library
Index Terms
- Detection of Advanced Web Bots by Combining Web Logs with Mouse Behavioural Biometrics
Recommendations
Web Bot Detection Evasion Using Deep Reinforcement Learning
ARES '22: Proceedings of the 17th International Conference on Availability, Reliability and SecurityWeb bots are vital for the web as they can be used to automate several actions, some of which would have otherwise been impossible or very time consuming. These actions can be benign, such as website testing and web indexing, or malicious, such as ...
A Deep Learning Approach to Web Bot Detection Using Mouse Behavioral Biometrics
Biometric RecognitionAbstractWeb bots are automated scripts that perform online tasks like human. Abuse of bot technology poses various threats to the security of websites. Recently, mouse dynamics has been applied to bot detection by analyzing whether recorded mouse ...
Towards a framework for detecting advanced Web bots
ARES '19: Proceedings of the 14th International Conference on Availability, Reliability and SecurityAutomated programs (bots) are responsible for a large percentage of website traffic. These bots can either be used for benign purposes, such as Web indexing, Website monitoring (validation of hyperlinks and HTML code), feed fetching Web content and data ...
Comments