skip to main content
research-article
Open Access

Detection of Advanced Web Bots by Combining Web Logs with Mouse Behavioural Biometrics

Authors Info & Claims
Published:08 June 2021Publication History
Skip Abstract Section

Abstract

Web bots vary in sophistication based on their purpose, ranging from simple automated scripts to advanced web bots that have a browser fingerprint, support the main browser functionalities, and exhibit a humanlike behaviour. Advanced web bots are especially appealing to malicious web bot creators, due to their browserlike fingerprint and humanlike behaviour that reduce their detectability. This work proposes a web bot detection framework that comprises two detection modules: (i) a detection module that utilises web logs, and (ii) a detection module that leverages mouse movements. The framework combines the results of each module in a novel way to capture the different temporal characteristics of the web logs and the mouse movements, as well as the spatial characteristics of the mouse movements. We assess its effectiveness on web bots of two levels of evasiveness: (a) moderate web bots that have a browser fingerprint and (b) advanced web bots that have a browser fingerprint and also exhibit a humanlike behaviour. We show that combining web logs with visitors’ mouse movements is more effective and robust toward detecting advanced web bots that try to evade detection, as opposed to using only one of those approaches.

References

  1. Akamai. 2018. Akamai’s Bot Manager—Advanced Strategies to Flexibly Manage the Long-term Business and IT Impact of Bots. Retrieved from https://www.akamai.com/us/en/multimedia/documents/product-brief/bot-manager-product-brief.pdf.Google ScholarGoogle Scholar
  2. Ismail Akrout, Amal Feriani, and Mohamed Akrout. 2019. Hacking Google recaptcha v3 Using Reinforcement Learning. arXiv:1903.01003. Retrieved from https://arxiv.org/abs/1903.01003.Google ScholarGoogle Scholar
  3. Shafiq Alam, Gillian Dobbie, Yun Sing Koh, and Patricia Riddle. 2014. Web bots detection using particle swarm optimization based clustering. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’14). IEEE, 2955–2962.Google ScholarGoogle Scholar
  4. Yasmin A. AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2013. Access patterns for robots and humans in web archives. In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 339–348.Google ScholarGoogle Scholar
  5. Babak Amin Azad, Oleksii Starov, Pierre Laperdrix, and Nick Nikiforakis. 2020. Web Runner 2049: Evaluating Third-Party Anti-bot Services. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 135--159. DOI:https://doi.org/10.1007/978-3-030-52683-2_7Google ScholarGoogle Scholar
  6. Quan Bai, Gang Xiong, Yong Zhao, and Longtao He. 2014. Analysis and detection of bogus behavior in web crawler measurement. In Proceedings of the Second International Conference on Information Technology and Quantitative Management (ITQM'14). Elsevier, 1084--1091. DOI:https://doi.org/10.1016/j.procs.2014.05.363Google ScholarGoogle Scholar
  7. Anshul Bhargav and Munish Bhargav. 2014. Pattern discovery and users classification through web usage mining. In Proceedings of the International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT’14). IEEE, 632–636.Google ScholarGoogle Scholar
  8. David Bianco. 2013. The pyramid of pain. Enterprise Detection & Response (2013). http://detect-respond.blogspot.com/2013/03/thepyramid-of-pain.html.Google ScholarGoogle Scholar
  9. Kevin Bock, Daven Patel, George Hughey, and Dave Levin. 2017. unCaptcha: A low-resource defeat of recaptcha’s audio challenge. In Proceedings of the 11th {USENIX} Workshop on Offensive Technologies ({WOOT}’17).Google ScholarGoogle Scholar
  10. Alberto Cabri, Grażyna Suchacka, Stefano Rovetta, and Francesco Masulli. 2018. Online web bot detection using a sequential classification approach. In Proceedings of the 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS’18). IEEE, 1536–1540.Google ScholarGoogle Scholar
  11. Michele Campobasso, Pavlo Burda, and Luca Allodi. 2019. CARONTE: Crawling adversarial resources over non-trusted, high-profile environments. In Proceedings of the 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW’19). IEEE, 433–442.Google ScholarGoogle Scholar
  12. Zi Chu, Steven Gianvecchio, and Haining Wang. 2018. Bot or human? A behavior-based online bot detection system. In From Database to Cyber Security: Essays Dedicated to Sushil Jajodia on the Occasion of His 70th Birthday. 432–449. DOI:https://doi.org/10.1007/978-3-030-04834-1_21Google ScholarGoogle Scholar
  13. Zibusiso Dewa and Leandros A Maglaras. 2016. Data mining and intrusion detection systems. vol 7 (2016), 62–71.Google ScholarGoogle ScholarCross RefCross Ref
  14. Distil Networks. 2018. 2018 BAD BOT REPORT: The Year Bad Bots Went Mainstream. Retrieved from https://resources.distilnetworks.com/white-paper-reports/2018-bad-bot-report.Google ScholarGoogle Scholar
  15. Distil Networks. 2019. 2019 BAD BOT REPORT: The Bot Arms Race Continues. Retrieved from https://resources.distilnetworks.com/white-paper-reports/bad-bot-report-2019.Google ScholarGoogle Scholar
  16. Wang Dong, Xi Lei, Zhang Hui, Liu Hebing, Zhang Hao, and Song Ting. 2015. Web robot detection with semi-supervised learning method. In 3rd International Conference on Material, Mechanical and Manufacturing Engineering (IC3ME'15). Atlantis Press, 2123--2128.Google ScholarGoogle Scholar
  17. Derek Doran and Swapna S. Gokhale. 2012. A classification framework for web robots. J. Assoc. Inf. Sci. Technol. 63, 12 (2012), 2549–2554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Derek Doran and Swapna S. Gokhale. 2016. An integrated method for real time and offline web robot detection. Expert Syst. 33, 6 (2016), 592–606.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Javad Hamidzadeh, Mahdieh Zabihimayvan, and Reza Sadeghi. 2017. Detection of Web site visitors based on fuzzy rough sets. Soft Comput. 22, 7 (2018), 2175--2188. DOI:https://doi.org/10.1007/s00500-016-2476-4Google ScholarGoogle Scholar
  20. Christos Iliou, Theodoros Kostoulas, Theodora Tsikrika, Vasilis Katos, Stefanos Vrochidis, and Yiannis Kompatsiaris. 2019. Towards a framework for detecting advanced Web bots. In Proceedings of the 14th International Conference on Availability, Reliability and Security (ARES’19).18:1–18:10. DOI:https://doi.org/10.1145/3339252.3339267Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Christos Iliou, Theodora Tsikrika, Stefanos Vrochidis, and Yiannis Kompatsiaris. 2017. Evasive focused crawling by exploiting human browsing behaviour: A study on terrorism-related content. In Proceedings of the 1st International Workshop on Cyber Deviance Detection co-located with the 10th International Conference on Web Search and Data (Mining CyberDD @ WSDM 2017).Google ScholarGoogle Scholar
  22. Hugo Jonker, Benjamin Krumnow, and Gabry Vlot. 2019. Fingerprint surface-based detection of web bot detectors. In Proceedings of the European Symposium on Research in Computer Security. Springer, 586–605.Google ScholarGoogle Scholar
  23. Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser fingerprinting: A survey. ACM Trans. Web 14, 2 (2020), 1–33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Pierre Laperdrix, Walter Rudametkin, and Benoit Baudry. 2016. Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP’16). IEEE, 878–894.Google ScholarGoogle Scholar
  25. Borui Li, Wei Wang, Yang Gao, Vir V. Phoha, and Zhanpeng Jin. 2020. Wrist in motion: A seamless context-aware continuous authentication framework using your clickings and typings. IEEE Trans. Biometr. Behav. Identity Sci. 2, 3 (2020), 294--307. DOI:https://doi.org/10.1109/TBIOM.2020.2997004Google ScholarGoogle ScholarCross RefCross Ref
  26. G. Neelima and Sireesha Rodda. 2016. Predicting user behavior through sessions using the web log mining. In Proceedings of the International Conference on Advances in Human Machine Interaction (HMI’16). IEEE, 1–5.Google ScholarGoogle Scholar
  27. Sergio Pastrana, Daniel R. Thomas, Alice Hutchings, and Richard Clayton. 2018. CrimeBB: Enabling cybercrime research on underground forums at scale. In Proceedings of the 2018 World Wide Web Conference on World Wide Web (WWW’18). 1845–1854. DOI:https://doi.org/10.1145/3178876.3186178Google ScholarGoogle Scholar
  28. Pavel Pudil, Jana Novovicová, and Josef Kittler. 1994. Floating search methods in feature selection. Pattern Recogn. Lett. 15, 10 (1994), 1119–1125. DOI:https://doi.org/10.1016/0167-8655(94)90127-9Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Stefano Rovetta, Alberto Cabri, Francesco Masulli, and Grażyna Suchacka. 2017. Bot or not? A case study on bot recognition from web session logs. In Proceedings of the Italian Workshop on Neural Nets. Springer, 197–206.Google ScholarGoogle Scholar
  30. H Nathan Rude and Derek Doran. 2015. Request type prediction for web robot and internet of things traffic. In Proceedings of the IEEE 14th International Conference on Machine Learning and Applications (ICMLA’15). IEEE, 995–1000.Google ScholarGoogle Scholar
  31. Michael Schwarz, Florian Lackner, and Daniel Gruss. 2019. JavaScript template attacks: Automatically inferring host information for targeted exploits. In Proceedings of the Network and Distributed System Security Symposium (NDSS’19).Google ScholarGoogle Scholar
  32. Merve Baş Seyyar, Ferhat Özgür Çatak, and Ensar Gül. 2018. Detection of attack-targeted scans from the Apache HTTP server access logs. Appl. Comput. Inf. 14, 1 (2018), 28--36.Google ScholarGoogle Scholar
  33. Dilip Singh Sisodia and Shrish Verma. 2012. Web usage pattern analysis through web logs: A review. In Proceedings of the International Joint Conference on Computer Science and Software Engineering (JCSSE’12). IEEE, 49–53.Google ScholarGoogle Scholar
  34. Dilip Singh Sisodia, Shrish Verma, and Om Prakash Vyas. 2015. Agglomerative approach for identification and elimination of web robots from web server logs to extract knowledge about actual visitors. J. Data Anal. Inf. Process. 3, 01 (2015), 1.Google ScholarGoogle ScholarCross RefCross Ref
  35. Suphannee Sivakorn, Iasonas Polakis, and Angelos D. Keromytis. 2016. I am robot: (Deep) learning to break semantic image CAPTCHAs. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P’16). 388–403. DOI:https://doi.org/10.1109/EuroSP.2016.37Google ScholarGoogle Scholar
  36. Suphannee Sivakorn, Jason Polakis, and Angelos D Keromytis. 2016. I’m not a human: Breaking the Google reCAPTCHA. In Black Hat ASIA 2016. 1--12.Google ScholarGoogle Scholar
  37. Dusan Stevanovic, Aijun An, and Natalija Vlajic. 2012. Feature evaluation for web crawler detection with data mining techniques. Expert Syst. Appl. 39, 10 (2012), 8707–8717.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Dusan Stevanovic, Natalija Vlajic, and Aijun An. 2013. Detection of malicious and non-malicious website visitors using unsupervised neural network learning. Appl. Soft Comput. 13, 1 (2013), 698–708.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Antoine Vastel, Walter Rudametkin, Romain Rouvoy, and Xavier Blanc. 2020. FP-Crawlers: Studying the resilience of browser fingerprinting to block crawlers. In Proceedings of the NDSS Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb’20).Google ScholarGoogle Scholar
  40. Luis Von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford. 2003. CAPTCHA: Using hard AI problems for security. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 294–311.Google ScholarGoogle Scholar
  41. Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, and Ben Y. Zhao. 2013. You are how you click: Clickstream analysis for sybil detection. In Proceedings of the USENIX Security Symposium, Vol. 9. 1–008.Google ScholarGoogle Scholar
  42. Ang Wei, Yuxuan Zhao, and Zhongmin Cai. 2019. A deep learning approach to web bot detection using mouse behavioral biometrics. In Proceedings of the 14th Chinese Conference on Biometric Recognition (CCBR’19). 388–395. DOI:https://doi.org/10.1007/978-3-030-31456-9_43Google ScholarGoogle Scholar
  43. Mahdieh Zabihimayvan, Reza Sadeghi, H. Nathan Rude, and Derek Doran. 2017. A soft computing approach for benign and malicious web robot detection. Expert Syst. Appl. 87 (2017), 129--140. DOI:https://doi.org/10.1016/j.eswa.2017.06.004Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Detection of Advanced Web Bots by Combining Web Logs with Mouse Behavioural Biometrics

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Digital Threats: Research and Practice
          Digital Threats: Research and Practice  Volume 2, Issue 3
          September 2021
          143 pages
          EISSN:2576-5337
          DOI:10.1145/3470118
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 June 2021
          • Online AM: 15 April 2021
          • Accepted: 1 January 2021
          • Revised: 1 December 2020
          • Received: 1 February 2020
          Published in dtrap Volume 2, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format