Comparative Analysis of GA-Optimised Decision Tree and Logistic Regression for Web Intrusion Detection

Authors

  • Lawal Mufidah Department of Computer Science, Federal University Dutsin-Ma
  • Oyenike Mary Olanrewaju
  • Ibrahim Adamu

DOI:

https://doi.org/10.33003/fjorae.2024.0102.09

Abstract

This study presents a comparative analysis of Decision Tree and Logistic Regression models, each Optimised through Genetic Algorithm (GA)-based feature selection, to enhance the accuracy of website intrusion detection. The study leverages the Thursday Morning Web Attack dataset from the Canadian Institute of Cybersecurity. The dataset, encompassing 170,366 records and 76 features, is specifically tailored to web-based attacks such as SQL injection, brute force, and Cross-Site Scripting (XSS). The GA is employed to iteratively refine feature subsets, enhancing model performance through selection, crossover, and mutation processes. Evaluation metrics, including accuracy, F1 scores, and ROC curves, are utilized to assess model efficacy. The findings reveal that GA optimization significantly improves the performance of the Decision Tree model, achieving notable advancements in classification metrics, especially for complex attack types like Brute Force and SQL Injection. While the Logistic Regression model also shows competitive results, it faces minor trade-offs in accuracy regarding XSS attack detection with GA optimization. This research addresses a critical gap in cybersecurity literature by systematically evaluating the effectiveness of GA-enhanced models in a comparative context. The insights gained lay the groundwork for future research on hybrid models and effective website intrusion detection.

References

Ahmad, I., Abdullah, A., Alghamdi, A., Alnfajan, K., & Hussain, M. (2011). Intrusion detection using feature subset selection based on MLP. Scientific Research and Essays. https://doi.org/10.5897/SRE11.142

Alhijawi, B., & Awajan, A. (2024). Genetic algorithms: theory, genetic operators, solutions, and applications. In Evolutionary Intelligence. https://doi.org/10.1007/s12065-023-00822-6

Amini, F., & Hu, G. (2021). A two-layer feature selection method using Genetic Algorithm and Elastic Net. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2020.114072

Bhakhri, K., Sethi, M., Sharma, I., & Kaushik, K. (2024). Examining the Consequences of Cyberattacks on Businesses and Organizations. 227–239. https://doi.org/10.1007/978-981-97-3466-5_17

Das, A. K., Das, S., & Ghosh, A. (2017). Ensemble feature selection using bi-objective genetic algorithm. Knowledge-Based Systems. https://doi.org/10.1016/j.knosys.2017.02.013

Deepa, G., & Thilagam, P. S. (2016). Securing web applications from injection and logic vulnerabilities: Approaches and challenges. Information and Software Technology, 74, 160–180. https://doi.org/10.1016/j.infsof.2016.02.005

Desamsetti, H. (2021). Crime and Cybersecurity as Advanced Persistent Threat: A Constant E-Commerce Challenges. American Journal of Trade and Policy. https://doi.org/10.18034/ajtp.v8i3.666

Dwivedi, S., Vardhan, M., & Tripathi, S. (2021). Building an efficient intrusion detection system using grasshopper optimization algorithm for anomaly detection. Cluster Computing. https://doi.org/10.1007/s10586-020-03229-5

Guo, W., Wu, C., Ding, Z., & Zhou, Q. (2021). Prediction of surface roughness based on a hybrid feature selection method and long short-term memory network in grinding. International Journal of Advanced Manufacturing Technology. https://doi.org/10.1007/s00170-020-06523-z

Halim, Z., Yousaf, M. N., Waqas, M., Sulaiman, M., Abbas, G., Hussain, M., Ahmad, I., & Hanif, M. (2021). An effective genetic algorithm-based feature selection method for intrusion detection systems. Computers and Security. https://doi.org/10.1016/j.cose.2021.102448

Khandezamin, Z., Naderan, M., & Rashti, M. J. (2020). Detection and classification of breast cancer using logistic regression feature selection and GMDH classifier. Journal of Biomedical Informatics. https://doi.org/10.1016/j.jbi.2020.103591

Książek, W., Gandor, M., & Pławiak, P. (2021). Comparison of various approaches to combine logistic regression with genetic algorithms in survival prediction of hepatocellular carcinoma. Computers in Biology and Medicine, 134. https://doi.org/10.1016/j.compbiomed.2021.104431

LaValley, M. P. (2008). Logistic regression. In Circulation. https://doi.org/10.1161/CIRCULATIONAHA.106.682658

Li, X., Yi, P., Wei, W., Jiang, Y., & Tian, L. (2021). LNNLS-KH: A Feature Selection Method for Network Intrusion Detection. Security and Communication Networks. https://doi.org/10.1155/2021/8830431

Liu, X., Ahmad, S. F., Anser, M. K., Ke, J., Irshad, M., Ul-Haq, J., & Abbas, S. (2022). Cyber security threats: A never-ending challenge for e-commerce. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2022.927398

Maleki, N., Zeinali, Y., & Niaki, S. T. A. (2021). A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2020.113981

Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., & Brown, S. D. (2004). An introduction to decision tree modeling. In Journal of Chemometrics (Vol. 18, Issue 6, pp. 275–285). https://doi.org/10.1002/cem.873

Onah, J. O., Abdulhamid, S. M., Abdullahi, M., Hassan, I. H., & Al-Ghusham, A. (2021). Genetic Algorithm based feature selection and Naïve Bayes for anomaly detection in fog computing environment. Machine Learning with Applications, 6, 100156. https://doi.org/10.1016/j.mlwa.2021.100156

Sarker, I. H., Kayes, A. S. M., Badsha, S., Alqahtani, H., Watters, P., & Ng, A. (2020). Cybersecurity data science: an overview from machine learning perspective. Journal of Big Data. https://doi.org/10.1186/s40537-020-00318-5

Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSP 2018 - Proceedings of the 4th International Conference on Information Systems Security and Privacy, 2018-Janua, 108–116. https://doi.org/10.5220/0006639801080116

Stein, G., Chen, B., Wu, A. S., & Hua, K. A. (2005). Decision tree classifier for network intrusion detection with GA-based feature selection. Proceedings of the Annual Southeast Conference. https://doi.org/10.1145/1167253.1167288

Umar, A. Z., Galadima Ibrahim, Y., & Ndanusa, A. (2023). Detecting Anomalies In Network Traffic Using a Hybrid of Linear-based and Tree-based Feature Selection Approaches. Researchgate.NetYG Ibrahim, A Ndanusaresearchgate.Net, 21–23.

Vanneschi, L., & Silva, S. (2023a). Decision Tree Learning. In Natural Computing Series (pp. 149–159). https://doi.org/10.1007/978-3-031-17922-8_6

Vanneschi, L., & Silva, S. (2023b). Genetic Algorithms. In Natural Computing Series. https://doi.org/10.1007/978-3-031-17922-8_3

Viharos, Z. J., Kis, K. B., Fodor, Á., & Büki, M. I. (2021). Adaptive, HHybrid FFeature Selection (AHFS). Pattern Recognition. https://doi.org/10.1016/j.patcog.2021.107932

Xenofontos, C., Zografopoulos, I., Konstantinou, C., Jolfaei, A., Khan, M. K., & Choo, K. K. R. (2022). Consumer, Commercial, and Industrial IoT (In)Security: Attack Taxonomy and Case Studies. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2021.3079916

Downloads

Published

2024-12-28