Comparative Analysis of GA-Optimised Decision Tree and Logistic Regression for Web Intrusion Detection
DOI:
https://doi.org/10.33003/fjorae.2024.0102.09Abstract
This study presents a comparative analysis of Decision Tree and Logistic Regression models, each Optimised through Genetic Algorithm (GA)-based feature selection, to enhance the accuracy of website intrusion detection. The study leverages the Thursday Morning Web Attack dataset from the Canadian Institute of Cybersecurity. The dataset, encompassing 170,366 records and 76 features, is specifically tailored to web-based attacks such as SQL injection, brute force, and Cross-Site Scripting (XSS). The GA is employed to iteratively refine feature subsets, enhancing model performance through selection, crossover, and mutation processes. Evaluation metrics, including accuracy, F1 scores, and ROC curves, are utilized to assess model efficacy. The findings reveal that GA optimization significantly improves the performance of the Decision Tree model, achieving notable advancements in classification metrics, especially for complex attack types like Brute Force and SQL Injection. While the Logistic Regression model also shows competitive results, it faces minor trade-offs in accuracy regarding XSS attack detection with GA optimization. This research addresses a critical gap in cybersecurity literature by systematically evaluating the effectiveness of GA-enhanced models in a comparative context. The insights gained lay the groundwork for future research on hybrid models and effective website intrusion detection.
References
Ahmad, I., Abdullah, A., Alghamdi, A., Alnfajan, K., & Hussain, M. (2011). Intrusion detection using feature subset selection based on MLP. Scientific Research and Essays. https://doi.org/10.5897/SRE11.142
Alhijawi, B., & Awajan, A. (2024). Genetic algorithms: theory, genetic operators, solutions, and applications. In Evolutionary Intelligence. https://doi.org/10.1007/s12065-023-00822-6
Amini, F., & Hu, G. (2021). A two-layer feature selection method using Genetic Algorithm and Elastic Net. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2020.114072
Bhakhri, K., Sethi, M., Sharma, I., & Kaushik, K. (2024). Examining the Consequences of Cyberattacks on Businesses and Organizations. 227–239. https://doi.org/10.1007/978-981-97-3466-5_17
Das, A. K., Das, S., & Ghosh, A. (2017). Ensemble feature selection using bi-objective genetic algorithm. Knowledge-Based Systems. https://doi.org/10.1016/j.knosys.2017.02.013
Deepa, G., & Thilagam, P. S. (2016). Securing web applications from injection and logic vulnerabilities: Approaches and challenges. Information and Software Technology, 74, 160–180. https://doi.org/10.1016/j.infsof.2016.02.005
Desamsetti, H. (2021). Crime and Cybersecurity as Advanced Persistent Threat: A Constant E-Commerce Challenges. American Journal of Trade and Policy. https://doi.org/10.18034/ajtp.v8i3.666
Dwivedi, S., Vardhan, M., & Tripathi, S. (2021). Building an efficient intrusion detection system using grasshopper optimization algorithm for anomaly detection. Cluster Computing. https://doi.org/10.1007/s10586-020-03229-5
Guo, W., Wu, C., Ding, Z., & Zhou, Q. (2021). Prediction of surface roughness based on a hybrid feature selection method and long short-term memory network in grinding. International Journal of Advanced Manufacturing Technology. https://doi.org/10.1007/s00170-020-06523-z
Halim, Z., Yousaf, M. N., Waqas, M., Sulaiman, M., Abbas, G., Hussain, M., Ahmad, I., & Hanif, M. (2021). An effective genetic algorithm-based feature selection method for intrusion detection systems. Computers and Security. https://doi.org/10.1016/j.cose.2021.102448
Khandezamin, Z., Naderan, M., & Rashti, M. J. (2020). Detection and classification of breast cancer using logistic regression feature selection and GMDH classifier. Journal of Biomedical Informatics. https://doi.org/10.1016/j.jbi.2020.103591
Książek, W., Gandor, M., & Pławiak, P. (2021). Comparison of various approaches to combine logistic regression with genetic algorithms in survival prediction of hepatocellular carcinoma. Computers in Biology and Medicine, 134. https://doi.org/10.1016/j.compbiomed.2021.104431
LaValley, M. P. (2008). Logistic regression. In Circulation. https://doi.org/10.1161/CIRCULATIONAHA.106.682658
Li, X., Yi, P., Wei, W., Jiang, Y., & Tian, L. (2021). LNNLS-KH: A Feature Selection Method for Network Intrusion Detection. Security and Communication Networks. https://doi.org/10.1155/2021/8830431
Liu, X., Ahmad, S. F., Anser, M. K., Ke, J., Irshad, M., Ul-Haq, J., & Abbas, S. (2022). Cyber security threats: A never-ending challenge for e-commerce. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2022.927398
Maleki, N., Zeinali, Y., & Niaki, S. T. A. (2021). A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Expert Systems with Applications. https://doi.org/10.1016/j.eswa.2020.113981
Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., & Brown, S. D. (2004). An introduction to decision tree modeling. In Journal of Chemometrics (Vol. 18, Issue 6, pp. 275–285). https://doi.org/10.1002/cem.873
Onah, J. O., Abdulhamid, S. M., Abdullahi, M., Hassan, I. H., & Al-Ghusham, A. (2021). Genetic Algorithm based feature selection and Naïve Bayes for anomaly detection in fog computing environment. Machine Learning with Applications, 6, 100156. https://doi.org/10.1016/j.mlwa.2021.100156
Sarker, I. H., Kayes, A. S. M., Badsha, S., Alqahtani, H., Watters, P., & Ng, A. (2020). Cybersecurity data science: an overview from machine learning perspective. Journal of Big Data. https://doi.org/10.1186/s40537-020-00318-5
Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSP 2018 - Proceedings of the 4th International Conference on Information Systems Security and Privacy, 2018-Janua, 108–116. https://doi.org/10.5220/0006639801080116
Stein, G., Chen, B., Wu, A. S., & Hua, K. A. (2005). Decision tree classifier for network intrusion detection with GA-based feature selection. Proceedings of the Annual Southeast Conference. https://doi.org/10.1145/1167253.1167288
Umar, A. Z., Galadima Ibrahim, Y., & Ndanusa, A. (2023). Detecting Anomalies In Network Traffic Using a Hybrid of Linear-based and Tree-based Feature Selection Approaches. Researchgate.NetYG Ibrahim, A Ndanusaresearchgate.Net, 21–23.
Vanneschi, L., & Silva, S. (2023a). Decision Tree Learning. In Natural Computing Series (pp. 149–159). https://doi.org/10.1007/978-3-031-17922-8_6
Vanneschi, L., & Silva, S. (2023b). Genetic Algorithms. In Natural Computing Series. https://doi.org/10.1007/978-3-031-17922-8_3
Viharos, Z. J., Kis, K. B., Fodor, Á., & Büki, M. I. (2021). Adaptive, HHybrid FFeature Selection (AHFS). Pattern Recognition. https://doi.org/10.1016/j.patcog.2021.107932
Xenofontos, C., Zografopoulos, I., Konstantinou, C., Jolfaei, A., Khan, M. K., & Choo, K. K. R. (2022). Consumer, Commercial, and Industrial IoT (In)Security: Attack Taxonomy and Case Studies. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2021.3079916