Genetic Algorithm-Based Feature Selection for Optimizing Website Intrusion Detection Model

Authors

  • LAWAL MUFIDAH
  • Z.U ARMAYAU

DOI:

https://doi.org/10.33003/fjorae.2024.0102.04

Abstract

The widespread use of the internet and digital platforms has made websites essential for communication, business, and accessing information. However, this reliance has also heightened cybersecurity concerns, as cyber-attacks on web applications pose risks such as data theft, malware installation, and redirection to malicious sites.  Cybersecurity is increasingly leveraging machine learning and predictive analysis to proactively identify and mitigate potential web application attacks. Nonetheless, the key challenge in website attack prediction is identifying the most relevant features that balance accurate predictions with manageable computational overhead. While Genetic Algorithm-based feature selection techniques hold promise, their effectiveness should be evaluated on datasets containing common web attacks such as SQL injection, cross-site scripting (XSS), and cross-site request forgery (CSRF). Additionally, metrics beyond accuracy should be considered to assess their generalizability and overall performance. This research investigates the application of Genetic Algorithms (GA) for feature selection in website attack prediction using Decision Tree and Logistic Regression models. The algorithms were evaluated using accuracy and F1-score metrics. The findings revealed that feature selection using GA significantly reduces the number of features to up to 86.8% while maintaining or improving detection accuracy and reduced overfitting. This 86.8% reduction in features simplifies the model while saving resources. The findings highlight the potential of GA as a valuable method for feature selection in website attack prediction.

References

Ahmad, I., Abdullah, A., Alghamdi, A., Alnfajan, K., & Hussain, M. (2011). Intrusion detection using feature subset selection based on MLP. Scientific Research and Essays. https://doi.org/10.5897/SRE11.142

Alhijawi, B., & Awajan, A. (2024). Genetic algorithms: theory, genetic operators, solutions, and applications. In Evolutionary Intelligence. https://doi.org/10.1007/s12065-023-00822-6

Bhakhri, K., Sethi, M., Sharma, I., & Kaushik, K. (2024). Examining the Consequences of Cyberattacks on Businesses and Organizations. 227–239. https://doi.org/10.1007/978-981-97-3466-5_17

Das, A. K., Das, S., & Ghosh, A. (2017). Ensemble feature selection using bi-objective genetic algorithm. Knowledge-Based Systems. https://doi.org/10.1016/j.knosys.2017.02.013

Desamsetti, H. (2021). Crime and Cybersecurity as Advanced Persistent Threat: A Constant E-Commerce Challenges. American Journal of Trade and Policy. https://doi.org/10.18034/ajtp.v8i3.666

Dwivedi, S., Vardhan, M., & Tripathi, S. (2021). Building an efficient intrusion detection system using grasshopper optimization algorithm for anomaly detection. Cluster Computing. https://doi.org/10.1007/s10586-020-03229-5

Halim, Z., Yousaf, M. N., Waqas, M., Sulaiman, M., Abbas, G., Hussain, M., Ahmad, I., & Hanif, M. (2021). An effective genetic algorithm-based feature selection method for intrusion detection systems. Computers and Security. https://doi.org/10.1016/j.cose.2021.102448

Kandasamy, K., Srinivas, S., Achuthan, K., & Rangan, V. P. (2022). Digital Healthcare - Cyberattacks in Asian Organizations: An Analysis of Vulnerabilities, Risks, NIST Perspectives, and Recommendations. IEEE Access. https://doi.org/10.1109/ACCESS.2022.3145372

LaValley, M. P. (2008). Logistic regression. In Circulation. https://doi.org/10.1161/CIRCULATIONAHA.106.682658

Li, X., Yi, P., Wei, W., Jiang, Y., & Tian, L. (2021). LNNLS-KH: A Feature Selection Method for Network Intrusion Detection. Security and Communication Networks. https://doi.org/10.1155/2021/8830431

Liu, X., Ahmad, S. F., Anser, M. K., Ke, J., Irshad, M., Ul-Haq, J., & Abbas, S. (2022). Cyber security threats: A never-ending challenge for e-commerce. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2022.927398

Sarker, I. H., Kayes, A. S. M., Badsha, S., Alqahtani, H., Watters, P., & Ng, A. (2020). Cybersecurity data science: an overview from machine learning perspective. Journal of Big Data. https://doi.org/10.1186/s40537-020-00318-5

Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSP 2018 - Proceedings of the 4th International Conference on Information Systems Security and Privacy, 2018-Janua, 108–116. https://doi.org/10.5220/0006639801080116

Stein, G., Chen, B., Wu, A. S., & Hua, K. A. (2005). Decision tree classifier for network intrusion detection with GA-based feature selection. Proceedings of the Annual Southeast Conference. https://doi.org/10.1145/1167253.1167288

Umar, A. Z., Galadima Ibrahim, Y., & Ndanusa, A. (2023). Detecting Anomalies In Network Traffic Using a Hybrid of Linear-based and Tree-based Feature Selection Approaches. Researchgate.NetYG Ibrahim, A Ndanusaresearchgate.Net, 21–23.

Vanneschi, L., & Silva, S. (2023a). Decision Tree Learning. In Natural Computing Series (pp. 149–159). https://doi.org/10.1007/978-3-031-17922-8_6

Vanneschi, L., & Silva, S. (2023b). Genetic Algorithms. In Natural Computing Series. https://doi.org/10.1007/978-3-031-17922-8_3

Viharos, Z. J., Kis, K. B., Fodor, Á., & Büki, M. I. (2021). Adaptive, HHybrid FFeature Selection (AHFS). Pattern Recognition. https://doi.org/10.1016/j.patcog.2021.107932

Xenofontos, C., Zografopoulos, I., Konstantinou, C., Jolfaei, A., Khan, M. K., & Choo, K. K. R. (2022). Consumer, Commercial, and Industrial IoT (In)Security: Attack Taxonomy and Case Studies. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2021.3079916

Downloads

Published

2024-11-02