Network Intrusion Detection System Using XGBoost and Random Forest Algorithms

Full Article - PDF

Published: 2023-08-31

Page: 321-335


Agu Edward Onyebueke

Computer Science Department, Federal University Wukari, Nigeria.

Addakenjo Ali David *

Information and Communication Technology Center, Federal University Wukari, Nigeria.

Stephen Munu

Computer Science Department, Federal University Wukari, Nigeria.

*Author to whom correspondence should be addressed.


Abstract

Data mining is a relatively new discipline that arose in response to the proliferation of digital information. Security and privacy issues have gained increased attention as the internet's data storage capacity continues to grow. Problems with data theft and intrusion are a common source of frustration for users. In order to anticipate and identify intrusions, this study suggests developing a model with the XGBoost and Random Forest algorithms. Python Anaconda and Kaggle datasets (found at www.kaggle.com) are integral parts of the study methodology. The research uses the XGBoost and Random Forest algorithms on the UNSW-NB15 2017 and KDD datasets, respectively. The XGBoost algorithm performs admirably on the first dataset, with 100% accuracy, precision, and recall, and a perfect F1-score. In addition, on the second dataset, both algorithms attain near-perfect accuracy (99% and 98%, respectively), after the pre-processing stages (normalization, feature selection, scaling of the dataset) and the application of Synthetic Minority Over-sampling Techniques (SMOTE). These findings shed light on the algorithms' capabilities and how well they achieve the study's goals.

The results show that the XGBoost algorithm is the most accurate and dependable option for the datasets under consideration.

Keywords: Anomalous, random forest, overfitting, XGboost (eXtreme gradient boosting)


How to Cite

Onyebueke, A. E., David, A. A., & Munu, S. (2023). Network Intrusion Detection System Using XGBoost and Random Forest Algorithms. Asian Journal of Pure and Applied Mathematics, 5(1), 321–335. Retrieved from https://globalpresshub.com/index.php/AJPAM/article/view/1854

Downloads

Download data is not yet available.

References

Gupta BB, Ed., Computer & Cyber Security: Principles, Algorithm, Applications, and Perspectives, CRC Press, Boca Raton, FL, USA; 2018.

Khraisat A, Alazab A. A critical review of intrusion detection systems in the Internet of Things: techniques, deployment strategy, validation strategy, attacks, public datasets and challenges. Cyber Security. 2021;4(18). DOI: 10.1186/s42400-021-00077-7

AlHosni N, Mani PJ, Jovanovic L, Antonijevic M, Bukumira M, Zivkovic M, Strumberger I, Bacanin, N. The XGBoost Model for Network Intrusion Detection Boosted by Enhanced Sine Cosine Algorithm. In JIZ Chen, JMRS Tavares, Shi F. (Eds.), Third International Conference on Image Processing and Capsule Networks. ICIPCN 2022. Lecture Notes in Networks and Systems 514. Springer. Available: https://doi.org/10.1007/978-3-031-12413-6_17

Tao J, Fang X, Zhou L. Unsupervised deep learning for fake content detection in social media. In Proceedings of the 54th Hawaii International Conference on System Sciences; 2021.

Chen T, Guestrin C. XGBoost: A scalable tree boosting system,’’ in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, San Francisco, CA, USA. 2016;2016:785–794.

Tianqi C & Carlos G. (2016)Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16. New York, NY, USA. 2016; 785–794. ACM.

Candice B, Anna C, Gonzalo M. A Comparative Analysis of XGBoost. Escuela Politécnica Superior, Universidad Autónoma de Madrid, Spain; 2020.

Breiman L. Random Forests. Machine Learning. 2021;45(1):5-32.

Available: https://api.semanticscholar.org/CorpusID:89141

Agrawal S, Agrawal J. Survey on anomaly detection using data mining techniques, Procedia Computer Science. 2015;60:708–713.

Buczak AL, Guven E. A survey of data mining and machine learning methods for cyber security intrusion detection, IEEE Communications Surveys & Tutorials. 2015;18(2):1153–1176.

Moustafa N. A Comprehensive Data Set for Network Intrusion Detection Systems. IEEE student Member, Jill Slay School of Engineering and Information Technology University of New South Wales at the Australian Defence Force Academy Canberra, Australia; 2017.

Haweliya J, Nigam B. Network intrusion detection using semi-supervised support vector machine. International Journal of Computer Applications. 2014a;85(9):27-31.

Available: https://doi.org/10.5120/14797-7847

Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the kdd cup 99 data set,” in 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. 2009;2009:1–6.

Nahiyan K, Kaiser S, Ferens K, R. A Multi-agent Based Cognitive Approach to Unsupervised Feature Extraction and Classification for Network Intrusion Detection, Int'l Conf. on Advances on Applied Cognitive Computing| ACC'17 page no. 25,CSERA Press;2017.

Ring M, Wunderl S, Deniz S, Dieter L, Andreas H. A Survey of Network-based Intrusion Detection Data Sets. Computers & Security;2019. DOI: https://doi.org/10.1016/j.cose.2019.06.005

Moustafa N, Slay J. The Significant Features of the UNSW-NB15 and the KDD99 Data Sets for Network Intrusion Detection Systems. In 2015 4th International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS). IEEE. 2015;1-6.

DOI:10.1109/badgers.2015.014.

Jadidi Z, Muthukkumarasamy V, Sithirasenan E, Sheikhan M. Flow-based anomaly detection using neural network optimized with gsa algorithm. In 2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops. 2013;76-- 81.

Gharaee H, Hosseinvand H. A new feature selection is based on genetic algorithm and svm, in 8th International Symposium on Telecommunications (IST). IEEE. 2016;139–144.

Nour M, Gideon C, Jill S. Big Data Analytics for Intrusion Detection System: Statistical Decision-Making Using Finite Dirichlet Mixture Models. The Australian Centre for Cyber Security, University of New South Wales Canberra, Canberra, NSW, Australia; 2017.

Ashfaq RAR, Wang XZ, Huang JZ, Abbas H, He YL. Fuzziness based semi-supervised learning approach for an intrusion detection system," Information Sciences. 2017;378:484–497.

Ingre B, Yadav A. Performance analysis of nsl-kdd dataset using ann. in 2015 International Conference on Signal Processing and Communication Engineering Systems. 2015;2015:92–96.

Kevric J, Jukic S, Subasi A. An effective combining classifier approach using tree algorithms for network intrusion detection. Neural Computing and Application. 2017;28(1):1051–1058.

Xu, Wen, Jang-Jaccard, Julian, Singh, Amardeep, Wei, Yuanyuan & Sabrina, Fariza. Improving Performance of Autoencoder-Based Network Anomaly Detection on NSL-KDD Dataset,” in IEEE Access. 2021;9:140136-140146. DOI: 10.1109/ACCESS.2021.3116612.

Souhail M, Tajjeeddine R, Nasser A. Network-Based Intrusion Detection Using the UNSW-NB15 Dataset.School of Science and Engineering, Al Akhawayn University in Ifrane, Ifrane 53000, MoroccoInternational Journal of Computing and Digital SystemsISSN (2210-142X)Int. J. Com. Dig. Sys. 2019;8(5).

Bala R, Nagpal R. Analysis of KDDCUP99 and NSL-KDD using Various Classification Algorithms. Journal of Emerging Technologies and Innovative Research (JETIR). 2018;5(12).

Retrieved from https://www.jetir.org/papers/JETIR1810A36.pdf

Liu H, Lang B. Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences. 2019;9(20):4396.

Tama BA, Rhee KH. Attack Classification Analysis of IoT Network via Deep Learning Approach, Research Briefs on Information & Communication Technology Evolution (ReBICTE). 2017;3, Article No. 15.

Answer HM, Farouk M, Abdel-Hamid A. A Framework for Efficient Network Anomaly Intrusion Detection with Features Selection, 9th International Conference on Information and Communication Systems (ICICS). 2018;2018, IEEE page no.157.

Devi R, Abualkibash M. Intrusion Detection System Classification Using Different Machine Learning Algorithms on KDD-99 and NSL-KDD Datasets - A Review Paper. International Journal of Computer Science and Information Technology. 2019;11:65-80. DOI:10.5121/ijcsit.2019.11306.

Albulayhi K, Abu Al-Haija Q, Alsuhibany SA, Jillepalli AA, Ashrafuzzaman M, Sheldon FT. IoT Intrusion Detection Using Machine Learning with a Novel High Performing Feature Selection Method. Appl. Sci. 2022;12:5015. Available: https://doi.org/10.3390/ app12105015

Hussein A, Li T, Chubato W, Bashir K. A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE. International Journal of Computational Intelligence Systems. 2019;12(1). DOI:10.2991/ijcis.d.191114.002.