Determine the Most Effective Machine Learning Technique for Detecting Phishing Websites

EasyChair Preprint 6497

12 pages•Date: August 31, 2021

Sm Mahamudul Hasan, Nirjas Mohammad Jakilim and Md Forhad Rabbi

Abstract

Consumer tastes have moved away from conventional shopping and toward electronic commerce due to the Internet's fast growth. Rather than conducting bank or shop robberies, today's criminals use a range of sophisticated cyber methods to track down their victims. Attackers have developed new ways of deceiving customers, such as phishing, using fake websites to gather sensitive information such as account IDs, usernames, and passwords. The semantic-based nature of the assaults, which mainly leverage the vulnerabilities of computer users, makes establishing the authenticity of a web page more difficult. Machine learning (ML) is a typical data analysis technique that has shown promise in the battle against phishing. The article examines the applicability of machine learning methods for identifying phishing attempts and their advantages and disadvantages. Specifically, a variety of machine learning methods have been explored to find appropriate anti-Phishing technology solutions. More significantly, we used a wide range of machine learning methods to test real-world phishing datasets against several criteria. To detect phishing websites, six different machine learning classification methods are employed. The Random Forest classifier had the most outstanding possible accuracy of 97.17% in this research, while the Gradient Boost Classifier had the highest achievable accuracy of 94.75%. The Decision Tree classifier has a provisioning accuracy of 94.69%. In contrast, Logistic Regression has a provisioning accuracy of 92.76%, KNN has a provisioning accuracy of 60.45%, and SVM has 56.04%. We showed that KNN has trouble detecting phishing sites since it hasn't been updated for accuracy. Decision trees are almost similar to Gradient Boosting in terms of performance.

Keyphrases: Classification, Decision Tree, Gradient Boost classifiers, Phishing Detection, Random Forest, SVM, Website Security, machine learning

Links:

https://easychair.org/publications/preprint/Rs1b

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:6497,
  author    = {Sm Mahamudul Hasan and Nirjas Mohammad Jakilim and Md Forhad Rabbi},
  title     = {Determine the Most Effective Machine Learning Technique for Detecting Phishing Websites},
  howpublished = {EasyChair Preprint 6497},
  year      = {EasyChair, 2021}}

Download PDF Open PDF in browser