Abstract:
Phishing is a type of social engineering attack with an intention to steal user data,
including login credentials and credit card numbers, leading to financial losses
for both organizations and individuals. It occurs when an attacker, pretending
as a trusted entity, lure a victim into click on a link or attachment in an email,
or in a text message. Phishing is often launched via email messages or text messages over social networks. Previous research has revealed that phishing attacks
can be identified just by looking at uniform resource locator (URLs). Identifying
the techniques which are used by phishers to mimic a phishing URL is rather a
challenging issue. At present, we have limited knowledge and understanding of
how cyber-criminals attempt to mimic URLs with the same look and feel of the
legitimate ones, to entice people into clicking links. Therefore, this paper investigates the feature selection of phishing URLs (uniform resource locators), aiming
to explore the strategies employed by phishers to mimic URLs that can obviously trick people into clicking links. We employed an information gain (IG) and
Chi-Squared feature selection methods in machine learning (ML) on a phishing
dataset. The dataset contains a total of 48 features extracted from 5000 phishing and another 5000 legitimate URL from web pages downloaded from January
to May 2015 and from May to June 2017. Our results revealed that there were
10 techniques that phishers used to mimic URLs to manipulate humans into
clicking links. Identifying these phishing URL manipulation techniques would
certainly help to educate individuals and organizations and keep them safe from
phishing attacks. In addition, the findings of this research will also help develop
anti-phishing tools, framework or browser plugins for phishing prevention.