TY - GEN
T1 - Automatic detection of phishing target from phishing webpage
AU - Liu, Gang
AU - Qiu, Bite
AU - Wenyin, Liu
PY - 2010
Y1 - 2010
N2 - An approach to identification of the phishing target of a given (suspicious) webpage is proposed by clustering the webpage set consisting of its all associated webpages and the given webpage itself. We first find its associated webpages, and then explore their relationships to the given webpage as their features for clustering. Such relationships include link relationship, ranking relationship, text similarity, and webpage layout similarity relationship. A DBSCAN clustering method is employed to find if there is a cluster around the given webpage. If such cluster exists, we claim the given webpage is a phishing webpage and then find its phishing target (i.e., the legitimate webpage it is attacking) from this cluster. Otherwise, we identify it as a legitimate webpage. Our test dataset consists of 8745 phishing pages (targeting at 76 well-known websites) selected from PhishTank and preliminary experiments show that the approach can successfully identify 91.44% of their phishing targets. Another dataset of 1000 legitimate webpages is collected to test our method's false alarm rate, which is 3.40%. © 2010 IEEE.
AB - An approach to identification of the phishing target of a given (suspicious) webpage is proposed by clustering the webpage set consisting of its all associated webpages and the given webpage itself. We first find its associated webpages, and then explore their relationships to the given webpage as their features for clustering. Such relationships include link relationship, ranking relationship, text similarity, and webpage layout similarity relationship. A DBSCAN clustering method is employed to find if there is a cluster around the given webpage. If such cluster exists, we claim the given webpage is a phishing webpage and then find its phishing target (i.e., the legitimate webpage it is attacking) from this cluster. Otherwise, we identify it as a legitimate webpage. Our test dataset consists of 8745 phishing pages (targeting at 76 well-known websites) selected from PhishTank and preliminary experiments show that the approach can successfully identify 91.44% of their phishing targets. Another dataset of 1000 legitimate webpages is collected to test our method's false alarm rate, which is 3.40%. © 2010 IEEE.
UR - http://www.scopus.com/inward/record.url?scp=78149484030&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-78149484030&origin=recordpage
U2 - 10.1109/ICPR.2010.1010
DO - 10.1109/ICPR.2010.1010
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9780769541099
SP - 4153
EP - 4156
BT - Proceedings - International Conference on Pattern Recognition
T2 - 2010 20th International Conference on Pattern Recognition, ICPR 2010
Y2 - 23 August 2010 through 26 August 2010
ER -