[2022.02.17] Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages

강민혁

(@mhkang)

글: 34

회원

주제 스타터

Abstract

Recent years have seen the development of phishing detec- tion and identification approaches to defend against phishing attacks. Phishing detection solutions often report binary re- sults, i.e., phishing or not, without any explanation. In con- trast, phishing identification approaches identify phishing webpages by visually comparing webpages with predefined legitimate references and report phishing along with its target brand, thereby having explainable results. However, there are technical challenges in visual analyses that limit existing solu- tions from being effective (with high accuracy) and efficient (with low runtime overhead), to be put to practical use.

In this work, we design a hybrid deep learning system, Phishpedia, to address two prominent technical challenges in phishing identification, i.e., (i) accurate recognition of identity logos on webpage screenshots, and (ii) matching logo variants of the same brand. Phishpedia achieves both high accuracy and low runtime overhead. And very importantly, different from common approaches, Phishpedia does not require train- ing on any phishing samples. We carry out extensive experi- ments using real phishing data; the results demonstrate that Phishpedia significantly outperforms baseline identification approaches (EMD, PhishZoo, and LogoSENSE) in accurately and efficiently identifying phishing pages. We also deployed Phishpedia with CertStream service and discovered 1,704 new real phishing websites within 30 days, significantly more than other solutions; moreover, 1,133 of them are not reported by any engines in VirusTotal.

[USENIX '21][paper] Phishpedia- A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages.pdf

게시됨 : 2022년 02월 17일 9:58 오전