簡易檢索 / 詳目顯示

研究生: 孫慈鴻
Sun, Tzu-Hung
論文名稱: 雜湊演算法用在辨識LINE假帳號個人圖片
Using hash algorithms to identify personal photos of fake LINE accounts
指導教授: 黃悅民
Huang, Yueh-Min
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系碩士在職專班
Department of Engineering Science (on the job class)
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 73
中文關鍵詞: 反詐騙假帳號偵測圖片檢索平台(CBIR)圖片雜湊演算法漢明距離結構性相似度比較(SSIM)LINE機器人
外文關鍵詞: Anti-fraud, Fake account detection, Content-Based Image Retrieval (CBIR), Image hashing algorithm, Hamming distance, Structural Similarity Index (SSIM), LINE bot
相關次數: 點閱:100下載:34
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年來,台灣的網路詐騙案件每年不斷的攀升,據刑事局打擊詐欺犯罪中心資料顯示,近5年的詐騙案達累積至41萬8633件,從109年起也成為台灣犯罪案件之首,依110年資料達124899件,平均每天就有342件詐欺案件發生。以金額來說,從106年到110年之間,詐欺罪案件查扣犯罪達1048件,查扣金額為11億8913萬元,平均每件詐欺案件金額損失為113萬元。
    在思考網路詐騙案件發生的可能原因,想到最先與被害者接觸的一個關鍵點就是社群平台假帳號,假帳號通常基於某一個目的而存在,不論其目的為何,目前還是沒有一套有效的方法去辨識這些社群平台帳號是否為假帳號,只能透過社群平台的其他用戶檢舉回報機制或該平台本身提供的假帳號自動偵測技術,去分辨假帳號後再讓假帳號下架。但是,根據我親自實際去檢舉大量假帳號的實際結果,社群平台似乎沒有真正的讓假帳號下架。只是回覆我說該帳號並不是假帳號,我認為社群平台的假帳號控管似乎已經是失靈的狀況。所以,我開始思考是否能有一套比較有效的方式去提醒其他人,這個帳號就是假帳號。最方便的方式就是提供一個查詢的平台,且平台最好是選擇目前台灣最多人使用的通訊軟體,所以我可以想到就是台灣用戶最多的LINE。
    雖然目前政府機關和各會部門,配合民間機構不斷的宣傳反詐騙和破解詐騙手法,但是,假帳號的數量及實際所發生的詐騙案件仍然不見它們有逐年下降的趨勢,反而越來越多。所以,如何在詐騙集團利用假帳號開始去行使詐騙之前就識破他是假帳號。我認為是值得研究的一門課題。
    本研究希望能以「內政部警政署165全民防騙網」曾經公布過的LINE詐騙假帳號,把個人圖片擷取出來後再進行圖片檢索,並透過圖片檢索平台的辨識比對,實作出一個可以快速去識別假帳號的工具,除了可以讓一般大眾受益,也可以將未來可能即將發生的詐騙案件和民眾的財產損失杜於門外。

    In recent years, the number of online fraud cases in Taiwan has been steadily increasing. According to data from the Criminal Investigation Bureau's Anti-Fraud Center, there have been a total of 418,633 fraud cases reported in the past 5 years. Since 2019, fraud cases have become the most common form of crime in Taiwan, reaching 124,899 cases in 2021, averaging 342 fraud cases per day. In terms of monetary losses, between 2017 and 2021, a total of 1,048 fraud cases were investigated, with a seized amount of 1.189 billion NTD, averaging a loss of 1.13 million NTD per case.

    When considering the possible reasons for the occurrence of online fraud cases, one key factor that comes to mind is the use of fake accounts on social media platforms. Fake accounts are often created for various purposes, and currently, there is no effective method to identify these accounts as fake. The only way to distinguish fake accounts is through user reports or automatic detection mechanisms provided by the social media platforms themselves. However, based on my personal experience of reporting numerous fake accounts, it seems that social media platforms are not effectively taking down these accounts. They simply respond by saying that the reported accounts are not fake. This suggests that the management of fake accounts on social media platforms seems to be ineffective. Therefore, I started thinking about whether there could be a more effective way to alert others that an account is fake. The most convenient way would be to provide a platform for account verification, preferably using the most widely used communication software in Taiwan, which is LINE.

    Despite the efforts of government agencies, private organizations, and various departments to promote anti-fraud measures and uncover fraudulent techniques, the number of fake accounts and actual fraud cases remains high and shows no sign of decreasing. Therefore, it is worth studying how to identify fake accounts before fraudsters can carry out their fraudulent activities. This is the focus of my research.

    In this study, I aim to extract personal photos from LINE fake accounts previously disclosed by the "165 National Fraud Prevention Website" under the National Police Agency of the Ministry of the Interior. These photos will then be used for image retrieval and comparison on an image search platform. The goal is to develop a tool that can quickly identify fake accounts. This tool not only benefits the general public but also helps prevent potential future fraud cases and protect people's financial assets.

    摘要 i Extend Abstract ii 誌謝 v 目錄 vi 表目錄 ix 圖目錄 x 第一章 緒論 1 1.1研究背景 1 1.2 研究動機與目的 1 1.2.1研究動機 1 1.2.2研究目的 3 1.3 研究方法 3 1.3.1 資料收集 3 1.3.2 影像前置處理 3 1.3.3 產生雜湊值 3 1.3.4 圖片標籤 4 1.3.5 雜湊值比對 4 1.3.6 防詐機器人實作 4 1.4 研究貢獻 4 第二章 文獻探討 5 2.1圖片檢索(CBIR) 5 2.2圖片標籤(Image labeling,IL)與圖片標註(Image Annotation,IA) 6 2.3 圖片雜湊演算法(Image Hash Algorithm) 7 2.4 漢明距離(Hamming distance) 7 2.5 圖片搜尋平臺 7 2.5.1 Google圖片搜尋 8 2.5.2 Bing圖片搜尋 8 2.5.3 TinEye搜圖 10 2.5.4 百度搜圖 11 2.5.5 Yandex搜圖 12 2.5.6 搜狗圖片 13 2.5.7 搜圖平臺總結 14 2.6台灣的防詐騙協助平台 14 2.6.1趨勢科技防詐達人 14 2-6-2內政部警政署165全民防騙網 17 2.6.3 警政署165反詐騙-LINE ID通報頻率 19 2.6.4台灣事實查核中心 20 2.7.光學字元辨識(OCR) 20 2.8 Python程式語言與相關程式庫套件 21 2.8.1 Python程式語言 21 2.8.2 Python程式庫 21 第三章 系統設計與架構 23 3.1研究架構 23 3.2資料蒐集與整理 24 3.2.1資料來源 24 3.2.2前置處理:LINE個人圖片取得 24 3.2.3前置處理:圖片裁切 26 3.3 圖片檢索演算法 26 3.3.1資料前置處理 26 3.3.2結構相似性指標 26 3.3.3均值雜湊演算法 29 3.3.4差值雜湊演算法 30 3.3.5感知雜湊演算法 31 3.3.6漢明距離計算 32 3.4模擬實作 32 3.4.1 LINE機器人雲端架設軟硬體規格 32 3.4.2 LINE機器人蒐集資料 33 3.4.3 反詐騙機器人功能 36 3.4.4 圖片標籤快速設定工具 36 3.4.5 SSIM於本研究中取平均值的方法: 39 第四章 實驗設計與結果 42 4.1資料集 42 4.1.1 LINE ID資料來源 42 4.1.2 圖片標籤 42 4.1.3 LINE個人圖片來源 42 4.2實驗設計 42 4.2.1 本實驗驗證時使用的硬體資源 42 4.2.2 驗證圖片雜湊函式是否適合用在假帳號反查上 43 4.2.3 利用結構相似度指標SSIM找出最適合的漢明距離 43 4.3實驗結果 43 4.3.1 雜湊值產生時間比較 43 4.3.2 雜湊演算法的相似圖誤報率比較 44 4.3.3 圖像分類結果 44 4.3.4 結構相似度指標來驗證各雜湊演算法的漢明距離最適合閥值 45 4.3.6 LINE反詐騙機器人K實際操作範例 47 4.4 實驗限制與問題: 50 4.4.1 本實驗條件限制 50 4.4.2 本實驗遇到的問題1–「選擇OpenCV和Pillow轉灰階圖會有差異」 50 4.4.3本實驗遇到的問題2–「傳給機器人比對的圖片要增加白色圓形遮罩」: 53 4.4.4本實驗遇到的問題3–「圖片進行縮放後,計算SSIM值會有微小偏差」: 55 4.4.5檢視圖進行比較並驗證ahash、dhash、phash的漢明距離(閥值): 55 第五章 結論與未來展望 59 5.1 總結 59 5.2 未來展望 59 5.2.1 收集其他平臺的詐騙個人圖片 59 5.2.2 AI生成的假圖片成為未來的挑戰 59 參考文獻 60 附錄 64 本研究所使用的MariaDB的資料表〔cheat_line〕 64 本研究快速標籤設定參考畫面(php撰寫) 65 本研究於「165反詐騙-詐騙LINE ID」收集到的LINE個人圖片(php撰寫) 66

    [1] “臉書假帳號有多猖狂?去年第四季撤銷13億個假帳號,” iThome. https://www.ithome.com.tw/news/143411 .
    [2] “LINE Transparency Report,” LINE Corporation. https://linecorp.com/zh-hant/security/transparency/2020h1 .
    [3] “Twitter Transparency Center.” https://transparency.twitter.com/en.html .
    [4] “社群平台的管理難題:假帳號、不當內容、網軍聯合操作/施典志|方格子 vocus,” Aug. 29, 2019. https://vocus.cc/article/5d67591bfd897800015a498f .
    [5] S. Adikari and K. Dutta, “Identifying fake profiles in linkedin,” ArXiv Prepr. ArXiv200601381, 2020.
    [6] D. Ramalingam and V. Chinnaiah, “Fake profile detection techniques in large-scale online social networks: A comprehensive review,” Comput. Electr. Eng., vol. 65, pp. 165–177, 2018.
    [7] “Digital 2022: Taiwan,” DataReportal – Global Digital Insights, Feb. 15, 2022. https://datareportal.com/reports/digital-2022-taiwan .
    [8] “常見詐騙犯罪型態及手法一覽表.” https://web.nutn.edu.tw/gac370/safe/defecra.htm .
    [9] 陳玉書 and 王秋惠, “網路詐欺被害特性分析,” 2011.
    [10] “內政部警政署 165 全民防騙網.” https://165.npa.gov.tw/#/ .
    [11] “165反詐騙諮詢專線-詐騙LINE ID,” 政府資料開放平臺. https://data.gov.tw/dataset/78432 .
    [12] “基于内容的图像检索,” 維基百科,自由的百科全書. Feb. 05, 2023. Accessed: Jun. 25, 2023. [Online]. Available: https://zh.wikipedia.org/w/index.php?title=%E5%9F%BA%E4%BA%8E%E5%86%85%E5%AE%B9%E7%9A%84%E5%9B%BE%E5%83%8F%E6%A3%80%E7%B4%A2&oldid=75837879
    [13] H. T. Shen, B. C. Ooi, and K.-L. Tan, “Giving meanings to WWW images,” in Proceedings of the eighth ACM international conference on Multimedia, 2000, pp. 39–47.
    [14] S. Pokhrel, “Image Data Labelling and Annotation — Everything you need to know,” Medium, Mar. 11, 2020. https://towardsdatascience.com/image-data-labelling-and-annotation-everything-you-need-to-know-86ede6c684b1 .
    [15] J. Buchner, “ImageHash: Image Hashing library.” Accessed: Jun. 25, 2023. [Online]. Available: https://github.com/JohannesBuchner/imagehash
    [16] “汉明距离,” 維基百科,自由的百科全書. May 04, 2023. Accessed: Jun. 25, 2023. [Online]. Available: https://zh.wikipedia.org/w/index.php?title=%E6%B1%89%E6%98%8E%E8%B7%9D%E7%A6%BB&oldid=77093505
    [17] “Google图片搜索,” 維基百科,自由的百科全書. Feb. 05, 2023. Accessed: Jun. 25, 2023. [Online]. Available: https://zh.wikipedia.org/w/index.php?title=Google%E5%9B%BE%E7%89%87%E6%90%9C%E7%B4%A2&oldid=75837734
    [18] “Bing,” Bing. https://www.bing.com/images/feed .
    [19] “TinEye Reverse Image Search.” https://tineye.com/ .
    [20] “百度图片-发现多彩世界.” https://image.baidu.com/ .
    [21] “Yandex Images: search for images online, image search.” https://yandex.com/images .
    [22] “搜狗图片搜索 - 上网从搜狗开始.” https://pic.sogou.com/ .
    [23] “趨勢科技防詐達人 — 最佳防詐騙工具.” https://getdr.com/ .
    [24] 防詐達人, “你知道加line好友之前,可以先查詢嗎?80萬人都在用的防詐小工具讓你遠離詐騙LineID.” https://vocus.cc/article/63f87334fd89780001284d44
    [25] “台灣事實查核中心 | Taiwan FactCheck Center.” https://tfc-taiwan.org.tw/ .
    [26] “原來OCR不只能辨識平面文字?完整介紹帶你認識OCR 3 大應用,” 大數看時事. https://www.largitdata.com/blog_detail/20111113
    [27] “Python,” 維基百科,自由的百科全書. Jun. 28, 2023. Accessed: Jul. 02, 2023. [Online]. Available: https://zh.wikipedia.org/w/index.php?title=Python&oldid=77854145
    [28] “Matplotlib — Visualization with Python.” https://matplotlib.org/
    [29] “NumPy.” https://numpy.org/
    [30] “Requests: HTTP for HumansTM — Requests 2.31.0 documentation.” https://requests.readthedocs.io/en/latest/
    [31] J. A. Clark (Alex), “Pillow: Python Imaging Library (Fork).” Accessed: Jul. 02, 2023. [Online]. Available: https://python-pillow.org
    [32] “OpenCV,” 維基百科,自由的百科全書. May 08, 2022. Accessed: Jul. 02, 2023. [Online]. Available: https://zh.wikipedia.org/w/index.php?title=OpenCV&oldid=71534425
    [33] “MySQL :: MySQL Connector/Python Developer Guide.” https://dev.mysql.com/doc/connector-python/en/
    [34] “Tesseract,” 維基百科,自由的百科全書. Mar. 22, 2023. Accessed: Jun. 25, 2023. [Online]. Available: https://zh.wikipedia.org/w/index.php?title=Tesseract&oldid=76463656
    [35] “LINE Messaging API SDK for Python.” LINE, Jun. 30, 2023. Accessed: Jul. 02, 2023. [Online]. Available: https://github.com/line/line-bot-sdk-python
    [36] “pandas - Python Data Analysis Library.” https://pandas.pydata.org/
    [37] “165 data api.” https://od.moi.gov.tw/api/v1/rest/datastore/A01010000C-001277-053 .
    [38] “結構相似性,” 維基百科,自由的百科全書. Aug. 23, 2022. Accessed: Jun. 25, 2023. [Online]. Available: https://zh.wikipedia.org/w/index.php?title=%E7%B5%90%E6%A7%8B%E7%9B%B8%E4%BC%BC%E6%80%A7&oldid=73338895
    [39] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
    [40] 赵岩, 孟丽茹, 王世刚, and 陈贺新, “符合人眼视觉感知特性的改进 PSNR 评价方法,” 吉林大学学报 工学版, vol. 45, no. 1, pp. 309–313, 2015.
    [41] 林婷筠, “不同壓縮技術影響多頻譜影像品質之研究,” 2010.
    [42] T. campos1, English: Grid illusion, Hermann or Hering Grid. 2007. Accessed: Jun. 25, 2023. [Online]. Available: https://commons.wikimedia.org/wiki/File:Grid_illusion.svg
    [43] “Looks Like It - The Hacker Factor Blog.” https://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html .
    [44] “FotoForensics.” https://fotoforensics.com/ .
    [45] “Kind of Like That - The Hacker Factor Blog.” https://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html .
    [46] C. Zauner, M. Steinebach, and E. Hermann, “Rihamark: perceptual image hash benchmarking,” in Media watermarking, security, and forensics III, SPIE, 2011, pp. 343–357.
    [47] “LINE Developers.” https://developers.line.biz/ .
    [48] K. Dion, E. Berscheid, and E. Walster, “What is beautiful is good.,” J. Pers. Soc. Psychol., vol. 24, no. 3, p. 285, 1972.
    [49] “RECOMMENDATION ITU-R BT.601-7 – Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios”.
    [50] J. Nilsson and T. Akenine-Möller, “Understanding SSIM.” arXiv, Jun. 29, 2020.
    [51] X. Luo and X. Tan, “Research and Application of Content-based Image Hash Retrieval Algorithm,” in 2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA), May 2023, pp. 413–416. doi: 10.1109/SERA57763.2023.10197838.
    [52] X. Zhang, S. Karaman, and S.-F. Chang, “Detecting and Simulating Artifacts in GAN Fake Images,” in 2019 IEEE International Workshop on Information Forensics and Security (WIFS), Feb. 2019, pp. 1–6. doi: 10.1109/WIFS47025.2019.9035107.
    [53] C. Zauner, M. Steinebach, and E. Hermann, “Rihamark: perceptual image hash benchmarking,” in Media Watermarking, Security, and Forensics III, SPIE, Feb. 2011, pp. 343–357. doi: 10.1117/12.876617.
    [54] W. Z. TANG Yingfu, “Registration of sand dune images using an improved SIFT and SURF algorithm,” J. Tsinghua Univ. Technol., vol. 61, no. 2, pp. 161–169, Dec. 2020, doi: 10.16511/j.cnki.qhdxxb.2020.22.031.
    [55] S. J. Mousavirad, H. Ebrahimpour-Komleh, and G. Schaefer, “Automatic clustering using a local search-based human mental search algorithm for image segmentation,” Appl. Soft Comput., vol. 96, p. 106604, Nov. 2020, doi: 10.1016/j.asoc.2020.106604.
    [56] A. A. Abdulla and M. W. Ahmed, “An improved image quality algorithm for exemplar-based image inpainting,” Multimed. Tools Appl., vol. 80, no. 9, pp. 13143–13156, Apr. 2021, doi: 10.1007/s11042-020-10414-6.
    [57] S. Y. Chaganti, I. Nanda, K. R. Pandi, T. G. N. R. S. N. Prudhvith, and N. Kumar, “Image Classification using SVM and CNN,” in 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA), Mar. 2020, pp. 1–5. doi: 10.1109/ICCSEA49143.2020.9132851.
    [58] U. Sara, M. Akter, and M. S. Uddin, “Image Quality Assessment through FSIM, SSIM, MSE and PSNR—A Comparative Study,” J. Comput. Commun., vol. 7, no. 3, Art. no. 3, Mar. 2019, doi: 10.4236/jcc.2019.73002.
    [59] B. Dolhansky and C. C. Ferrer, “Adversarial collision attacks on image hashing functions.” arXiv, Nov. 18, 2020. doi: 10.48550/arXiv.2011.09473.
    [60] M. KHAN, S. Monir, and I. NASEEM, “Robust image hashing based on structural and perceptual features forauthentication of color images,” Turk. J. Electr. Eng. Comput. Sci., vol. 29, no. 2, pp. 648–662, Jan. 2021, doi: 10.3906/elk-2002-6.
    [61] L. Struppek, D. Hintersdorf, D. Neider, and K. Kersting, “Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash,” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, in FAccT ’22. New York, NY, USA: Association for Computing Machinery, Jun. 2022, pp. 58–69. doi: 10.1145/3531146.3533073.
    [62] P. Samanta and S. Jain, “Analysis of Perceptual Hashing Algorithms in Image Manipulation Detection,” Procedia Comput. Sci., vol. 185, pp. 203–212, Jan. 2021, doi: 10.1016/j.procs.2021.05.021.
    [63] M. J. Willemink et al., “Preparing Medical Imaging Data for Machine Learning,” Radiology, vol. 295, no. 1, pp. 4–15, Apr. 2020, doi: 10.1148/radiol.2020192224.
    [64] A. Dutta and A. Zisserman, “The VIA Annotation Software for Images, Audio and Video,” in Proceedings of the 27th ACM International Conference on Multimedia, in MM ’19. New York, NY, USA: Association for Computing Machinery, Oct. 2019, pp. 2276–2279. doi: 10.1145/3343031.3350535.
    [65] A. Dinca, N. Angelescu, and D. Popescu, “Modern Web Application for Image Annotation Using Web Technologies,” in 2021 13th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Jul. 2021, pp. 1–6. doi: 10.1109/ECAI52376.2021.9515159.
    [66] K. Alhazmi, W. Alsumari, I. Seppo, L. Podkuiko, and M. Simon, “Effects of annotation quality on model performance,” in 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Apr. 2021, pp. 063–067. doi: 10.1109/ICAIIC51459.2021.9415271.
    [67] A. Latif et al., “Content-Based Image Retrieval and Feature Extraction: A Comprehensive Review,” Math. Probl. Eng., vol. 2019, p. e9658350, Aug. 2019, doi: 10.1155/2019/9658350.
    [68] Musabeyezu, Fortunee Abuja, Nigeria, “Comparative Study of Annotation Tools and Techniques” June, 2019
    [69] L. Pingyuan, Z. Dan, Y. Xiaoguang, and J. Suiping, “Image Hashing by Pre-Trained Deep Neural Network,” in 2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML), Mar. 2022, pp. 468–471. doi: 10.1109/CACML55074.2022.00085.
    [70] Biswas, Rubel, Pablo Blanco-Medina, “State of the Art: Image Hashing”. arXiv, Aug. 2021 doi: 10.48550/arXiv.2108.11794.
    [71] Youmeng LuoO, Wei Li, Xiaoyu Ma, Kaiqiang Zhang, “Image Retrieval Algorithm Based on Locality-Sensitive Hash Using Convolutional Neural Network and Attention Mechanism.” Sep. 2022
    [72] M. A. R. Khan, R. C. Tripathi, and A. Kumar, “Repacked android application detection using image similarity,” Nexo Rev. Científica, vol. 33, no. 01, Art. no. 01, Jul. 2020, doi: 10.5377/nexo.v33i01.10058.
    [73] D. Povedano Álvarez, A. L. Sandoval Orozco, J. P. García-Miguel, and L. J. García Villalba, “Learning Strategies for Sensitive Content Detection,” Electronics, vol. 12, no. 11, Art. no. 11, Jan. 2023, doi: 10.3390/electronics12112496.
    [74] J. Yin, S. Wang, and F. Li, “Division-of-Focal-Plane Polarization Image Denoising Algorithm Based on Improved Principal Component Analysis,” Acta Opt. Sin., vol. 41, no. 7, p. 0710002, Apr. 2021, doi: 10.3788/AOS202141.0710002.
    [75] The State Scientific Institution «The United Institute of Informatics Problems of the National Academy of Sciences of Belarus», V. V. Starovoitov, E. E. Eldarova, K. T. Iskakov, and L.N.Gumilyov Eurasian National University, “Comparative analysis of the SSIM index and the pearson coefficient as a criterion for image similarity,” Eurasian J. Math. Comput. Appl., vol. 8, no. 1, pp. 76–90, 2020, doi: 10.32523/2306-6172-2020-8-1-76-90.
    [76] X. Wang, X. Zhou, Q. Zhang, B. Xu, and J. Xue, “Image alignment based perceptual image hash for content authentication,” Signal Process. Image Commun., vol. 80, p. 115642, Feb. 2020, doi: 10.1016/j.image.2019.115642.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE