| 研究生: |
黃哲緯 Huang, Che-Wei |
|---|---|
| 論文名稱: |
可應用於高效性偵測詐騙電話之新穎圖形探勘方法 Novel Graph Mining Approaches for Efficient Fraudulent Phone Call Detection |
| 指導教授: |
謝孫源
Hsieh, Sun-Yuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 英文 |
| 論文頁數: | 78 |
| 中文關鍵詞: | 詐騙電話偵測 、信賴程度探勘 、有權重HITS演算法 、增量學習 |
| 外文關鍵詞: | Fraudulent Phone Call Detection, Trust Value Mining, Weighted HITS Algorithm, Incremental Learning |
| 相關次數: | 點閱:69 下載:9 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,隨著現代科技發展及全球化通訊,詐騙行為在國際間越來越猖獗。儘管有ㄧ些文獻解決了詐騙電話偵測問題,這些現存的方法將詐騙電話偵測問題定義成一個二元分類問題。由於通話紀錄資訊的限制,諸如以分類器為依據的方法用於詐騙電話偵測上ㄧ般地偵測效果並不好。在此篇論文中,我們提出了一個根據圖形探勘的詐騙電話偵測架構(FrauDetector)能夠自動地將詐騙電話標記為詐騙,這是一個很重要的應用用於分辨正常電話及詐騙電話。根據通話紀錄,我們建立兩種有向圖CPG及UPG來代表使用者與遠端號碼之關係。為了在有向圖CPG及UPG中加上權重,我們擷取特徵來代表使用者與遠端號碼的通話行為。我們在有權重的有向圖CPG及UPG中執行有權重的HITS演算法去學習使用者的經驗分數以及遠端號碼的信賴分數。我們提出一個分數函式來評估一通電話使否為詐騙電話的機率。我們使用了手機反詐騙軟體Whoscall提供的資料進行了一個全面的實驗研究。FrauDetector的結果顯示出我們有權重HITS演算法的效率以及在特徵擷取中考慮到邊上的權重。然而,我們先前提出的架構FrauDetector已經得到了顯著的結果,此方法仍能夠在效率及擴展性上提升。因此,我們提出了一個高效率平行根據圖形探勘的詐騙電話偵測架構〖FrauDetector〗^+。〖FrauDetector〗^+ 最初地從原本的圖中產生更小、更能夠控制的子圖並執行平行化的HITS演算法在圖形學習上顯著地加速。〖FrauDetector〗^+採用了一個新型的整合方法來產生使用者的經驗分數以及遠端號碼的信賴分數根據他們各自在子群體中的分數。在最初的步驟之後,我們能夠逐漸地更新使用者的經驗分數以及遠端號碼的信賴分數當有一通新的詐騙號碼被偵測時。在偵測模組時,使用一個以詐騙電話為中心的雜湊結構去加速並即時地偵測詐騙電話。〖FrauDetector〗^+的結果比起FrauDetector在效率上有所改進以及比起分類器的方法有卓越的性能。
In recent years, fraud is becoming more rampant internationally with the development of modern technology and global communication. Although many literatures have addressed the fraud detection problem, these existing works focus only on formulating the fraud detection problem as a binary classification problem. Due to the limitation of information provided by telecommunication records, such classifier-based approaches for fraudulent phone call detection normally do not work well. In this paper, we develop a graph-mining-based fraudulent phone call detection framework, FrauDetector for a mobile application to automatically annotate fraudulent phone numbers with a “fraud” tag, which is a crucial prerequisite for distinguishing fraudulent phone calls from normal phone calls. Our detection approach performs a weighted HITS algorithm to learn the trust value of a remote phone number. Based on telecommunication records, we build two kinds of directed bipartite graph CPG and UPG to transform users’ telecommunication as directed graphs. To weight the edges of CPG and UPG, we extract features to represent the telecommunication behavior of users. Upon weighted CPG and UPG, we perform weighted HITS algorithm to learn experience values for users and trust value for phone numbers and we propose a scoring function to evaluate the probability for a phone number to be fraudulent. Moreover, we propose a highly-efficient incremental graph-mining-based fraudulent phone call detection framework,〖FrauDetector〗^+. 〖FrauDetector〗^+ initially generates smaller, more manageable sub-networks from the original graph and performs a parallelized weighted HITS algorithm for significant speed acceleration in the graph learning module. It adopts a novel aggregation approach to generate the trust (or experience) value for each phone number (or user) based on their respective local values. After initial procedure, we can incrementally update the trust value (or experience) for each phone number (or user) while a new fraud phone number is identified. An efficient fraud-centric hash structure is constructed to support fast, real-time detection of fraudulent phone numbers in the detection module. We conduct a comprehensive experimental study based on a real dataset collected through an anti-fraud mobile application, Whoscall. The results of FrauDetector demonstrate the effectiveness of our weighted HITS-based approach and show the strength of taking weighted edges into account in feature extraction. Besides, the results of 〖FrauDetector〗^+demonstrate a significantly improved efficiency of our approach compared to FrauDetector and superior performance against other major classier-based methods.
[1] “Hadoop information,” http://hadoop.apache.org/.
[2] Agosti, M., And Pretto, A theoretical study of a generalized version of kleinberg’s hits algorithm, Inf. Retr. 8 (2) (2005) 219–243.
[3] An, P., Jula, A., Rus, S., Saunders, S. , Smith, T., Tanase, G., Thomas, N., Amato, N., And Rauchwerger, L. STAPL: A standard template adaptive parallel C++ library. In Int. Wkshp on Adv. Compiler Technology for High Perf. and Embedded Processors, July 2001.
[4] An, P., Jula, A., Rus, S., Saunders, S. , Smith, T., Tanase, G., Thomas, N., Amato, N., And Rauchwerger, L. STAPL: An adaptive, generic parallel programming library for C++. In Wkshp. On Lang. and Comp. for Par. Comp. (LCPC), pages 193–208, August 2001.
[5] Becker, R. A., Volinsky, C., And Wilks, A. R. Fraud Detection in Telecommunications: History and Lessons Learnd. Technometrics, vol. 52, No. 1, pp. 20–33, February 2010.
[6] Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J. Data mining for credit card fraud: a comparative study. Decision Support Systems, 50 (3), pp. 602–613, 2011
[7] Breiman, L. Random forests. Machine Learning, 45(1):5–32, 2001.
[8] Burez, J., And Van Den Poel, D. Handling class imbalance in customer churn prediction. Expert Systems with Applications, 2009
[9] Cahill, M. H., Lambert, D., Pinheiro, J. C., And Sun, D. X. Detecting Fraud In The Real World. The Handbook of Massive Data Sets, Kluwer, pp911-930, 2002.
[10] Chaiken, R., Jenkins, B., Larson, P.-A., Ramsey, B. , Shakib, D., Weaver, S., And Zhou, J. “Scope: easy and efficient parallel processing of massive data sets,” Proc. VLDB Endow. 1,2(Aug. 2008), 1265-1276.
[11] Chan, A., And Dehne, F. CGMgraph/CGMlib: Implementing and testing CGM graph algorithms on PC clusters. In PVM/MPI, pp117-125, 2003.
[12] Chan, A., And Dehne, F. cgmLIB: A library for coarse-grained parallel computing. http://lib.cgmlab.org/, Springer,December 2004.
[13] Chan, P. K., Fan, W., Prodromidis, A., And Stolfo, S. Distributed Data Mining in Credit Card Fraud Detection. IEEE Intelligent Systems, 14, pp67-74, 1999.
[14] Chen, R., Weng, X., He, B., Yang, M., Byron, C., And Li, X. On the Efficiency and Programmability of Large Graph Processing in the Cloud. Tech. Rep. MSR-TR-2010-44, Microsoft Research TechReport, May 2010.
[15] Cortes, C., And Vapnik, V. Support-vector network. Machine Learning. 20, 273–297, 1995.
[16] Gregor, D., Edmonds, N., Barrett, B., And Lumsdaine, A. The Parallel Boost Graph Library,2005.
[17] Dean, J., And Ghemawat, S. Mapreduce: Simplified data processing on large clusters. In Proceedings of the 6th USENIX OSDI, pages 137–150, 2004.
[18] Dean, J., And Ghemawat, S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008
[19] Deguchi, T., Takahashi, K., Takayasu, H., Takayasu, M. 2014. Hubs and authorities in the world trade network using a weighted HITS algorithm. PLoS ONE 9, 7(07 2014),1-16.
[20] Deng, H., Lyu, M. R., King, I. A generalized Co-HITS algorithm and its application to bipartite graphs. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA, 2009), KDD ’09, ACM, pp. 239-248
[21] Fawcett, T., And Provost, F. Adaptive fraud detection. Data Mining and Knowledge Discovery, Kluwer, 1, pp291-316, 1997
[22] Fawcett, T., And Provost, F. Combining Data Mining and Machine Learning for Effective User Profiling. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Oregon, USA, 1996, pp. 8-13.
[23] Forum, M. P. I. Corporate the MPI forum - MPI: A Message Passing Interface. In Proc. of Supercomputing ’93, pages 878–883. IEEE Computer Society Press, November 1993, pp.878-883.
[24] Gadi, M. F. A., Wang, X., Do Lago, Alair Pereira, E. P. J., Lee, D., And Jung, S. Artificial Immune Systems: 7th International Conference, ICARIS 2007, Phuket,Thailand, August 10-13, 2008. Proceedings. Berlin, Heidelberg: Springer Berline Heidelberg, 2008, ch. Credit Card Fraud Detection with Artificial Immune System, pp. 119-131.
[25] Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., And Sunderam, V. S. PVM: A Parallel Virtual Machine. Scientific and Engineering Computation Series. MIT Press, 1994.
[26] Geist, A., Gropp, W., Huss-Lederman, S., Lumsdaine, A., Lusk, E., Saphir, W., Skjellum, T. And Snir, M. MPI-2: Extending the message-passing interface. Euro-Par ’96 Parallel Processing.
[27] Grossman, R. L., And Gu, Y. “Data mining using high performance data clouds: experimental studies using sector and sphere,” In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA,2008), KDD ’08, ACM, pp. 920-927.
[28] Hielscher, F., and Gottschling, P. ParGraph. http://pargraph.sourceforge.net/, 2004
[29] He, H., Wang, J., Graco, W., And Hawkins, S. Application of Neural Networks to Detection of Medical Fraud. Expert Systems with Applications, 13, pp329-336, 1997.
[30] Kleinberg, J. Authoritative sources in a hyperlinked environment. ACM Computing Surveys, 46 (5): 604–632, 1999.
[31] Kleinberg, J. Hubs, Authorities, and Communities. ACM Comput. Surv. 31, 4es(dec 1999).
[32] Kollam Chandranna, A. An Online Version of Hyperlinked-Induced Topics Search (HITS) Algorithm, Master’s Projects (2010).
[33] Lattanzi, S., Moseley, B., Suri, S., And Vassilvitskii, S. Filtering: a method for solving graph problems in mapreduce. In SPAA, pages 85–94, 2011.
[34] Li, L., Shang, Yi, And Zhang, W. Improvement of HITS-based algorithms on web documents. In Proceedings of the 11th International Conference on World Wide Web (New York, NY, USA, 2002), WWW ‘02, ACM, pp. 527-535
[35] London, A., And Csendes, T. Hits based network algorithm for evaluating the professional skills of wine tasters. In Applied Computational Intelligence and Informatics (SACI) 2013 IEEE 8th International Symposium on (May 2013), pp. 197-200
[36] Long, X., And Joshi, J. A hits-based POI recommendation algorithm for location-based social networks. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on (Aug 2013), pp. 642-647
[37] Maes, S., Tuyls, K., Vanschoenwinkel, B., And Manderick, B. Credit Card Fraud Detection Using Bayesian and Neural Networks. In Proceedings of the 1st International NAISO Congress on Neuro Fuzzy Technologies, Havana, Cuba, 2002.
[38] Malewicz, G., Austern, M. H., Bik, A. J. C., Dehnert, J. C., Horn, I., Leiser, N., Czajkowski, G. Pregel: a system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010 (2010), pp. 135-146.
[39] Murphy, K. P. Naive Bayes classifiers. http://www.cs.ubc.ca/murphyk/Teaching/CS340-Fall06/reading/NB.pdf
[40] Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., And Sun, X. The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50 (3) (2011), pp. 559–569.
[41] Nguyen, T. T., and Jung, J. J. Exploiting geotagged resources to spatial ranking by extending HITS algorithm. Comput. Sci. Inf. Syst. 12, 1 (2015), 185-201.
[42] Nomura, S., Oyama, S., Hayamizu, T. And Ishida, T. Analysis and improvement of hits algorithm for detecting web communities. Syst. Comput. Japan, 35, Nov. 2004, pp. 32-42.
[43] Olston, C., Reed, B., Srivastava, U., Kumar, R., And Tomkins, A. “Pig latin: a not-so-foreign language for data processing,” In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (New York, NY, USA, 2008), SIGMOD ’08, 2008, pp. 1099–1110.
[44] Olszewski, D. A probabilistic approach to fraud detection in telecommunications. Knowledge Based Systems, Volume 26, pp.246-258, 2012
[45] M. Onderwater. Detecting unusual user profiles with outlier detection techniques. VU University Amsterdam, 2010.
[46] Ormerod, T. , Morley, N., Ball, L., Langley, C., And Spenser, C. Using Ethnography To Design a Mass Detection Tool (MDT) For The Early Discovery of Insurance Fraud. In CHI ’03 Extended Abstracts on Human Factors in Computing Systems (2003), pp. 650-651.
[47] Pal, S. H., Patel, J. N. Data Mining in Telecommunication: A Review. International Journal of Innovative Research in Technology, 2014
[48] Patrick L. Brockett, Xiaohua Xia, And R. A. D. Using Kohonen's Self Organising Feature Map to Uncover Automobile Bodily Injury Claims Fraud. The Journal of Risk and Insurance 65, 2 (1998), pp. 245-274.
[49] Phua, C., Alahakoon, D., And Lee, V. Minority Report in Fraud Detection: Classification of Skewed Data. ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 50-59, 2004.
[50] Picard, P. Handbook of Insurance. Economic analysis of insurance fraud, pp. 315–362.
[51] Pike, R., Dorward, S., Griesemer, R., And Quinlan, S. Interpreting the data: Parallel analysis with sawzall, Sci. Program. 13, 4 (Oct. 2005), 277-298.
[52] Qing-Xian Wang, L. L. MapReduce for HITS Algorithm with Application to Chinese Word Networks, School of Computer Science and Engineering, University of Electronic Science and Technology of China, 610054 Chengdu, P. R. China.
[53] Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufman Publishers Inc., San Francisco, CA, USA, 1993.
[54] Raj, S. B. E., And Portia, A. A. Analysis on credit card fraud detection methods. In Computer, Communication and Electrical Technology (ICCCET), 2011 International Conference on (March 2011), pp. 152-156.
[55] Rish, I. An empirical study of the naive bayes classifier. In Proceedings of IJCAI-01 workshop on Empirical Methods in AI, International Joint Conference on Artificial Intelligence, 2001, pages 41 – 46.
[56] Suganya, R. Adapting Hits Algorithm For Image Search In Favour of User Profile, IJIRST (November 2014).
[57] Shen, A., Tong, R., And Deng, Y. Application of classification models on credit card fraud detection, In 2007 International Conference on Service System and Service Management (June 2007) , pp. 1-4.
[58] Siek, J., Lee, L.-Q., And Lumsdaine, A. The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley, 2002.
[59] Siek, J., Lumsdaine, A., And Lee, L.-Q. Boost Graph Library. Boost, 2001.
[60] Srivatava, A. , Kundu, A., Sural, S., And Majumdar, A. Credit card fraud detection using hidden markov model, IEEE Transactions on Dependable and Secure Computing 5, 1 (2008), pp. 37-48.
[61] Šubelj, L., Furlan, S., And Bajec, M. An expert system for detecting automobile insurance fraud using social network analysis. CoRR abs/1104.2904 (2011).
[62] SANCHEZ, D., VILA, M., CERDA, L., AND SERRANO, J. Association rules applied to credit card fraud detection, Expert Systems with Applications 36, 2, Part 2 (2009), pp. 3630-3640, 2009.
[63] Tseng, V. S., Ying, J., Huang, C., Kao, Y., and Chen, K. FrauDetector: A Graph-Mining-based Framework for Fraudulent Phone Call Detection, In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Minging, Sydney, NSW, Australia, August 10-13, 2015 (2015), pp. 2157-2166.
[64] Wang, S. Interdisciplinary Computing in Java Programming Language. Dordrecht the Netherlands, Kluwer Academic Publishers, 2003.
[65] Weatherford, M. Mining for Fraud. IEEE Intelligent Systems 17, 4 (Jul 2002), pp. 4-6.
[66] Williams, G. Evolutionary Hot Spots Data Mining: An Architecture for Exploring for Interesting Discoveries. In Methodologies for Knowledge Discovery and Data Mining: Third Pacific-Asia Conference, PAKDD-99 Beijing, China, April 26-28, 1999 Proceedings. Springer Berlin Heidelberg, 1999.
[67] Williams, G. Evolutionary Hot Spots Data Mining. In Proceedings of the 3rd Pacific-Asia Conference in Knowledge Discovery and Data Mining, Beijing, China, 1999.
[68] Williams, G., and Huang, Z. Mining the Knowledge Mine: The Hot Spots Methodology for Mining Large Real World Databases. In Proceedings of the 10th Australian Joint Conference on Artificial Intelligence, Perth, Australia, 1997, pp. 340-348.
[69] Xue, W., And Shen, P. Application of HITS Algorithm in Web Mining in Literature Evaluation and Empirical Study, Communications in Statistics – Simulation and Computation 40, 10 (2011), pp. 1576 – 1586.
[70] Ying, J. J.-C., Lu, E. H.-C., Kuo, W.-N., And Tseng, V. S. Mining User Check-in Behavior with a Random Walk for Urban Point-of-interest Recommendations. ACM Transactions on Intelligent Systems and Technology (TIST) Volume 5 Issue 3, September 2014, Article No. 40.
[71] Yusoff, M. I. M., Mohamed, I., And Baker, M. R. A. Fraud detection in telecommunication industry using Gaussian mixed model. In 2013 International Conference on Research and Innovation in Information Systems (ICRIIS) (Nov 2013), pp. 27-32.