簡易檢索 / 詳目顯示

研究生: 黃繼民
Huang, Chi-Min
論文名稱: 以訊框及語義探勘為基礎之自動化地點標記方法
Frame-based Semantics Mining for Automatic Place Labeling
指導教授: 曾新穆
Tseng, Shin-Mu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 62
中文關鍵詞: 地點標記使用者行為分析類別不平衡問題多層式分類
外文關鍵詞: Place labeling, User behavior analysis, Class imbalance problem, Multi-level classification
相關次數: 點閱:104下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,關於利用使用者智慧型手機資料以自動化地點標記的研究議題備受矚目。雖然在文獻上已有許多關於自動化地點標記的技術,但是大部份技術的觀念都是單純地把自動化地點標記轉換為一個多類別的分類問題。此外,由於使用者經常在相同標記地點之中做各種不同的活動,而這些地點都被標示為相同的標記,如何在這樣異質性的資料集中建立出一個彈性的標記模型也是一個具有挑戰性的議題。在此論文中,我們提出一個創新的方法名為以訊框為基礎的語義探勘(Frame-based Semantics Mining)來結合使用者的智慧型手機資料並同時利用使用者行為及環境資訊來標記地點。本研究為首例在使用者智慧型手機資料中同時考慮到語義相似度及訊框特徵值之研究。經由一系列使用Nokia Mobile Data Challenge [31] 之真實資料進行的完整實驗,證明我們提出之以訊框及語義探勘為基礎之自動化地點標記方法能有效地標記地點。

    In recent years, researches on automatic place labeling based on users’ smartphone data have attracted a lot of attention. However, most of proposed automatic place labeling techniques only transform the automatic place labeling to a multi-class classification problem. Furthermore, since users always perform many different activities in the places which are labeled the same semantic label, how to build a flexible labeling model based on such kind of heterogeneous data is also a challenging issue. In this thesis, we propose a novel approach named Frame-based Semantics Mining (FS-Mining) that integrates users’ smartphone data for labeling a place based on the users’ behaviors and environment of place. To our best knowledge, this is the first work on automatic place labeling that considers similarity between semantic labels and frame features in users’ smartphone data. Through comprehensive experimental evaluations on a real dataset from Nokia Mobile Data Challenge [31], the proposed FS-Mining is shown to deliver excellent performance.

    中文摘要 I Abstract II 誌謝 III Content IV List of Tables VI List of Figures VII 1. Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Overview of Our Approach 4 1.4 Contribution 6 1.5 Thesis Organization 7 2. Related Work 8 2.1 Automatic Place Labeling 8 2.2 Class Imbalance Handling 9 2.3 Ensemble Learning 10 3. Proposed Methods 12 3.1 Overview of the Proposed Approach 12 3.2 Frame Feature Handling 14 3.2.1 Behavior Feature 14 3.2.2 Environment Feature 17 3.2.3 Feature Selection 18 3.3 Classification Model Building 19 3.3.1 Basic Multi-Level Classification 19 3.3.2 Modified Multi-Level Classification 20 3.3.3 Ensemble Classification for Handling Class Imbalance 26 4. Experiments and Evaluation 27 4.1 Experimental Dataset 27 4.2 Evaluation Methodology 28 4.3 Effectiveness of Features 29 4.3.1 Recall 29 4.3.2 Precision 30 4.3.3 F-measure 31 4.4 Effectiveness of Feature Selection 32 4.5 Effectiveness on Labels 34 4.6 Comparison with Existing Methods 35 4.6.1 Effectiveness in Terms of Recall 37 4.6.2 Effectiveness in Terms of Precision 39 4.6.3 Effectiveness in Terms of F-measure 40 4.7 Discussions 45 4.7.1 Impact on Number of Visits of Places 45 4.7.2 Impact on Number of frames of Places 50 5. Conclusions and Future Work 52 5.1 Conclusions 52 5.2 Future Work 54 References 55

    [1] https://research.nokia.com/page/12000
    [2] https://foursquare.com/
    [3] https://www.everytrail.com/
    [4] S. Bergamaschi, E. Domnori, F. Guerra, M. Orsini, R. T. Lado, and Y. Velegrakis, "Keymantic: semantic keyword-based searching in data integration systems," Proceedings of the Very Large Data Base Endowment, vol. 3, pp. 1637-1640, 2010.
    [5] L. Breiman, "Bagging predictors," Machine Learning, vol. 24, pp. 123-140, 1996.
    [6] P. Buchlmann and B. Yu, "Analyzing Bagging," The Annals of Statistics, vol. 30, pp. 927-961, 2002.
    [7] A. Buja and W. Stuetzle, "Observations on bagging," Statistica Sinica, vol. 16, pp. 323-351, 2006.
    [8] N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, "SMOTE: Synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
    [9] N. Chawla, A. Lazarevic, L. Hall, and K. Bowyer, "SMOTEBoost: Improving prediction of the minority class in boosting," Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 107-119, Cavtat-Dubrovnik, Croatia, 2003.
    [10] X. Chen, B. Gerlach, and D. Casasent, "Pruning Support Vectors for Imbalanced Data Classification," Proceedings of International Joint Conference on Neural Networks, pp. 1883-1888, Canada, 2005.
    [11] Y. Chon, Y. Kim, and H. Cha, "Autonomous place naming system using opportunistic crowdsensing and knowledge from crowdsourcing," Proceedings of the ACM/IEEE Conference on Information Processing in Sensor Networks, pp. 19-30, Philadelphia, USA, 2013.
    [12] Y. Chon, Y. Kim, H. Shin, and H. Cha, "Topic Modeling-based Semantic Annotation of Place using Personal Behavior and Environmental Features," Proceedings of the Mobile Data Challenge by Nokia Workshop, co-located with Pervasive 2012, Newcastle, United Kingdom, 2012.
    [13] B. V. Dasarathy and B. V. Sheela, "Composite Classifier System-Design - Concepts and Methodology," Proceedings of the IEEE, vol. 67, pp. 708-713, 1979.
    [14] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum Likelihood from Incomplete Data Via Em Algorithm," Journal of the Royal Statistical Society Series B-Methodological, vol. 39, pp. 1-38, 1977.
    [15] T. Do and D. Gatica-Perez, "The Places of Our Lives: Visiting Patterns and Automatic Labeling from Longitudinal Smartphone Data," IEEE Transactions on Mobile Computing, vol. PP, p. 1, 2013.
    [16] P. Domingos, "MetaCost: A General Method for Making Classifiers Cost-Sensitive," Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 155-164, San Diego, CA , USA, 1999.
    [17] A. Estabrooks, T. Jo, and N. Japkowicz, "A Multiple Resampling Method for Learning from Imbalanced Data Sets," Computational Intelligence, vol. 20, pp. 18-36, 2004.
    [18] Y. Freund and R. E. Schapire, "Experiments with a new boosting algorithm," Proceedings of the International Conference on Machine Learning, pp. 148-156, Bari, Italy, 1996.
    [19] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of Computer and System Sciences, vol. 55, pp. 119-139, 1997.
    [20] B. Guc, M. May, Y. Saygin, and C. Korner, "Semantic Annotation of GPS Trajectories," Proceedings of the AGILE International Conference on Geographic Information Science, Girona, Spain, 2008.
    [21] V. Hegde, J. X. Parreira, and M. Hauswirth, "Semantic Tagging of Places Based on User Interest Profiles from Online Social Networks," Proceedings of European Conference on Information Retrieval, pp. 218-229, Moscow, Russia 2013.
    [22] T. R. Hoens, Q. Qian, N. V. Chawla, and Z.-H. Zhou, "Building Decision Trees for the Multi-class Imbalance Problem," Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 122-134, Malaysia, 2012.
    [23] C.-W. Hsu and C.-J. Lin, "A comparison of methods for multi-class support vector machines," IEEE Transactions on Neural Networks, vol. 13, pp. 415-425, 2002.
    [24] C.-M. Huang, J. J.-C. Ying, and V. S. Tseng, "Mining Users Behaviors and Environments for Semantic Place Prediction," Proceedings of the Mobile Data Challenge by Nokia Workshop, co-located with Pervasive 2012, Newcastle, United Kingdom, 2012.
    [25] S.-J. Huang, Y. Yu, and Z.-H. Zhou, "Multi-label hypothesis reuse," Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.525-533, Beijing, China, 2012.
    [26] S.-J. Huang and Z.-H. Zhou, "Multi-label learning by exploiting label correlations locally," Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 949-955, Toronto, Canada, 2012.
    [27] M. Kubat and S. Matwin, "Addressing the Curse of Imbalanced Data Sets: One Sided Sampling," Proceedings of the 14th International Conference on Machine Learning, pp. 179-186, Nashville, Tennessee, USA, 1997.
    [28] M. Kubat and S. Matwin, "Learning When Negative Examples Abound," Proceedings of the European Conference on Machine Learning, pp. 146-153, Prague, Czech Republic, 1997.
    [29] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms: Wiley-Interscience, 2004.
    [30] L. I. Kuncheva and C. J. Whitaker, "Controlling the diversity in classifier ensembles through a measure of agreement," Pattern Recognition, vol. 38, no. 11, pp. 2195–2199, 2005.
    [31] J. K. Laurila, D. Gatica-Perez, I. Aad, J. Blom, O. Bornet, T. Do, O. Dousse, J. Eberle, and M .Miettinen, "The Mobile Data Challenge: Big Data for Mobile Computing Research," Proceedings of the Mobile Data Challenge by Nokia Workshop, co-located with Pervasive 2012, Newcastle, United Kingdom, 2012.
    [32] D. Lian and X. Xie, "Learning location naming from user check-in histories," Proceedings of ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, pp. 112-121, Chicago, IL, USA, 2011.
    [33] L. Liao, D. Fox, and H. Kautz, "Location-based activity recognition using relational markov networks," Proceedings of the International Joint Conference on Artificial Intelligence, pp. 773-778, Edinburgh, Scotland, United Kingdom, 2005.
    [34] L. Liao, D. Fox, and H. Kautz, "Extracting places and activities from GPS traces using hierarchical conditional random fields," International Journal of Robotics Research, vol. 26, pp. 119-134, 2007.
    [35] X. Y. Liu, J. X. Wu, and Z. H. Zhou, "Exploratory Undersampling for Class-Imbalance Learning," IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics, vol. 39, pp. 539-550, 2009.
    [36] H.-Y. Lo, K.-W. Chang, S.-T. Chen, T.-H. Chiang, C.-S. Ferng, C.-J. Hsieh, Y.-K. Ko, T.-T. Kuo, H.-C. Lai, K.-Y. Lin, C.-H. Wang, H.-F. Yu, C.-J. Lin, H.-T. Lin, and S.-D. Lin, "An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes," Journal of Machine Learning Research - Proceedings Track, vol. 7, pp. 57-64, 2009.
    [37] R. Montoliu, A. Martnez-Uso, and J. Martnez-Sotoca, "Semantic place prediction by combining smart binary classifiers," Proceedings of the Mobile Data Challenge by Nokia Workshop, co-located with Pervasive 2012, Newcastle, United Kingdom, 2012.
    [38] M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk, "Reducing misclassification costs," Proceedings of the International Conference on Machine Learning, pp. 217–225, New Brunswick, NJ, 1994.
    [39] T. Rattenbury, N. Good, and M. Naaman, "Towards automatic extraction of event and place semantics from flickr tags," Proceedings of ACM SIGIR Special Interest Group on Information Retrieval, pp. 103-110, Amsterdam, The Netherlands, 2007.
    [40] N. Ravi, N. Dandekar, P. Mysore, and M. L. Littman, "In Activity Recognition from Accelerometer Data," Proceedings of Association for the Advancement of Artificial Intelligence, pp. 1541-1546, Pittsburgh, Pennsylvania, USA, 2005.
    [41] J. D. Rodríguez, A. P. Martínez, and J. A. Lozano, "Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, pp. 569`575, 2010.
    [42] L. Rokach, Pattern Classification Using Ensemble Methods: World Scientific, 2010.
    [43] S. M. Ross, Introduction to probability and statistics for engineers and scientists: Elsevier Academic Press, 2009.
    [44] A. Sae-Tang, M. Catasta, and L. K. McDowell, "Report on Dedicated Task 1: Semantic Place Prediction," Proceedings of the Mobile Data Challenge by Nokia Workshop, co-located with Pervasive 2012, Newcastle, United Kingdom, 2012.
    [45] R. E. Schapire, "The Strength of Weak Learnability," Machine Learning, vol. 5, pp. 197-227, 1990.
    [46] Y. Sun, M. Kamel, and Y. Wang, "Boosting for learning multiple classes with imbalanced class distribution," Proceedings of the IEEE International Conference on Data Mining, pp. 592-602, Hong Kong, China, 2006.
    [47] J. Wu, H. Xiong, and J. Chen, "COG: local decomposition for rare class analysis," Data Mining and Knowledge Discovery, vol. 20, pp. 191-220, 2010.
    [48] G. Xu, Y. Gu, P. Dolog, Y. Zhang, and M. Kitsuregawa, "SemRec: A Semantic Enhancement Framework for Tag Based Recommendation," Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1267-1272, San Francisco, California, USA, 2011.
    [49] Z. Yan and D. Chakraborty, "SAMMPLE: Detecting Semantic Indoor Activities in Practical Settings using Locomotive Signatures," Proceedings of International Symposium on Wearable Computers, pp. 37-40, Newcastle, United Kingdom, 2012.
    [50] Y. Yang and J. Pedersen, "A comparative study on feature selection in text categorization," Proceedings of the International Conference on Machine Learning, pp. 412-420, Nashville, Tennessee, USA, 1997.
    [51] M. Ye, K. Janowicz, C. Mulligann, W.-C. Lee, "What you are is when you are: the temporal dimension of feature types in location-based social networks," Proceedings of ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, pp. 102-111, Chicago, IL, USA,2011.
    [52] M. Ye, D. Shou, W.-C. Lee, P. Yin, K. Janowicz, "On the semantic annotation of places in location-based social networks," Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 520-528, San Diego, CA, USA, 2011.
    [53] J. J.-C. Ying, W.-C. Lee, T.-C. Weng, and V. S. Tseng, "Semantic Trajectory Mining for Location Prediction," Proceedings of the ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 34-43, Chicago, IL, USA, 2011.
    [54] H.-F. Yu, H.-Y. Lo, H.-P. Hsieh, J.-K. Lou, T. G.McKenzie, J.-W. Chou, P.-H. Chung, C.-H. Ho, C.-F. Chang, Y.-H. Wei, J.-Y. Weng, E.-S. Yan, C.-W. Chang, T.-T. Kuo, Y.-C. Lo, P. T. Chang, C. Po, C.-Y. Wang, Y.-H. Huang, C.-W. Hung, Y.-X. Ruan, Y.-S. Lin, S.-d. Lin, H.-T. Lin, and C.-J. Lin, "Feature Engineering and Classifier Ensemble for KDD Cup 2010," Proceedings of the KDD Cup 2010 Workshop, pp. 1-16, Washington, DC, 2010.
    [55] J. Yuan, Y. Zheng, X. Xie, "Discovering regions of different functions in a city using human mobility and POIs," Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 186-194, Beijing, China, 2012.
    [56] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms: Chapman & Hall/CRC, 2012.
    [57] Z.-H. Zhou and X.-Y. Liu, "Training cost-sensitive neural networks with methods addressing the class imbalance problem," IEEE Transactions on Knowledge and Data Engineering, vol. 18, pp. 63-77, 2006.
    [58] Z.-H. Zhou and X.-Y. Liu, "On Multi-Class Cost-Sensitive Learning," Computational Intelligence, vol. 26, pp. 232-257, 2010.
    [59] Y. Zhu, Y. Sun, and Y. Wang, "Predicting Semantic Place and Next Place via Mobile Data," Proceedings of the Mobile Data Challenge by Nokia Workshop, co-located with Pervasive 2012, Newcastle, United Kingdom, 2012.
    [60] Y. Zhu, E. Zhong, Z. Lu, and Q. Yang, "Feature Engineering for Place Category Classification" Proceedings of the Mobile Data Challenge by Nokia Workshop, co-located with Pervasive 2012, Newcastle, United Kingdom, 2012.

    無法下載圖示 校內:2018-08-29公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE