| 研究生: |
邱哲夫 Chiu, Che-Fu |
|---|---|
| 論文名稱: |
結合動態分群人造蜂群演算法之自動化分群系統 An automatic clustering system with dynamic clustering artificial bee colony algorithm |
| 指導教授: |
王惠嘉
Wang, Hei-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 中文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | 自動化分群 、啟發式分群 、人造蜂群演算法 、模範策略 |
| 外文關鍵詞: | Automatic clustering, Metaheuristic clustering, Artificial Bee Colony algorithm, Model Strategy |
| 相關次數: | 點閱:93 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
分群是資料探勘中相當重要的一個技術,它是一種非監督式的學習方法。經過資料彼此間的相似度計算,再根據相似度將資料分配給各群,使得群內具有高相似度,跨群的資料則具有低相似度。在眾多的分群方法中,啟發式分群在近年來漸漸受到重視,啟發式分群指的是運用啟發式演算法或是啟發式的概念解決分群問題,以分群的技術類型來說,它屬於分割式分群的一種,但相較於著名的分割式分群演算法-k-means演算法,啟發式分群顯得更加穩定亦更具有效能。
一般的分群演算法在實作的時候,通常需要使用者給定群數這個先備知識,但多數的資料集空間維度過大,高維度使得資料難以觀察,想要給定群數這個資訊相對困難,因此,發展一個可以自動化決定群數的分群演算法,已經成為一個重要的議題。鑑於啟發式分群在分群問題的成功,開始有學者嘗試設計可以自動化分群的啟發式分群演算法,像是衍生自基因演算法(Genetic algorithm, GA)的GCUK(Genetic clustering for unknown K),以及衍生自粒子群最佳化演算法(Particle swarm optimization, PSO)的MEPSO(Multi-Elitist PSO)。上述這兩種方法雖然可以成功的自動化分群,但卻有著效率不佳的問題,其原因有二:(1)編碼格式設計不佳(2)選用之啟發式演算法不適合分群問題或能力不足。
因此,本研究提出一個可以自動化決定群數的自動化分群系統(Automatic Clustering System, ACS),此系統先使用群數搜尋演算法(Cluster Range Discovery algorithm, CRD)縮減欲搜尋的群數區間,再使用動態分群人造蜂群演算法(Dynamic Clustering Artificial Bee Colony algorithm, DCABC)進行自動化分群,DCABC加入了模範策略(Model Strategy)以克服既有人造蜂群演算法(Artificial Bee Colony algorithm, ABC)的缺點,並透過特別設計的編碼格式,使其可以在分群的時候同時達成決定群數和優化分群品質的功能。
最後,本研究將在實驗的部分驗證DCABC和ACS的效能。
Clustering is an important technology in data mining. It is an unsupervised learning method. Clustering algorithm partitions input data into each cluster through similarity measure, where the data within a cluster are similar and the data in different clusters are dissimilar. Among many clustering algorithms, Metaheuristic clustering algorithms becomes popular recently in solving clustering problems. In different types of clustering methods, Metaheuristic clustering is a partition clustering method. To compare with the most well-known partition clustering algorithm-k-means algorithm, Metaheuristic clustering seems to have more stable and better performances.
General clustering algorithms require users to provide prior knowledge of the dataset (e.g. the number of clusters). However, most datasets have high-dimensional, which makes the data hard to be observed, and thus the information (prior knowledge) is not easy to obtain. Therefore, it is important to develop a clustering algorithm which can automatically cluster the data and determine the number of clusters. With the success of Metaheuristic clustering, some researchers tend to design an automatic clustering algorithm with Metaheuristic method. For example, Genetic clustering for unknown K (GCUK) is based on Genetic algorithm (GA); and Multi-Elitist particle swarm optimization (MEPSO) is based on Particle swarm optimization (PSO). Though the two algorithms can successfully automatic clustering, their performances are not very good. Here are two possible reasons: (1) Encoding design is improper (2) The selected Metaheuristic algorithm does not suit the clustering problem or it lacks capability.
This study proposes an Automatic Clustering System(ACS), which can automatically determine the number of clusters. First, ACS uses a Cluster Range Discovery algorithm(CRD) to reduce the search range of cluster number. Then, ACS uses Dynamic Clustering Artificial Bee Colony algorithm(DCABC) to complete the automatic clustering. DCABC adopts the Model Strategy to overcome the drawback of original Artificial bee colony algorithm (ABC). DCABC also designs a brand-new encoding format. Combining this encoding format, DCABC can cluster the data and find the number of clusters simultaneously. Finally, the performance of the DCABC was examined in the study.
Akay, B., & Karaboga, D. (2012). A modified Artificial Bee Colony algorithm for real-parameter optimization. Information Sciences, 192(0), 120-142.
Akbari, R., Hedayatzadeh, R., Ziarati, K., & Hassanizadeh, B. (2012). A multi-objective artificial bee colony algorithm. Swarm and Evolutionary Computation, 2(0), 39-52.
Arbelaitz, Olatz, Gurrutxaga, Ibai, Muguerza, Javier, Pérez, Jesús M., & Perona, Iñigo. (2013). An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1), 243-256.
Bandyopadhyay, S., & Maulik, U. (2002). Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognition, 35(6), 1197-1208.
Chatterjee, S., Carrera, C., & Lynch, L. A. (1996). Genetic algorithms and traveling salesman problems. European Journal of Operational Research, 93(3), 490-510.
Chou, C.-H., Su, M.-C., & Lai, E. (2004). A new cluster validity measure and its application to image compression. Pattern Anal. Appl., 7(2), 205-220.
Chowdhury, A., & Das, S. (2012). Automatic shape independent clustering inspired by ant dynamics. Swarm and Evolutionary Computation, 3(0), 33-45.
Cowgill, M. C., Harvey, R. J., & Watson, L. T. (1999). A genetic algorithm approach to cluster analysis. Computers & Mathematics with Applications, 37(7), 99-108.
Das, S., Abraham, A., & Konar, A. (2008). Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm. Pattern Recognition Letters, 29(5), 688-699.
Das, S., Abraham, A., & Konar, A. (2009). Metaheuristic Clustering: Springer.
Davies, David L., & Bouldin, Donald W. (1979). A Cluster Separation Measure. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-1(2), 224-227.
Erisoglu, M., Calis, N., & Sakallioglu, S. (2011). A new algorithm for initial cluster centers in k-means algorithm. Pattern Recognition Letters, 32(14), 1701-1705.
Güngör, Z., & Ünler, A. (2008). K-Harmonic means data clustering with tabu-search method. Applied Mathematical Modelling, 32(6), 1115-1125.
Geem, Z. W., Kim, J. H., & Loganathan, G. V. (2001). A new heuristic optimization algorithm: Harmony search. Simulation, 76(2), 60-68.
Glover, F. (1986). Future paths for integer programming and links to artificial intelligence. Computers & Operations Research, 13(5), 533-549.
Handl, J., & Knowles, J. (2007). An Evolutionary Approach to Multiobjective Clustering. Evolutionary Computation, IEEE Transactions on, 11(1), 56-76.
Hasan, M. A., Chaoji, V., Salem, S., & Zaki, M. J. (2009). Robust partitional clustering by outlier and density insensitive seeding. Pattern Recognition Letters, 30(11), 994-1002.
Holland, J. H. (1975). Adaptation in Natural and Artificial Systems: University of Michigan Press.
Hubert, Lawrence, & Arabie, Phipps. (1985). Comparing partitions. Journal of Classification, 2(1), 193-218.
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Comput. Surv., 31(3), 264-323.
Ji, J., Pang, W., Zhou, C., Han, X., & Wang, Z. (2012). A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowledge-Based Systems, 30(0), 129-135.
Karaboga, D. (2005). An idea based on honey bee swarm for numerical optimization: Technical Report-TR06, Erciyes University, Engineering Faculty, Computer Engineering Department.
Karaboga, D., & Akay, B. (2009). A comparative study of Artificial Bee Colony algorithm. Applied Mathematics and Computation, 214(1), 108-132.
Karaboga, D., & Ozturk, C. (2011). A novel clustering approach: Artificial Bee Colony (ABC) algorithm. Applied Soft Computing, 11(1), 652-657.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Paper presented at the Neural Networks, 1995. Proceedings., IEEE International Conference on, Perth, WA, Australia.
Laszlo, M., & Mukherjee, S. (2007). A genetic algorithm that exchanges neighboring centers for k-means clustering. Pattern Recognition Letters, 28(16), 2359-2366.
Lee, K. S., & Geem, Z. W. (2005). A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Computer Methods in Applied Mechanics and Engineering, 194(36–38), 3902-3933.
Li, G., Niu, P., & Xiao, X. (2012). Development and investigation of efficient artificial bee colony algorithm for numerical function optimization. Applied Soft Computing, 12(1), 320-332.
Liao, S.-H., Chu, P.-H., & Hsiao, P.-Y. (2012). Data mining techniques and applications – A decade review from 2000 to 2011. Expert Systems with Applications, 39(12), 11303-11311.
Likas, A., Vlassis, N., & J. Verbeek, J. (2003). The global k-means clustering algorithm. Pattern Recognition, 36(2), 451-461.
Liu, M., Jiang, X., & Kot, A. C. (2009). A multi-prototype clustering algorithm. Pattern Recognition, 42(5), 689-698.
Mahajan, M., Nimbhorkar, P., & Varadarajan, K. (2012). The planar k-means problem is NP-hard. Theoretical Computer Science, 442(0), 13-21.
Mahdavi, M., Chehreghani, M. H., Abolhassani, H., & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. Applied Mathematics and Computation, 201(1–2), 441-451.
Niknam, T., & Amiri, B. (2010). An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis. Applied Soft Computing, 10(1), 183-197.
Oftadeh, R., Mahjoob, M. J., & Shariatpanahi, M. (2010). A novel meta-heuristic optimization algorithm inspired by group hunting of animals: Hunting search. Computers & Mathematics with Applications, 60(7), 2087-2098.
Omran, M. G. H., Salman, A., & Engelbrecht, A. P. (2006). Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Analysis and Applications, 8(4), 332-344.
Osman, I., & Laporte, G. (1996). Metaheuristics: A bibliography. Annals of Operations Research, 63(5), 511-623.
Rashedi, E., Nezamabadi-pour, H., & Saryazdi, S. (2009). GSA: A Gravitational Search Algorithm. Information Sciences, 179(13), 2232-2248.
Ren, Yan, Liu, Xiaodong, & Liu, Wanquan. (2012). DBCAMM: A novel density based clustering algorithm via using the Mahalanobis metric. Applied Soft Computing, 12(5), 1542-1554.
Rousseeuw, Peter J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(0), 53-65.
Senthilnath, J., Omkar, S. N., & Mani, V. (2011). Clustering using firefly algorithm: Performance study. Swarm and Evolutionary Computation, 1(3), 164-171.
Simpson, C. W., & Prusak, L. (1995). Troubles with information overload—Moving from quantity to quality in information provision. International Journal of Information Management, 15(6), 413-425.
Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques. Paper presented at the KDD Workshop on Text Mining.
Su, Z.-g., Wang, P.-h., Shen, J., Li, Y.-g., Zhang, Y.-f., & Hu, E.-j. (2012). Automatic fuzzy partitioning approach using Variable string length Artificial Bee Colony (VABC) algorithm. Applied Soft Computing, 12(11), 3421-3441.
Szeto, W. Y., Wu, Y., & Ho, S. C. (2011). An artificial bee colony algorithm for the capacitated vehicle routing problem. European Journal of Operational Research, 215(1), 126-135.
Tan, Pang-Ning, Steinbach, Michael, & Kumar, Vipin. (2006). Introduction to Data Mining: Addison Wesley.
Tan, S.C., Ting, K.M., & Teng, S.W. (2011). A general stochastic clustering method for automatic cluster discovery. Pattern Recognition, 44(10–11), 2786-2799.
UCI Machine Learning Repository. from http://archive.ics.uci.edu/ml/index.html
Yan, Xiaohui, Zhu, Yunlong, Zou, Wenping, & Wang, Liang. (2012). A new approach for data clustering using hybrid artificial bee colony algorithm. Neurocomputing, 97(0), 241-250.
Zhang, C., Ouyang, D., & Ning, J. (2010). An artificial bee colony approach for clustering. Expert Systems with Applications, 37(7), 4761-4767.
Zhang, L., & Cao, Q. (2011). A novel ant-based clustering algorithm using the kernel method. Information Sciences, 181(20), 4658-4672.
校內:2018-08-12公開