簡易檢索 / 詳目顯示

研究生: 曾馨稹
Tzeng, Hsin-Chen
論文名稱: 利用基因演算法對Case-Based案例進行分群與屬性權重設定
Clustering and Feature Weighting Case-Based Data Sets by Genetic Algorithm
指導教授: 王惠嘉
Wang, Hei-Chia
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 46
中文關鍵詞: 案例式推理分群基因演算法屬性權重
外文關鍵詞: Case-Based Reasoning, Genetic Algorithm, Clustering, Feature Weighting
相關次數: 點閱:69下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   CBR(Case-Based Reasoning)為現今建置專家系統時被廣泛使用的技術,其特色為當目標領域缺乏固定可使用的規則與隨時可諮詢的領域專家時,系統設計者可透過所擁有的案例與資料,對其屬性特徵進行分析與整理,依其有興趣的項目結果,建立一套查詢系統。當使用者得到新案例,希望能夠由過去存在的案例中得到參考資訊時,便可使用此查詢系統經由案例比對,得到推薦的答案和相關資訊。

      然而採用傳統的CBR方式時會遇到兩個問題: (一)當現存案例庫的案例數量龐大時,案例查詢若採用逐一比對的方式,將花費相當多的時間;(二)原始案例擷取時所建立的特徵與屬性,對於該系統中案例的重要性也不盡相同,若將其重要性一視同仁全部比對,會對查詢結果的準確性與時效性有所影響。本研究將提出一建立Case-Base案例庫的架構,第一階段先經由分群技術,將所擁有案例庫中的案例,依其相似的程度進行分群,如此進行查詢時將可不用對所有案例全部比對,可以先判斷該案例的可能查詢結果是屬於哪一群,後再對該群所屬的案例進行比對;第二階段在使用基因演算法,對所有特徵屬性進行權重的設定,找出各屬性的重要度,以提升查詢時的準確度。透過上述兩階段方式,將使得此系統擁有較好的查詢速度,同時保持原有的查詢準確性。

      CBR(Case-Based Reasoning) is a common used technique of developing modern expert system. Its special feature is that system developers would be able to build query systems through analysis and organize the attributes of datasets, when they could not find domain experts or regularly consultants. With the aid of such systems, users can get relevant information and proposed answers from old data, by insert attributes of new cases they received.

      However, traditional CBR has two major problems: (1) query time expands along with the quantity of the cases stored in database. Scan through each cases for every individual query would be time consuming. (2) The importance of the attributes the cases differs. If all attributes treat the same, the accuracy of proposed answer would be affected.

      This research proposes a CBR framework of two stages. At the first stage, using clustering algorithm to separate all cases into several clusters by comparing the similarities of each cases. That the system would only have to scan through the most similar cluster of the queried case, without going through all cases of the database. Second, set up weights of each attribute for each cluster using genetic algorithm, these weights indicate the importance of the attribute. With the aid of these weights the proposed answer will be more accurate. Through these two stages, the system build after will have better performance in processing time, and keep the accuracy high as scan through all cases

    摘要 i Abstract III 目錄 IV 圖目錄 VI 表目錄 VII 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 研究範圍與限制 3 1.4 論文大綱 3 第二章 文獻探討 5 2.1 CBR (Case-Based Reasoning) 5 2.1.1 案例組織與擷取 5 2.1.2 KNN (K- Nearest Neighbor) 6 2.1.3 CBR系統的使用 6 2.2 基因演算法(Genetic Algorithm) 7 2.3 分群(Clustering) 8 2.3.1 階層式分群演算法 (Hierarchical Clustering Algorithm) 9 2.3.2 切割分群 (Partitional Clustering) 10 2.3.3 模糊分群 (Fuzzy Clustering) 10 2.3.4 類神經網路分群 ANNs(Artificial Neural Networks for Clustering) 11 2.3.5 基因演算分群 11 2.4 分群技術配合屬性權重相關研究 12 2.5 CBR系統使用屬性權重與基因演算法案例 13 第三章 研究方法 14 3.1. 研究架構 14 3.2. 基因演算法分群 16 3.3. 基因演算法屬性權重設定 18 3.4. 建立系統 20 第四章 實作驗證 22 4.1. 系統建構 22 4.2. 實驗方法與比較項目 25 4.2.1. 參數設定 25 4.2.2. 資料來源 26 4.2.3. 實驗設計與比較項目 26 4.3. 實驗結果與分析 27 4.3.1. 心臟病資料集 27 4.3.1.1 使用原始資料中分類結果屬性 27 4.3.1.2不使用原始資料中分類結果屬性 32 4.3.2. 澳大利亞金融卡資料集 35 4.3.2.1使用原始資料中分類結果屬性 35 4.3.2.2不使用原始資料中分類結果屬性 39 4.4. 參數討論 42 第五章 結論 43 參考文獻 44

    Al-Sultan, K. S., and Khan, M. M. (1996). “Computational experience on four algorithms for the hard clustering problem.” Pattern Recognition Letter, 17(3), pp. 295-308.
    Arabie, P., and Hubert, L. J. (1999). An overview of combinatorial data analysis. Clustering and Classification, World Scientific.
    Becerra-Fernandez, I., Gonzalez, A., and Sabherwal. R. (2004). Knowledge Management : Challenges, Solutions, and Technologies. Pearson, Prentice Hall.
    Boinee, P., Angelis, A. D., and Milotti, E. (2003). “Automatic Classification using Self-Organising Neural Networks in Astrophysical Experiments.” Arxiv preprint cs.NE/0307031.
    Dutta, S., Wierenga. B., and Dalebout. A. (1997). “Case-Based Reasoning Systems: From Automation to Decision-Aiding and Stimulation.” IEEE Transactions on Knowledge and Data Engineerin, 9(6), pp. 911-922.
    Garai, G., and Chaudhuri, B. B. (2004). “A novel genetic algorithm for automatic clustering.” Pattern Recognition Letters, 25, pp. 173-187.
    Hartigan, J. A. (1975). Clustering Algorithms. Wiley.
    Huang, J. Z., Ng, M. K., Rong, H., and Li, Z. (2005). “Automated Variable Weighting in k-Mean Type Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), pp. 657-668.
    Jain, A. K., And Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice-Hall advanced reference series. Prentice-Hall, Inc.
    Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). “Data Clustering : A Review.” ACM Computing Surveys, 31(3), pp. 265-323.
    Juell, P., and Paulson, P. (2003). “Using Reinforcement Learning for Similarity Assessment in Case-Based Systems.” IEEE Intelligent System, 18(4). pp. 60 - 67
    Kantardzic, M., (2003). Data Mining : Concepts, Models, Methods, and Algorithms. Wiley Inter-Science.
    Kim, K. J. (2004). “Toward Global Optimization of Case-Based Reasoning Systems for Financial Forecasting.” Applied Intelligence, 21, pp. 239-249.
    Kolodner, J. L. (1993). Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann.
    Liao, T. W. (2004). “An investigation of a hybrid CBR method for failure mechanisms identification.” Engineering Applications of Artificial Intelligence, 17, pp. 123–134.
    Makarenkov, M., and Legendre, P. (2001). “Optimal Variable Weighting for Ultrametric and Additive Trees and K-Means Partitioning : Methods and Software.” Journal of Classification, 18, pp. 245-271.
    Marcelloni, F. (2003). “Feature selection based on a modified fuzzy C-mean algorithm with supervision.” Information Sciences. 151. pp 201-226.
    Modha, D. S., and Spangler, W. S. (2003). “Feature Weighting in k-Mean Clustering.” Machine Learning, 52, pp. 217-237.
    Oh, I. S., Lee, J. S., & Moon, B. R. (2004). “Hybrid Genetic algorithms for Feature Selection.” IEEE Transaction on Pattern Analysis and Machine Intelligence, 26(11), pp. 1424-1437.
    Pal, N. R., Bezdek, J. C., and Tsao, E. C.-K. (1993). “General Clustering networks and Kohonen’s self-organization scheme.” IEEE Transaction on Neural Network, 4, pp. 549-557.
    Park, C. S., and Han. I. (2002). “A case-based reasoning with the feature weights derived bye analytic hierarchy process for bankruptcy prediction.” Expert Systems with Applicatio,. 23(3), pp. 255-264
    Park, J. H., Im, K. H., Shin, C. K., and Park, S. C. (2004). “MBNR: Case-Based Reasoning with Local Feature Weighting by Neural Network.” Applied Intelligence, 21, pp. 265-276.
    Shin, K. S., and Han, I. (1999). “Case-Based reasoning supported by genetic algorithms for corporate and rating.” Expert Systems with Applications, 16, pp. 85-95.
    Shiu, S. C. K., and Pal, S. K. (2004). “Case-Based Reasoning : Concepts, Features and Soft Computing.” Applied Intelligence , 21, pp. 233-238.
    Wettschereck, D., Aha, D.W., and Mohri, T. (1997). “A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms.” Artificial Intelligence Review, 11, pp. 273–314.
    Yang, Y. (1999), “An evaluation of statistical approaches to text categorization.” Journal of Information Retrieval,1(2), pp.67-88.
    Zhang, Z., and Yang, Q. (2001). “Feature weighting Maintenance in Case Based Using Introspective Learning.” Journal of Intelligent Information System, 16(2), pp. 95-115.

    下載圖示 校內:2007-06-27公開
    校外:2007-06-27公開
    QR CODE