簡易檢索 / 詳目顯示

研究生: 張恩耀
Chang, En-Yao
論文名稱: 用多樣性測度評估集成模型正確率之研究
A Study on Estimating the Accuracies of Ensemble Models by Diversity Metrics
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 107
中文關鍵詞: 集成模型基本模型多樣性測度分類正確率
外文關鍵詞: accuracy, base model, diversity, ensemble model
相關次數: 點閱:132下載:27
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 集成模型透過整合數個預測能力相異的模型,以提高預測正確率,進而提升 決策者對於資料類別的判定能力。直觀上,參與集成模型整合的基本模型越趨於獨 立,經過整合後,所得到集成模型,預測正確率越高。然而,在現存集成模型研究 中,此概念並非完全正確。因此,研究者進而探討基本模型的兩個面向:基本模型 個別預測正確率、基本模型彼此間關係與集成模型預測正確率之間的關係,期望可 以透過基本模型的兩個面向,推算出集成模型預測正確率。惟這十多年來研究,始 終找不到基本模型與集成模型間關係,僅能夠找到攜帶著不確定性的關係。是故, 本研究將嘗試從多樣性測度所提供的訊息數量角度,分析基本模型個別預測正確率 與現存的基本模型彼此間關係所能夠提供的資訊,是否真的足以推算出集成模型預 測正確率,並試圖找出足以推算出集成模型預測正確率的資訊,並將這些資訊取代 過去基本模型個別預測正確率、基本模型彼此間關係所扮演的角色,以推算集成模 型預測正確率,並稱其為基本資訊。本研究最終提出一個敏感度較高,數值變動相 對較為穩定的多樣性,以測量基本模型間的關係,進而取得一個較為穩定、顯著的 基本模型與集成模型間的關係。並以此多樣性測度作為依據,建立一個具有低預測 成本,且可達到一定預測水準的集成模型。

    Ensemble algorithms induce several classification models, called base models, from a data set to enhance prediction accuracy. Intuitively, the more independent the base models are, the more accurate the ensemble model should be. This argument had been shown to be not entirely correct. Hence, researchers began to explore the relationship between the accuracy of an ensemble model and two characteristics of its base models: the diversity among their predictions and their individual accuracies. None of past studies can provide a clear result in specifying the relationship. This thesis attempts to find the information carried by the two characteristics of base models for explaining the reason why the relationship cannot be clearly identified. The analytical results show that the diversity among bases models and the accuracy of each base models cannot provide enough information to derive the accuracy of their corresponding ensemble model. A novel diversity metric is thus proposed to measure the agreement among the predictions of base models. The experimental results on fifteen data sets demonstrate that our diversity metric can be applied with a relatively lower cost in finding an ensemble model with competitive accuracy.

    摘要 I 致謝 V 第一章 緒論 1 1.1 研究背景、動機 1 1.2 研究目的 3 第二章 文獻回顧 5 2.1 集成模型 5 2.2 集成模型影響因素 7 2.3 集成模型與多樣性測度關係探討 17 2.4 小結 21 第三章 集成模型相關性測度分析 22 3.1 基本概念與基本性質推導 22 3.2 現存基本模型間關係指標的探討 37 3.3 小結 67 第四章 實證研究 68 4.1 資料檔特性 69 4.2 實驗結果 70 4.3 小結 98 第五章 結論與建議 100 5.1 結論 100 5.2 未來發展與建議 101 參考文獻 102

    Banfield, R. E., Hall, L. O., Bowyer, K. W., and Kegelmeyer, W. P. (2005). Ensemble diversity measures and their application to thinning. Information Fusion, 6(1), 49–62.

    Bartlett, P., Freund, Y., Lee, W. S., and Schapire, R. E. (1998). Boosting the margin: a new explanation for the effectiveness of voting methods. Annals of Statistics. 26(5). 1651-1686.

    Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

    Breiman, L. (2001). Random forests, Machine Learning, 45(1), 5–32.

    Butler, H. K., Friend, M. A, Jr , K. W B., and Bihl, T. J. (2018). The effectiveness of using diversity to select multiple classifier systems with varying classification thresholds. Journal of Algorithms & Computational Technology, 12(3), 187–199.

    Cunningham, P. and Carney, J. (2000). Diversity versus quality in classification ensembles based on feature selection. Machine Learning: ECML 2000, Lecture Notes in Computer Science, 1810, 109-116, Springer, Berlin, Heidelberg.

    Dai, Qun, Ye, Rui, and Liu, Zhuan. (2017). Considering diversity and accuracy simultaneously for ensemble pruning. Applied Soft Computing 58. 75–91.

    Didaci, L., Fumera, G., and Roli, F. (2013). Diversity in classifier ensembles: fertile concept or dead end? Proceedings of the Multiple Classifier Systems Workshop, 37-48. Springer, Berlin, Heidelberg.

    Dietterich, T. G. (2000). Ensemble methods in machine learning. Proceedings of the Multiple Classifier Systems Workshop, 1–15.

    Fleiss, J. (1981). Statistical methods for rates and proportions. Wiley, New York.

    Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256-285.

    Giacinto, G. and Roli, F. (2001). Design of effective neural network ensembles for image classification processes. Image Vision and Computing Journal, 19, 699–707.

    Guo, H. P., Liu, H. B. , Li, R. , Wu, C. A. , Guo, Y. B., and Xu, M. L. (2018). Margin & diversity based ordering ensemble pruning. Neurocomputing. 275. 237–246.

    Hansen, L., and Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993–1001.

    Hsu, K. W. (2017). Controlling the diversity in classifier ensembles through a measure of agreement. Computational Intelligence and Neuroscience Article ID 1930702.

    Ho, T. (1998). The random space method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.

    Kohavi, R., and Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. Proceedings of the Machine Learning 13th International Conference. 275–283. Morgan Kaufmann.

    Kuncheva, L. I., Whitaker, C. J., and Shipp, C. A. (2000) Is independence good for combining classifies? Proceedings of the 15th International Conference on Pattern Recognition, 1051-4651. Barcelona, Spain: IEEE Computer Society.

    Kuncheva, L. I. (2003). That elusive diversity in classifier ensembles. Pattern Recognition and Image Analysis. 1126-1138. Springer, Berlin, Heidelberg.

    Kuncheva, L. I., and Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning. 51. 181–207.

    Kuncheva, L. I. (2013). A bound on kappa-error diagrams for analysis of classifier ensembles. IEEE Transactions on Knowledge and Data Engineering, 25(3), 494-501.

    Liu, Y., and Yao, X. (1999). Ensemble learning via negative correlation. Neural Networks, 12(10), 1399– 1404.

    Mordechai, Gal-Or, Jerrold, H., May, and William, E. Spangler. (2005). Assessing the predictive accuracy of diversity measures with domain-dependent, asymmetric misclassification costs. Information Fusion, 6(1), 37–48.

    Partridge, D. and Krzanowski, W. J. (1997). Software diversity: Practical statistics for its measurement and exploitation. Information & Software Technology, 39(10), 707–717.

    Quost, B., Masson, Marie-Hélène, and Denoeux, T. (2011). Classifier fusion in the Dempster–Shafer framework using optimized t-norm based combination rules. International Journal of Approximate Reasoning, 52(3), 353-374.

    Rätsch, G., Onoda, T., and Müller, K. (2004). Soft Margins for AdaBoost. Machine Learning, 42, 287–320.

    Shipp, C. A. and Kuncheva, L. I. (2002). Relationships between combination methods and measures of diversity in combining classifiers. Information Fusion, 3(2). 135–148.

    Skalak, D. (1996). The sources of increased accuracy for two proposed boosting algorithms. Proceedings of the AAAI-96 Workshop on Integrating Multiple Learned Models, 120–125. The AAAI Press, Menlo Park, California.

    Sneath, P. and Sokal, R. (1973). Numerical Taxonomy. Freeman. San Francisco.

    Tang, E. K., Suganthan, P. N., and Yao, X. (2006). An analysis of diversity measures. Machine Learning, 65(1), 247-271.

    Taleb Zouggar, S. and Adla, A. (2019). A diversity- accuracy measure for homogenous ensemble selection. International Journal of Interactive Multimedia and Artificial Intelligence, 5(5). 63-70.

    Valdovinos, R. M., Sánchez, J. S., and Gasca, E. (2007). Influence of resampling and weighting on diversity and accuracy of classifier ensembles. Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, 4478, 250-257. Springer, Berlin, Heidelberg.

    Windeatt, T. (2005). Diversity measures for multiple classifier system analysis and design. Information Fusion, 6(1), 21–36.

    Yin , X. C., Huang , K. Z., Yang C., and Hao, H. W.. (2014). Convex ensemble learning with sparsity and diversity. Information Fusion, 20(1), 49–59.

    Yule, G. (1900). On the association of attributes in statistics: with illustrations from the material of the childhood society, &c. Philosophical Transactions, 194, 257– 319.

    Zhang, D., Chen, S., Zhou, Z., and Yang, Q. (2008). Constraint projections for ensemble learning. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 758–763. The AAAI Press, Menlo Park, California.

    Zhou, Z., Wu, J., and Tang, W. (2002). Ensembling neural networks: many could be better than all. Artificial Intelligent, 137, 239–263.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE