簡易檢索 / 詳目顯示

研究生: 郭家豪
Kuo, Chia-Hao
論文名稱: 使用叢集技術結合基因演算法改善軟體瑕疵預測
Using Clustering Techniques with Genetic Algorithm to Improve Software Defect Prediction
指導教授: 朱治平
Chu, Chih-Ping
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 55
中文關鍵詞: 軟體瑕疵預測叢集技術基因演算法
外文關鍵詞: Defect prediction, Clustering technology, Generic algorithm
相關次數: 點閱:122下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 軟體瑕疵不但會影響到軟體的品質更會衍生出額外的成本甚至可能影響到軟體專案的成功。所以若可以預先發現可能產生瑕疵並且避免瑕疵的產生,就可以提高軟體的品質並且減少不必要的成本產生而影響軟體專案開發。故預測軟體瑕疵是一項重要的活動,但要確定的行動可能引起缺陷之前執行是困難的。為了解決這一問題,本研究提出一種叢集技術結合基因演算法瑕疵預測方法(CBDPGA),其作法是改良自關聯規則的瑕疵預測(ARDP)與行動導向之瑕疵預測(ABDP)。基因演算法源自於自然界中"物競天擇,適者生存" 的特性,模擬生物間的競爭,倖存者得以繁衍下一代的觀念。以隨機方式同時產生多組解,較佳的解將被留下運算,如此多次疊代即可求出最佳解。使用此特性來調整叢集演算法中塑模的最佳參數。同時分析已執行過的軟體發展行動記錄與瑕疵報告來建立瑕疵預測模型。此模型可以運用來預測將要執行的軟體發展行動是否會產生瑕疵。

    為了更進一步論證本研究所提出方式之效能與有效性,本研究運用於一個蒐集自商業專案的資料,我們不僅使用不同的參數(K)對於K-means叢集演算法和K-means可移除雜訊方法構建了不同的預測模型,同時也與先前研究所提出之基於關聯式規則之瑕疵預測(ARDP)方法做比較。實驗結果顯示本研究之方法有較優異之預測性能及有效性。

    Software defects will not only affect the quality of software but also will rise additional costs or even may affect the success of software projects. So if we can find possible defects in advance and then we can avoid generating defects, which can improve software quality and reduce unnecessary costs and the negative impact on software project development. Predicting software defects is an important activity, but predicting defects before the action executed is rather difficult. To solve this problem, this study proposes a defect prediction method. The main approach is adapted from the association rules defect prediction(ARDP) and Action-Based Defect Prediction(ABDP), mainly clustering-based with genetic algorithm(CBDPGA). Genetic algorithm is derived from nature "Nature selects, the fittest survives" characteristics, with the ideas of simulating biological competition between the survivors to reproduce the next generation. Multiple solutions are randomly generated simultaneously and a better solution would be to stay operational. After many iterations, the optimal solution will be found. By applying this characteristics to regulate the parameters of clustering algorithm, we can effective utilize and analyze historical records of action and reported defects to construct defect prediction model. This model can be applied to predict whether the actions are likely to cause defects.

    To demonstrate the effectiveness of our approach, dataset from to a business project is to verify. We not only used different parameters (K) for K-means clustering algorithm and outlier-removed K-means approach to build prediction models, but also compared with the previously proposed approach ARDP. Our approach shows better prediction than the ARDP approach.

    論文口試委員審定書(中文) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 論文口試委員審定書(英文) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract(in Chinese) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract(in English) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Software Defects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Defect Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Defect Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3 Benefit of Defect Removal . . . . . . . . . . . . . . . . . . . . . 7 2.2 Clustering Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Hierarchical Clustering Algorithms . . . . . . . . . . . . . . . . 9 2.2.2 Partition Clustering Algorithms . . . . . . . . . . . . . . . . . . 10 2.2.3 Clustering algorithm for mixed categorical and numeric attributes 13 2.3 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.3 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.4 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 The CBDPGA Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1 Action Defect Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Action Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 The Experimental Results and Discussion . . . . . . . . . . . . . . . . . . . . 34 4.1 The Evaluation method . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.1.2 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.1.3 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.1.4 Specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.1.5 False alarm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1.6 F-Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 The results of experiment . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2.1 The results of K-means clustering algorithm . . . . . . . . . . . 38 4.2.2 The results of outlier-removed K-means approach . . . . . . . . 38 4.2.3 The results of CBDPGA approach . . . . . . . . . . . . . . . . 38 4.2.4 The result of ARDP approach . . . . . . . . . . . . . . . . . . . 45 4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5 Conclusions and Future work . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    [1] C.-P. Chang, C.-P. Chu, and Y.-F. Yeh, Integrating in-process software defect prediction with association mining to discover defect pattern," Inf. Softw. Tech-nol., vol. 51, pp. 375-384, February 2009.
    [2] C.-P. Chang and C.-P. Chu, Defect prevention in software processes: An action-based approach," J. Syst. Softw., vol. 80, pp. 559-570, April 2007.
    [3] R. S. Pressman, Software Engineering: A Practitioner's Approach. McGraw-Hill Higher Education, 5th ed., 2001.
    [4] K. E. Emam and O. Laitenberger, Evaluating capture-recapture models with two inspectors," IEEE Transactions on Software Engineering, vol. 27, pp. 851-864, 2001.
    [5] M. C. Ohlsson and A. A. Andrews, Modelling fault-proneness statistically over a sequence of releases: A case study," in Journal of Software Maintenance and Evolution: Research and Practice, Volume 13, pp. 167-199, 2001.
    [6] S. Bibi, G. Tsoumakas, I. Stamelos, and I. Vlahavas, Regression via classi cation applied on software defect estimation," Expert Syst. Appl., vol. 34, pp. 2091-2101, April 2008.
    [7] G. Kenny, Estimating defects in commercial software during operational use," Reliability, IEEE Transactions on, vol. 42, pp. 107-115, Mar. 1993.
    [8] N. Fenton and M. Neil, A critique of software defect prediction models," Software Engineering, IEEE Transactions on, vol. 25, no. 5, pp. 675-689, 1999.
    [9] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise," in Proc. of 2nd Inter-
    national Conference on Knowledge Discovery and, pp. 226-231, 1996.
    [10] B. Boehm and L. G. Huang, Value-based software engineering: a case study," Computer, vol. 36, pp. 33-41, mar 2003.
    [11] C. L. Jones, A process-integrated approach to defect prevention," IBM Systems Journal, vol. 24, no. 2, pp. 150-167, 1985.
    [12] C. Wohlin and P. Runeson, Defect content estimations from review data," in Software Engineering, 1998. Proceedings of the 1998 International Conference on, pp.400-409, apr 1998.
    [13] A. M. Salem, K. Rekab, and J. A. Whittaker, Prediction of software failures through logistic regressio" ,Information & Software Technology, vol. 46, no. 12,
    pp. 781-789, 2004.
    [14] N. F. Schneidewind, Methodology for validating software metrics," IEEE Transactions on Software Engineering, vol. 18, pp. 410-422, 1992.
    [15] C.-P. Chang, J.-L. Lv, and C.-P. Chu, A defect estimation approach for sequential inspection using a modi ed capture-recapture model," in COMPSAC (1), pp. 41-46, 2005.
    [16] M. Li and C. Smidts, A ranking of software engineering measures based on expert opinion," IEEE Transactions on Software Engineering, vol. 29, pp. 811-824, 2003.
    [17] M. Pighin and R. Zamolo, A predictive metric based on discriminant statistical analysis," in ICSE, pp. 262-270, 1997.
    [18] D. Hovemeyer and W. Pugh, Finding bugs is easy," in ACM SIGPLAN Notices, pp. 132-136, ACM Press, 2004.
    [19] Galin.D, Software Quality Assurance: From Theory to Implementation. Pearson Education, 2004.
    [20] C.-C. Hsu, Generalizing self-organizing map for categorical data," Neural Networks, IEEE Transactions on, vol. 17, no. 2, pp. 294-304, 2006.
    [21] C.-C. Hsu and S.-H. Wang, An integrated framework for visualized and exploratory pattern discovery in mixed data," Knowledge and Data Engineering, IEEE Transactions on, vol. 18, no. 2, pp. 161-173, 2006.
    [22] C. Li and G. Biswas, Unsupervised learning with mixed numeric and nominal data," IEEE Transactions on Knowledge and Data Engineering, vol. 14, pp. 673-690, 2002.
    [23] A. K. Jain and R. C. Dubes, Algorithms for clustering data. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1988.
    [24] W. B. Frakes and R. A. Baeza-Yates, Information Retrieval: Data Structures & Algorithms. Prentice-Hall, 1992.
    [25] V. Rus, X. Nan, S. G. Shiva, and Y. Chen, Clustering of defect reports using graph partitioning algorithms," in In Proceedings of the 20th International Conference on Software and Knowledge Engineering, 2009.
    [26] V. Hautamaki, S. Cherednichenko, I. Karkkainen, T. Kinnunen, and P. Franti, Improving k-means by outlier removal," in SCIA, pp. 978-987, 2005.
    [27] H. Beck, T. Anwar, and S. Navathe, A conceptual clustering algorithm for database schema design," Knowledge and Data Engineering, IEEE Transactions on, vol. 6, pp. 396-411, jun 1994.
    [28] M.-Y. Shih, J.-W. Jheng, and L.-F. Lai, A two step for clustering mixed categorical and numeric data," Tamkang Journal of Science and Engineering, vol. 13, pp. 11-19, 2010.
    [29] J. Holland, Adaptation in natural and artificial systems. University of Michigan Press, 1975.
    [30] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. The Morgan Kaufmann Series in Data Management Systems, San Francisco, CA: Morgan Kaufmann Publishers, 2nd ed., 2005.

    下載圖示 校內:2016-08-24公開
    校外:2016-08-24公開
    QR CODE