| 研究生: |
劉其瑋 Liu, Chi-Wei |
|---|---|
| 論文名稱: |
質譜資料前處理中基底線修正與波峰校準之新方法 Novel Baseline Correction and Peak Alignment Methods for Mass Spectrometry Data Preprocessing |
| 指導教授: |
曾新穆
Tseng, Shin-Mu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 中文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 波峰校準 、基底線修正 、質譜資料前處理 、資料探勘 、蛋白質譜 |
| 外文關鍵詞: | baseline correction, MS data preprocessing, data mining, mass spectrometry, peak alignment |
| 相關次數: | 點閱:90 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
質譜分析在蛋白質體學研究中是重要的技術之一,而質譜資料前處理過程中又以基底線修正及波峰校準處理更是影響最後分析結果品質的關鍵。在目前已有的研究方法中,往往基底線修正的失真度與波峰校準的雜訊敏感度皆過高。因此,在本研究中,我們分別提出改進方法。在基底線修正的處理上,我們結合凸包(convex hull)演算法和LOESS迴歸法的優點找出更精準的質譜基底線,如此便能提升質譜訊號的品質。另一方面,由於目前已有的波峰校準方法無法找出雜訊位置,因此做出來的校對結果容易受到雜訊影響,所以我們提出了一個新的波峰校準演算法TPC (Two-Phases Clustering),利用此演算法,我們可以有效地從含有雜訊的波峰集中,把潛在雜訊從中篩選出來,進而提升質譜波峰資料間校對的正確性。在實驗部份,我們使用真實資料與人造資料來測試效能。在真實資料的實驗結果中,其效能評估比之前的方法還要好,而在人造資料的實驗中,我們所提出的方法可以更精確的找出實驗預藏的潛在雜訊,並且其涵蓋率(Recall)、精確率(Precision)以及F-measure值都很高。由實驗結果來看,我們提出的方法的確比目前已有的分析法有更佳的正確性。
In most proteomic studies, Mass spectrometry (MS) data analysis has become an important protein identification technique. The “baseline correction” and “peak alignment” methods are the key factors in MS data preprocessing stage for further analysis. However, the existing baseline correction methods may cause the distortion for original peak signals. And the existing peak alignment methods may be sensitive to noise peaks across various MS samples. In this study, we proposed two novel algorithms for these two key factors. We combined Convex Hull algorithm and LOESS regression method to find a better baseline for a MS data. It can successfully correct each MS peak profile and the result is more similar to original profile than the existing methods do. In the existing peak alignment methods, no studies have ever tried to point out the inconsistent peaks across various MS samples. We also proposed a new TPC (Two-phases clustering) algorithm to align multiple MS samples while the potential noise peaks could be indexed. In our experiments, we used real MS datasets and also generated synthetic datasets to evaluate the accuracy of peak alignment method. The results show that our method is better than previous method.
[1] R. Aebersold, and M. Mann, “Mass spectrometry-based proteomics”. Nature, 422, 198–207, 2003
[2] K. A. Baggerly, J. S. Morris, J. Wang, D. Gold, L. C. Xiao, and K. R. Coombes, “A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples,” Proteomics, vol. 3, pp. 1667-72, 2003.
[3] E. J. Breen, F. G. Hopwood, K. L. Williams, and M. R. Wilkins. “Automatic poisson peak harvesting for high throughput protein identification,” Electrophoresis, 21:2243–2251,2000.
[4] TP. Conrads, VA. Fusaro, S. Ross, D. Johann, V. Rajapakse, BA. Hitt, SM. Strinberg, EC. Kohn, DA. Fishman, G. Whitely, JC. Barrett, LA. Liotta, EF 3rd. Petricoin, TD. Veenstra, “High-resolution serum proteomic features for ovarian cancer detection,” Endocr Relat Cancer, 2004 Jun;11(2):163-78.
[5] KR. Coombes, HA. Fritsche, C. Clarke, JN. Chen, KA. Baggerly, JS. Morris, LC. Xiao, MC. Hung, HM. Kuerer, “Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization,” Clinical Chemistry. 2003 Oct;49(10):1615-23.
[6] K. R. Coombes, S. Tsavachidis, J. S. Morris, K. A. Baggerly, M. C. Hung, and H. M. Kuerer, "Improved peak detection and quantification of mass spec-trometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform," The University of Texas M.D. Anderson Cancer Center, Technical Report UTMDABTR-001-04, 2004.
[7] E.P. Diamandis, “Mass Spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations.” Mol. Cell. Proteomics, 3, 367–378, 2004
[8] R. Etziono, N. Urban, S. Ramsey, M. Mcintosh, S. Schwartz, B. Reid, J. Radich, G. Anderson, L. Hartwell, “The case for early detection,” Nature reviews cancer, 3(4):243-52, 2003 Apr.
[9] E. T. Fung and C. Enderwick, “ProteinChip clinical proteomics: computational challenges and solutions,” Biotechniques, vol. Suppl, pp. 34-8, 40-1, 2002.
[10] P. Geurts, M. Fillet, D. de Seny, MA. Meuwis, M. Malaise, MP. Meerville, L. Wehenkel, “Proteomic mass spectra classification using decision tree based ensemble methods,” Bioinformatics, Volume 21, Number 14, page 3138--3145 – 2005
[11] Y. Hu, S. Zhang, J. Yu, J. Liu, S. Zheng, “SELDI-TOF-MS: the proteomics and bioinformatics approaches in the diagnosis of breast cancer,” Breast, 14(4):250-5, 2005 Aug.
[12] Q. Liu, B. Krishnapuram, P. Pratapa, X. Liao, A. Hartemink, L. Carin, “Identification of differentially expressed proteins using MALDI-TOF mass spectra,” Asilomar Conf on Signals, Systems and Computers, November 2003.
[13] D. I. Malyarenko, W. E. Cooke, B. L. Adam, G. Malik, H. Chen, E. R. Tracy, M. W. Trosset, M. Sasinowski, O. J. Semmes, and D. M. Manos, "Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques," Clin Chem, vol. 51, pp. 65-74, 2005.
[14] EF. Petricoin, AM. Ardekani, BA. Hitt, PJ. Levine, VA. Fusaro, SM. Steinberg, GB. Mills, C. Simone, DA. Fishman, EC. Kohn, LA. Liotta, “Use of proteomic patterns in serum to identify ovarian cancer,” Lancet , 359(9306):572-7, 2002 Feb 16
[15] J. Prados A. Kalousis M. Hilario, “On Preprocessing of SELDI-MS Data and its Evaluation,” 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06) , pp 953-958, 2006.
[16] J. Prados, A. Kalousis, JC. Sanchez, L. Allard, O. Carrette, M. Hilario, “Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents,” Proteomics, 2004 Aug;4(8):2320-32.
[17] V. Paradis, F. Degos, D. Dargere, N. Pham, J. Belghiti, C. Degott, J. L. Janeau, A. Bezeaud, D. Delforge, M. Cubizolles, I. Laurendeau, and P. Bedossa, "Identification of a new marker of hepatocellular carcinoma by serum protein profiling of patients with chronic liver diseases," Hepatology, vol. 41, pp. 40-7, 2005.
[18] H. W. Ressom, R. S. Varghese, and E. Orvisky, et al., “Analysis of MALDI-TOF serum profiles for biomarker selection and sample classification,” in Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB '05), November 2005.
[19] A. C. Sauve, T. P. Speed, and "Normalization, baseline correction and alignment of high-throughput mass spectrometry data " Proceedings of the Genomic Signal Processing and Statistics workshop, Baltimore, MD, USA., May 26-27, 2004.
[20] R. Tibshirani, T. Hastie, B. Narasimhan, S. Soltys, G. Shi, A. Koong, QT. Le, “Sample classification from protein mass spectrometry, by 'peak probability contrasts',” Bioinformatics, 2004 Nov 22;20(17):3034-44.
[21] R.J.O. Torgrip, M.Aberg, B. Karlberg, and S.P. Jacobsson, “Peak alignment using reduced set mapping,” J. Chemometrics, 17, 573-582, 2003
[22] M. Wagner, D. Naik, A. Pothen, “Protocols for disease classification from mass spectrometry data,” Proteomics, 2003 Sep;3(9):1692-8
[23] B. Williams, S. Cornett, A. Crecelius, R. Caprioli, B. Dawant, and B. Bodenheimer, “An algorithm for baseline correction of MALDI mass spectra,” in Proceedings of the 43rd ACM Southeast Conference (ACMSE '05), March 2005.
[24] W. Yu, B. Wu, N. Lin, K. Stone, K. Williams, H. Zhao, “Detecting and aligning peaks in mass spectrometry data with applications to MALDI,” Computational Biology and Chemistry 30(1): 27-38 (2006).
[25] Y. Yasui, D. McLerran, BL. Adam, M. Winget, M. Thornquist, Z. Feng, ” An Automated Peak Identification/Calibration Procedure for High-Dimensional Protein Measures From Mass Spectrometers,” Journal of Biomedicine and Biotechnology, 2003(4):242-248.
[26] Y. Yasui, M. Pepe, ML. Thompson, BL. Adam, GL. Wright, Y. Qu, JD. Potter, M. Winget, M. Thornquist, Z. Feng, “A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection,” Biostatistics, 2003 Jul;4(3):449-63.
[27] W. Yu, X. Li, J. Liu, B. Wu, KR. Williams, H. Zhao, “Multiple peak alignment in sequential data analysis: a scale-space-based approach,” IEEE/ACM Trans Comput Biol Bioinform. 2006 Jul-Sep;3(3):208-19.
[28] Z. Zhang, R. C. Bast, Jr., Y. Yu, J. Li, L. J. Sokoll, A. J. Rai, J. M. Rosenzweig, B. Cameron, Y. Y. Wang, X. Y. Meng, A. Berchuck, C. Van Haaften-Day, N. F. Hacker, H. W. de Bruijn, A. G. van der Zee, I. J. Jacobs, E. T. Fung, and D. W. Chan, "Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer," Cancer Res, vol. 64, pp. 5882-90, 2004.