| 研究生: |
張彙音 Chang, Hui-Yin |
|---|---|
| 論文名稱: |
質譜儀資料分析中基於叢集方式之尖峰對準技術 Cluster-based Peak Alignment Techniques for LC-MS Data Analysis |
| 指導教授: |
廖寶琦
Liao, Pao-Chi 曾新穆 Tseng, Vincent Shin-Mu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 醫學資訊研究所 Institute of Medical Informatics |
| 論文出版年: | 2009 |
| 畢業學年度: | 97 |
| 語文別: | 中文 |
| 論文頁數: | 86 |
| 中文關鍵詞: | 波峰校正 、質譜資料前置處理 、資料探勘 、蛋白質譜 |
| 外文關鍵詞: | Data Mining, Liquid Chromatography-Mass Spectrometry, peak alignment, LC-MS data preprocessing |
| 相關次數: | 點閱:86 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著人類基因體的定序完成,蛋白質體學現已成為二十一世紀最熱門的研究主題,研究人員目前常透過質譜儀探究基因密碼和不同蛋白質間的關係,希望能進一步了解生物體的致病原因和疾病的預防。然而在執行質譜儀時,所產生的蛋白質圖譜可能因為儀器的誤差、或者不同的實驗時間,造成圖譜中的波峰產生位移,這個波峰位移現象讓研究人員無法輕易地辨別同種蛋白質波峰在不同的質譜樣本中的位置,導致後續分析蛋白質體序列時的困難度。所以在本篇論文中,我們著重於波峰校正方法改良,預計解決蛋白質圖譜中「波峰位移」的問題,期望透過我們所提出的方法,將實驗過程中波峰產生的潛在位移誤差移除並且波峰位置能夠調校回正確數值,更精準地讓不同樣本中的波峰校正在一起,方便研究人員在後續的蛋白質分析。本篇論文提出PeakAlign Algorithm,首先使用了局部加權迴歸分析平滑方法(LOESS Regression Method)將圖譜中的誤差值去除,接著藉由改良式的群集演算法將相似度高的波峰群集在一起。在實驗的部分,我們使用真實資料與人造資料作為效能測試的依據,從真實資料的實驗結果中,我們可以發現若是質譜中的波峰有經過位移調整並且移除其誤差值,那麼在做波峰校正時可以更為精準。另外無論是在真實資料或者是人造資料的實驗中,我們提出的PeakAlign Algorithm方法的準確率表現皆優於其他常用的演算法,例如:DTW和SlidingWin 演算法。
Identifying proteomics markers to classify diseases by using mass spectrometry with high-performance liquid chromatography (LC-MS) has been a trend recently. However, since the experimental errors, one traditional problem (also known as the peak-shifting problem) occurred in the preprocessing of mutltiple LC-MS data analysis is that the identical peptides from multiple samples may have different retention time drifts. In our study, we proposed an algorithm, namely PeakAlign, to solve the peak-shifting problem. Our algorithm consists of two phases, the adjustment phase and the alignment phase. In the adjustment phase, the LOESS regression method is used to calculate the different shifting values among peaks along the retention time. In the alignment phase, a novel cluster-based technique based on the distance constraint is applied to align the adjusted peaks. To evaluate the PeakAlign, we used two real LC-MS datasets as well as a set of generated semi-synthetic datasets to evaluate the accuracy and similarity of the alignment results. The experimental results show that the performance of our algorithm is much better than that of other methods, such as the DTW and the SlidingWin algorithms.
[1] Ruedi Aebersold and Matthias Mann, “Mass spectrometry-based proteomics,” Nature, 422: 198-207, 2003.
[2] Matthew Bellew, Marc Coram, Matthew Fitzgibbon, Mark Igra, Tim Randolph, Pei Wang, Damon May, Jimmy Eng, Ruihua Fang, ChenWei Lin, Jinzhi Chen, David Goodlett, Jeffrey Whiteaker, Amanda Paulovich and Martin McIntosh, “A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS,” Bioinformatics, 22: 1902–1909, 2006.
[3] Dan Bylund, Rolf Danielsson, Gunnar Malmquist and Karin E. Markides, “Chromatographic alignment by warping and dynamic programming as a pre-processing tool for PARAFAC modeling of liquid chromatography-mass spectrometry data,” Journal of Chromatography A, 961: 237-244, 2002.
[4] Keith A. Baggerly, Jeffrey S. Morris, Jing Wang, David Gold, Lian-Chun Xiao and Kevin R. Coombes, “A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization time of flight, TOF proteomics spectra from serum samples,” Proteomics, 3: 1667-1672, 2003.
[5] William S. Cleveland, “Robust locally weighted regression and smoothing scatter plots,” Journal of the American Statistical Association, 74: 829-836, 1979.
[6] Kevin R. Coombes, Herbert A. Fritsche, Jr, Charlotte Clarke, Jeng-neng Chen, Keith A. Baggerly, Jeffrey S. Morris, Lian-chun Xiao, Mien-Chie Hung and Henry M. Kuerer, “Quality Control and Peak Finding for Proteomics Data Collected from Nipple Aspirate Fluid by Surface-Enhanced Laser Desorption and Ionization,” Clinical Chemistry, 49: 1615–1623, 2003.
[7] Eleftherios P. Diamandis, “Mass Spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations,” Mol. Cell. Proteomics, 3: 367-178, 2004.
[8] Paul H. C. Eilers, “Parametric Time Warping”, Analytical Chemistry, 76: 404-411, 2004.
[9] Bernd Fischer, Jonas Grossmann, Volker Roth, Wilhelm Gruissem, Sacha Baginsky and Joachim M. Buhmann, “Semi-supervised LC/MS alignment for differential proteomics,” Bioinformatics, 22: 132-140, 2006.
[10] Bernd Fischer, Volker Roth and Joachim M Buhmann,”Time-series alignment by non-negative multiple generalized canonical correlation analysis,” BMC Bioinformatics, 8:1471-2105, 2007.
[11] Pierre Geurts, Marianne Fillet, Dominique de Seny, Marie-Alice Meuwis, Michel Malaise, Marie-Paule Merville and Louis Wehenkel, “Proteomic mass spectra classification using decision tree based ensemble methods,“ Bioinformatics, 21: 3138-3145, 2005.
[12] Melanie Hilario, Alexandros Kalousis, Christian Pellegrini and Markus Múller, “Processing and classification of protein mass spectra,” Wiley InterScience, 25: 409-449, 2005.
[13] Kazutaka Katoh, Kazuharu Misawa, Kei-ichi Kuma and Takashi Miyata,” MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform,” Nucleic Acids Research, 30: 3059-3066, 2002.
[14] Jennifer Listgarten and Andrew Emili, “Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry,” Mol. Cell Proteomics, 4: 419-434, 2005.
[15] Eva Lange1, Clemens Gröpl, Ole Schulz-Trieglaff1, Andreas Leinenbach, Christian Huber and Knut Reinert, “A geometric approach for the alignment of liquid chromatography—mass spectrometry data,” Bioinformatics, 23: 273–281, 2007.
[16] Kwan R. Lee, Xiwu Lin, Daniel C. Park and Sergio Eslava, “Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method,” Proteomics, 3:1680-1686, 2003.
[17] Eva Lange, Ralf Tautenhahn, Steffen Neumann and Clemens Cröpl, “Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements,” BMC Bioinformatics, 9:375, 2008.
[18] Simon M Lin, Lihua Zhu, Andrew Q Winter, Maciek Sasinowski and Warren A Kibbe, “What is mzXML good for?” Expert Rev Proteomics, 2:839-845, 2005.
[19] Niels-Peter Vest Nielsen, Jens Michael Carstensen and JØrn Smedsgaard, “Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimized warping,” Journal of Chromatography A, 805: 17-35, 1998.
[20] Emanuel F Petricoin III, Ali M Ardekani, Ben A Hitt, Peter J Levine, Vincent A Fusaro, Seth M Steinberg, Gordon B Mills, Charles Simone, David A Fishman, Elise C Kohn and Lance A Liotta, “Use of proteomic patterns in serum to identify ovarian cancer,” The Lancet, 359:572-577, 2002.
[21] Patrick G A Pedrioli, Jimmy K Eng, Robert Hubley, Mathijs Vogelzang, Eric W Deutsch, Brian Raught, Brian Pratt, Erik Nilsson, Ruth H Angeletti, Rolf Apweiler, Kei Cheung, Catherine E Costello, Henning Hermjakob, Sequin Huang, Randall K Julian Jr, Eugene Kapp, Mark E McComb, Stephen G Oliver, Gilbert Omenn, Norman W Paton, Richard Simpson, Richard Smith, Chris F Taylor, Weimin Zhu and Ruedi Aebersold, “A common, open representation of mass spectrometry data and its application to proteomics research,” Nat Biotechnol, 22:1459-1466, 2004.
[22] John T. Prince and Edward M. Marcotte, “Chromatographic Alignment of ESI-LC-MS Proteomics Data Sets by Ordered Bijective Interpolated Warping,” Anal. Chem., 78: 6140-6152, 2006.
[23] Smriti R. Ramakrishnan, Rui Mao, Aleksey A. Nakorchevskiy, John T. Prince, Willard S. Willard, Weijia Xu, Edward M. Marcotte and Daniel P. Miranker, “A fast coarse filtering method for protein identification by mass spectrometry,” Bioinformatics, 22: 1524-1531, 2006.
[24] Mark D Robinson, David P De Souza, Woon Wai Keen, Eleanor C Saunders, Malcolm J McConville, Terence P Speed and Vladimir A Likic, “A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments,” BMC Bioinformatics, 8:419, 2007.
[25] Colin A. Smith, Elizabeth J. Want, Grace O’Maille Ruben Abagyan and Gary Siuzdak, “XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification,” Analytical Chemistry, 78: 779-787, 2006.
[26] Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, Scott Soltys, Gongyi Shi, Albert Koong and Quynh-Thu Le, “Sample classification from protein mass spectrometry, by peak probability contrasts,” Bioinformatics, 20:3034-3044, 2004.
[27] Bobbie-Jo M. Webb-Robertson and William R. Cannon, “Current trends in computational inference from mass spectrometry-based proteomics,” Briefings in Bioinformatics, 8: 304-317, 2007.
[28] Michael Wagner, Dayanand Naik and Alex Pothen, “Protocols for disease classification from mass spectrometry data,” Proteomics, 3: 1692-1698, 2003.
[29] Pei Wang, Hua Tang, Matthew P. Fitzgibbon and Martin Mcintosh, “A statistical method for chromatographic alignment of LC-MS data,” Biostatistics, 8: 357–367, 2007.
[30] Yutaka Yasui, Dale McLerran, Bao-Ling Adam, Marcy Winget, Mark Thornquist and Ziding Feng, “An Automated Peak Identification/Calibration Procedure for High-Dimensional Protein Measures From Mass Spectrometers,” Journal of Biomedicine and Biotechnology, 4: 242-248, 2003.