| 研究生: |
林倩如 Lin, Chien-Ru |
|---|---|
| 論文名稱: |
受污染微陣列資料的診斷與矯正 The diagnosis and remedy on contaminated Microarray data |
| 指導教授: |
詹世煌
Chan, Shin-Huang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2004 |
| 畢業學年度: | 92 |
| 語文別: | 中文 |
| 論文頁數: | 30 |
| 中文關鍵詞: | 微陣列 |
| 外文關鍵詞: | Microarray, Single linkage clustering, K-means clustering, loess |
| 相關次數: | 點閱:86 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
1865年,孟德爾建立了基因的觀念,他以八年的豌豆實驗,合理的假設,統計的方法,奠定了現代遺傳學的基礎。世代遺傳的基本單位稱為基因,基因是記錄在DNA上的密碼檔案。透過基因解碼,得以了解其與疾病間的關係。利用microarray技術,我們得以在一次實驗中獲得成千上萬個基因表現值,藉由基因篩選的技巧,可了解那些基因和疾病有關,因而microarray技術乃成為生物學史上的一大突破。惟以往微陣列資料偶有遭受機器、人為污染的可能,對此人們所用之診斷方式多無明確理論基礎,且其處理方式皆為予以刪除。本文考慮污染映像點,利用single linkage clustering偵測晶片基因是否受到污染,並以K-means clustering法找出受污染基因。由於基因表現值含前景及背景值,因此我們對污染點的背景及前景,以loess法估計其所受的污染量,加以扣除。對所建議的方法我們以模擬方式評估其績效,並以膀胱癌microarray資料做為建議方法的實例應用。
In 1865, Mendel gave the idea of gene. After for eight work in experiment years and under reasonable assumption and sound analysis with pea data he established the groundwork for genetics. Gene is the base heredity unit which is coded the DNA sequence. Through decode gene the investigators become aware of the relationship between gene and disease. People can obtain a large amount of gene expression levels through microarray technology and find out the genes that significantly associated with disease through gene mining. Unfortunately, lots of sources of noise can be introduced during the microarray experiment, and diagnosis is and remedy measure is necessary to be taken if quality of data is to be assured. To take care of the possible bad spots, some commercial companies set some criteria in the selection of good gene spots, but the rationale of the selection criteria are questionable. In this study, I suggest using single linkage clustering method to detect contaminated spots if such exist, and use K-means clustering method to locate the dirty spots. We then use loess method to estimate the contaminated volume of the bad spots selected by K-means clustering method. Simulation study shows that the performance of the suggested method is reasonable well. We use a microarray data for bladder cancer to illustrate the application of our approach.
[1] Cheng Li, and Wing Hung Wong. (2001). Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. PNAS, 98, 31-36.
[2] Dallas E. Johnson. Applied multivariate methods for data analysts. Duxbury. 1998.
[3] Daniel Bozinow and Jörg Rahnenführer. (2002). Unsupervised technique for robust target separation and analysis of DAN microarray spots through adaptive pixel clustering. Bioinformatics, 18, 747-756.
[4] S. Dudoit, Y.H.Yang, T.P. Speed, and M.J. Callow. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica, 12,111-140.
[5] Neter, Kutner, Nachtsheim, Wasserman. Applied linear regression models. 3rd. Irwin, 1995.