簡易檢索 / 詳目顯示

研究生: 陳新杰
Chen, Xin-Jie
論文名稱: 在分支因子和巢型因子的高斯過程模型下研究次序代基因組裝參數調整
Performance Tuning of Nest-generation Sequencing Assembly via Gaussian Process Model with Branching and Nested Factors
指導教授: 陳瑞彬
Chen, Ray-Bing
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 54
中文關鍵詞: 高斯過程模型分支拉丁方格設計期望函數次序代基因重組電腦實驗
外文關鍵詞: Gaussian Process model, Branching Latin hypercube design, Next-generation sequecing, De novo assembly, Optimization
相關次數: 點閱:137下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 對於次序代基因重組,組裝工具以及其對應的參數選擇對於組裝的品質有很大的影響。在本篇研究中,三種組裝工具: Velvet, SOAPdenovo, ABySS被考慮為分支因子,對應於各工具所特有的參數被當作巢狀因子,以及各工具共有的參數被當作是共享因子。因為基因的組裝是通過電腦模擬重組的,這個選擇問題就成了電腦實驗上對於於上述參數的最優解問題

    在這篇研究中,我們提出了一個二階段式的過程來處理這個最優解問題。 首先,我們應用分支拉丁設計探索反應曲面的大致情況,在第二階段,高斯過程模型被應用於建構反應曲面, 然後根據最大化期望函數準則選擇下一個實驗點直到停止的條件滿足為止。在模擬方面,這個二階段過程的表現相當不錯。同時,當應用在實際資料的時候,這個過程迅速的選擇了一個反應值比較大的區域,最終選擇到的實驗點的值也比其他的方法好。

    For de novo assembly of next-generation sequencing data, the selection of assembly tool and the corresponding parameters have a great effect on the quality. The tool is treated as the branching factor. Three tools: Velvet, SOAPdenovo and ABySS are considered which are regarded as the levels of the
    branching factor. And the parameters which are special for each tools are treated as nested factors. Besides, the parameters shared by all tools are regarded as shared factor. In this study, we want to choose the tool and corresponding parameters that optimize the quality under limited resource.

    Because the de novo assembly is simulated with computer, the selection becomes the optimization problem of computer experiment with respect to the factors. We propose a sequential procedure to choose the assembly tool and the corresponding optimal parameters simultaneously. Firstly, we apply the Branching Latin hypercube design to explore the
    response surface. Secondly, Gaussian Process model is applied to construct the response surface and select the next experiment point by maximizing the Expected Improvement function until the stopping criterion is meet. The performance of the numerical simulation seems well. The implementation of real data can search into the region with larger value of response quickly and access to a better
    experiment point compared to other methods.

    1 Introduction 1 1.1 Background and Motivation 1 1.2 Data Description 2 1.3 Literature Reviews 2 1.4 Overview 4 2 Branching Latin Hypercube Design 6 2.1 Branching Latin Hypercube Design 6 2.2 Maximin Branching Latin Hypercube Design 8 3 Gaussian Process Model and Expected Improvement Function 10 3.1 Positive Definite Correlation Matrix 10 3.2 Parameter Estimation and Prediction 11 3.3 Expected Improvement Criterion 13 3.4 Other Models 14 3.4.1 Independent Model 14 3.4.2 BQ Model 15 3.4.2.1 Singularity Problem of BQ Model 15 3.4.3 QQ Model 16 4 Simulation 18 4.1 Case 1 with Same Marginal Effect 19 I 4.1.1 Sequential Procedure for Case 1 20 4.1.2 Comparison with Other Correlation Structure 22 4.2 Case 2 26 4.3 Case 3 28 4.4 Case 4 30 4.5 Case 5 33 4.6 Case 6 34 4.6.1 Sequential Procedure for Case 6 34 4.6.2 Comparison with Other Methods 36 5 De novo Assembly 41 6 Conclusion and Future Work 51 6.1 Future Work 51 Bibliography 53

    M.M. Allan and G.Walter. A new method for sequencing dna. Biochemistry, 74(2):560–564, Feb. 1977.

    R.Z. Daniel and B. Ewan. Velvet: Algorithms for de novo short read assembly using de bruijn graphs. Genome Research, 18:821–829, 2008.

    R.J. Donald, S. Matthias, and William J. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13:455–492, 1998.

    Alexander. Forrester, Andras. Spbester, and Andy. Keane. Engineering Design via Surrogate Model. John Wiley and Sons Ltd., 51-59, 2008.

    Y. Hung, R Joseph, and S.N. Melkote. Design and analysis of computer experiments with branching and nested factors. Technometrics, 51(4):354–365, 2009.

    S.-L. Jeng, Y.-H. Wu, and T.-L. Liu. Improving de novo assembly by preprocessing the next-generation sequencing data. Journal of the Chinese Statistical Association, 51:352–372, 2013.

    R.Q. Li, H.M. Zhu, J. Ruan,W.B. Qian, Fang X.D., ZH.B. Shi, Y.R. Li, SH.T. Li, G. Shan, K. Kristiansen, S.G. Li, H.M. Yang, J. Wang, and J Wang. De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20:265–272, 2009.

    Y. Lin, J. Li, H. Shen, L. Zhang, C.J. Papasian, and H.W. Deng. Comparative studies of de novo assembly tools for next-generation sequencing technologies.Bioinformatics, 27:2031–2037, June 2011.

    M.d Morris and T.J Mithcell. Exploratory design for computational experiments. Journal of Statistical Planning and Inference, 43(3):381–402, Feb. 1995.

    Z.G. Qian, H.Q. Wu, and C.F. Wu. Gaussian process models for computer experiments with qualitative and quantitative factors. Technometrics, 50 (3):383–396, 2008.

    J. Sacks, W.J. Welch, T.J. Mitchell, and H.P. Wynn. Design and analysis of computer experiments. Statistical Science, 4(4):409–423, Nov. 1989.

    T.J. Simpson, K. Wong, D.S. Jackman, E.J Schein, S.J. Jones, and I. Birol. Abyss: A parallel assembler for short read sequence data. Genome Research, 19:1117–1123, 2009.

    G Taguchi. System of Experimental Design. White Plans, New York.,279- 310, 1987.

    W.J. Welch, R. J. Buch, J. Sacks, H.P. Wynn, T.J. Mitchell, and M.D. Morris. Screening, predicting and computer experiments. Technometries, 34(4): 15–25, 1992.

    R Wu. Nucleotide sequence analysis i. partical seqence of the cohesive ends of bacteriphage and 186 dna. Journal of Molecular Biology, 41(3):501– 521, Aug. 1970.

    下載圖示 校內:2019-08-13公開
    校外:2019-08-13公開
    QR CODE