簡易檢索 / 詳目顯示

研究生: 王康霖
Wong, Kang-Lin
論文名稱: 全基因變異位點篩選的高效網頁檢視器
An Efficient Web-Application for Filtering Whole Genome Variants
指導教授: 張天豪
Chang, Tien-Hao
共同指導教授: 陳倩瑜
Chen, Chien-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 41
中文關鍵詞: 變異點偵測巨型表格片段讀取變異位點檢視器
外文關鍵詞: Variant Calling, Big Table Block Reading, VCF Viewer
相關次數: 點閱:83下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 次世代定序技術在現今被廣泛利用在做基因定序上,此技術可分為三個種類:全基因組定序,全外顯子定序以及目標定序。次世代定序技術在臨床上被廣泛使用,並成為做基因變異點偵測的方法之一。變異點偵測是將受測者的基因序列跟參考基因組對應的結果整理並留下變異的資訊,主要可分為單核苷酸多態性,小片段變異以及結構型變異。變異指的是受測者的基因組跟參考基因組去做比較差異,這個差異有可能是致病變異但也有可能只是單純的外觀上的微小變化。
    變異點偵測所產生的檔案內容非常多,通常分析人員會針對該變異在公開資料庫上的資訊來決定這個變異是否為致病變異,所以會有一些針對分數篩選的條件,但是由於容量龐大且資料量多,不是一般的編輯器可以打開,分析人員必須透過程式指令來對檔案進行閱讀或是篩選,這對沒有程式背景的分析人員來說是非常不友善的,因此就一些有GUI的變異點位點檢視器,但又因為效能或是沒有繼續維護而不好使用。
    本研究提出了一個高效網頁檢視器,不僅可以開啟容量龐大的全基因變異偵測檔案,亦能快速響應對使用者對檔案的篩選和檢視,且本研究也導入微服務的概念,因此可以快速部署在不同的作業系統上,降低安裝的難度,即使是沒有程式背景的分析人員也能將本研究的檢視器部署在個人電腦上使用。

    Next Generation Sequencing (NGS) is widely used in genome sequencing, NGS includes Whole Genome Sequencing, Whole Exome Sequencing and Target Sequencing. NGS is also used in variant calling. Variant calling involves comparing a sample sequence, which may be a single gene sequence, a whole exome or a whole genome, and comparing it to a reference sequence. The differences between sample sequence and reference sequence may be SNPs, indel and structural variants. Variants may or may not produce discernible changes of an organism but also may become pathogenic variants.
    The file format of variant calling is called VCF, the VCF files will show all the difference and analysts will compare the variants with information from open database to get know which variant may be the pathogenic variants. Analysts usually have to filter out irrelevant variants, but analysts have to read the files row by row and use command line to execute the filter and the size of files are usually too big to open easily. Hence, there are some GUI VCF viewer developed for analysts to use, but the viewers can be found are usually lack of performance and not maintained so that the problems remaining.
    This work proposes an efficient web-application viewer to view and filter VCF files, no matter the VCF files is from whole genome, the viewer still works efficiently. Our viewer also applies microservice idea and pack our viewer into an image so that this application can be installed on cross-platform easily even on personal computer.

    摘要.................................................I SUMMARY.......................................II 誌謝................................................IX 目錄................................................X 圖目錄............................................XIII 表目錄............................................XIV 緒論................................................1 第一章 相關研究..............................3 1.1 變異點偵測(Variant Calling)........3 1.2 相關檔案格式..............................6 1.2.1 SAM/BAM檔............................6 1.2.2 VCF檔.....................................7 1.2.3 Pedigree檔..............................8 1.3 標註工具.....................................9 1.3.1 Annovar...................................9 1.3.2 KGGSeq..................................10 1.4 VCF檢視器.................................10 1.4.1 Varapp.....................................10 1.4.2 VCF-Miner...............................11 1.5 基因組檢視器..............................12 1.5.1 IGV...........................................13 第二章 研究方法...............................14 2.1 網頁技術.....................................14 2.2 架構設計.....................................16 2.2.1 創建類群..................................17 2.2.2 內建標註..................................17 2.2.2.1 Annovar................................18 2.2.2.2 KGGSeq...............................19 2.2.3 資料處理..................................22 2.2.4 資料庫.....................................23 2.2.5 片段表格..................................25 2.2.6 欄位選擇..................................26 2.2.7 篩選.........................................27 2.2.8 儲存結果..................................28 2.2.9 基因組檢視器連動....................28 2.3 容器化........................................29 第三章 研究結果...............................30 3.1 與其他方法之比較........................30 3.1.1 創建類群時間比較......................30 3.1.2 篩選時間比較.............................31 3.2 ORM效能實測..............................32 3.2.1 資料庫效能分析.........................32 3.2.2 表格顯示效能分析.....................34 3.2.3 每筆與批次寫入效能分析...........35 3.3 批次寫入筆數比較........................37 3.4 瀏覽器效能..................................38 第四章 結論.......................................39 4.1 結果探討......................................39 4.2 未來展望......................................39 參考文獻............................................40

    1. dbSNP. 2020 [cited 2020 25 May]; Available from: https://www.ncbi.nlm.nih.gov/snp/.
    2. Li, M.X., et al., A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases. Nucleic Acids Res, 2012. 40(7): p. e53.
    3. Wang, K., M. Li, and H. Hakonarson, ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res, 2010. 38(16): p. e164.
    4. Jäger, M., et al., Jannovar: a java library for exome annotation. Hum Mutat, 2014. 35(5): p. 548-55.
    5. Granata, I., et al., Var2GO: a web-based tool for gene variants selection. BMC Bioinformatics, 2016. 17(12): p. 376.
    6. Vandeweyer, G., et al., VariantDB: a flexible annotation and filtering portal for next generation sequencing data. Genome Med, 2014. 6(10): p. 74.
    7. Delafontaine, J., et al., Varapp: A reactive web-application for variants filtering. bioRxiv, 2016: p. 060806.
    8. Hart, S., et al., VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files. Briefings in bioinformatics, 2015. 17.
    9. The Human Genome Project. [cited 2020 25 May]; Available from: https://www.genome.gov/human-genome-project.
    10. Human Genome Overview. [cited 2020 25 May]; Available from: https://www.ncbi.nlm.nih.gov/grc/human.
    11. Human Genome Browser. [cited 2020 25 May]; Available from: https://genome-asia.ucsc.edu/cgi-bin/hgGateway?redirect=manual&source=genome.ucsc.edu.
    12. Sequence Variant Nomenclature. [cited 2020 25 May]; Available from: https://varnomen.hgvs.org.
    13. Wolf, A.B., et al., APOE and neuroenergetics: an emerging paradigm in Alzheimer's disease. Neurobiol Aging, 2013. 34(4): p. 1007-17.
    14. Li, H., et al., The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England), 2009. 25(16): p. 2078-2079.
    15. Danecek, P., et al., The variant call format and VCFtools. Bioinformatics (Oxford, England), 2011. 27(15): p. 2156-2158.
    16. Pedigree file. [cited 2020 25 May]; Available from: https://www.mv.helsinki.fi/home/tsjuntun/autogscan/pedigreefile.html.
    17. Alfaiz, A.A., et al., TBC1D7 mutations are associated with intellectual disability, macrocrania, patellar dislocation, and celiac disease. Hum Mutat, 2014. 35(4): p. 447-51.
    18. Robinson, J.T., et al., Integrative genomics viewer. Nat Biotechnol, 2011. 29(1): p. 24-6.
    19. Vue.js. [cited 2020 25 May]; Available from: https://vuejs.org.
    20. Semantic UI. [cited 2020 25 May]; Available from: https://semantic-ui.com.
    21. pug. [cited 2020 25 May]; Available from: https://pugjs.org/api/getting-started.html.
    22. Sass. [cited 2020 25 May]; Available from: https://sass-lang.com.
    23. Javascript. [cited 2020 25 May]; Available from: https://developer.mozilla.org/zh-TW/docs/Web/JavaScript.
    24. django. [cited 2020 25 May]; Available from: https://www.djangoproject.com.
    25. PostgreSQL. [cited 2020 25 May]; Available from: https://www.postgresql.org.
    26. MySQL. [cited 2020 25 May]; Available from: https://www.mysql.com.
    27. SQLite. [cited 2020 25 May]; Available from: https://sqlite.org/index.html.
    28. Docker. [cited 2020 25 May]; Available from: https://www.docker.com.
    29. Creating HTML Links to IGV. [cited 2020 25 May]; Available from: https://software.broadinstitute.org/software/igv/ControlIGV.
    30. Robinson, J.T., et al., igv.js: an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV). bioRxiv, 2020: p. 2020.05.03.075499.
    31. TnT Genome. [cited 2020 25 May]; Available from: http://tntvis.github.io/tnt.genome/.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE