簡易檢索 / 詳目顯示

研究生: 郭彥宏
Kuo, Yen-Hung
論文名稱: 以PageRank演算法分析閱讀行為以達成數位文章摘要
PageRank based Reading Pattern Analysis for eDocument Summarization
指導教授: 黃悅民
Huang, Yueh-Min
學位類別: 博士
Doctor
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 73
中文關鍵詞: 閱讀式樣圖PageRank低連結性數位文章摘要閱讀行為
外文關鍵詞: Reading Pattern Graph (RPG), Reading behavior, PageRank, Low connectivity, eDocument summarization
相關次數: 點閱:100下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 相較於二十年前,現代人選擇性閱讀的行為更加明顯,為了由閱讀中獲取更多的資訊,人們通常花較多的時間在重要的內容上並略讀其他的內容,這種行為上的改變給了作者研究的靈感,並激發作者發展一套透過分析人們閱讀活動來提取數位文章重要片段的方法。為了有效率的分析閱讀活動,作者於研究中整理了一系列的閱讀行為,並利用其將閱讀活動表示為閱讀樣式圖,在閱讀樣式圖中點代表數位文章的片段,而邊則是閱讀的路徑,於所提出的方法中,每一個閱讀式樣圖都將透過PageRank演算法來計算其圖中各個點的重要性,之後所有的閱讀式樣圖中各片段的重要性將被平均以獲得一個綜合性的排名結果,最後,在綜合性排名結果的前幾名即代表數位文章重要的片段。然而PageRank演算法卻有可能高估數位文章片段的重要性而導致摘要的準確度降低,在研究中兩個主要造成高估數位文章片段問題的可能性首先被指出來:一、某些數位文章不容易造成記憶上的困難或選擇性閱讀;二、閱讀活動可以被視為一條往返於數位文章上的線。上述兩個理由造成閱讀樣式圖上點與點之間的低連結性,而其低連結性的現象就是造成高估數位文章片段的主因,根據上述兩個理由,三種解決方式在研究中被提出並且測試:一、考慮向後檢閱和反向往回檢閱的行為於閱讀樣式圖中;二、使用Site-Ranking演算法;三、使用AggregateRPG演算法。研究的實驗結果指出,任何單一的方式無法在實驗設定中造成大量摘要效果的改進,然而高估數位文章片段的問題可以透過同時採用三種方式來解決,此外以AggregateRPG為基礎的數位文章摘要的流程可以大大的減低以閱讀式樣圖為基礎的數位文章摘要所需要的時間,在文章最後,作者推薦同時使用所有的處理方式配合以閱讀樣式圖為基礎的數位文章摘要來達成未來所需的文章摘要工作。

    Nowadays, people’s reading behaviors have become more selective than two decades ago. To acquire more information from eDocuments, people usually pay attention to significant contents while skimming through the remainder. This behavioral change motivated the author to develop a RPG based eDocument summarization approach to extract significant segments from an eDocument by analyzing people’s reading activities. To efficiently analyze reading activity, a set of reading behavior is categorized and adopted in this study to model a reading activity as a reading pattern graph (RPG), in which vertices are segments of an eDocument and edges are reading paths. For each RPG, the PageRank algorithm is applied to rank its vertices, and all ranking results are then aggregated as a synthetic ranking result. Consequently, the significant segments of an eDocument can be found in the top portion of the synthetic ranking result. However, there is a potential that some segments in an eDocument may be overrated by the PageRank analysis. In this work, two potential reasons, which cause the overrating problem, are identified: (1) Some eDocuments are difficult to cause memory difficulty or selective reading and (2) A reading activity can be treated as a traversed thread over an eDocument. The two reasons would cause a RPG’s vertices have a low connectivity, and the low connectivity phenomenon is the main reason that causes the overrating problem. According to the two reasons, three treatments are introduced and tested: (1) Adding the forward checking (FC) and the reverse backtracking (RBT) behaviors into a RPG, (2) Using the Site-Ranking algorithm, and (3) Using the AggregateRPG algorithm. The experimental result indicates that any single treatment cannot make a substantial improvement in the testing conditions. Nevertheless, the overrating problem would be properly dealt by simultaneously using all three treatments. In addition, the AggregateRPG based process can greatly reduce the required time of RPG based eDocument summarization. Finally, author recommends using all the treatments at a time to perform the future RPG based eDocument summarization tasks.

    Chapter 1 Introduction………………1 Chapter 2 Research Background……………………………………………5 2.1. The PageRank analysis……………………………………………5 2.2. Reading behaviors…………………………………………6 Chapter 3 Construction of Reading Pattern Graph…………………………..8 3.1. Notation definitions………………………………………8 3.2. Mapping reading behaviors to the RPG…………………………………9 Chapter 4 Summarizing eDocument by PageRank based Analysis……….………13 4.1. The RPG analysis by the PageRank algorithm….…………….……………13 4.2. Aggregating of ranking results………………………………………………….15 4.3. Evaluation……………………………………17 4.3.1. Experimental process…………………………………………19 4.3.2. Measurements………………………………25 4.3.3. Results………………………………27 4.4. Discussions………………………………33 Chapter 5 Improving the Effect and Efficiency of the RPG based eDocument Summarization……………………39 5.1. The Site-Ranking algorithm……………………………42 5.2. Aggregating of RPGs………………………………….44 5.3. Evaluations………………………………47 5.3.1. Experiment I – Evaluating effectiveness of proposed treatments……47 5.3.2. Experiment II – Evaluation of eDocument summarization efficiency....................58 5.4. Discussions……59 Chapter 6 Conclusions and Future Works……………………………………....64 References……………69 Appendix A Questionnaire for Surveying Backgrounds of Examinee………………71 Appendix B Questionnaire for Surveying Experiences of Examinee………………..72 Vita……………………………………73

    Baeza-Yates, R. & Ribeiro-Neto, B., Modern Information Retrieval, Addison-Wesley, 1999.
    Bazerman, C., Shaping written knowledge: The genre and activity of the experimental article in science, The University of Wisconsin Press, Wisconsin, 1988.
    Bodner, R.C., Chignell, M.H., Charoenkitkarn, N., Golovchinsky, G., & Kopak, R.W., “The impact of text browsing on text retrieval performance,” Information Processing & Management, 37(3), 507-520, 2001.
    Brin, S. & Page, L., “The anatomy of a large-scale hypertextual web search engine,” Computer Networks and ISDN Systems, 30(1-7), 107-117, 1998.
    Garner, R., “Strategies for reading and studying expository text,” Educational Psychologist, 22(3), 299-312, 1987.
    Goldman, S.R. & Saul, E.U., “Flexibility in text processing,” Learning and Individual Differences, 2(2), 181-219, 1990.
    Hornbæk, K. & Frøkjær, E., “Reading patterns and usability in visualizations of electronic documents,” ACM Transactions on Computer-Human Interaction, 10(2), 119-149, 2003.
    Horney, M. & Anderson-Inman, L., “The ElectroText project: Hypertext reading patterns of middle school students,” Journal of Educational Multimedia and Hypermedia, 3(1), 71-91, 1994.
    Kent, M.L., “Critical analysis of blogging in public relations,” Public Relations Review, 34(1), 32-40, 2008.
    Kim, K.J., Kang, M.S., & Choi, Y.S., “A Site-Ranking algorithm for a small group of sites,” Lecture Notes in Computer Science, 4706, 397-405, 2007.
    Liu, Z., “Reading behavior in the digital environment: Changes in reading behavior over the past ten years,” Journal of Documentation, 61(6), 700-712, 2005.
    Liu, Z., “Print vs. electronic resources: A study of user perceptions, preferences, and use,” Information Processing & Management, 42(2), 583–592, 2006.
    Piolat, A., Roussey, J.Y., & Thunin O., “Effects of screen presentation on text reading and revising,” International Journal of Human-Computer Studies, 47(4), 565-589, 1997.
    Qayyum, M.A., “Capturing the online academic reading process,” Information Processing & Management, 44(2), 581-595, 2008.
    Scardamalia, M. & Bereiter, C., “Development of strategies in text processing”, in Mandl, H., Stein, N. L., & Trabasso, T. (Eds.), Learning and comprehension of text, Hillsdale, NJ: Lawrence Erlbaum Associates, Inc., 379-406, 1984.
    Shih, W.C., Tseng, S.S., & Yang, C.T., “Wiki-based rapid prototyping for teaching-material design in e-Learning grids,” Computers & Education, 51(3), 1037-1057, 2008.
    Shipman, F., Price, M., Marshall, C.C., & Golovchinsky, G., “Identifying useful passages in documents based on annotation patterns,” Lecture Notes in Computer Science, 2769, 101-112, 2004.
    Wikipedia-Sigmoid function, http://en.wikipedia.org/wiki/Sigmoid_function. Data retrieved on November 25, 2008.

    下載圖示 校內:2011-01-06公開
    校外:2014-01-06公開
    QR CODE