| 研究生: |
陳威男 Chen, Wei-nan |
|---|---|
| 論文名稱: |
利用隱藏式馬可夫模型辨識網頁上的結構化資源 Identifying Structured Resource Objects on the Web Using Hidden Markov Model |
| 指導教授: |
盧文祥
Lu, Wen-hsiang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2009 |
| 畢業學年度: | 97 |
| 語文別: | 中文 |
| 論文頁數: | 39 |
| 中文關鍵詞: | 結構化網頁資源 、結構化資源辨識模型 、網路搜尋 |
| 外文關鍵詞: | Suquential Structured Resource Objects Identification Model, Structured Web Resources, Web Search |
| 相關次數: | 點閱:81 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在網頁中存在著很多資源是對使用者有用的,在一般的網路世界提供使用者各種形式的豐富資源,在一般的搜尋引擎上使用者只能利用關鍵字,經過搜尋引擎利用關鍵字配對(keyword-matching)的技術尋找相關的頁面,接著使用者再自行尋找所要的資源,因此搜尋引擎沒有辦法有效的回應使用者其所針對的特定資源。實際上根據網路上的資源,我們觀察到其中有某些文字的資源具有特定的次序結構,而這類資源許多描述方法並有步驟順序的句子所構成,而這一連串的步驟指令是為了製作、準備某件物品或進行某項事務,因此我們定義這種資源為Sequential Structured Resource Objects。為了能夠在網頁上有效的辨識出結構化資源,本論文提出一個結構化資源辨識模型(Sequential Structured Resource Objects Identification Model),進行結構化資源結構的統計式訓練,最後可以有效的在網頁上辨識結構化資源。
本論文的實驗挑選了三種不同的結構化資源進行訓練和測試,根據我們實驗的結果,在三種不同的結構化資源中,利用我們提出的模型可以有效的辨識出網頁內容是否為結構化資源,未來可擴大資源的種類,則可以應用到整個網路上所有的網頁。
There are a lot of usuful web resources for users on the Web. However, users can only use keywords to search in general search engines, and usually have to find the web resources they want by themselves. In fact, search engines can not response the user demand effectively.
We observed that some textual web resources are structured. Those web resources are a set of instructions for making, preparing or doing something, and they are made by sequential sentences. We called this kind of web resources sequential structured resource objects. In order to identify sequential structured resource objects, we proposed a Sequential Structured Resource Objects Identification Model to learn the structure of sequential structured resource objects then identify them on the Web.
In this paper, we collected three kinds of sequential structured resource objects to training the model . Our model can effectively identify sequential structured resource objects on the Web in all the three web resources.
Gangemi, A., & Mika, P. (2003). Understanding the semantic Web through descriptions and situations. Proceedings of International Conference of Ontologies, Databases, and Applications of Semantics (ODBASE2003) (pp. 689-706). Catalina, Italy. November 3-7th 2003. London: Springer-Verlag.
Indrajit Bhattacharya, Shantanu Godbole and Sachindra Joshi. Structured Entity Identification and Document Categorization: Two Tasks with One Joint Model. In Proc. of KDD’08.
J. Jeon and W. B. Croft. Learning Translation-based Language Models Using Q&A Archives. Technical Report, University of Massachusetts.
J. Jeon, W. B. Croft and J. Lee. Finding Semantically Similar Questions Based on Their Answers. In Proc. of SIGIR'05.
J. Jeon, W. B. Croft and J. Lee. Finding Similar Questions in Large Question and Answer Archives. In Proc. of CIKM'05.
Kimberly Tee, Karyn Moffatt, Leah Findlater, Eve Mac Gregor, Joanna McGrenere, Barbara Purves and Sidney S. Fels. A Visual Recipe Book for Persons with Language Impairments. In proc. of CHI’05.
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. In Proc. IEEE, 77(2):257–286, 1989.
Li, Y., Meng, X., Wang, L., and Li, Q. RecipeCrawler: collecting recipe data from www incrementally. In Proc. of the 7th International Conference on Web-Age Information Management (WAIM), Hong Kong, China, 2006, pp. 263-274.
Liping Wang and Qing Li. A Personalized Recipe Database system with User-Centerd Adaption and Tutoring Suppoort. In Proc. of SIGMOD Ph.D. workshop on IDAR, 2007
Liping Wang, Qing Li, Na Li, Guozhu Dong and Yu Yang. Substructure Similarity Measurement in Chinese Recipes. In Proc. of WWW’08.
Michael L. Nelson, Joan A. Smith, Ignacio Garcia del Campo, Herbert Van de Sompel and Xiaoming Liu Efficient. Automatic Web Resource Harvesting. In Proc. of WIDM’06.
Valentina Presutti and Aldo Gangemi. Identify of Resources and Entities on the Web. In Proc.of the International Journal on Semantic Web & Information Systems 2008
Valentina Presutti and Aldo Gangemi. The bourne identity of a Web resource. In Proc. of IRW’06
Venkatesan T. Chakaravarthy, Himanshu Gupta, Prason Roy and Mukesh Mohania. Efficient Linking Text Documents with Relevant Strutured Information. In Proc. of VLDB’06.
Wang, L. CookRecipe - towards a versatile and fully-fledged recipe analysis and learning system. Ph.D. thesis, Department of Computer Science, City University of Hong Kong, Hong Kong (Jan. 2008).