簡易檢索 / 詳目顯示

研究生: 紀永昌
Chi, Yung-Chang
論文名稱: 以深度學習及資料擴增預測新興技術專利侵權風險
Forecasting New Technology Patent Infringement Risks Using Deep Learning and Data Augmentation
指導教授: 王惠嘉
Wang, Hei-Chia
學位類別: 博士
Doctor
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 96
中文關鍵詞: 深度學習資料擴增專利風險專利侵權
外文關鍵詞: Deep Learning, Data Augmentation, Patent Risk, Patent Infringement
相關次數: 點閱:231下載:34
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 技術專利被認為是新興技術的來源。而開發新專利需投入大量成本,但有時投入後又可能與其它專利相似,缺乏相關可專利條件而無法通過專利審查,造成投資上的損失,另一方面若能事先確定新產品技術是否有專利侵權對於降低損害風險是非常重要的課題。而專利審查到目前為止皆以人工方式進行審查與比對,由於人力與時間之限制審查時間甚長,就目前相關的專利相似比對研究,多以文字探勘的分類演算法,提供審查通過與否的可能性分析,但對是否侵權的可能性較無討論。
    本研究試圖提出從現有專利資料庫評估專利申請與侵權風險的問題。對於每項專利申請,本研究採用了卷積神經網路(Convolutional Neural Networks CNN)與長短期記憶(Long Short-Term Memory Network LSTM)預測模型,及基於關鍵字搜尋的美國專利事務局(USPTO)公開專利申請案及審核結果,再運用資料擴增後訓練前,隨機抽出10%批准及拒絕之申請案當作測試案例,其餘90%案例訓練本研究預測模型,以期計算出一個可為專利侵權與審核預測的模型。實驗結果本研究的模型預測各分類準確度最低皆可達87.7%以上,並可從其中找到專利申請時可能被拒絕原因的分類。

    Technology patents are considered the source and bedrock of emerging technologies. Patents create value in any enterprise. However, obtaining patents is time consuming, expensive, and risky; especially if the patent application is rejected. The development of new patents requires extensive costs and resources, but sometimes they may be similar to other patents once the technology is fully developed. They might lack relevant patentable features and as a result, fail to pass the patent examination, resulting in investment losses. Patent infringement is also an especially important topic for reducing the risk of legal damages of patent holders, applicants, and manufacturers. Patent examinations have so far been performed manually. Due to manpower and time limitations, the examination time is exceedingly long and inefficient. Current patent similarity comparison research, and the classification algorithms of text mining are most commonly employed to provide analyses of the possibility of examination approval, but there is insufficient discussion about the possibility of infringement. However, if a new technology or innovation can be accurately determined in advance whether it likely to pass or fail (and why), or is at risk of patent infringement, losses can be mitigated.
    This research attempts to identify the issues involved in evaluating patent applications and infringement risks from existing patent databases. For each patent application, this research uses Convolutional Neural Networks (CNN) and Long Short-Term Memory Network (LSTM) prediction model, and the United States Patent and Trademark Office (USPTO) public utility patent application and reviews results based on keyword search. Then, data augmentation is utilized before performing model training; 10% of the approved and rejected applications are randomly selected as test cases, with the remaining 90% of the cases used to train the prediction model of this research in order to determine a model that can predict patent infringement and examination outcomes. Experimental results of the model in this study predict that the accuracy of each classification is at least 87.7%, and can be used to find the classification of the reason for a rejection of a patent application failure.

    Abstract IV Contents V List of tables VIII List of figures IX Chapter 1. Introduction 1 1.1 Research background and motivation 1 1.2 Research purpose 6 1.3 Research procedure 7 Chapter 2. Related Works 8 2.1 Patent Infringement 10 2.2 Literal Infringement 11 2.3 Novelty 12 2.4 Non-obviousness 13 2.5 Natural Language Processing 15 2.6 Text Mining 17 2.7 Convolutional Neural Networks 17 2.8 LSTM 19 2.9 Text data augmentation 21 2.10 GloVe 24 Chapter 3 Methodology 25 3.1 Preprocess module 26 3.2 Classification module 32 3.2.1 Text data augmentation 33 3.3 Patent Risk Evaluation Module 38 3.3.1 Predictive Model Training 38 3.3.2 Risk Evaluation 41 Chapter 4. Experiments and Results 43 4.1 Data set 43 4.2 Evaluation Indicators 45 4.3 Predictive model architecture and hyperparameter 45 4.4 Holdout cross validation 46 4.5 Experiment I, Utility Patent Application Claims Pass or Fail Classification 47 4.6 Experiment II, Utility Patent Application Rejected Reason 101 Classified 48 4.7 Experiment III, Utility Patent Application Rejected Reason 102 Classified 48 4.8 Experiment IV, Utility Patent Application Rejected Reason 103 Classified 49 4.9 Experiment V, Utility Patent Application Rejected Reason 112 Classified 50 4.10 Experiment VI, Utility Patent Application Rejected by Other Reasons are Classified 50 4.11 Results 53 4.12 Discuss 53 Chapter 5. Conclusion 63 5.1 Research Contributions 64 5.2 Research scope and limitations 65 Chapter 6. Future work 66 6.1 Introduction 66 6.2 Related works 68 6.2.1 Machine learning 68 6.2.2 Supervised learning 69 6.2.3 Unsupervised learning 72 6.2.4 Deep learning 74 6.2.4.1 Supervised Neural Networks 75 6.2.4.2 Unsupervised Pre-trained Neural Networks 77 6.2.5 Generative Adversarial Network 79 6.2.5.1 Relational Generative Adversarial Networks 81 6.2.5.2 Seq-GAN 82 6.2.5.3 LeakGAN 83 6.3. Methodology 84 6.3.1 Data preprocess 84 6.3.2 Relational Generative Adversarial Networks for claims generate 85 6.4 Experiments 87 6.5. Expected contribution 87 Reference 89 Appendix 94

    Adam B. Jaffe., & Manuel Trajtenberg. (2002). Patents, Citations, and Innovations.
    The MIT Press. Cambridge, Massachusetts, London, England.
    Alessandro, Evangelista., Lorenzo, Ardito., Antonio, Boccaccio., Michele, Fiorentino.,
    Antonio, Messeni, Petruzzelli., & Antonio, Uva. (2020). Unveiling the
    technological trends of augmented reality: A patent analysis. Computers in
    Industry, 118:103221. https://doi.org/10.1016/j.compind.2020.103221.
    Alves, T., Rodrigues, R., Costa, H., & Rocha, M. (2017). Development of Text
    Mining Tools for Information Retrieval from Patents. Paper presented at the
    International Conference on Practical Applications of Computational Biology &
    Bioinformatics. https://doi.org/10.1007/978-3-319-60816-7_9.
    Amy, J.C. Trappery., Charles, V. Trappey., Jheng-Long, Wu., & Jack, W.C. Wang(2020).
    Intelligent compilation of patent summaries using machine learning and
    natural language processing techniques. Advanced Engineering Informatics,
    volume 43, January 2020, 101027. https://doi.org/10.1016/j.aei.2019.101027
    Changyong, Lee., Bokyoung, Kang., & Juneseuk, Shin. (2015). Novelty-focused
    patent mapping for technology opportunity analysis. Technological Forecasting
    & Social Change. 90(B), 355-365.https://doi.org/10.1016/j.techfore.2014.05.010.
    Chen, Y.-L., & Chang, Y.-C. (2012). A three-phase method for patent classification.
    Information Processing & Management, 48(6), 1017-1030.
    https://doi.org/10.1016/j.ipm.2011.11.001.
    Chen, Y.-L., & Chiu, Y.-T. (2011). An IPC-based vector space model for patent
    retrieval. Information Processing & Management, 47(3), 309-322.
    https://doi.org/10.1016/j.ipm.2010.06.001
    Claude, Coulombe. (2018). Text Data Augmentation Made Simple By Leveraging
    NLP Cloud APIs. Doctorant Informatique Cognitive, TELUQ/UQAM, Consultant.
    Lingua Technologies Inc. DataFranca.
    Daniel, Tamming. (2020). Data Augmentation for Text Classification Tasks. A thesis
    presented to the University of Waterloo in fulfillment of the thesis requirement
    for the degree of Master of Mathematics in Computer Science. Waterloo,
    Ontario, Canada.
    Davide, Mazzini., Paolo, Napoletano., Flavio, Piccoli., & Raimondo, Schettini. (2020).
    A Novel Approach to Data Augmentation for Pavement Distress Segmentation.
    Computers in Industry, 121:103225.
    https://doi.org/10.1016/j.compind.2020.103225
    Dietmar Harhoff, Francis Narin, FM Scherer, & Katrin Vopel. (1999). Citation
    Frequency and the Value of Patented Inventions. Review of Economics and
    Statistics.81(3),511-515.
    Dietmar Harhoff, Frederic M. Scherer, & Katrin Vopel. (2003). Citations, family
    size, opposition, and the value of patent rights. Research Policy. 32(8), 1343-
    1363. https://doi.org/10.1016/S0048-7333(02)00124-5
    Douglas, HM, Leandro, IL d. F., Roniberto, M. d. A., & Jose, ARG(2017).
    Claim-based patent indicators: A novel approach to analyze patent content
    and monitor technological advances. World Patent Information, 50,64-72.
    https://doi.org/10.1016/j.wpi.2017.08.008
    Fenglong, Su., & Qinghua, Xie. (2016). Research on clustering extraction of
    domain entity attribute words based on deep learning. Electronic technology
    applications. 42(6),1674-7720. doi:10.3966/199115992019023001004.
    Feng, Zhenyu.(2002). Comment on Festo Case of the Supreme Court of the United
    States--Although the theory of equality continues to be effective, its
    influence is gradually limited. Intellectual property.200207.
    Geng, Jun., Liu, Jiangbin., & Sun, Yuanzhao.(2000). US Patent Law litigation on
    Doctrine of Equivalents case studies. Intellectual Property Office. Ministry of
    Economic Affairs, R. O. C.
    Google. (2020). Google Patents. Retrieved from
    https://patents.google.com/advanced
    H, Borko. & M, Bernick. Automatic document classification. (1963). Journal of
    the ACM. https://doi.org/10.1145/321160.321165.
    Hongbin, K.; Junegak, J.; Kwangsoo, K. Semi-automatic extraction of
    technological causality from patents. Comput. Ind. Eng. 2018, 115, 532–542.
    Sustainability 2018, 10, 3729 18 of 18
    Intelligence generates confidence (2022, April 9). GAN model for text generation.
    Retrieved from https://cloud.tencent.com/developer/article/1885686.
    Janghyeok, Yoon., Byeongki, Jeong., Mujin, Kim., & Changyong, Lee. (2021). An
    information entropy and latent Dirichlet allocation approach to noise patent
    filtering. Advanced Engineering Informatics, volume 47, January 2021,
    101243. https://doi.org/10.1016/j.aei.2020.101243.
    Janice, M. Mueller., (2006). An Introduction to Patent Law. Second Edition.
    ASPEN PUBLISHERS.
    Jeffrey, Pennington., Richard, Socher., & Christopher, D, Manning. (2014). Glove:
    Global vectors for word representation. Proceedings of the 2014 conference
    on empirical methods in natural language processing.
    http://dx.doi.org/10.3115/v1/D14-1162
    Jiaxian, Guo., Sidi, Lu., Han, Cai., Weiana, Zhang., Yong, Yu., & Jun, Wang. (2017). 
    Long Text Generation via Adversarial Training with Leaked Information.
    Cornell University.
    Jin, Wang., Liang-Chih, Yu., K. Robert, Lai., & Xuejie, Zhang.(2016).
    Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model.
    Proceedings of the 54th Annual Meeting of the Association for
    Computational Linguistics, Berlin, Germany.
    http://dx.doi.org/10.18653/v1/P16-2037.
    Junghyun, Min., R., Thomas, McCoy., Dipanjan, Das., Emily, Pitler., and Tal, Linzen.
    (2020). Syntactic Data Augmentation Increases Robustness to Inference
    Heuristics. Proceedings of the 58th Annual Meeting of the Association for
    Computational Linguistics. https://aclanthology.org/2020.acl-main.212
    Kim, J., Choi, J., Park, S., & Jang, D. (2018). Patent Keyword Extraction for
    Sustainable Technology Management. Sustainability (2071-1050), 10(4).
    https://doi.org/10.3390/su10041287
    Kim, Y.(2014). Convolutional Neural Networks for Sentence Classification.
    Proceedings of the 2014 Conference on Empirical Methods in Natural
    Language Processing (EMNLP 2014), 1746–1751.
    http://dx.doi.org/10.3115/v1/D14-1181.
    Konstantinos, Markellos., Katerina, Perdikuri., Penelope, Markellou., Spiros,
    Sirmakessis., George, Mayritsakis., & Athanasios, Tsakalidis. (2002). Knowledge
    discovery in patent databases. Proceedings of the eleventh international
    conference on information and knowledge management.
    https://doi.org/10.1145/584792.584915.
    Lantao, Yu., Weinan, Zhang., Jun, Wang., & Yong, Yu. (2016). SeqGAN: Sequence
    Generative Adversarial Nets with Policy Gradient. Cornell University.
    Lee, C., Kim, J., Kwon, O., & Woo, H.-G. (2016). Stochastic technology life cycle
    analysis using multiple patent indicators. Technological Forecasting and Social
    Change, 106, 53-64. https://doi.org/10.1016/j.techfore.2016.01.024.
    Lee, C., Kwon, O., Kim, M., & Kwon, D. (2018). Early identification of emerging
    technologies: A machine learning approach using multiple patent indicators.
    Technological Forecasting and Social Change, 127, 291-303.
    https://doi.org/10.1016/j.techfore.2017.10.002.
    Lee, S., Yoon, B., & Park, Y. (2009). An approach to discovering new technology
    opportunities: Keyword-based patent map approach. Technovation, 29(6-7),
    481-497. https://doi.org/10.1016/j.technovation.2008.10.006.
    Leonidas, Aristodemou., & Frank, Tietze. (2018). The state-of-the-art on Intellectual
    Property Analytics (IPA): A literature review on artificial intelligence, machine
    learning and deep learning methods for analysing intellectual property (IP)
    data. World Patent Information, 55(2018) 37-51.
    https://doi.org/10.1016/j.wpi.2018.07.002
    M.A. Hasan,, W.S. Spangler,, T. Griffin,, A. Alba,, COA: Finding novel patents through
    text analysis, Proc. 15th ACM SIGKDD International Conference on Knowledge
    Discovery and Data Mining (KDD’09), pp.1175–1184, ACM (online), DOI:
    http://doi.acm.org/ 10.1145/1557019.1557146. (2009).
    Mnih Andriy, Hinton Geoffrey, A Scalable Hierarchical Distributed Language Model,
    NIPS (2008).
    Monir, Ech-Chouyyerkh., Hicham Omara., & Mohamed, Lazaar. (2019). Scientific
    paper classification using Convolutional Neural Networks. Proceedings of the
    4th International Conference on Big Data and Internet of Things.13, 1-6.
    https://doi.org/10.1145/3372938.3372951
    Muhammad, Abulaish, SMIEEE. & Amit Kumar Sah. (2019). A Text Data
    Augmentation Approach for Improving the Performance of CNN. Proceedings
    of the 11th International Conference on Communication Systems &
    Networks(COMSNETS). https://doi.org/10.1109/COMSNETS.2019.8711054.
    Po-Wei, Wu., Yu-Jing, Lin., Che-Han, Chang., Edward, Y. Chang., & Shih-Wei Liao.
    (2019. RelGAN: Multi-Domain Image-to-Image Translation via Relative
    Attributes. Cornell University.
    Rahul, Kapoor., Matti, Karvonen., Samira, Ranaei., & Tuomo, Kassi. (2015).
    Patent portfolios of European wind industry: New insights using citation
    categories. World Patent Information. 41, 4-10.
    https://doi.org/10.1016/j.wpi.2015.02.002
    Ralf, Krestel., Renukswamy, Chikkamath., Christoph, Hewel., & Julian, Risch.
    (2021). A survey on deep learning for patent analysis. World Patent
    Information, Volume 65, June 2021, 102035.
    https://doi.org/10.1016/j.wpi.2021.102035
    Roh, T., Jeong, Y., & Yoon, B. (2017). Developing a Methodology of Structuring
    and Layering Technological Information in Patent Documents through Natural
    Language Processing. Sustainability, 9(11), 2117.
    https://doi.org/10.3390/su9112117
    Rosso, P., Correa, S., & Buscaldi, D. (2011). Passage retrieval in legal texts.
    Journal of Logic and Algebraic Programming, 80(3-5), 139-153.
    https://doi.org/10.1016/j.jlap.2011.02.001
    Schastiani, F. (2002). Machine learning in automated text categorization. A CM
    Compuling Surveys, 34, 1-17. https://doi.org/10.1145/505282.505283.
    Shohei, H., Shoko, S., Risa, N., Takashi, I., Rikiya, T., Tetsuya, N., Tsuyoshi, I.,
    Yusuke, K., Rinju, Y., Takeshi, U., Akira, T., & Toshiya, W. (2012). Modeling Patent
    Quality: A System for Large-scale Patentability Analysis using Text Mining. J
    journal of Information Processing, 20(3),655-666.
    https://doi.org/10.2197/ipsjjip.20.655.
    Sunhye, Kim. & Byungun, Yoon. (2020). Patent infringement analysis using a text
    mining technique based on SAO structure. Computers in Industry, 125: 103379.
    https://doi.org/10.1016/j.compind.2020.103379.
    Uspto. (2020). Public Patent Application Information. Retrieved from
    https://portal.uspto.gov/pair/PublicPair
    Wang, C. S., Teng Morris. (2007). US Patent Litigation.
    Weili, Nie., Nina, Narodytska., & Ankit, B., Patel. (2019). RELGAN: RELATIONAL
    GENERATIVE ADVERSARIAL NETWORKS FOR TEXT GENERATION. Published as
    a conference paper at ICLR 2019.
    Yan, Tang. Demey., & Domenico, Golzio. (2020). Search strategies at the European
    Patent Office. World Patent Inforamtion, 63(2020)101989.
    https://doi.org/10.1016/j.wpi.2020.101989
    Yingwen, Wu., Yangjian, Ji., Fu, Gu., & Jianfeng, Guo. (2021). A collaborative
    evaluation method of the quality of patent scientific and technological
    resources. World Patent Information, Volume 67, December 2021, 102074.
    https://doi.org/10.1016/j.wpi.2021.102074
    Youngjung, Geum. & Mirae, Kim. (2020). How to identify promising chances for
    technological innovation: Keygraph-based patent analysis. Adavnced
    Engineering Informatics, Volume 46, October 2020, 101155.
    https://doi.org/10.1016/j.aei.2020.101155
    Yuan, Zhou., Fang, Dong., Yufei, Liu., Zhaofu, Li., JunFei, Du., & Li, Zhang.(2020).
    Forecasting emerging technologies using data augmentation and deep
    learning. Scientometrics, 123:1-29. https://doi.org/10.1007/s11192-020-
    03351-6.
    Yu-Jing, Chiu., Kuang-Chin, Chen., & Hui-Chung, Che. (2021). Patent predictive
    price-to-book ration (PB) on improving investment performance—Evidence in
    China. World Patent Information, Volume 65, June 2021, 102039.
    https://doi.org/10.1016/j.wpi.2021.102039
    Yung-Hsien, Tseng., Chi-Jen, Lin., & Yu-I, Lin. (2007). Text Mining Techniques
    for Patent Analysis. Information Processing and Management. 43, 1216-1247.
    https://doi.org/10.1016/j.ipm.2006.11.011

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE