| 研究生: |
楊家豪 Yang, Chia-Hao |
|---|---|
| 論文名稱: |
應用機器學習於甘藷良品種植之預測 On the Use of Machine Learning to Predict the Production of Premium Grade Sweet Potatoes |
| 指導教授: |
徐立群
shu, li-chun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 會計學系 Department of Accountancy |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 44 |
| 中文關鍵詞: | 甘藷 、機器學習 、隨機森林 、良率 、重要因子 |
| 外文關鍵詞: | Sweet Potato, Machine Learning, Random Forest, Premium Grade Ratio, Important Factors |
| 相關次數: | 點閱:142 下載:11 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
甘藷為台灣第二大農產品,它是台灣早期重要糧食之一,也是現今社會低GI(升醣指數)健康飲食的代表食物之一。但因台灣土地大小受限之因素,若要提升整體甘藷收成產量與品質,勢必須研究土地、種植情況、氣候、人為因素…等對於種植收成之影響,同時也能為企業預估未來可能收成量或品質。如此一來,企業能在未收成前就先估計未來營收情況,並讓公司能提前做經營管理上之變更或決策異動。本研究藉由台灣某產銷班所提供之種植履歷資料、施肥與農藥紀錄、某地區農改場所提供之農業氣候資料和土壤資料,利用五種機器學習方法依照種植前、中、後期分別建立預測模型,並評估模型。在該五種機器學習方法中,本研究發現隨機森林對於甘薯種植良率預測表現最佳,同時隨著種植期數越後期,預測誤差越低。除了能準確預測甘藷良率之外,本研究也找出影響甘藷良品之重要因子,並得出該因子對於甘藷良率增減之變化。有了上述分析結果,本研究相信若未來能取得更多資料作模型驗證、分析,預測出之結果將更能確認該預測模型之價值,且能提供農業企業管理者一個管理決策用的工具,使甘藷種植更有品質、效率。
SUMMARY
The purpose of the thesis is to predict accurate premium grade sweet potatoes ratio. By doing so, these research can help company to make decisions timely. The data resource can be divided into four major areas, which includes five years of planting resumes, meteorological data from agricultural farms in a certain area, fertilization and pesticide data, soil data. The author use five machine learning method, which includes decision tree, linear regression, neural network, gradient boosting regression tree, random forest, to predict the future production of premium grade sweet potatoes. In order to analyze planting situation timely, the author split the planting days in to three periods, which contains pre-planting, mid-planting, post-planting period. There will be five results to interpret each planting period, so the author chooses the minimum root-mean-square error of model as the explanation. After the model comparison, the result shows that random forest has the best performance. As a result, the author use tree-interpreter to calculate each input variable’s contribution and visualize result to explain the discovery.
Key words: Sweet Potato, Machine Learning, Random Forest, Premium Grade Ratio, Important Factors
INTRODUCTION
Nowadays, Sweet potato is the second largest agricultural product in Taiwan. It is not only one of the most important foods in Taiwan but also one of the representative foods of the low GI (glycemic index) healthy diet. However, due to the limited size of land in Taiwan, if we want to increase the yield and quality of sweet potato, we must have to study the influence of soil, planting conditions, climate, and human factors, etc. on planting, at the same time, we can also estimate the potential future yield and quality of sweet potato for the company. The so called premium grade ratio of sweet potato is to calculate the proportion of premium grade among all harvest volume in one planting field. As a result, the purpose of this study is to find out important factor during each planting period and predict premium grade ratio of sweet potato so that the company can estimate the future revenue before harvest, and allow them to make changes in management or decision-making in advance.
MATERIAL AND METHODS
In this chapter, the method is based on the Cross Industry Standard Process for Data Mining(CRISP-DM) to construct this part. First, the author understands the business background, evaluate and analyze the needs of the enterprise, and convert it into a feasible solution for data mining.
Then start with data collection, find out that may affect subject, and after preliminary understanding of the data, organize, transform, and combine into available data sets for the final modeling analysis. The data of this research consist of cultivation historical data, fertilization records, pesticide records, agriculture climate data and soil data.
After setting up the dataset, the author chooses decision tree, linear regression, neural network, gradient boosting regression tree and random forest to predict the pre-planting, mid-planting, post-planting period of sweet potatoes’ premium grade ratio, respectively, to find hidden rules in data and check the design of the model for problems. Finally, evaluate five model by measuring actual output and prediction’s difference. For instance, the author use root-mean-square-error to evaluate the result.
RESULTS AND DISCUSSION
In this chapter, the author will use Gartner’s four stages of data analytics maturity to narrate each stage of analysis. First, the author will do descriptive analytics and diagnostic analytics to explain what happened, why did it happen in the past. After interpreting events that have occurred in the past, the author will do predictive analytics and prescriptive analytics to predict what will happen, how can we make it happen in the future.
Among these five machine learning methods, the study found that random forests performed best for the prediction of premium grade ratio of sweet potato. At the same time, as planting time goes by, the root-mean-square-error is getting lower, and the root-mean-square-error in post-planting period of sweet potatoes is the down to 5.17%.
In addition to accurately predicting premium grade ratio of sweet potato, the study use tree-interpreter python package to calculate each input variable’s contribution and identify important factors affecting premium grade of sweet potato, and concluded that how these factors may affect the increase or decrease in premium grade ratio changes by using waterfall python visualization package, which also known as waterfall map.
CONCLUSION
In summary, with the analysis results, this study believes that if there will be more data that are available for model validation and analysis in the future, the predicted results will be more capable to confirm the value of the forecast model and provide a management tool for agricultural enterprise managers, which makes sweet potato be more quality.
參考文獻
一、 中文文獻
1. 劉天成,2010,「我國精準農業的發展方向與策略」,農政與農情,第91期。
2. 林楚羚,2012,「甘藷於臺灣飲食文化脈絡中的演變」,國立高雄餐旅大學,台灣飲食文化產業研究所。
3. 林俊義、楊純明,2000,「發展精準農業關鍵性技術,加速農業生產技術升級」,水稻精準農業(耕)體系之研究,1-6頁。
4. 彭作奎、謝佑立,2008,「台灣農業結購之變化與農業政策之重點」. 台灣農學會報9(6):604-614。
5. 陳孟萱. (2015). 應用機器學習方法於作物種植條件之研究. 成功大學會計學系學位論文, 1-51.
6. 徐芳玉. (2017). 應用機器學習於作物生長關鍵因素及產量與良率預測之研究. 長榮大學資訊管理研究所學位論文, 1-50.
二、 英文文獻
1. Villordon, A. Q., La Bonte, D. R., Firon, N., Kfir, Y., Pressman, E., & Schwartz, A. (2009). Characterization of adventitious root development in sweet potato. HortScience, 44(3), 651-655.
2. Stone, R. C., & Meinke, H. (2005). Operational seasonal forecasting of crop performance. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1463), 2109-2124.
3. Keating, B. A., Carberry, P. S., Hammer, G. L., Probert, M. E., Robertson, M. J., Holzworth, D., ... & McLean, G. (2003). An overview of APSIM, a model designed for farming systems simulation. European journal of agronomy, 18(3-4), 267-288.
4. Stöckle, C. O., Donatelli, M., & Nelson, R. (2003). CropSyst, a cropping systems simulation model. European journal of agronomy, 18(3-4), 289-307.
5. Ramirez-Villegas, J., Jarvis, A., & Läderach, P. (2013). Empirical approaches for assessing impacts of climate change on agriculture: The EcoCrop model and a case study with grain sorghum. Agricultural and Forest Meteorology, 170, 67-78.
6. McBratney, Alex, et al. "Future directions of precision agriculture." Precision agriculture 6.1 (2005): 7-23.
7. Behmann, J., Mahlein, A. K., Rumpf, T., Römer, C., & Plümer, L. (2015). A review of advanced machine learning methods for the detection of biotic stress in precision crop protection. Precision Agriculture, 16(3), 239-260.
8. Herwitz, S. R., et al. "Imaging from an unmanned aerial vehicle: agricultural surveillance and decision support." Computers and electronics in agriculture 44.1 (2004): 49-61.
9. Psirofonia, P., Samaritakis, V., Eliopoulos, P., & Potamitis, I. (2017). Use of unmanned aerial vehicles for agricultural applications with emphasis on crop protection: three novel case-studies. International Journal of Agricultural Science and Technology.
10. Felipe, F. Bocca, and Luiz, Henrique Antunes Rodrigues (2016), “The effect of tuning, feature engineering, and feature selection in data mining applied to rainfed sugarcane yield modelling”. Computers and Electronics in Agriculture 128 67–76.
11. Razmjooy, N., Mousavi, B. S., & Soleymani, F. (2012). “A real-time mathematical computer method for potato inspection using machine vision”. Computers & Mathematics with Applications, 63(1), 268-279.
12. Ramesh, D., & Vardhan, B. V. (2015). Analysis of crop yield prediction using data mining techniques. International Journal of Research in Engineering and Technology, 4(01), 470-473.
13. Jorquera, H., Perez, R., Cipriano, A., & Acuna, G. (2001). Short term forecasting of air pollution episodes. Environmental modeling, 4.
14. Rajagopalan, B., & Lall, U. (1999). A k‐nearest‐neighbor simulator for daily precipitation and other weather variables. Water resources research, 35(10), 3089-3101.
15. Tripathi, S., Srinivas, V. V., & Nanjundiah, R. S. (2006). Downscaling of precipitation for climate change scenarios: a support vector machine approach. Journal of hydrology, 330(3-4), 621-640.
16. Villordon, A., Clark, C., Ferrin, D., & LaBonte, D. (2009). Using growing degree days, agrometeorological variables, linear regression, and data mining methods to help improve prediction of sweet potato harvest date in Louisiana. HortTechnology, 19(1), 133-144.
17. Interpreting random forest (http://blog.datadive.net/interpreting-random-forests/)
18. Visualization of random forest’s contribution (https://github.com/chrispaulca/waterfall)