| 研究生: |
卓冠廷 Chuo, Kuang-Ting |
|---|---|
| 論文名稱: |
以機器學習開發事業廢水未妥善處理排放潛勢之預測模型 Potential prediction models for the discharge of industrial wastewater without treatment based on machine learning |
| 指導教授: |
陳必晟
Chen, Pi-Cheng |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 環境工程學系 Department of Environmental Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 93 |
| 中文關鍵詞: | 工業廢水 、犯罪預測 、機器學習 |
| 外文關鍵詞: | industrial wastewater, crime forecasting, machine learning |
| 相關次數: | 點閱:138 下載:8 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
事業直接排放未處理之工業廢水會造成環境污染,甚至影響人體健康,業者為了節省操作水污染防治設備之費用,利用繞流、暗管與稀釋排放等手法排放工業廢水,其中可能含有重金屬、強酸、強鹼及其他有害物質,因此水污染非法排放已成為重要議題。環保署為了遏止非法排放惡行,改採深度稽查取代過往管末控制,期望能達到嚇阻作用。但由於稽查人力有限,加上成功開罰比率低,因此提升稽查效率相當重要。建立非法水污染預測模型,可以在事件發生前提前預警,支援稽查人員決策。進行犯罪預測通常使用回顧型預測,即利用歷史資料預測未來事件。過往研究使用統計方法建造犯罪預測模型,但由於犯罪事件有非線性、非均質等特質,且犯罪之預測因子之間重要性未知,而無法準確預測,因此採用機器學習演算法建立模型,可解決上述問題。本研究假設污泥量不合理變化時,可能進行非法排放,結合人口、地理、環境及氣象等資訊,利用機器學習演算法建立預測模型。本研究篩選廢棄物資料中,處理製程廢水產生污泥之資料,將污泥量依行業分類,再與各行業之生產指數進行二元資料轉換,找出污泥量減少但生產指數上升的月份。由於2月份為台灣過年期間,污泥總量較其他月份驟減,使轉換之結果出現異常,因此本研究設計兩種情境,分別為包含2月資料與不包含2月資料,並以降雨、河川水質、縣市及鄉鎮市區人口密度等作為預測因子,將資料前處理後利用Synthetic Minority Oversampling Technique (SMOTE)及Random Undersampling解決資料不平衡問題,再使用Random Forest (RF)、K-Nearest Neighbor (KNN)、Support Vector Machine (SVM)及Multilayer Perceptron (MLP)四種演算法進行機器學習,得到最佳結果為不包含2月資料、利用SMOTE進行資料平衡及RF演算法進行機器學習。建立預測模型後,篩選出最適用之前三項行業,分別為印刷電路板製造業Area Under Curve (AUC)為0.834、Recall為0.88;金屬表面處理業AUC為0.765、Recall為0.85;積體電路製造業AUC為0.75、Recall為0.85,作為稽查單位輔助決策工具。
Discharging wastewater without treatment causes environmental pollution. Industrial wastewater contains pollutants that could be hazardous to human health. In order to reduce the cost of water treatment, factories illegally discharge the industrial wastewater directly or with an underground pipeline. Environmental inspection is a method to prevent illicit wastewater discharge. However, the low efficiency results in the short-staffed problem. Therefore, building a crime forecasting system to be a decision-making supporting tool can predict the illegal event to help environmental agencies have appropriate staffing deployment. Since crime is a complex social problem, developing a crime forecasting system with machine learning can overcome non-linear, heterogeneous, and unknown feature importance problems. Observing the changes in waste sludge amount, we transform the data into binary format to indicate the probability of illegal discharges. The research is divided into two experiments due to the drastic decrease of waste sludge during the lunar New Year. Scenario.1 remains original data, and scenario.2 excludes February data. We selected precipitation, river quality, city, and population density to be the predicting features, and used Synthetic Minority Oversampling Technique (SMOTE) and Random Undersampling to solve imbalance data problem, and performed machine learning with Random Forest (RF), K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Multilayer Perceptron (MLP). The best result is scenario.2, which excluding February data with SMOTE and RF algorithm. Three industries applicable for building prediction models were Printed Circuit Boards with AUC 0.83, surface treatments with AUC 0.765, and IC manufacturing with AUC 0.75. Validating models with wastewater punishments records, the rates for all industries were above 15%, which were higher than the efficiency of environmental inspection.
Azeroual, A., Taher, Y., & Nsiri, B. (2020). Recidivism forecasting: A study on process of feature selection. ACM International Conference Proceeding Series. Scopus. https://doi.org/10.1145/3386723.3387848
Brantingham, P. J., & Brantingham, P. L. (1984). Patterns in crime. Macmillan ; Collier Macmillan.
Caiying, L., Yan, W., Tan, W., Yongmei, C., Hengliang, M., Pingyu, W., & Yanzhi, S. (2013). Study on Wastewater Chemical Fingerprint Database for Identifying the Pollution Source of Illegal Discharge. 2013 Fourth International Conference on Digital Manufacturing & Automation, 1346–1349. https://doi.org/10.1109/ICDMA.2013.321
Chachuła, K., Nowak, R., & Solano, F. (2021). Pollution Source Localization in Wastewater Networks. Sensors, 21(3), 826. https://doi.org/10.3390/s21030826
Chomboon, K., Chujai, P., Teerarassamee, P., Kerdprasop, K., & Kerdprasop, N. (2015). An Empirical Study of Distance Metrics for k-Nearest Neighbor Algorithm. 7.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
Craig, P. P., Gadgil, A., & Koomey, J. G. (2002). What Can History Teach Us? A Retrospective Examination of Long-Term Energy Forecasts for the United States. Annual Review of Energy and the Environment, 27(1), 83–118. https://doi.org/10.1146/annurev.energy.27.122001.083425
Elluri, L., Mandalapu, V., & Roy, N. (2019). Developing Machine Learning Based Predictive Models for Smart Policing. 2019 IEEE International Conference on Smart Computing (SMARTCOMP), 198–204. https://doi.org/10.1109/SMARTCOMP.2019.00053
Fu, F. (2011). Removal of heavy metal ions from wastewaters: A review. Journal of Environmental Management, 12.
Gorr, W., & Harries, R. (2003). Introduction to crime forecasting. International Journal of Forecasting, 19(4), 551–555. Scopus. https://doi.org/10.1016/S0169-2070(03)00089-X
Gunatilake, S. (2015). Methods of Removing Heavy Metals from Industrial Wastewater. Journal of Multidiciplinary Engineering Science Studies, 1.
Hand, D. J., & Adams, N. M. (2015). Data Mining. 收入 N. Balakrishnan, T. Colton, B. Everitt, W. Piegorsch, F. Ruggeri, & J. L. Teugels (編輯), Wiley StatsRef: Statistics Reference Online (頁 1–7). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118445112.stat06466.pub2
Hauser, F. M., Metzner, T., Rößler, T., Pütz, M., & Krause, S. (2019). Real-time wastewater monitoring as tool to detect clandestine waste discharges into the sewage system. Environmental Forensics, 20(1), 13–25. https://doi.org/10.1080/15275922.2019.1566295
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. Scopus. https://doi.org/10.1109/TKDE.2008.239
Hiropoulos, A., & Porter, J. (2014). Visualising Property Crime in Gauteng: Applying GIS to crime pattern theory. South African Crime Quarterly, 47(1), 17. https://doi.org/10.4314/sacq.v47i1.2
Hwa-Lung, Y., & Chih-Hsin, W. (2010). Retrospective prediction of intraurban spatiotemporal distribution of PM2.5 in Taipei. Atmospheric Environment, 44(25), 3053–3065. https://doi.org/10.1016/j.atmosenv.2010.04.030
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/science.aaa8415
Kang, H.-W., & Kang, H.-B. (2017). Prediction of crime occurrence from multimodal data using deep learning. PLoS ONE, 12(4). Scopus. https://doi.org/10.1371/journal.pone.0176244
Kansagara, D., Englander, H., Salanitro, A., Kagen, D., Theobald, C., Freeman, M., & Kripalani, S. (2011). Risk Prediction Models for Hospital Readmission: A Systematic Review. JAMA, 306(15), 1688. https://doi.org/10.1001/jama.2011.1515
Karpievitch, Y. V., Dabney, A. R., & Smith, R. D. (2012). Normalization and missing value imputation for label-free LC-MS analysis. 9.
Kounadi, O., Ristea, A., Araujo, A., & Leitner, M. (2020). A systematic review on spatial crime forecasting. Crime Science, 9(1), 7. https://doi.org/10.1186/s40163-020-00116-7
Lega, M., Ferrara, C., Persechino, G., & Bishop, P. (2014). Remote sensing in environmental police investigations: Aerial platforms and an innovative application of thermography to detect several illegal activities. Environmental Monitoring and Assessment, 186(12), 8291–8301. https://doi.org/10.1007/s10661-014-4003-3
Morrison, R. D. (2000). Critical Review of Environmental Forensic Techniques: Part I. Environmental Forensics, 1(4), 157–173. https://doi.org/10.1006/enfo.2000.0017
Ort, C., & Siegrist, H. (2009). Assessing wastewater dilution in small rivers with high resolution conductivity probes. Water Science and Technology, 59(8), 1593–1601. https://doi.org/10.2166/wst.2009.174
Owusu-Asante, Y. (2019). Analysis and determination of optimum risk factors to prioritize illegal discharge potential in urban catchments. Physics and Chemistry of the Earth, Parts A/B/C, 111, 86–99. https://doi.org/10.1016/j.pce.2019.04.007
Ridzuan Khairuddin, A., Alwee, R., & Haron, H. (2020). A Comparative Analysis of Artificial Intelligence Techniques in Forecasting Violent Crime Rate. IOP Conference Series: Materials Science and Engineering, 864, 012056. https://doi.org/10.1088/1757-899X/864/1/012056
Ruck, D. W., Rogers, S. K., & Kabrisky, M. (1989). Feature Selection Using a Multilayer Perceptron. 14.
Rumi, S. K., Deng, K., & Salim, F. D. (2018). Crime event prediction with dynamic features. EPJ Data Science, 7(1), 43. https://doi.org/10.1140/epjds/s13688-018-0171-7
Saranya, C., & Manikandan, G. (2013). A Study on Normalization Techniques for Privacy Preserving Data Mining. International Journal of Engineering and Technology, 5(3), 4.
Swami, D., & Buddhi, D. (2006). Removal of contaminants from industrial wastewater through various non-conventional technologies: A review. International Journal of Environment and Pollution, 27(4), 324. https://doi.org/10.1504/IJEP.2006.010576
Tasaki, T., Kawahata, T., Osako, M., Matsui, Y., Takagishi, S., Morita, A., & Akishima, S. (2007). A GIS-based zoning of illegal dumping potential for efficient surveillance. Waste Management, 27(2), 256–267. Scopus. https://doi.org/10.1016/j.wasman.2006.01.018
Tin Kam Ho. (1995). Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 1, 278–282. https://doi.org/10.1109/ICDAR.1995.598994
Wang, J., Hu, J., Shen, S., Zhuang, J., & Ni, S. (2020). Crime risk analysis through big data algorithm with urban metrics. Physica A: Statistical Mechanics and its Applications, 545. Scopus. https://doi.org/10.1016/j.physa.2019.123627
Wang, P., Mathieu, R., Ke, J., & Cai, H. J. (2010). Predicting Criminal Recidivism with Support Vector Machine. 2010 International Conference on Management and Service Science, 1–9. https://doi.org/10.1109/ICMSS.2010.5575352
Xue, Y., & Brown, D. E. (2003). Decision Based Spatial Analysis of Crime. 收入 H. Chen, R. Miranda, D. D. Zeng, C. Demchak, J. Schroeder, & T. Madhusudan (編輯), Intelligence and Security Informatics (卷 2665, 頁 153–167). Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-44853-5_12
Zhou, Z.-H. (2018). A brief introduction to weakly supervised learning. National Science Review, 5(1), 44–53. https://doi.org/10.1093/nsr/nwx106
行政院環保署(2021)。109年環保署統計年報。
經濟部統計處(2020)。108年工廠校正及營運調查報告。檢自https://www.moea.gov.tw/Mns/dos/content/Content.aspx?menu_id=6819(Aug. 9, 2021)
行政院環境保護署督察總隊(2013)。行政院環保署環保新聞。檢自https://enews.epa.gov.tw/Page/3B3C62C78849F32F/14d56d29-215b-4f85-8b9e-00c5654c804d(Aug. 9, 2021)
經濟部統計處(2021)。工業產銷存動態調查。
中央氣象局(2021)。觀測資料查詢站況資訊。檢自https://e-service.cwb.gov.tw/wdps/obs/state.htm(Aug. 9, 2021)
中央氣象局(2021)。觀測資料查詢系統。檢自https://e-service.cwb.gov.tw/HistoryDataQuery/index.jsp(Aug. 9, 2021)
行政院環保署(2021)。列管事業單位資料(含裁處資訊)查詢系統。檢自https://prtr.epa.gov.tw/(Aug. 9, 2021)
行政院環保署(2021)。全國環境水質監測資訊網。檢自https://wq.epa.gov.tw/EWQP/zh/Default.aspx(Aug. 9, 2021)
內政部戶政司(2021)。鄉鎮土地面積及人口密度。檢自https://www.ris.gov.tw/app/portal/346(Aug. 9, 2021)