簡易檢索 / 詳目顯示

研究生: 卓冠廷
Chuo, Kuang-Ting
論文名稱: 以機器學習開發事業廢水未妥善處理排放潛勢之預測模型
Potential prediction models for the discharge of industrial wastewater without treatment based on machine learning
指導教授: 陳必晟
Chen, Pi-Cheng
學位類別: 碩士
Master
系所名稱: 工學院 - 環境工程學系
Department of Environmental Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 93
中文關鍵詞: 工業廢水犯罪預測機器學習
外文關鍵詞: industrial wastewater, crime forecasting, machine learning
相關次數: 點閱:138下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 事業直接排放未處理之工業廢水會造成環境污染,甚至影響人體健康,業者為了節省操作水污染防治設備之費用,利用繞流、暗管與稀釋排放等手法排放工業廢水,其中可能含有重金屬、強酸、強鹼及其他有害物質,因此水污染非法排放已成為重要議題。環保署為了遏止非法排放惡行,改採深度稽查取代過往管末控制,期望能達到嚇阻作用。但由於稽查人力有限,加上成功開罰比率低,因此提升稽查效率相當重要。建立非法水污染預測模型,可以在事件發生前提前預警,支援稽查人員決策。進行犯罪預測通常使用回顧型預測,即利用歷史資料預測未來事件。過往研究使用統計方法建造犯罪預測模型,但由於犯罪事件有非線性、非均質等特質,且犯罪之預測因子之間重要性未知,而無法準確預測,因此採用機器學習演算法建立模型,可解決上述問題。本研究假設污泥量不合理變化時,可能進行非法排放,結合人口、地理、環境及氣象等資訊,利用機器學習演算法建立預測模型。本研究篩選廢棄物資料中,處理製程廢水產生污泥之資料,將污泥量依行業分類,再與各行業之生產指數進行二元資料轉換,找出污泥量減少但生產指數上升的月份。由於2月份為台灣過年期間,污泥總量較其他月份驟減,使轉換之結果出現異常,因此本研究設計兩種情境,分別為包含2月資料與不包含2月資料,並以降雨、河川水質、縣市及鄉鎮市區人口密度等作為預測因子,將資料前處理後利用Synthetic Minority Oversampling Technique (SMOTE)及Random Undersampling解決資料不平衡問題,再使用Random Forest (RF)、K-Nearest Neighbor (KNN)、Support Vector Machine (SVM)及Multilayer Perceptron (MLP)四種演算法進行機器學習,得到最佳結果為不包含2月資料、利用SMOTE進行資料平衡及RF演算法進行機器學習。建立預測模型後,篩選出最適用之前三項行業,分別為印刷電路板製造業Area Under Curve (AUC)為0.834、Recall為0.88;金屬表面處理業AUC為0.765、Recall為0.85;積體電路製造業AUC為0.75、Recall為0.85,作為稽查單位輔助決策工具。

    Discharging wastewater without treatment causes environmental pollution. Industrial wastewater contains pollutants that could be hazardous to human health. In order to reduce the cost of water treatment, factories illegally discharge the industrial wastewater directly or with an underground pipeline. Environmental inspection is a method to prevent illicit wastewater discharge. However, the low efficiency results in the short-staffed problem. Therefore, building a crime forecasting system to be a decision-making supporting tool can predict the illegal event to help environmental agencies have appropriate staffing deployment. Since crime is a complex social problem, developing a crime forecasting system with machine learning can overcome non-linear, heterogeneous, and unknown feature importance problems. Observing the changes in waste sludge amount, we transform the data into binary format to indicate the probability of illegal discharges. The research is divided into two experiments due to the drastic decrease of waste sludge during the lunar New Year. Scenario.1 remains original data, and scenario.2 excludes February data. We selected precipitation, river quality, city, and population density to be the predicting features, and used Synthetic Minority Oversampling Technique (SMOTE) and Random Undersampling to solve imbalance data problem, and performed machine learning with Random Forest (RF), K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Multilayer Perceptron (MLP). The best result is scenario.2, which excluding February data with SMOTE and RF algorithm. Three industries applicable for building prediction models were Printed Circuit Boards with AUC 0.83, surface treatments with AUC 0.765, and IC manufacturing with AUC 0.75. Validating models with wastewater punishments records, the rates for all industries were above 15%, which were higher than the efficiency of environmental inspection.

    中文摘要 I 目錄 VI 圖目錄 VIII 表目錄 X 第一章 緒論 1 1.1研究背景 1 1.2研究目的 4 第二章 文獻回顧 5 2.1水污染非法排放研究議題 5 2.1.1水污染稽查 5 2.1.2水污染非法排放相關研究 8 2.1.3事業廢棄物 8 2.2犯罪預測 10 2.3機器學習 11 2.3.1機器學習簡介 11 2.3.2機器學習演算法 14 2.3.3交叉驗證 17 2.3.4模型評估 18 第三章 研究方法 23 3.1研究設計 23 3.2資料選擇與來源 25 3.2.1廢棄物資料 25 3.2.2列管事業單位基本資料來源 26 3.2.3工業產銷存生產指數 27 3.2.4氣象資料 27 3.2.5河川水質資料 28 3.2.6鄉鎮市區人口密度資料 29 3.2.7縣市代碼資料 29 3.3資料蒐集與資料整合 31 3.3.1污泥資料結合列管事業單位地理位址 31 3.3.2整合氣象資料 33 3.3.3整合河川水質資料 34 3.3.4整合鄉鎮市區人口密度資料 35 3.4資料前處理 36 3.4.1資料清理與篩選 36 3.4.2資料轉換 37 3.5探索性資料分析 40 3.5.1污泥資料探勘 40 3.6機器學習 43 3.6.1機器學習演算法 43 第四章結果與討論 45 4.1預測因子與類別資料關係 45 4.2機器學習演算法比較與優選 50 4.3資料平衡方法比較與優選 53 4.4情境比較 54 4.5適用預測模型之行業篩選 56 4.6行業探討 57 4.6.1印刷電路板製造業 57 4.6.2金屬表面處理業 64 4.6.3積體電路製造業 71 4.7裁處資訊驗證 79 第五章結論與建議 81 5.1結論 81 5.2建議 81 參考文獻 83 附錄A 89

    Azeroual, A., Taher, Y., & Nsiri, B. (2020). Recidivism forecasting: A study on process of feature selection. ACM International Conference Proceeding Series. Scopus. https://doi.org/10.1145/3386723.3387848
    Brantingham, P. J., & Brantingham, P. L. (1984). Patterns in crime. Macmillan ; Collier Macmillan.
    Caiying, L., Yan, W., Tan, W., Yongmei, C., Hengliang, M., Pingyu, W., & Yanzhi, S. (2013). Study on Wastewater Chemical Fingerprint Database for Identifying the Pollution Source of Illegal Discharge. 2013 Fourth International Conference on Digital Manufacturing & Automation, 1346–1349. https://doi.org/10.1109/ICDMA.2013.321
    Chachuła, K., Nowak, R., & Solano, F. (2021). Pollution Source Localization in Wastewater Networks. Sensors, 21(3), 826. https://doi.org/10.3390/s21030826
    Chomboon, K., Chujai, P., Teerarassamee, P., Kerdprasop, K., & Kerdprasop, N. (2015). An Empirical Study of Distance Metrics for k-Nearest Neighbor Algorithm. 7.
    Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
    Craig, P. P., Gadgil, A., & Koomey, J. G. (2002). What Can History Teach Us? A Retrospective Examination of Long-Term Energy Forecasts for the United States. Annual Review of Energy and the Environment, 27(1), 83–118. https://doi.org/10.1146/annurev.energy.27.122001.083425
    Elluri, L., Mandalapu, V., & Roy, N. (2019). Developing Machine Learning Based Predictive Models for Smart Policing. 2019 IEEE International Conference on Smart Computing (SMARTCOMP), 198–204. https://doi.org/10.1109/SMARTCOMP.2019.00053
    Fu, F. (2011). Removal of heavy metal ions from wastewaters: A review. Journal of Environmental Management, 12.
    Gorr, W., & Harries, R. (2003). Introduction to crime forecasting. International Journal of Forecasting, 19(4), 551–555. Scopus. https://doi.org/10.1016/S0169-2070(03)00089-X
    Gunatilake, S. (2015). Methods of Removing Heavy Metals from Industrial Wastewater. Journal of Multidiciplinary Engineering Science Studies, 1.
    Hand, D. J., & Adams, N. M. (2015). Data Mining. 收入 N. Balakrishnan, T. Colton, B. Everitt, W. Piegorsch, F. Ruggeri, & J. L. Teugels (編輯), Wiley StatsRef: Statistics Reference Online (頁 1–7). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118445112.stat06466.pub2
    Hauser, F. M., Metzner, T., Rößler, T., Pütz, M., & Krause, S. (2019). Real-time wastewater monitoring as tool to detect clandestine waste discharges into the sewage system. Environmental Forensics, 20(1), 13–25. https://doi.org/10.1080/15275922.2019.1566295
    He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. Scopus. https://doi.org/10.1109/TKDE.2008.239
    Hiropoulos, A., & Porter, J. (2014). Visualising Property Crime in Gauteng: Applying GIS to crime pattern theory. South African Crime Quarterly, 47(1), 17. https://doi.org/10.4314/sacq.v47i1.2
    Hwa-Lung, Y., & Chih-Hsin, W. (2010). Retrospective prediction of intraurban spatiotemporal distribution of PM2.5 in Taipei. Atmospheric Environment, 44(25), 3053–3065. https://doi.org/10.1016/j.atmosenv.2010.04.030
    Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/science.aaa8415
    Kang, H.-W., & Kang, H.-B. (2017). Prediction of crime occurrence from multimodal data using deep learning. PLoS ONE, 12(4). Scopus. https://doi.org/10.1371/journal.pone.0176244
    Kansagara, D., Englander, H., Salanitro, A., Kagen, D., Theobald, C., Freeman, M., & Kripalani, S. (2011). Risk Prediction Models for Hospital Readmission: A Systematic Review. JAMA, 306(15), 1688. https://doi.org/10.1001/jama.2011.1515
    Karpievitch, Y. V., Dabney, A. R., & Smith, R. D. (2012). Normalization and missing value imputation for label-free LC-MS analysis. 9.
    Kounadi, O., Ristea, A., Araujo, A., & Leitner, M. (2020). A systematic review on spatial crime forecasting. Crime Science, 9(1), 7. https://doi.org/10.1186/s40163-020-00116-7
    Lega, M., Ferrara, C., Persechino, G., & Bishop, P. (2014). Remote sensing in environmental police investigations: Aerial platforms and an innovative application of thermography to detect several illegal activities. Environmental Monitoring and Assessment, 186(12), 8291–8301. https://doi.org/10.1007/s10661-014-4003-3
    Morrison, R. D. (2000). Critical Review of Environmental Forensic Techniques: Part I. Environmental Forensics, 1(4), 157–173. https://doi.org/10.1006/enfo.2000.0017
    Ort, C., & Siegrist, H. (2009). Assessing wastewater dilution in small rivers with high resolution conductivity probes. Water Science and Technology, 59(8), 1593–1601. https://doi.org/10.2166/wst.2009.174
    Owusu-Asante, Y. (2019). Analysis and determination of optimum risk factors to prioritize illegal discharge potential in urban catchments. Physics and Chemistry of the Earth, Parts A/B/C, 111, 86–99. https://doi.org/10.1016/j.pce.2019.04.007
    Ridzuan Khairuddin, A., Alwee, R., & Haron, H. (2020). A Comparative Analysis of Artificial Intelligence Techniques in Forecasting Violent Crime Rate. IOP Conference Series: Materials Science and Engineering, 864, 012056. https://doi.org/10.1088/1757-899X/864/1/012056
    Ruck, D. W., Rogers, S. K., & Kabrisky, M. (1989). Feature Selection Using a Multilayer Perceptron. 14.
    Rumi, S. K., Deng, K., & Salim, F. D. (2018). Crime event prediction with dynamic features. EPJ Data Science, 7(1), 43. https://doi.org/10.1140/epjds/s13688-018-0171-7
    Saranya, C., & Manikandan, G. (2013). A Study on Normalization Techniques for Privacy Preserving Data Mining. International Journal of Engineering and Technology, 5(3), 4.
    Swami, D., & Buddhi, D. (2006). Removal of contaminants from industrial wastewater through various non-conventional technologies: A review. International Journal of Environment and Pollution, 27(4), 324. https://doi.org/10.1504/IJEP.2006.010576
    Tasaki, T., Kawahata, T., Osako, M., Matsui, Y., Takagishi, S., Morita, A., & Akishima, S. (2007). A GIS-based zoning of illegal dumping potential for efficient surveillance. Waste Management, 27(2), 256–267. Scopus. https://doi.org/10.1016/j.wasman.2006.01.018
    Tin Kam Ho. (1995). Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 1, 278–282. https://doi.org/10.1109/ICDAR.1995.598994
    Wang, J., Hu, J., Shen, S., Zhuang, J., & Ni, S. (2020). Crime risk analysis through big data algorithm with urban metrics. Physica A: Statistical Mechanics and its Applications, 545. Scopus. https://doi.org/10.1016/j.physa.2019.123627
    Wang, P., Mathieu, R., Ke, J., & Cai, H. J. (2010). Predicting Criminal Recidivism with Support Vector Machine. 2010 International Conference on Management and Service Science, 1–9. https://doi.org/10.1109/ICMSS.2010.5575352
    Xue, Y., & Brown, D. E. (2003). Decision Based Spatial Analysis of Crime. 收入 H. Chen, R. Miranda, D. D. Zeng, C. Demchak, J. Schroeder, & T. Madhusudan (編輯), Intelligence and Security Informatics (卷 2665, 頁 153–167). Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-44853-5_12
    Zhou, Z.-H. (2018). A brief introduction to weakly supervised learning. National Science Review, 5(1), 44–53. https://doi.org/10.1093/nsr/nwx106
    行政院環保署(2021)。109年環保署統計年報。
    經濟部統計處(2020)。108年工廠校正及營運調查報告。檢自https://www.moea.gov.tw/Mns/dos/content/Content.aspx?menu_id=6819(Aug. 9, 2021)
    行政院環境保護署督察總隊(2013)。行政院環保署環保新聞。檢自https://enews.epa.gov.tw/Page/3B3C62C78849F32F/14d56d29-215b-4f85-8b9e-00c5654c804d(Aug. 9, 2021)
    經濟部統計處(2021)。工業產銷存動態調查。
    中央氣象局(2021)。觀測資料查詢站況資訊。檢自https://e-service.cwb.gov.tw/wdps/obs/state.htm(Aug. 9, 2021)
    中央氣象局(2021)。觀測資料查詢系統。檢自https://e-service.cwb.gov.tw/HistoryDataQuery/index.jsp(Aug. 9, 2021)
    行政院環保署(2021)。列管事業單位資料(含裁處資訊)查詢系統。檢自https://prtr.epa.gov.tw/(Aug. 9, 2021)
    行政院環保署(2021)。全國環境水質監測資訊網。檢自https://wq.epa.gov.tw/EWQP/zh/Default.aspx(Aug. 9, 2021)
    內政部戶政司(2021)。鄉鎮土地面積及人口密度。檢自https://www.ris.gov.tw/app/portal/346(Aug. 9, 2021)

    下載圖示 校內:2023-08-11公開
    校外:2023-08-11公開
    QR CODE