| 研究生: |
吳姍珊 Wu, Shan-Shan |
|---|---|
| 論文名稱: |
基於駕駛行為大數據辨識國道事故熱點 Identifying Highway Crash Hotspot Based on Driving Behavior Data Analyzing |
| 指導教授: |
李威勳
Lee, Wei-Hsun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 交通管理科學系 Department of Transportation and Communication Management Science |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 115 |
| 中文關鍵詞: | 時空關聯駕駛行為資料 、多樣態事故預測 、機器學習方法 、資料不平衡 |
| 外文關鍵詞: | Spatial-temporal driving behavior data, Multiple crash type and severity prediction, Machine learning, Data imbalance |
| 相關次數: | 點閱:146 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
根據107年國道事故檢討分析報告指出,107年國道事故肇事因素為人為因素者約佔總體9成以上(交通部高速公路局, 2019)。由此見得駕駛人行為表現與決策為影響事故之關鍵成因,然而現有事故資料為靜態紀錄資料,導致現有國道交通秩序改善難以納入動態駕駛行為因素,相關學術研究蒐集自然駕駛行為資料時亦面臨缺乏實際道路交通事故。儘管駕駛人為交通事故中關鍵因素,然而不論學術或實務面皆因資料蒐集困難,難以將駕駛行為納入。
為改善難以納入駕駛行為人因素之困境,本研究嘗試奠基於國道客運危險駕駛行為資料、國道交通事故資料、國道電子計程收費資料,建構國道交通事故撞擊類型與死傷嚴重度多樣態之時空分佈預測模型。首先透過時空串連駕駛行為與真實事故資料,將真實事故反向標記為相同時空下駕駛行為後續可能發生之事故(Possible Crash),以此作為模型訓練依據。模型包括多項邏輯斯迴歸模型(Multinomial Logistic Regression)、K-近鄰演算法(k-Nearest Neighbors, kNN)、隨機森林(Random Forest)、CatBoost等4種模型,並應用過採樣(Over Sampling)與欠採樣(Under Sampling)等多種資料不平衡改善方法。實驗結果顯示,以隨機森林分類效果最好,預測事故是否發生AUC可達0.99;事故撞擊類型與死傷嚴重度兩者四分類議題F1-score最低皆為0.40以上,最高達0.95,證明本研究所提出之研究方法對於事故樣態多分類具有一定預測能力。此外,kNN表現與隨機森林不相上下,CatBoost則精準度高但召回率低,而多項邏輯斯迴歸幾乎無分類能力,且最為耗時。不同過採樣方法表現差異不大,以SMOTE表現最佳,對於小樣本類別分類能力有一定程度提升,但運算時間則明顯增加;欠採樣反而因樣本數減少使模型表現減退。
According to the Highway Crash Review Report of 2018, human factor accounts for over 90% which shown the driving behaviors are critical and influential. However, it is hard to present the dynamics before crash since the crash records are static and lacking real crash data is the problem that related studies always faced. Even driver is the core factor of crash, it is hard for both academia and practical to take it for consideration.
For breaking the data collection dilemma, this study tries to build and compares different machine learning models that predicting the spatial-temporal distribution of multiple crash types and severities with combining highway bus driving behavior data, highway crash data, eTag data. The proposed model includes Multinomial Logistic Regression, k-Nearest Neighbors, Random Forest and CatBoost. Besides, this study compares various modules of over-sampling and under-sampling for data imbalance.
The results shown that random forest has the best performance with AUC 0.99 for predicting crash, and the F1-score achieves 0.40 at least and 0.95 at max for both quadruple classification, which proved the proposed methodologies. Furthermore, kNN performances are similar with Random Forest, CatBoost owns high precision but low recall, and MLR is time-consuming with poor performance. There are few differences between over-sampling modules and SMOTE is the best, but computing time is higher than the performance it gains. And the under-sampling modules make models worse since the less samples.
1. 交通部高速公路局, 107年國道事故檢討分析報告. 2019: https://www.freeway.gov.tw/Publish.aspx?cnid=516&p=2849.
2. World Health Organization, W., Global status report on road safety 2018: Summary. 2018, World Health Organization.
3. 衛生福利部統計處. 107年國人死因統計結果. 2019 [cited 2019 2019.11.04]; Available from: https://www.mohw.gov.tw/cp-16-48057-1.html.
4. 行政院交通部, 第13期院頒「道路交通秩序與交通安全改進方案」. 2018.
5. National Highway Traffic Safety Administration, N., National motor vehicle crash causation survey: Report to congress. National Highway Traffic Safety Administration Technical Report DOT HS, 2008. 811: p. 059.
6. 交通部運輸研究所, 交通事故傷害資料蒐集體系建構及應用. 2019.
7. 交通部, 智慧運輸系統發展建設計畫(106-109年). 2016.
8. Klauer, S.G., et al., The impact of driver inattention on near-crash/crash risk: An analysis using the 100-car naturalistic driving study data. 2006.
9. Eboli, L., G. Mazzulla, and G. Pungillo, Combining speed and acceleration to define car users’ safe or unsafe driving behaviour. Transportation Research Part C: Emerging Technologies, 2016. 68: p. 113-125.
10. Arvin, R., M. Kamrani, and A.J. Khattak, The role of pre-crash driving instability in contributing to crash intensity using naturalistic driving data. Accident Analysis & Prevention, 2019. 132: p. 105226.
11. Xie, J., A.R. Hilal, and D.J.I.S.J. Kulić, Driving maneuver classification: A comparison of feature extraction methods. 2017. 18(12): p. 4777-4784.
12. Wang, J., et al., Driving risk assessment using near-crash database through data mining of tree-based model. 2015. 84: p. 54-64.
13. Jonasson, J.K. and H. Rootzén, Internal validation of near-crashes in naturalistic driving studies: A continuous and multivariate approach. Accident Analysis & Prevention, 2014. 62: p. 102-109.
14. Wu, J., et al., A novel method of vehicle-pedestrian near-crash identification with roadside LiDAR data. 2018. 121: p. 238-249.
15. Ito, D., et al., Difference between car-to-cyclist crash and near crash in a perpendicular crash configuration based on driving recorder analysis. Accident Analysis & Prevention, 2018. 117: p. 1-9.
16. Cheng, Z., J. Lu, and Y. Li, Freeway crash risks evaluation by variable speed limit strategy using real-world traffic flow data. Accident Analysis & Prevention, 2018. 119: p. 176-187.
17. Kuang, Y., et al., A tree-structured crash surrogate measure for freeways. 2015. 77: p. 137-148.
18. Liu, J., et al., Do safety performance functions used for predicting crash frequency vary across space? Applying geographically weighted regressions to account for spatial heterogeneity. 2017. 109: p. 132-142.
19. Nitsche, P., et al., Pre-crash scenarios at road junctions: A clustering method for car crash data. Accident Analysis & Prevention, 2017. 107: p. 137-151.
20. Ulak, M.B., et al., Exploring alternative spatial weights to detect crash hotspots. Computers, Environment and Urban Systems, 2019. 78: p. 101398.
21. Thakali, L., T.J. Kwon, and L. Fu, Identification of crash hotspots using kernel density estimation and kriging methods: a comparison. Journal of Modern Transportation, 2015. 23(2): p. 93-106.
22. Wali, B. and A.J. Khattak, Harnessing ambient sensing & naturalistic driving systems to understand links between driving volatility and crash propensity in school zones–A generalized hierarchical mixed logit framework. Transportation Research Part C: Emerging Technologies, 2020. 114: p. 405-424.
23. Li, P., M. Abdel-Aty, and J. Yuan, Real-time crash risk prediction on arterials based on LSTM-CNN. Accident Analysis & Prevention, 2020. 135: p. 105371.
24. Perez, M.A., et al., Performance of basic kinematic thresholds in the identification of crash and near-crash events within naturalistic driving data. Accident Analysis & Prevention, 2017. 103: p. 10-19.
25. Xiong, X., et al., A forward collision avoidance algorithm based on driver braking behavior. Accident Analysis & Prevention, 2019. 129: p. 30-43.
26. Kim, S., et al., Exploring the association of rear-end crash propensity and micro-scale driver behavior. Safety science, 2016. 89: p. 45-54.
27. Basso, F., et al., Real-time crash prediction in an urban expressway using disaggregated data. Transportation research part C: emerging technologies, 2018. 86: p. 202-219.
28. Bao, J., P. Liu, and S.V. Ukkusuri, A spatiotemporal deep learning approach for citywide short-term crash risk prediction with multi-source data. Accident Analysis & Prevention, 2019. 122: p. 239-254.
29. 李威勳, 車輛安全駕駛與駕駛行為巨量資料分析之研發. 2017.
30. Hosmer Jr, D.W., S. Lemeshow, and R.X. Sturdivant, Applied logistic regression. Vol. 398. 2013: John Wiley & Sons.
31. Kleinbaum, D.G., et al., Logistic regression. 2002: Springer.
32. Abdel-Aty, M., et al., Predicting freeway crashes from loop detector data by matched case-control logistic regression. Transportation Research Record, 2004. 1897(1): p. 88-95.
33. Böhning, D., Multinomial logistic regression algorithm. Annals of the institute of Statistical Mathematics, 1992. 44(1): p. 197-200.
34. Starkweather, J. and A.K. Moske, Multinomial logistic regression. Consulted page at September 10th: http://www. unt. edu/rss/class/Jon/Benchmarks/MLR_JDS_Aug2011. pdf, 2011. 29: p. 2825-2830.
35. Deng, Z., et al., Efficient kNN classification algorithm for big data. Neurocomputing, 2016. 195: p. 143-148.
36. Mani, I. and I. Zhang. kNN approach to unbalanced data distributions: a case study involving information extraction. in Proceedings of workshop on learning from imbalanced datasets. 2003.
37. Breiman, L., Random forests. Machine learning, 2001. 45(1): p. 5-32.
38. Prokhorenkova, L., et al. CatBoost: unbiased boosting with categorical features. in Advances in neural information processing systems. 2018.
39. Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321-357.
40. Han, H., W.-Y. Wang, and B.-H. Mao. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. in International conference on intelligent computing. 2005. Springer.
41. He, H., et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). 2008. IEEE.
42. Liu, A., J. Ghosh, and C.E. Martin. Generative Oversampling for Mining Imbalanced Datasets. in DMIN. 2007.
43. Tahir, M.A., J. Kittler, and F. Yan, Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognition, 2012. 45(10): p. 3738-3750.
44. More, A., Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv preprint arXiv:1608.06048, 2016.