簡易檢索 / 詳目顯示

研究生: 吳姍珊
Wu, Shan-Shan
論文名稱: 基於駕駛行為大數據辨識國道事故熱點
Identifying Highway Crash Hotspot Based on Driving Behavior Data Analyzing
指導教授: 李威勳
Lee, Wei-Hsun
學位類別: 碩士
Master
系所名稱: 管理學院 - 交通管理科學系
Department of Transportation and Communication Management Science
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 115
中文關鍵詞: 時空關聯駕駛行為資料多樣態事故預測機器學習方法資料不平衡
外文關鍵詞: Spatial-temporal driving behavior data, Multiple crash type and severity prediction, Machine learning, Data imbalance
相關次數: 點閱:146下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 根據107年國道事故檢討分析報告指出,107年國道事故肇事因素為人為因素者約佔總體9成以上(交通部高速公路局, 2019)。由此見得駕駛人行為表現與決策為影響事故之關鍵成因,然而現有事故資料為靜態紀錄資料,導致現有國道交通秩序改善難以納入動態駕駛行為因素,相關學術研究蒐集自然駕駛行為資料時亦面臨缺乏實際道路交通事故。儘管駕駛人為交通事故中關鍵因素,然而不論學術或實務面皆因資料蒐集困難,難以將駕駛行為納入。
    為改善難以納入駕駛行為人因素之困境,本研究嘗試奠基於國道客運危險駕駛行為資料、國道交通事故資料、國道電子計程收費資料,建構國道交通事故撞擊類型與死傷嚴重度多樣態之時空分佈預測模型。首先透過時空串連駕駛行為與真實事故資料,將真實事故反向標記為相同時空下駕駛行為後續可能發生之事故(Possible Crash),以此作為模型訓練依據。模型包括多項邏輯斯迴歸模型(Multinomial Logistic Regression)、K-近鄰演算法(k-Nearest Neighbors, kNN)、隨機森林(Random Forest)、CatBoost等4種模型,並應用過採樣(Over Sampling)與欠採樣(Under Sampling)等多種資料不平衡改善方法。實驗結果顯示,以隨機森林分類效果最好,預測事故是否發生AUC可達0.99;事故撞擊類型與死傷嚴重度兩者四分類議題F1-score最低皆為0.40以上,最高達0.95,證明本研究所提出之研究方法對於事故樣態多分類具有一定預測能力。此外,kNN表現與隨機森林不相上下,CatBoost則精準度高但召回率低,而多項邏輯斯迴歸幾乎無分類能力,且最為耗時。不同過採樣方法表現差異不大,以SMOTE表現最佳,對於小樣本類別分類能力有一定程度提升,但運算時間則明顯增加;欠採樣反而因樣本數減少使模型表現減退。

    According to the Highway Crash Review Report of 2018, human factor accounts for over 90% which shown the driving behaviors are critical and influential. However, it is hard to present the dynamics before crash since the crash records are static and lacking real crash data is the problem that related studies always faced. Even driver is the core factor of crash, it is hard for both academia and practical to take it for consideration.
    For breaking the data collection dilemma, this study tries to build and compares different machine learning models that predicting the spatial-temporal distribution of multiple crash types and severities with combining highway bus driving behavior data, highway crash data, eTag data. The proposed model includes Multinomial Logistic Regression, k-Nearest Neighbors, Random Forest and CatBoost. Besides, this study compares various modules of over-sampling and under-sampling for data imbalance.
    The results shown that random forest has the best performance with AUC 0.99 for predicting crash, and the F1-score achieves 0.40 at least and 0.95 at max for both quadruple classification, which proved the proposed methodologies. Furthermore, kNN performances are similar with Random Forest, CatBoost owns high precision but low recall, and MLR is time-consuming with poor performance. There are few differences between over-sampling modules and SMOTE is the best, but computing time is higher than the performance it gains. And the under-sampling modules make models worse since the less samples.

    摘要 I 誌謝 VII 目錄 VIII 圖目錄 X 表目錄 XI 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 4 1.3 研究目的 6 1.4 研究限制 8 1.5 研究流程 10 第二章 文獻回顧 11 2.1 駕駛行為 11 2.2 NEAR CRASH 13 2.3 道路風險識別 14 2.4 駕駛行為應用於道路風險識別 17 第三章 研究方法 18 3.1 研究流程 18 3.2 PHASE 0—資料前處理 20 3.2.1 危險駕駛行為與Near Crash資料 20 3.2.2 道路交通事故資料 24 3.2.3 高速公路電子收費交通資料 33 3.3 PHASE 1—歷史事故資料分析 34 3.3.1 空間分佈 34 3.3.2 時間分佈 36 3.3.3 交通事故資料與駕駛行為資料串連 37 3.4 PHASE 2—國道事故樣態時空分佈預測 41 3.4.1 模型選擇 41 3.4.2 實驗設計 44 3.5 PHASE 3—國道道路秩序維護改善建議 53 第四章 實驗結果 54 4.1 資料處理 54 4.1.1 原始資料 54 4.1.2 實驗資料 56 4.2 事故死傷嚴重度 57 4.2.1 事故死傷嚴重度四分類 57 4.2.2 事故死傷嚴重度三分類 67 4.2.3 事故死傷嚴重度二分類(A3與A1+A2) 75 4.2.4 事故死傷嚴重度二分類(A1與A2) 79 4.3 事故撞擊類型 83 4.3.1 事故撞擊類型四分類 83 4.3.2 事故撞擊類型三分類 91 4.3.3 事故撞擊類型二分類(追撞與非追撞) 96 4.3.4 事故撞擊類型二分類(自撞與擦撞) 100 4.4 事故是否發生 104 4.5 實務應用:國道道路秩序維護改善建議 107 第五章 結論與建議 110 第六章 參考文獻 113

    1. 交通部高速公路局, 107年國道事故檢討分析報告. 2019: https://www.freeway.gov.tw/Publish.aspx?cnid=516&p=2849.
    2. World Health Organization, W., Global status report on road safety 2018: Summary. 2018, World Health Organization.
    3. 衛生福利部統計處. 107年國人死因統計結果. 2019 [cited 2019 2019.11.04]; Available from: https://www.mohw.gov.tw/cp-16-48057-1.html.
    4. 行政院交通部, 第13期院頒「道路交通秩序與交通安全改進方案」. 2018.
    5. National Highway Traffic Safety Administration, N., National motor vehicle crash causation survey: Report to congress. National Highway Traffic Safety Administration Technical Report DOT HS, 2008. 811: p. 059.
    6. 交通部運輸研究所, 交通事故傷害資料蒐集體系建構及應用. 2019.
    7. 交通部, 智慧運輸系統發展建設計畫(106-109年). 2016.
    8. Klauer, S.G., et al., The impact of driver inattention on near-crash/crash risk: An analysis using the 100-car naturalistic driving study data. 2006.
    9. Eboli, L., G. Mazzulla, and G. Pungillo, Combining speed and acceleration to define car users’ safe or unsafe driving behaviour. Transportation Research Part C: Emerging Technologies, 2016. 68: p. 113-125.
    10. Arvin, R., M. Kamrani, and A.J. Khattak, The role of pre-crash driving instability in contributing to crash intensity using naturalistic driving data. Accident Analysis & Prevention, 2019. 132: p. 105226.
    11. Xie, J., A.R. Hilal, and D.J.I.S.J. Kulić, Driving maneuver classification: A comparison of feature extraction methods. 2017. 18(12): p. 4777-4784.
    12. Wang, J., et al., Driving risk assessment using near-crash database through data mining of tree-based model. 2015. 84: p. 54-64.
    13. Jonasson, J.K. and H. Rootzén, Internal validation of near-crashes in naturalistic driving studies: A continuous and multivariate approach. Accident Analysis & Prevention, 2014. 62: p. 102-109.
    14. Wu, J., et al., A novel method of vehicle-pedestrian near-crash identification with roadside LiDAR data. 2018. 121: p. 238-249.
    15. Ito, D., et al., Difference between car-to-cyclist crash and near crash in a perpendicular crash configuration based on driving recorder analysis. Accident Analysis & Prevention, 2018. 117: p. 1-9.
    16. Cheng, Z., J. Lu, and Y. Li, Freeway crash risks evaluation by variable speed limit strategy using real-world traffic flow data. Accident Analysis & Prevention, 2018. 119: p. 176-187.
    17. Kuang, Y., et al., A tree-structured crash surrogate measure for freeways. 2015. 77: p. 137-148.
    18. Liu, J., et al., Do safety performance functions used for predicting crash frequency vary across space? Applying geographically weighted regressions to account for spatial heterogeneity. 2017. 109: p. 132-142.
    19. Nitsche, P., et al., Pre-crash scenarios at road junctions: A clustering method for car crash data. Accident Analysis & Prevention, 2017. 107: p. 137-151.
    20. Ulak, M.B., et al., Exploring alternative spatial weights to detect crash hotspots. Computers, Environment and Urban Systems, 2019. 78: p. 101398.
    21. Thakali, L., T.J. Kwon, and L. Fu, Identification of crash hotspots using kernel density estimation and kriging methods: a comparison. Journal of Modern Transportation, 2015. 23(2): p. 93-106.
    22. Wali, B. and A.J. Khattak, Harnessing ambient sensing & naturalistic driving systems to understand links between driving volatility and crash propensity in school zones–A generalized hierarchical mixed logit framework. Transportation Research Part C: Emerging Technologies, 2020. 114: p. 405-424.
    23. Li, P., M. Abdel-Aty, and J. Yuan, Real-time crash risk prediction on arterials based on LSTM-CNN. Accident Analysis & Prevention, 2020. 135: p. 105371.
    24. Perez, M.A., et al., Performance of basic kinematic thresholds in the identification of crash and near-crash events within naturalistic driving data. Accident Analysis & Prevention, 2017. 103: p. 10-19.
    25. Xiong, X., et al., A forward collision avoidance algorithm based on driver braking behavior. Accident Analysis & Prevention, 2019. 129: p. 30-43.
    26. Kim, S., et al., Exploring the association of rear-end crash propensity and micro-scale driver behavior. Safety science, 2016. 89: p. 45-54.
    27. Basso, F., et al., Real-time crash prediction in an urban expressway using disaggregated data. Transportation research part C: emerging technologies, 2018. 86: p. 202-219.
    28. Bao, J., P. Liu, and S.V. Ukkusuri, A spatiotemporal deep learning approach for citywide short-term crash risk prediction with multi-source data. Accident Analysis & Prevention, 2019. 122: p. 239-254.
    29. 李威勳, 車輛安全駕駛與駕駛行為巨量資料分析之研發. 2017.
    30. Hosmer Jr, D.W., S. Lemeshow, and R.X. Sturdivant, Applied logistic regression. Vol. 398. 2013: John Wiley & Sons.
    31. Kleinbaum, D.G., et al., Logistic regression. 2002: Springer.
    32. Abdel-Aty, M., et al., Predicting freeway crashes from loop detector data by matched case-control logistic regression. Transportation Research Record, 2004. 1897(1): p. 88-95.
    33. Böhning, D., Multinomial logistic regression algorithm. Annals of the institute of Statistical Mathematics, 1992. 44(1): p. 197-200.
    34. Starkweather, J. and A.K. Moske, Multinomial logistic regression. Consulted page at September 10th: http://www. unt. edu/rss/class/Jon/Benchmarks/MLR_JDS_Aug2011. pdf, 2011. 29: p. 2825-2830.
    35. Deng, Z., et al., Efficient kNN classification algorithm for big data. Neurocomputing, 2016. 195: p. 143-148.
    36. Mani, I. and I. Zhang. kNN approach to unbalanced data distributions: a case study involving information extraction. in Proceedings of workshop on learning from imbalanced datasets. 2003.
    37. Breiman, L., Random forests. Machine learning, 2001. 45(1): p. 5-32.
    38. Prokhorenkova, L., et al. CatBoost: unbiased boosting with categorical features. in Advances in neural information processing systems. 2018.
    39. Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321-357.
    40. Han, H., W.-Y. Wang, and B.-H. Mao. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. in International conference on intelligent computing. 2005. Springer.
    41. He, H., et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). 2008. IEEE.
    42. Liu, A., J. Ghosh, and C.E. Martin. Generative Oversampling for Mining Imbalanced Datasets. in DMIN. 2007.
    43. Tahir, M.A., J. Kittler, and F. Yan, Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognition, 2012. 45(10): p. 3738-3750.
    44. More, A., Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv preprint arXiv:1608.06048, 2016.

    無法下載圖示
    校外:不公開
    電子論文及紙本論文均尚未授權公開
    QR CODE