簡易檢索 / 詳目顯示

研究生: 黃偉閔
Huang, Wei-Min
論文名稱: 階層式邏輯斯迴歸於智能製造異常偵測之研究
Anomaly Detection in Smart Manufacturing Using Hierarchical Logistic Regression
指導教授: 李昇暾
Li, Sheng-Tun
學位類別: 碩士
Master
系所名稱: 管理學院 - 經營管理碩士學位學程(AMBA)
Advanced Master of Business Administration (AMBA)
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 59
中文關鍵詞: 異常偵測不平衡資料巢狀結構階層式邏輯斯迴歸成本敏感學習
外文關鍵詞: Anomaly detection, imbalanced data, nested structure, hierarchical logistic regression, cost-sensitive learning
相關次數: 點閱:48下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在智能製造的發展浪潮下,異常偵測技術於品質控制中扮演關鍵角色。然而,實務中常面臨資料高度不平衡與樣本之間巢套結構等挑戰,導致傳統分類模型預測表現不佳。為解決此問題,本研究提出結合資料平衡技術與階層式邏輯斯迴歸(Hierarchical Logistic Regression, HLR)之異常偵測模型,並與傳統邏輯斯迴歸(Logistic Regression, LR)進行比較分析,以驗證HLR在處理不平衡巢套資料中的優勢與實用性。
    本研究以國內某網路通訊製造公司之實際製程數據為研究樣本,透過Categorical SMOTE技術進行類別平衡處理,且運用組內相關係數(Intraclass Correlation Coefficient, ICC)指標驗證資料之巢套結構,進而建構LR與HLR模型,並進行顯著性檢定、分類績效(PR-AUC)分析與成本敏感學習實驗。結果顯示,在五折交叉驗證(5-fold cross-validation)中,HLR模型在訓練集中的PR-AUC表現優於LR,但在不平衡驗證集上,兩者皆表現不佳,PR-AUC值普遍低於0.5,因此本研究進一步聚焦於成本分析以補充績效評估。
    成本敏感分析顯示,當HLR在混淆矩陣中處理偽陽性(False Positive, FP)與偽陰性(False Negative, FN)具有不同成本比例時,能有效降低總預測成本,尤其在FN成本較高的情境下差異顯著。驗證集之成對樣本T檢定(Paired Sample T-test)亦顯示HLR在高風險成本情境中具統計顯著的成本效益差異,驗證其在實務場域中的潛力。綜合而言,HLR結合群組結構與成本敏感概念,為製造業異常偵測提供更具經濟價值之預測工具。

    With the rise of smart manufacturing, anomaly detection has become integral to quality control. However, highly imbalanced datasets and nested data structures frequently compromise the predictive performance of traditional classification models such as Logistic Regression (LR). To overcome these limitations, this study proposes an anomaly detection framework that integrates data balancing techniques with Hierarchical Logistic Regression (HLR), aiming to assess its effectiveness in handling complex manufacturing data. The research utilizes real production data from a Taiwanese network communications manufacturer. Categorical SMOTE is employed to address class imbalance, and five-fold cross-validation is used to evaluate model stability. The presence of a nested data structure is verified through Intraclass Correlation Coefficient (ICC) analysis, which supports the use of hierarchical modeling. The study conducts statistical significance tests, evaluates model performance via PR-AUC, and performs cost-sensitive analyses using multiple false negative (FN) to false positive (FP) cost ratios. Although HLR demonstrated higher PR-AUC on the training sets, both LR and HLR performed poorly on the imbalanced validation sets with PR-AUC values consistently below 0.5. This limitation prompted a shift to cost-sensitive analysis as a more reliable performance metric under real-world conditions. Experimental results indicate that HLR consistently achieves superior total cost control compared to LR, particularly under high FN cost scenarios. Paired sample t-tests further confirm that the cost reductions associated with HLR are statistically significant in such circumstances. In conclusion, the integration of hierarchical model and cost-sensitive learning demonstrates that HLR offers a more economically advantageous predictive tool for anomaly detection in smart manufacturing applications.

    摘要 I 英文摘要 II 致謝 V 目錄 VII 圖目錄 IX 表目錄 X 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 研究架構 3 第二章 文獻探討 5 2.1 異常偵測 5 2.2 不平衡資料集 6 2.3 階層式邏輯斯迴歸 9 2.3.1 巢套結構數據 9 2.3.2 邏輯斯迴歸 11 2.3.3 階層式邏輯斯迴歸 12 2.4 本章小結 14 第三章 研究方法 15 3.1 問題定義 15 3.2 模型框架 15 3.3 資料平衡性 16 3.4 空模型建置 18 3.5 模型LR與HLR 19 3.6 績效評估指標 19 3.7 成本敏感策略 22 第四章 實驗與分析 24 4.1 資料集與資料前處理 24 4.2 空模型分析與ICC檢定 27 4.2.1 空模型之隨機截距估計 27 4.2.2 組內相關係數ICC數值 28 4.3 顯著性檢定比較LR與HLR 29 4.3.1 資料集A 30 4.3.2 資料集B 31 4.4 績效評估比較 33 4.5 成本敏感分析比較 35 4.5.1 資料集A總成本比較 35 4.5.2 資料集B總成本比較 38 4.5.3 資料集A與資料集B成本分析總結 40 4.6 本章小節 41 第五章 結論與未來展望 43 5.1 研究結論與貢獻 43 5.2 管理意涵 44 5.3 研究限制與未來展望 45 參考文獻 46

    Agrawal, S., & Agrawal, J. (2015). Survey on anomaly detection using data mining techniques. Procedia Computer Science, 60, 708-713.
    Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58.
    Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
    Cohen, J. (2013). Statistical power analysis for the behavioral sciences. Academic press.
    Dey, I., & Pratap, V. (2023). A comparative study of SMOTE, borderline-SMOTE, and ADASYN oversampling techniques using different classifiers. 2023 3rd international conference on smart data intelligence (ICSMDI), 294-302.
    Frumosu, F. D., Khan, A. R., Schiøler, H., Kulahci, M., Zaki, M., & Westermann-Rasmussen, P. (2020). Cost-sensitive learning classification strategy for predicting product failures. Expert Systems with Applications, 161, 113653.
    Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.
    Hasanin, T., & Khoshgoftaar, T. (2018). The effects of random undersampling with simulated class imbalance for big data. 2018 IEEE international conference on information reuse and integration (IRI), 2018, 70-79.
    Hassan, D. (2017). The impact of false negative cost on the performance of cost sensitive learning based on Bayes minimum risk: a case study in detecting fraudulent transactions. International Journal of Intelligent Systems and Applications, 9(2), 18.
    Hawkins, D. M. (1980). Identification of outliers (Vol. 11). Springer.
    He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), 2008, 1322-1328.
    Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons.
    Khan, S. A., & Rana, Z. A. (2019). Evaluating performance of software defect prediction models using area under precision-Recall curve (AUC-PR). 2019 2nd International Conference on Advancements in Computational Sciences (ICACS), 2019, 1-6.
    Kreft, I. G., & De Leeuw, J. (1998). Introducing multilevel modeling. Sage.
    Li, Y.-Z., & Li, S.-T. (2022). Do reviewers’ words and behaviors help detect fake online reviews and spammers? Evidence from a hierarchical model. IEEE Access, 10, 42181-42197. Li, Y.-Z., & Li, S.-T. (2022). Do reviewers’ words and behaviors help detect fake online reviews and spammers? Evidence from a hierarchical model. IEEE Access, 10, 42181-42197.
    Mishra, S. (2017). Handling imbalanced data: SMOTE vs. random undersampling. International Research Journal of Engineering and Technology, 4(8), 317-320.
    Mukherjee, M., & Khushi, M. (2021). SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and continuous features. Applied System Innovation, 4(1), 18.
    Nagidi, J. (2020). Best ways to handle imbalanced data in machine learning. Dataaspirant Homepage.
    Pang, G., Shen, C., Cao, L., & Hengel, A. V. D. (2021). Deep learning for anomaly detection: A review. ACM computing surveys (CSUR), 54(2), 1-38.
    Ren, Z., Lin, T., Feng, K., Zhu, Y., Liu, Z., & Yan, K. (2023). A systematic review on imbalanced learning methods in intelligent fault diagnosis. IEEE Transactions on Instrumentation and Measurement.
    Sofaer, H. R., Hoeting, J. A., & Jarnevich, C. S. (2019). The area under the precision‐recall curve as a performance metric for rare binary events. Methods in Ecology and Evolution, 10(4), 565-577.
    Sommet, N., & Morselli, D. (2017). Keep calm and learn multilevel logistic modeling: a simplified three-step procedure using Stata, R, mplus, and SPSS. International Review of Social Psychology, 30, 203-218.
    Stradowski, S., & Madeyski, L. (2024). Costs and benefits of machine learning software defect prediction: Industrial case study. Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, 2024, 92-103.
    Tharwat, A. (2020). Classification assessment methods. Applied computing and informatics, 17(1), 168-192.
    Wongvorachan, T., He, S., & Bulut, O. (2023). A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining. Information, 14(1), 54.
    Zeng, G. (2023). On the analytical properties of category encodings in logistic regression. Communications in Statistics-Theory and Methods, 52(6), 1870-1887.
    邱皓政. (2017). 多層次模式與縱貫資料分析: Mplus 8 解析應用. 五南圖書出版股份有限公司.

    無法下載圖示 校內:2030-06-30公開
    校外:2030-06-30公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE