簡易檢索 / 詳目顯示

研究生: 彭梓瑄
Peng, Zi-Xuan
論文名稱: 考量機台狀態未知情況下的多機台維護最佳化問題
Optimization of Multi-Machine Maintenance under Unknown Machine States
指導教授: 莊雅棠
Tang, Ya-Tang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 64
中文關鍵詞: 機台維修部分可觀察馬可夫決策過程動態規劃Restless Bandits Problem
外文關鍵詞: Restless Bandits Problem, Machine Maintenance, Partially Observable Markov Decision Process, Dynamic Programming
相關次數: 點閱:21下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 機台維修常伴隨高度不確定性,尤其在設備實際狀態不可直接觀測,或其狀態轉移受隱性因素影響時,使得管理者難以掌握即時健康狀況。在資訊不完全的情境下,維修決策往往無法達到最佳化,進而導致長期運行與維護成本升高。對於高價值的設備而言,任何錯誤維修或延誤都可能造成重大損失,使得機台維修決策成為管理上的一大挑戰。本研究針對多機台、資訊不完全的決策環境,致力於建構一套有效的維修策略,以協助決策者在不確定性中降低長期成本。不同於傳統文獻假設完全觀測的前提,本研究假設機台的狀態轉移機率已知,但當前狀態不可直接觀察。管理者需依賴觀測結果進行貝氏更新(Bayesian updating),以逐期修正對系統狀態的認知。考量問題的高維性、不完全資訊與多機台之間的資源競爭,本研究導入多臂賭徒(Multi-armed Bandits, MAB)架構,並運用Whittle Index 概念求解資源配置問題。本研究進一步將此問題建模為部分可觀察馬可夫決策過程(Partially Observable Markov Decision Process, POMDP),透過動態規劃方法解析價值函數與最佳政策的結構,以實現總成本的最小化。

    Machine maintenance often involves uncertainty, such as unknown machine states and stochastic transitions. In such cases, decision-makers must operate with incomplete information, making it difficult to minimize long-term operational and maintenance costs. For high-value equipment, even minor maintenance delays or errors can result in significant losses, making the optimization of maintenance policies a crucial challenge.
    This study focuses on a multi-machine maintenance problem under partial observability. Unlike traditional models that assume fully observable system states, we assume that the transition matrix is known but current states are hidden. A Bayesian belief update process is used to estimate each machine’s state based on noisy observations.
    To efficiently allocate limited maintenance resources, we model the problem using a Restless Multi-Armed Bandit (RMAB) framework. Each machine is treated as an arm, and the Whittle index is used to prioritize which machines should be repaired. A dynamic programming approach is employed to compute the Whittle index offline under a Partially Observable Markov Decision Process (POMDP) model. This enables nearoptimal decisions that are computationally tractable even with many machines.
    Simulation results show that the Whittle index policy significantly reduces long-term costs compared to baseline strategies such as immediate repair or health-based sorting. The findings demonstrate the value of belief-based decision-making and highlight the practical potential of Whittle index methods in large-scale, uncertain maintenance environments.

    摘要i 英文延伸摘要ii 目錄 vi 表目錄 viii 圖目錄 ix 第一章緒論1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 論文架構 4 第二章文獻回顧 5 2.1 機台維修 5 2.1.1 馬可夫決策過程(MDP)的應用 6 2.2 部分可觀察馬可夫決策過程 8 2.3 強化學習 9 2.4 拉霸機問題 11 2.5 小結 12 第三章研究方法 13 3.1 POMDP模型 15 3.1.1 情境假設 15 3.1.2 模型設定 15 3.1.3 數值分析 19 第四章狀態擴展 25 4.1 相型分佈 25 4.1.1 情境假設 25 4.2 球面座標轉換 29 4.2.1 引入動機 29 4.2.2 模型設定 29 第五章多機台維修 33 5.1 多機台Whittle指數模型 33 5.1.1 情境假設 33 5.1.2 數值分析 38 5.1.3 進一步比較:與Myopic策略的差異 40 第六章敏感度分析 42 6.1 不同維修人力下的策略表現 42 6.1.1 參數設定 42 6.1.2 模擬結果與分析 43 6.2 不同系統規模下的比例穩定性 45 6.2.1 參數設定 45 6.2.2 模擬結果與分析 46 第七章結論 48 7.1 結論 48 7.2 未來研究與方向 49 參考文獻 50

    Abbou, A., & Makis, V. (2019). Group maintenance: A restless bandits approach. INFORMS Journal on Computing, 31(4), 719–731.
    Andriotis, C. P., & Papakonstantinou, K. G. (2019). Managing engineering systems with large state and action spaces through deep reinforcement learning. Reliability Engineering & System Safety, 191, 106483.
    A˚ stro¨m, K.-J., & Torsten, B. (1965). Numerical identification of linear dynamic systems from normal operating records. IFAC Proceedings Volumes, 2(2), 96–111.
    Aven, T., & Castro, I. (2009). A delay-time model with safety constraint. Reliability Engineering & System Safety, 94(2), 261–267.
    Cassandra, A. R., Littman, M. L., & Zhang, N. L. (1997). Incremental pruning: A simple,fast, exact method for partially observable markov decision processes. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, (pp. 54–61). Morgan Kaufmann Publishers Inc.
    Chan, C. W., & Farias, V. F. (2009). Stochastic depletion problems: Effective myopic policies for a class of dynamic optimization problems. Mathematics of Operations Research, 34(2), 333–350.
    Cheng, J., Liu, Y., Cheng, M., Li, W., & Li, T. (2022). Optimum condition-based maintenance policy with dynamic inspections based on reinforcement learning. Ocean Engineering,261, 112058.
    Chuang, Y.-T. (2025). Sequential bayesian replacement with unknown transition probabilities.Naval Research Logistics (NRL).
    Eggertsson, R., Eruguz, A. S., Basten, R., & Maillart, L. M. (2025). Maintenance optimization for multi-component systems with a single sensor. European Journal of Operational Research, 320(3), 559–569.
    Feinberg, E. A., & Shwartz, A. (2012). Handbook of Markov decision processes: methods and applications, vol. 40. Springer Science & Business Media.
    Grosfeld-Nir, A. (2007). Control limits for two-state partially observable markov decision processes. European Journal of Operational Research, 182(1), 300–304.
    Howard, R. A. (1960). Dynamic Programming and Markov Processes. Cambridge, MA:MIT Press.
    Huynh, K. T., Barros, A., & B´erenguer, C. (2012). Maintenance decision-making for systems operating under indirect condition monitoring: value of online information and impact of measurement uncertainty. IEEE Transactions on Reliability, 61(2), 410–425.
    Katehakis, M. N., & Veinott Jr, A. F. (1987). The multi-armed bandit problem: decomposition and computation. Mathematics of Operations Research, 12(2), 262–268.
    Khan, I., Noor-ul Amin, M., Khan, D. M., Ismail, E. A., & Sumelka, W. (2023). Monitoring of manufacturing process using bayesian ewma control chart under ranked based sampling designs. Scientific Reports, 13(1), 18240.
    Kharoufeh, J. P., Finkelstein, D. E., & Mixon, D. G. (2006). Availability of periodically inspected systems with markovian wear and shocks. Journal of Applied Probability, 43(2), 303–317.
    Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. Wiley.
    Lefebvre, M., & Yaghoubi, R. (2024). Optimal inspection and maintenance policy: Integrating a continuous-time markov chain into a homing problem. Machines, 12(11),795.
    Ma, S., Ma, X., & Xia, L. (2023). A unified algorithm framework for mean-variance optimization in discounted markov decision processes. European Journal of Operational Research, 311(3), 1057–1067.
    Osband, I., Russo, D., & Van Roy, B. (2013). (more) efficient reinforcement learning via posterior sampling. Advances in Neural Information Processing Systems, 26.
    Paraschos, P. D., Koulinas, G. K., & Koulouriotis, D. E. (2020). Reinforcement learning for combined production-maintenance and quality control of a manufacturing system with deterioration failures. Journal of Manufacturing Systems, 56, 470–483.
    Robbins, H. (1952). A note on gambling systems and birth statistics. The American Mathematical Monthly, 59(10), 685–686.
    Ross, S. M. (1971). Quality control under markovian deterioration. Management Science,17(9), 587–596.
    Russo, D., & Van Roy, B. (2014). Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4), 1221–1243.
    Sondik, E. J. (1978). The optimal control of partially observable markov processes over the infinite horizon: Discounted costs. Operations Research, 26(2), 282–304.
    Song, H., Liu, C.-C., Lawarr´ee, J., & Dahlgren, R. W. (2000). Optimal electricity supply bidding by markov decision process. IEEE Transactions on Power Systems, 15(2), 618–624.
    Sutton, R. S., Barto, A. G., et al. (1998). Reinforcement learning: An introduction, vol. 1.MIT press Cambridge.
    Tien, K.-W., & Prabhu, V. (2024). Phase-type distribution models for performance evaluation of condition-based maintenance. Production & Manufacturing Research, 12(1),2380723.
    van Oosterom, C., Maillart, L. M., & Kharoufeh, J. P. (2017). Optimal maintenance policies for a safety-critical system and its deteriorating sensor. Naval Research Logistics(NRL), 64(5), 399–417.
    Van Oosterom, C. D., Elwany, A. H., C¸ elebi, D., & Van Houtum, G.-J. (2014). Optimal policies for a delay time model with postponed replacement. European Journal of Operational Research, 232(1), 186–197.
    Vora, M., Grussing, M. N., & Ornik, M. (2024). Solving truly massive budgeted monotonic pomdps with oracle-guided meta-reinforcement learning. arXiv preprint arXiv:2408.07192.
    Wang, J., & Lee, C.-G. (2015). Multistate bayesian control chart over a finite horizon. Operations Research, 63(4), 949–964.
    Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25(A), 287–298.
    Zhang, P., Zhu, X., & Xie, M. (2021). A model-based reinforcement learning approach for maintenance optimization of degrading systems in a large state space. Computers & Industrial Engineering, 161, 107622.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE