| 研究生: |
彭梓瑄 Peng, Zi-Xuan |
|---|---|
| 論文名稱: |
考量機台狀態未知情況下的多機台維護最佳化問題 Optimization of Multi-Machine Maintenance under Unknown Machine States |
| 指導教授: |
莊雅棠
Tang, Ya-Tang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 64 |
| 中文關鍵詞: | 機台維修 、部分可觀察馬可夫決策過程 、動態規劃 、Restless Bandits Problem |
| 外文關鍵詞: | Restless Bandits Problem, Machine Maintenance, Partially Observable Markov Decision Process, Dynamic Programming |
| 相關次數: | 點閱:21 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
機台維修常伴隨高度不確定性,尤其在設備實際狀態不可直接觀測,或其狀態轉移受隱性因素影響時,使得管理者難以掌握即時健康狀況。在資訊不完全的情境下,維修決策往往無法達到最佳化,進而導致長期運行與維護成本升高。對於高價值的設備而言,任何錯誤維修或延誤都可能造成重大損失,使得機台維修決策成為管理上的一大挑戰。本研究針對多機台、資訊不完全的決策環境,致力於建構一套有效的維修策略,以協助決策者在不確定性中降低長期成本。不同於傳統文獻假設完全觀測的前提,本研究假設機台的狀態轉移機率已知,但當前狀態不可直接觀察。管理者需依賴觀測結果進行貝氏更新(Bayesian updating),以逐期修正對系統狀態的認知。考量問題的高維性、不完全資訊與多機台之間的資源競爭,本研究導入多臂賭徒(Multi-armed Bandits, MAB)架構,並運用Whittle Index 概念求解資源配置問題。本研究進一步將此問題建模為部分可觀察馬可夫決策過程(Partially Observable Markov Decision Process, POMDP),透過動態規劃方法解析價值函數與最佳政策的結構,以實現總成本的最小化。
Machine maintenance often involves uncertainty, such as unknown machine states and stochastic transitions. In such cases, decision-makers must operate with incomplete information, making it difficult to minimize long-term operational and maintenance costs. For high-value equipment, even minor maintenance delays or errors can result in significant losses, making the optimization of maintenance policies a crucial challenge.
This study focuses on a multi-machine maintenance problem under partial observability. Unlike traditional models that assume fully observable system states, we assume that the transition matrix is known but current states are hidden. A Bayesian belief update process is used to estimate each machine’s state based on noisy observations.
To efficiently allocate limited maintenance resources, we model the problem using a Restless Multi-Armed Bandit (RMAB) framework. Each machine is treated as an arm, and the Whittle index is used to prioritize which machines should be repaired. A dynamic programming approach is employed to compute the Whittle index offline under a Partially Observable Markov Decision Process (POMDP) model. This enables nearoptimal decisions that are computationally tractable even with many machines.
Simulation results show that the Whittle index policy significantly reduces long-term costs compared to baseline strategies such as immediate repair or health-based sorting. The findings demonstrate the value of belief-based decision-making and highlight the practical potential of Whittle index methods in large-scale, uncertain maintenance environments.
Abbou, A., & Makis, V. (2019). Group maintenance: A restless bandits approach. INFORMS Journal on Computing, 31(4), 719–731.
Andriotis, C. P., & Papakonstantinou, K. G. (2019). Managing engineering systems with large state and action spaces through deep reinforcement learning. Reliability Engineering & System Safety, 191, 106483.
A˚ stro¨m, K.-J., & Torsten, B. (1965). Numerical identification of linear dynamic systems from normal operating records. IFAC Proceedings Volumes, 2(2), 96–111.
Aven, T., & Castro, I. (2009). A delay-time model with safety constraint. Reliability Engineering & System Safety, 94(2), 261–267.
Cassandra, A. R., Littman, M. L., & Zhang, N. L. (1997). Incremental pruning: A simple,fast, exact method for partially observable markov decision processes. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, (pp. 54–61). Morgan Kaufmann Publishers Inc.
Chan, C. W., & Farias, V. F. (2009). Stochastic depletion problems: Effective myopic policies for a class of dynamic optimization problems. Mathematics of Operations Research, 34(2), 333–350.
Cheng, J., Liu, Y., Cheng, M., Li, W., & Li, T. (2022). Optimum condition-based maintenance policy with dynamic inspections based on reinforcement learning. Ocean Engineering,261, 112058.
Chuang, Y.-T. (2025). Sequential bayesian replacement with unknown transition probabilities.Naval Research Logistics (NRL).
Eggertsson, R., Eruguz, A. S., Basten, R., & Maillart, L. M. (2025). Maintenance optimization for multi-component systems with a single sensor. European Journal of Operational Research, 320(3), 559–569.
Feinberg, E. A., & Shwartz, A. (2012). Handbook of Markov decision processes: methods and applications, vol. 40. Springer Science & Business Media.
Grosfeld-Nir, A. (2007). Control limits for two-state partially observable markov decision processes. European Journal of Operational Research, 182(1), 300–304.
Howard, R. A. (1960). Dynamic Programming and Markov Processes. Cambridge, MA:MIT Press.
Huynh, K. T., Barros, A., & B´erenguer, C. (2012). Maintenance decision-making for systems operating under indirect condition monitoring: value of online information and impact of measurement uncertainty. IEEE Transactions on Reliability, 61(2), 410–425.
Katehakis, M. N., & Veinott Jr, A. F. (1987). The multi-armed bandit problem: decomposition and computation. Mathematics of Operations Research, 12(2), 262–268.
Khan, I., Noor-ul Amin, M., Khan, D. M., Ismail, E. A., & Sumelka, W. (2023). Monitoring of manufacturing process using bayesian ewma control chart under ranked based sampling designs. Scientific Reports, 13(1), 18240.
Kharoufeh, J. P., Finkelstein, D. E., & Mixon, D. G. (2006). Availability of periodically inspected systems with markovian wear and shocks. Journal of Applied Probability, 43(2), 303–317.
Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. Wiley.
Lefebvre, M., & Yaghoubi, R. (2024). Optimal inspection and maintenance policy: Integrating a continuous-time markov chain into a homing problem. Machines, 12(11),795.
Ma, S., Ma, X., & Xia, L. (2023). A unified algorithm framework for mean-variance optimization in discounted markov decision processes. European Journal of Operational Research, 311(3), 1057–1067.
Osband, I., Russo, D., & Van Roy, B. (2013). (more) efficient reinforcement learning via posterior sampling. Advances in Neural Information Processing Systems, 26.
Paraschos, P. D., Koulinas, G. K., & Koulouriotis, D. E. (2020). Reinforcement learning for combined production-maintenance and quality control of a manufacturing system with deterioration failures. Journal of Manufacturing Systems, 56, 470–483.
Robbins, H. (1952). A note on gambling systems and birth statistics. The American Mathematical Monthly, 59(10), 685–686.
Ross, S. M. (1971). Quality control under markovian deterioration. Management Science,17(9), 587–596.
Russo, D., & Van Roy, B. (2014). Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4), 1221–1243.
Sondik, E. J. (1978). The optimal control of partially observable markov processes over the infinite horizon: Discounted costs. Operations Research, 26(2), 282–304.
Song, H., Liu, C.-C., Lawarr´ee, J., & Dahlgren, R. W. (2000). Optimal electricity supply bidding by markov decision process. IEEE Transactions on Power Systems, 15(2), 618–624.
Sutton, R. S., Barto, A. G., et al. (1998). Reinforcement learning: An introduction, vol. 1.MIT press Cambridge.
Tien, K.-W., & Prabhu, V. (2024). Phase-type distribution models for performance evaluation of condition-based maintenance. Production & Manufacturing Research, 12(1),2380723.
van Oosterom, C., Maillart, L. M., & Kharoufeh, J. P. (2017). Optimal maintenance policies for a safety-critical system and its deteriorating sensor. Naval Research Logistics(NRL), 64(5), 399–417.
Van Oosterom, C. D., Elwany, A. H., C¸ elebi, D., & Van Houtum, G.-J. (2014). Optimal policies for a delay time model with postponed replacement. European Journal of Operational Research, 232(1), 186–197.
Vora, M., Grussing, M. N., & Ornik, M. (2024). Solving truly massive budgeted monotonic pomdps with oracle-guided meta-reinforcement learning. arXiv preprint arXiv:2408.07192.
Wang, J., & Lee, C.-G. (2015). Multistate bayesian control chart over a finite horizon. Operations Research, 63(4), 949–964.
Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25(A), 287–298.
Zhang, P., Zhu, X., & Xie, M. (2021). A model-based reinforcement learning approach for maintenance optimization of degrading systems in a large state space. Computers & Industrial Engineering, 161, 107622.