簡易檢索 / 詳目顯示

研究生: 陳彥彣
Chen, Yen- Wen
論文名稱: 資料包絡分析與條件風險價值強化學習於最佳化產能與生產力決策
Data Envelopment Analysis and Reinforcement Learning with Conditional Value-at-Risk for Productivity Optimization
指導教授: 王宏鍇
Wang, Hung-Kai
共同指導教授: 李家岩
Lee, Chia-Yen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 製造資訊與系統研究所
Institute of Manufacturing Information and Systems
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 89
中文關鍵詞: 資料包絡分析強化學習條件風險價值生產力最佳化
外文關鍵詞: Data Envelopment Analysis (DEA), Reinforcement Learning (RL), Conditional Value-at-Risk (CVaR), Productivity Optimization
相關次數: 點閱:69下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究提出一個結合資料包絡分析(Data Envelopment Analysis, DEA)與強化學習(Reinforcement Learning, RL)的混和框架。藉由最佳的資源分配策略,強化學習可以做為一個引導方針用以改善產能。
    對於決策者而言,了解自身相對於其他參考點的績效至關重要。然而,當同時考慮多維的成本和產出時,變量的權重便難以決定。為了克服這個問題,本研究使用無母數的最佳化方法DEA來評估生產效率。DEA會以生產可能集合建構效率前緣,並通過最佳化權重來評估生產效率以做為績效的衡量指標。
    然而DEA著重在事後的績效評估,缺乏在事前幫助規劃策略的價值。為了以經驗輔助未來決策,我們便需要應用RL為將來的改善制定一套執行方針。RL代理在學習的過程中,通過觀察代理與環境之間的互動來得出最佳策略,因此強化學習與生產力分析之間可視為互補關係。
    本研究中,除了事後的效率分析,我們亦著重在事前的策略規劃。在實證研究中,我們蒐集兩筆實驗數據以衡量其生產力,並驗收應用RL後的成效。RL依照歷史經驗提出對資源運用的改善策略,以實驗結果而言,應用RL的結果在生產力的平均與變異上都較沒有使用RL佳,因此我們認為RL可以改善最佳資源策略。

    This study proposes a hybridized Data Envelopment Analysis (DEA) framework with Reinforcement Learning (RL). Through acquiring the optimal resource reallocation policy, the RL leads a guiding principle in the productivity improvement.
    For the responsible decision-maker, it is critical to understand the performance relative to other benchmarks. However, when considering multiple dimensions of inputs and outputs, the weights of the variables is an issue to be discussed. To overcome the issue, DEA, a nonparametric optimization approach, is used to evaluate the productive performances in this study. DEA forms a frontier among production possibility set as a reference and evaluates the performance through optimizing sets of weights. The performance refers to the productive efficiency and effectiveness. However, we may suffer if DEA only provides ana ex-post analysis but the ability of improving decisions in advance. To establish an improvement policy based on historical experience, it is then important to set up an adequate guiding principle for future improvement. Here is where RL takes part in. The RL agent learns an optimal policy through observing the interactions between the agent and the environment. As a matter of fact, RL technique complements productivity analysis.
    In this study, we emphasize on planning over evaluation, we collect empirical datasets to evaluate productivity and validate the influence of the RL. Based on the results, the RL results prove better performance on both productivity average and variance, hence we find that the RL can enhance the optimal resource policy.

    中文摘要 II Abstract III 致謝 IV Table of Contents V List of Tables VII List of Figures VIII Terminology and Notation XI Chapter 1. Introduction 1 1.1 Background and Motivation 1 1.2 Problem Description and Research Overview 2 Chapter 2. Literature Review 4 2.1 Data Envelopment Analysis (DEA) 4 2.2 Reinforcement Learning (RL) 6 2.3 Summary and Discussion 9 Chapter 3. Operational Model with Reinforcement Learning 10 3.1 Research Framework 10 3.2 Problem Definition 12 3.3 Data Collection 13 3.3.1 Case 1: Labour Productivity 13 3.3.2 Case 2: Power Generation 14 3.4 Data Envelopment Analysis (DEA) 15 3.4.1 Operational Efficiency Model 15 3.4.2 Productivity Index 17 3.5 Reinforcement Learning (RL) 18 3.5.1 Element Definitions 18 3.5.2 QR-DQN with CVaR 21 3.6 Empirical Study 24 3.6.1 Case 1: Labour Productivity 25 3.6.2 Case 2: Power Generation 35 3.7 Summary and Discussion 45 Chapter 4. Environmental Model with Reinforcement Learning 46 4.1 Problem Definition 46 4.2 Data Collection 47 4.3 Data Envelopment Analysis (DEA) 48 4.3.1 Environmental Efficiency Model 49 4.3.2 Environmental Effectiveness Model 51 4.3.3 Productivity Index 54 4.4 Reinforcement Learning (RL) 55 4.4.1 Element Definitions 56 4.4.2 Mixed Strategy 58 4.5 Empirical Study 59 4.5.1 Case 1: Labour Productivity 59 4.5.2 Case 2: Power Generation 66 4.6 Summary and Discussion 82 Chapter 5. Conclusion and Future Research 83 References 85

    Abdulhai, B., & Kattan, L. (2003). Reinforcement learning: Introduction to theory and potential for transport applications. Canadian Journal of Civil Engineering, 30(6), 981-991.
    Afriat, S. N. (1972). Efficiency estimation of production functions. International Economic Review, 13 (3), 568–598.
    Banker, R. D., Chang, H., & Cooper, W. W. (1996). Equivalence and implementation of alternative methods for determining returns to scale in data envelopment analysis. European Journal of Operational Research, 89, 473–481.
    Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management Science, 30 (9), 1078–1092.
    Banker, R. D., Cooper, W. W., Seiford, L. M., Thrall, M., R., & Zhu, J. (2004). Returns to scale in different DEA models. European Journal of Operational Research, 154, 345–362.
    Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete event dynamic systems, 13(1), 41-77.
    Bellemare, M. G., Dabney, W., & Munos, R. (2017). A Distributional Perspective on Reinforcement Learning. Paper presented at the Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research. http://proceedings.mlr.press/v70/bellemare17a.html
    Bernhard, J., Pollok, S., & Knoll, A. (2019). Addressing inherent uncertainty: Risk-sensitive behavior generation for automated driving using distributional reinforcement learning. Paper presented at the 2019 IEEE Intelligent Vehicles Symposium (IV).
    Boyan, J., & Littman, M. (1993). Advances in Neural Information Processing Systems, volume 6, chapter Packet routing in dynamically changing networks: A reinforcement learning approach. In: Morgan Kaufmann, San Francisco, CA.
    Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research, 2(6), 429-444.
    Chen, C.-M., & Van Dalen, J. (2010). Measuring dynamic efficiency: Theories and an integrated methodology. European Journal of Operational Research, 203(3), 749-760.
    Chen, W.-C., & Cho, W.-J. (2009). A procedure for large-scale DEA computations. Computers & Operations Research, 36 (6), 1813–1824.
    Chow, Y.-L., & Pavone, M. (2014). A framework for time-consistent, risk-averse model predictive control: Theory and algorithms. Paper presented at the 2014 American Control Conference.
    Chow, Y., & Ghavamzadeh, M. (2014). Algorithms for CVaR Optimization in MDPs. arXiv pre-print server. doi:None
    arxiv:1406.3339
    Chow, Y., Ghavamzadeh, M., Janson, L., & Pavone, M. (2017). Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1), 6070-6120.
    Chung, Y. H., Färe, R., & Grosskopf, S. (1997). Productivity and undesirable outputs: a directional distance function approach. Journal of Environmental Management, 51(3), 229-240.
    Coelli, T. J., Rao, D. S. P., O'Donnell, C. J., & Battese, G. E. (2005). An introduction to efficiency and productivity analysis: springer science & business media.
    Dabney, W., Rowland, M., Bellemare, M., & Munos, R. (2018). Distributional reinforcement learning with quantile regression. Paper presented at the Proceedings of the AAAI Conference on Artificial Intelligence.
    Du, J., Chen, Y., & Huang, Y. (2018). A Modified Malmquist-Luenberger Productivity Index: Assessing Environmental Productivity Performance in China. European Journal of Operational Research, 269(1), 171-187. doi:10.1016/j.ejor.2017.01.006
    Dulac-Arnold, G., Mankowitz, D., & Hester, T. (2019). Challenges of Real-World Reinforcement Learning. arXiv pre-print server. doi:None
    arxiv:1904.12901
    Färe, R., Grosskopf, S., Lindgren, B., & Roos, P. (1992). Productivity changes in Swedish pharamacies 1980?1989: A non-parametric Malmquist approach. Journal of Productivity Analysis, 3(1-2), 85-101. doi:10.1007/bf00158770
    Färe, R., Grosskopf, S., Norris, M., & Zhang, Z. (1994). Productivity growth, technical progress, and efficiency change in industrialized countries. The American economic review, 66-83.
    Färe, R., Grosskopf, S., & Whittaker, G. (2013). Directional output distance functions: endogenous directions based on exogenous normalization constraints. Journal of Productivity Analysis, 40(3), 267-269.
    Fried, H. O., Lovell, C. A. K., & Schmidt, S. S. (2008). The Measurement of Productive Efficiency and Productivity Growth. Oxford University Press.
    Jane, Kurth-Nelson, Z., Tirumala, D., Soyer, H., Joel, Munos, R., . . . Botvinick, M. (2017). Learning to reinforcement learn. arXiv pre-print server. doi:None
    arxiv:1611.05763
    Jiang, C., Zhang, H., Ren, Y., Han, Z., Chen, K.-C., & Hanzo, L. (2016). Machine learning paradigms for next-generation wireless networks. IEEE Wireless Communications, 24(2), 98-105.
    Johnson, A. L., & Kuosmanen, T. (2012). One-stage and two-stage DEA estimation of the effects of contextual variables. European Journal of Operational Research, 220(2), 559-570.
    Johnson, A. L., & Lee, C.-Y. (2017). Predictive efficiency analysis: a study of US hospitals. Adv. DEA Theory Appl. With Extensions to Forecast. Model: Wiley, 404.
    Kao, C. (2014a). Efficiency decomposition in network data envelopment analysis with slacks-based measures. Omega, 45, 1-6.
    Kao, C. (2014b). Network data envelopment analysis: a review. European Journal of Operational Research, 239(1), 1–16.
    Kao, C., & Hwang, S.-N. (2008). Efficiency decomposition in two-stage data envelopment analysis: An application to non-life insurance companies in Taiwan. European Journal of Operational Research, 185(1), 418-429.
    Keramati, R., Dann, C., Tamkin, A., & Brunskill, E. (2020). Being optimistic to be conservative: Quickly learning a cvar policy. Paper presented at the Proceedings of the AAAI Conference on Artificial Intelligence.
    Kuosmanen, T., & Podinovski, V. (2009). Weak Disposability in Nonparametric Production Analysis: Reply to Färe and Grosskopf. American Journal of Agricultural Economics, 91(2), 539-545. doi:10.1111/j.1467-8276.2008.01238.x
    Lee, C.-Y. (2014). Meta-data envelopment analysis: Finding a direction towards marginal profit maximization. European Journal of Operational Research, 237(1), 207-216.
    Lee, C.-Y. (2015). Distinguishing Operational Performance in Power Production: A New Measure of Effectiveness by DEA. IEEE Transactions on Power Systems, 30(6), 3160-3167. doi:10.1109/tpwrs.2014.2372009
    Lee, C.-Y. (2016). Nash-profit efficiency: A measure of changes in market structures. European Journal of Operational Research, 255(2), 659-663.
    Lee, C.-Y. (2017). Directional marginal productivity: a foundation of meta-data envelopment analysis. Journal of the Operational Research Society, 68(5), 544–555.
    Lee, C.-Y. (2018). Mixed-strategy Nash equilibrium in data envelopment analysis. European Journal of Operational Research, 266(3), 1013-1024.
    Lee, C.-Y., & Johnson, A. L. (2011). A decomposition of productivity change in the semiconductor manufacturing industry. International Journal of Production Research, 49(16), 4761-4785. doi:10.1080/00207543.2010.497507
    Lee, C.-Y., & Johnson, A. L. (2012). Two-dimensional efficiency decomposition to measure the demand effect in productivity analysis. European Journal of Operational Research, 216(3), 584-593.
    Lee, C.-Y., & Johnson, A. L. (2014). Proactive data envelopment analysis: effective production and capacity expansion in stochastic environments. European Journal of Operational Research, 232 (3), 537–548.
    Lee, C.-Y., & Johnson, A. L. (2015a). Effective production: measuring of the sales effect using data envelopment analysis. Annals of Operations Research, 235(1), 453-486.
    Lee, C.-Y., & Johnson, A. L. (2015b). Measuring efficiency in imperfectly competitive markets: An example of rational inefficiency. Journal of Optimization Theory and Applications, 164(2), 702-722.
    Lee, C.-Y., & Zhou, P. (2015). Directional shadow price estimation of CO2, SO2 and NOx in the United States coal power industry 1990-2010. Energy Economics, 51, 493–502.
    Li, X., Wang, K., Liu, L., Xin, J., Yang, H., & Gao, C. (2011). Application of the entropy weight and TOPSIS method in safety evaluation of coal mines. Procedia Engineering, 26, 2085-2091.
    Liang, L., Wu, J., Cook, W. D., & Zhu, J. (2008). The DEA game cross-efficiency model and its Nash equilibrium. Operations Research, 56(5), 1278-1288.
    Lyle, C., Bellemare, M. G., & Castro, P. S. (2019). A comparative analysis of expected and distributional reinforcement learning. Paper presented at the Proceedings of the AAAI Conference on Artificial Intelligence.
    Mahadevan, S., & Connell, J. (1992). Automatic programming of behavior-based robots using reinforcement learning. Artificial intelligence, 55(2-3), 311-365.
    McDonald, J. (2009). Using least squares and tobit in second stage DEA efficiency analyses. European Journal of Operational Research, 197(2), 792-798.
    Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., . . . Ostrovski, G. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
    Morimura, T., Sugiyama, M., Kashima, H., Hachiya, H., & Tanaka, T. (2012). Parametric return density estimation for reinforcement learning. arXiv preprint arXiv:1203.3497.
    Mustapha, I., Ali, B. M., Rasid, M. F. A., Sali, A., & Mohamad, H. (2015). An energy-efficient spectrum-aware reinforcement learning-based clustering algorithm for cognitive radio sensor networks. Sensors, 15(8), 19783-19818.
    Nachum, O., Gu, S., Lee, H., & Levine, S. (2018). Data-Efficient Hierarchical Reinforcement Learning. arXiv pre-print server. doi:None
    arxiv:1805.08296
    Oh, D.-h. (2010). A global Malmquist-Luenberger productivity index. Journal of Productivity Analysis, 34(3), 183-197.
    Olson, D. L. (2004). Comparison of weights in TOPSIS models. Mathematical and Computer Modelling, 40(7-8), 721-727.
    Park, K.-H., Kim, Y.-J., & Kim, J.-H. (2001). Modular Q-learning based multi-agent cooperation for robot soccer. Robotics and Autonomous Systems, 35(2), 109-122.
    Pastor, J. T., & Lovell, C. K. (2005). A global Malmquist productivity index. Economics Letters, 88(2), 266-271.
    Petrik, M., & Subramanian, D. (2012). An approximate solution method for large risk-averse Markov decision processes. arXiv preprint arXiv:1210.4901.
    Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., . . . Graepel, T. (2017). Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815.
    Silver, D., Sutton, R. S., & Müller, M. (2007). Reinforcement Learning of Local Shape in the Game of Go. Paper presented at the IJCAI.
    Simar, L., & Wilson, P. W. (2007). Estimation and inference in two-stage, semi-parametric models of production processes. Journal of econometrics, 136(1), 31-64.
    Stanko, S. (2018). Risk-averse distributional reinforcement learning. MS thesis, Computer Science, Czech Technical University, Prague, Czech …,
    Stanko, S., & Macek, K. (2019). Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach. Paper presented at the IJCCI.
    Tamar, A., Glassner, Y., & Mannor, S. (2015). Optimizing the CVaR via sampling. Paper presented at the Proceedings of the AAAI Conference on Artificial Intelligence.
    Tone, K., & Tsutsui, M. (2014). Dynamic DEA with network structure: A slacks-based measure approach. Omega, 42(1), 124-131.
    Wang, K., Xian, Y., Lee, C.-Y., Wei, Y.-M., & Huang, Z. (2019). On selecting directions for directional distance functions in a non-parametric framework: A review. Annals of Operations Research, 278(1), 43-76.
    Wang, K., Zhang, J., & Wei, Y.-M. (2017). Operational and environmental performance in China's thermal power industry: Taking an effectiveness measure as complement to an efficiency measure. Journal of Environmental Management, 192, 254-270.
    Wang, P., Li, Y., Shekhar, S., & Northrop, W. F. (2019). Uncertainty Estimation with Distributional Reinforcement Learning for Applications in Intelligent Transportation Systems: A Case Study. Paper presented at the 2019 IEEE Intelligent Transportation Systems Conference (ITSC).
    Yu, M.-M., & Lin, E. T. (2008). Efficiency and effectiveness in railway performance using a multi-activity network DEA model. Omega, 36(6), 1005-1017.
    Zhu, J. (2020). DEA under big data: data enabled analytics and network data envelopment analysis. Annals of Operations Research, 1-23.
    Zhu, Y., Tian, D., & Yan, F. (2020). Effectiveness of Entropy Weight Method in Decision-Making. Mathematical Problems in Engineering, 2020, 1-5. doi:10.1155/2020/3564835
    Zofio, J. L., Pastor, J. T., & Aparicio, J. (2013). The directional profit efficiency measure: on why profit inefficiency is either technical or allocative. Journal of Productivity Analysis, 40(3), 257-266.

    下載圖示 校內:2024-08-01公開
    校外:2024-08-01公開
    QR CODE