簡易檢索 / 詳目顯示

研究生: 楊婷芸
YANG, TING-YUN
論文名稱: 強化學習應用於半導體離子植入排程問題
Reinforcement Learning for Semiconductor Ion Implantation Scheduling Problem
指導教授: 楊大和
Yang, Ta-Ho
王宏鍇
Wang, Hung-Kai
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 製造資訊與系統研究所
Institute of Manufacturing Information and Systems
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 70
中文關鍵詞: 強化學習半導體排程永續製造無效行動遮罩雙 Q-learning離子佈值
外文關鍵詞: Reinforcement learning, Semiconductor Scheduling, Sustainable Manufacturing, Ion Implantation, Invalid Action Masking, Double Q-learning
相關次數: 點閱:22下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在面對不斷精進的半導體製造環境中,關鍵尺寸(Critical Dimension, CD)不斷縮小,使得製程環節日益複雜,提出了對排程效率與資源利用更嚴苛的要求。同時,環境永續逐漸成為製造業不容忽視的核心問題,在改善生產效率之餘,也能兼顧資源節約與碳排放控制。本研究聚焦於晶片製程中離子植入階段的非等效平行機台排程問題綜合,考量設置時間、機台可用性及工單優先順序等多重限制條件,設計一種應用強化學習的智慧排程架構。
    所提出的DIQRL模型結合了雙Q-learning(Double Q-learning)與無效行動遮罩(Invalid Action Masking, IAM)機制,通過設計的複合獎勵函數以同時最小化總完工時間(Makespan)、設置次數(Setup Count)與報廢量(Scrap Quantity),旨在提升排程效能與生產彈性。為驗證本研究的效能,實驗中將DIQRL與最短加工時間法(SPT)、遺傳演算法(GA)、粒子群優化(PSO)、單一Q-learning(QRL)以及IQRL等其他演算法進行比較,實驗數據顯示DIQRL在完工時間、設置次數與報廢量等,在關鍵指標方面均取得更優表現,且具備穩定且適應性的排程決策能力。
    本研究證實,在半導體非等效平行機台排程問題中具有卓越的應用潛力,且可結合永續目標。未來研究可將碳排放、電力成本與環境政策等因素納入考量,發展兼具經濟與永續性之智慧排程系統,以實現綠色製造與智慧化生產的長遠目標。

    In the face of a continuously advancing semiconductor manufacturing environment, the progressive shrinking of critical dimensions (CD) has led to increasingly complex process stages, imposing more stringent demands on scheduling efficiency and capacity utilization. At the same time, environmental sustainability has gradually become a core issue that the manufacturing industry can no longer overlook. In addition to improving production efficiency, manufacturers are now expected to conserve resources and control carbon emissions. This study focuses on the unrelated parallel-machine scheduling problem in the ion implantation stage of semiconductor fabrication. It comprehensively considers multiple constraints, including setup time, machine availability, and job priority, and designs an intelligent scheduling framework based on reinforcement learning.
    The DIQRL model combines Double Q-learning with an Invalid Action Masking (IAM) mechanism. The composite reward function minimizes the makespan, setup count, and scrap quantity, improving scheduling performance and production flexibility. To evaluate the proposed method’s performance in terms of stability and efficiency, this research compares DIQRL with the Shortest Processing Time (SPT) heuristic, Genetic Algorithm (GA), Particle Swarm Optimization (PSO), standard Q-learning (QRL), and IQRL. The results show that DIQRL outperforms all benchmark methods on key performance indicators, achieving reduced completion times, fewer setup operations, and lower scrap quantities while providing stable and adaptive scheduling decisions.
    This study effectively applies reinforcement learning to the unrelated parallel-machine scheduling problem in semiconductor manufacturing and has the potential to incorporate sustainability objectives. Future extensions will incorporate carbon emissions, energy costs, and environmental regulations to develop an intelligent scheduling system that balances economic efficiency with environmental responsibility, thereby advancing green manufacturing and smart-factory initiatives.

    中文摘要I AbstractII Table of ContentIV List of TablesVI List of FiguresVII Chapter1.Introduction1 1.1Background and Motivation1 1.2Research Purpose3 1.3Research Overview4 Chapter2.Literature Review5 2.1 Semiconductor Manufacturing Scheduling Problem5 2.2 Parallel Machine Scheduling7 2.2.1 Classification and Characteristics of Parallel Machines7 2.3 Reinforcement Learning10 2.4 Reinforcement Learning for Scheduling12 Chapter3.Research Method15 3.1Problem Description16 3.1.1Characteristics of the Ion Implantation Process16 3.1.2Problem Constraints and Assumptions17 3.2Performance Indicators of Scheduling System19 3.3Methodology23 3.3.1Reinforcement Learning23 3.3.2Q-learning and Motivation for Double Q-learning24 3.3.3Double Q-learning Framework29 3.3.4Training Mechanism of Double Q-learning36 3.3.5Composite Reward Design38 Chapter4.Experiment40 4.1Data Collection40 4.2Experimental Design and Parameter Settings43 4.2.1Dispatching Rule44 4.2.2Particle Swarm Optimization44 4.2.3Genetic Algorithm46 4.2.4Reinforcement Learning47 4.3Experimental Results49 Chapter5.Conclusion and Future Research56 5.1Conclusion56 5.2Future Research57 Reference59

    Allahverdi,A.,Mittenthal,J.(1994).SCHEDULING ON M PARALLEL MACHINES SUBJECT TO RANDOM BREAKDOWNS TO MINIMIZE EXPECTED MEAN FLOW TIME.Naval Research Logistics,41(5),677–682.
    Baker,K.R.(1974).Introduction to sequencing and scheduling.
    Chen,C.Y.,Fathi,M.,Khakifirooz,M.,Wu,K.(2022).Hybrid tabu search algorithm for unrelated parallel machine scheduling in semiconductor fabs with setup times, job release, and expired times.Computers & Industrial Engineering,165,Article107915.
    Chen,D.,Zhang,J.,Wu,B.L.,Zhang,P.,Wang,M.(2024).Industrial data space application framework for semiconductor wafer manufacturing system scheduling.Journal of Manufacturing Systems,77,464–482.
    Clerc,M.,Kennedy,J.(2002).The particle swarm—explosion, stability, and convergence in a multidimensional complex space.IEEE Transactions on Evolutionary Computation,6(1),58–73.
    Diana,R.O.M.,deSouza,S.R.,Wanner,E.F.(2021).A robust multi-response VNS-aiNet approach for solving scheduling problems under unrelated parallel machines environments.Expert Systems with Applications,182,115140.
    Driessel,R.,Mönch,L.(2011).Variable neighborhood search approaches for scheduling jobs on parallel machines with sequence-dependent setup times, precedence constraints, and ready times.Computers & Industrial Engineering,61(2),336–345.
    Eberhart,R.,Kennedy,J.(1995).A new optimizer using particle swarm theory.MHS’95:Proceedings of the Sixth International Symposium on Micro Machine and Human Science.
    Ezugwu,A.E.,Akutsah,F.(2018).An improved firefly algorithm for the unrelated parallel machines scheduling problem with sequence-dependent setup times.IEEE Access,6,54459–54478.
    Garey,M.R.,Graham,R.L.(1975).Bounds for multiprocessor scheduling with resource constraints.SIAM Journal on Computing,4(2),187–200.
    Gonzalez,T.,Ibarra,O.H.,Sahni,S.(1977).Bounds for LPT schedules on uniform processors.SIAM Journal on Computing,6(1),155–166.
    Hasselt,H.(2010).Double Q-learning.Advances in Neural Information Processing Systems,23.
    Ho,J.W.,Huang,Y.S.,Fu,C.T.(2021).Dispatching problems for parallel machines in the TFT-LCD assembly process.International Transactions in Operational Research,28(5),2715–2732.
    Kayhan,B.M.,Yildiz,G.(2023).Reinforcement learning applications to machine scheduling problems: a comprehensive literature review.Journal of Intelligent Manufacturing,34(3),905–929.
    Kim,H.-J.,Lee,J.-H.(2021).Scheduling uniform parallel dedicated machines with job splitting, sequence-dependent setup times, and multiple servers.Computers & Operations Research,126,105115.
    Lee,C.-H.,Liao,C.-J.,Chao,C.-W.(2014).Unrelated parallel machine scheduling with dedicated machines and common deadline.Computers & Industrial Engineering,74,161–168.
    Lee,D.,Powell,W.B.(2019).Bias-corrected Q-learning with multistate extension.IEEE Transactions on Automatic Control,64(10),4011–4023.
    Lee,D.H.,Kang,H.,Lee,D.,Lee,J.,Kim,K.(2023).Deep reinforcement learning-based scheduler on parallel dedicated machine scheduling problem towards minimizing total tardiness.Sustainability,15(4),Article2920.
    Li,Y.(2017).Deep reinforcement learning: An overview.arXiv preprint arXiv:1701.07274.
    Liu,R.,Piplani,R.,Toro,C.(2023).A deep multi-agent reinforcement learning approach to solve dynamic job shop scheduling problem.Computers & Operations Research,159,106294.
    Mnih,V.,Kavukcuoglu,K.,Silver,D.,Rusu,A.A.,Veness,J.,Bellemare,M.G.,Graves,A.,Riedmiller,M.,Fidjeland,A.K.,Ostrovski,G.(2015).Human-level control through deep reinforcement learning.Nature,518(7540),529–533.
    Park,J.,Chun,J.,Kim,S.H.,Kim,Y.,Park,J.(2021).Learning to schedule job-shop problems: representation and policy learning using graph neural network and reinforcement learning.International Journal of Production Research,59(11),3360–3377.
    Rocholl,J.,Mönch,L.(2021).Decomposition heuristics for parallel-machine multiple orders per job scheduling problems with a common due date.Journal of the Operational Research Society,72(8),1737–1753.
    Rummery,G.A.,Niranjan,M.(1994).On-line Q-learning using connectionist systems (Vol.37).University of Cambridge,Department of Engineering.
    Shahvari,O.,Logendran,R.(2017).A bi-objective batch processing problem with dual-resources on unrelated-parallel machines.Applied Soft Computing,61,174–192.
    Shi,Y.,Eberhart,R.(1998).A modified particle swarm optimizer.1998 IEEE International Conference on Evolutionary Computation Proceedings,pp.69–73.
    Sutton,R.S.,Barto,A.G.(1998).Reinforcement Learning: An Introduction (Vol.1).MIT Press.
    Sutton,R.S.,Barto,A.G.(2018).Reinforcement Learning: An Introduction (2nd ed.).The MIT Press.
    Wang,H.-K.,Chou,C.-W.,Wang,C.-H.,Ho,L.-A.(2024).Sustainable scheduling of TFT-LCD cell production: a hybrid dispatching rule and two-phase genetic algorithm.International Journal of Production Economics,278,109412.
    Wang,H.K.,Chien,C.F.,Gen,M.S.(2015).An algorithm of multi-subpopulation parameters with hybrid estimation of distribution for semiconductor scheduling with constrained waiting time.IEEE Transactions on Semiconductor Manufacturing,28(3),353–366.
    Wang,X.,Zhang,L.,Lin,T.,Zhao,C.,Wang,K.,Chen,Z.(2022).Solving job scheduling problems in a resource preemption environment with multi-agent reinforcement learning.Robotics and Computer-Integrated Manufacturing,77,102324.
    Wang,Y.-H.,Li,T.-H.S.,Lin,C.-J.(2013).Backward Q-learning: the combination of SARSA algorithm and Q-learning.Engineering Applications of Artificial Intelligence,26(9),2184–2193.
    Watkins,C.J.C.H.(1989).Learning from delayed rewards.
    Yang,Z.,Bi,L.,Jiao,X.(2023).Combining reinforcement learning algorithms with graph neural networks to solve dynamic job shop scheduling problems.Processes,11(5),1571.
    Yuan,E.,Cheng,S.,Wang,L.,Song,S.,Wu,F.(2023).Solving job shop scheduling problems via deep reinforcement learning.Applied Soft Computing,143,110436.
    Zhang,X.,Zhu,G.-Y.(2024).A literature review of reinforcement learning methods applied to job-shop scheduling problems.Computers & Operations Research,106929.
    Zhou,T.,Tang,D.,Zhu,H.,Wang,L.(2020).Reinforcement learning with composite rewards for production scheduling in a smart factory.IEEE Access,9,752–766.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE