簡易檢索 / 詳目顯示

研究生: 吳家驊
Wu, Jia-Hwa
論文名稱: 平行編譯器中資料依賴性分析的改良
Improvement of Data Dependence Testing in Parallel Compilers
指導教授: 朱治平
Chu, Chih-Ping
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 99
中文關鍵詞: 平行編譯器迴圈平行化資料依賴性測試
外文關鍵詞: Data dependence test, loop parallelization, parallelizing compilers
相關次數: 點閱:82下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾十年來,平行處理已成為計算機界裡熱門的研究領域。根據統計處理機執行一個程式,大部份時間皆花費於迴圈執行上,因此,若能將循序程式透過平行編譯器做迴圈重組,使其能在向量處理機或平行機器上執行時能善用機器上之平行特性,將會有效改善程式之執行效率。平行編譯器將循序程式編譯成向量碼或平行碼時,最重要的工作是分析每個敘述之間變數使用的相依性,依據分析結果,提供迴圈重組的訊息。一個可重構編譯器的第一個工作就是完成資料依賴性分析,以及檢查可以被平行執行的程式碼。經由檢查陣列表示式,來完成記憶體存取型式的分析。一個可重構編譯器的下一個工作就是完成平行度的開發,或是經由之前的分析,來完成最佳化。

    資料相依分析是決定一個迴圈可否向量化或平行化的必須的步驟,它判斷在迴圈內是否會存取到相同之陣列元素或變數(即在相同的記憶體位址做存取)。近年來,平行編譯器的研究相當多,但是,資料相依測試一直是個難題,國外有許多著名的資料相依測試法,如Banerjee Test, test, Omega Test, I Test, Power Test, IR Test 等,這些較精確的測試法,常被使用在平行編譯器設計上之資料相依測試,在這篇論文中,二次元測試法(Quadratic Test)可以被用來測試具有常數邊界值之二次元方程式的一維陣列是否有整數解。實驗結果指出,在303對被測試之二次元方程式的一維陣列中,二次元測試法可以正確地找出所有被測試資料的整數解,然而針對相同被測試資料,範圍測試法(Range Test)無法正確地分辨出所有整數解,在資料依賴性分析之準確性上,二次元測試法是優於範圍測試法。

    其次,多維度的區間縮減測試法(Multi-dimension Interval Reduction Test)可以被用來測試具有常數邊界值的多維陣列是否有整數解。實驗結果指出,在276 對被測試的多維陣列中,分別有220對具常數邊界值, 7對具變動邊界值, 49對具符號邊界值,實驗結果顯示,它們的資料依賴性分析被多維度的區間縮減測試法所改進。

    多維度的區間測試法(Multi-dimension I Test)可以被用來測試具有常數邊界值的多維陣列是否有整數解。在276對被測試的多維陣列中,實驗結果顯示,威力測試法(Power Test)比其多花5.3至14.8倍時間,Omega Test比其多花6.1至19.2倍時間。

    最後,延伸版的多維度的區間測試法(Multi-dimension generalized I Test) 可以被用來測試具有變動邊界值和方向向量的多維陣列是否有整數解。在515對被測試的多維陣列中,實驗結果顯示,延伸版的多維度的區間測試法較延伸版的LAMBDA測試法正確率大約改進1.2%。

    For the past several decades, parallel processing has become an important research subject in the computer science area. According to the statistics, in executing a program, most of time is spent on the loops. If we can use the technique of loop restructuring in the parallelizing compiler such that the conventional sequential program can be executed by exploiting the characteristics of vector machine or parallel machine, the execution efficiency will be greatly improved. In the parallelizing compiler, data dependence analysis is very important because it provides the information for loop restructuring. The first task of a restructuring compiler is to perform a data dependence analysis and detect the segments of the code that can be executed concurrently. Essentially this means performing an analysis of the memory access pattern, mainly through inspection of the array subscript expressions. The next task of a parallelizing compiler is to finish parallelism exploitation or optimization that means to use the date and control dependence information previously detected for scheduling the work on target architecture.

    Data dependence analysis is necessary in order to determine whether a loop can be vectorized or parallelized. It analyzes whether the same array element or variable will be accessed more than once in a loop (e.g. access the same memory location more than once in loop execution). In the recent years, the researches about parallelizing compiler are considerable. But, data dependence analysis is still a bottleneck. There are many data dependence test such as Banerjee Test, test, Omega Test, I Test, Power Test, IR Test and so on, which have been used in the design of parallelizing compiler. In this thesis, the Quadratic test can be applied towards determining whether integer solutions exist for one-dimensional array references with constant bounds and quadratic equation, improving the applicable range of the Range test. Experiments with benchmark cited from EISPACK, LINPACK, Parallel loops, Livermore loops and Vector loops showed that among 303 pairs of one-dimensional array references with constant bound and quadratic equation tested, the Quadratic test accurately found all integer solutions. However, the Range test was not able to identify integer solutions for the same problem. The Quadratic test is superior to the Range test in terms of analyzing accuracy.

    Then, the multi-dimensional IR(Interval Reduction) test can be applied towards determining whether integer solutions exist for multi-dimensional arrays under linear subscripts and constant bounds, improving the testing precision of the Lambda test. Experiments with benchmark cited from SPEC77 showed that among 276 pairs of multi-dimensional array references. Of the 276 pairs of multi-dimensional array references, 220, 7 and 49 pairs were observed to have constant bounds, variable constraints and symbolic limits, subsequently. The significance of the multi-dimensional IR test lies in that it enhances the testing precision, eliminates the possible false dependency and exploits the degree of loop parallelization and vectorization.

    The multi-dimensional I test can be applied towards testing whether there are integer solutions for multi-dimensional linear arrays under constant limits. Experiments with benchmark cited from SPEC77 showed that among 276 pairs of multi-dimensional array references. It is found in our experiment that the Power test takes 5.3 to 14.8 times longer in execution than the multi-dimensional I test and the Omega test takes 6.1 to 19.2 times longer in execution than the multi-dimensional I test when testing the dependence of multi-dimensional arrays.

    Finally, the multi-dimensional generalized interval test that can be applied towards testing whether there are integer-valued solutions for multi-dimensional linear arrays under variable limits and any given direction vectors. Experiments with benchmark cited from EISPACK, Parallel loops showed that among 515 pairs of linear subscripts array references. In our experiments, 515 pairs of array references were found to have linear subscripts with variable bounds, and 6 of them were found by the proposed method and were not found by the generalized Lambda test to have definitive solutions. So the improvement rate of the multi-dimensional generalized interval test over the generalized Lambda test was about 1.2%.

    中文摘要 ………………………………………………………….….……IV Abstract …………………………………………………………………... VI Acknowledgements ……………………………………………………... VIII Contents ……………………………………………………………….….. IX List of Tables ………………………………………………………….….. XI List of Figures ………………………………………………………….... XII Chapter 1 Introduction …………………………………………………… 1 1.1 High Performance Computing through Parallel Processing … 1 1.2 Automatic Parallelization …….. 2 1.3 The Need for Parallelism Detection and Parallelism Exploitation …....... 3 1.4 Organization of Dissertation .... 5 Chapter 2 Background …………………………………………………..... 7 2.1 Data Dependence ……………………………………………………………….….. ....7 2.2 Data Dependence Equations …… 10 2.2.1 One Dimensional Arrays ……. 10 2.2.2 One Dimensional Arrays with Direction Vectors ........ 11 2.2.3 Multi Dimensional Arrays with Variable Bounds ………13 2.2.4 Multi Dimensional Arrays with Symbolic Bounds …. 14 2.3 Data Dependence Testing …..15 2.3.1 The GCD And Banerjee Tests ……. 15 2.3.2 The I Test ….. 15 2.3.3 The Direction Vector I Test …. 17 2.3.4 The Lambda Test ……. 19 2.3.5 Banerjee's Infinity Test ... 20 2.3.6 The Power Test 21 2.3.7 The Omega Test ……... 21 2.3.8 The IR Test …........ 22 Chapter 3 The Quadratic Test …………………………………….....…....24 3.1 Problem Definition ………....…... 24 3.2 Range Propagation Techniques .....……......26 3.3 The Quadratic Test ….…….........27 3.4 Examples ...…….….33 3.5 Experimental Results ………...35 Chapter 4 A Multi-dimensional Interval Reduction Test ………..……... 38 4.1 The Multi-dimensional IR (Interval Reduction) Test .………... 38 4.1.1 Find General Solutions of Dependence Equations ……….. 38 4.1.2 Find Exact Results for Dependence Equations with Loop Limits …………...39 4.2 Time Complexity ……….. 45 4.3 Comparison of Our Proposed Method and Interval Reduction Test…………..45 4.4 Experimental Results ………..46 Chapter 5 A Multi-dimensional I Test ……………………………………..49 5.1 The Multi-dimensional I Test ……... 49 5.1.1 The Case of Two Dimensional Array References……………………….52 5.1.2 The Case of Multi-dimensional Array References .…....…….……….57 5.2 Time Complexity .….......................................................…….………….….59 5.3 Experimental Results …….60 Chapter 6 A Multi-dimensional Generalized I Test ………………………..64 6.1 The Multi-dimensional Generalized Interval Test ………64 6.1.1 The Case of Two Dimensional Array References……………………………….71 6.1.2 The Case of Multi-dimensional Array References……………………………....78 6.2 The Algorithm……………………………....................................................................81 6.3 Time Complexity .….......................................................…….…….………82 6.4 Experimental Results ……...84 Chapter 7 Conclusions …………………………………………………… 86 7.1 Conclusions …….. 86 Bibliographies ……………………………………………………………...90 Vita ……………………………………………………………………….....98 Publication List ………………………………………………………….....99

    [AL83] J. R. Allen. Data Analysis for Subscripted Variable and Its Application to Program Transformations, Ph.D. Dissertation, Department of Mathematical Sciences, Rice University, Houston TX, 1983.

    [AKPW84] J. R Allen, K. Kennedy, C. Porterfield and J. Wareen. Conversion of control dependence to data dependence. In Conf. Rec. 10th ACM Sym, principle of Programming Languages (POPL), pp.177-189, 1983.

    [AK84] J. R Allen and K. Kennedy. Automatic Loop Interchange. Proceedings of the SIGPLAN '84 Symposium on Compiler Construction, Montreal, Canada. pp. 233-246, 1984.

    [AK87] J. R Allen and K. Kennedy. Automatic Translation of FORTRAN programs to Vector form. ACM TOPLAS, Vol.9, pp. 491-542, 1987.

    [BA76] U. Banerjee. Data dependence in ordinary program, M. S. thesis, University of Illinois at Urbana-Champaign, 1976.

    [BA88] U. Banerjee. Dependence Analysis for Supercomputing, Kluwer Academic Publishers, 1988.

    [BA93] U. Banerjee. Loop Transformations for Restructuring Compilers: The Foundations, Kluwer Academic Publishers, Boston, Mass, 1993.

    [BA94] U. Banerjee. Loop Parallelization, Kluwer Academic Publishers, 1994.

    [BA97] U. Banerjee. Dependence Analysis, Kluwer Academic Publishers, Norwell, Massachusetts, 1997.

    [BE92] W. Blume and R. Eigenmann. Performance Analysis of Parallelizing Compilers on the Perfect Benchmark Programs. IEEE Transaction on Parallel and Distributed Systems, Vol. 3, No. 6, pp. 643-656, November 1992.

    [BE94] W. Blume and R. Eigenmann. The Range Test: A Dependence Test for Symbolic, Non-linear Expressions. IEEE Supercomputing, Washington, D.C., pp. 528-537, November 1994.

    [BE95] W. Blume and R. Eigenmann, Symbolic Range Propagation, In Proc. Ninth Int'l Parallel Processing Symp., pp. 357-363 , 1995

    [BE98] W. Blume and R. Eigenmann. Nonlinear and Symbolic Data Dependence Testing. IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 12, pp. 1180-1194, 1998.

    [BEH+94] W. Blume, R. Eigenmann, J. Hoeflinger, D. Padua, P. Petersen, L. Rauchwerger, and P. Tu. Automatic Detection of Parallelism: A Grand Challenge for High Performance Computing. IEEE Parallel and Distribute Technology, 2(3): 37-47, fall 1994.

    [BKKPS94] David Bau, Induprakas Kodukula, Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. Solving Alignment Using Elementary Linear Algebra. In: Conference Record of the 7th Workshop on Languages and Compilers for Parallel Computing, pp.4660, 1994.

    [CC98] W.-L. Chang and C.-P. Chu. The Extension of the I Test, Parallel Computing, Vol. 24, pp.2101-2127, 1998

    [CC00] W.-L. Chang and C.-P. Chu. The Infinity Lambda Test: A Multi-dimensional Version of Banerjee's Infinity Test, Parallel Computing 26, pp. 1275-1295, 2000

    [CC01] W.-L. Chang, C.-P. Chu. The Generalized Direction Vector I Test, Parallel Computing 27, pp.1117-1144, 2001

    [CCW99] W.-L. Chang, C.-P. Chu and J. Wu. The Generalized Lambda test: A Multi-dimensional Version of Banerjee's Algorithm, International Journal of Parallel and Distributed Systems and Networks, Vol. 2, Issue 2, pp.69-78, 1999

    [CCW01] W.-L. Chang, C.-P. Chu and J.-H. Wu. A Multi-dimensional Version of the I Test, Parallel Computing 27, pp. 1783-1799, 2001

    [CCW02] W.-L. Chang, C.-P. Chu and J.-H. Wu, A Precise Dependence Analysis for Multi-dimensional Arrays Under Specific Dependence Direction, The Journal of Systems and Software 63, pp. 99-112, 2002

    [CDL91] David Callahan, Jack Dongarra and David Levine. Test Suite for Vectorizing Compilers, Version 3.0, Argonne National Laboratory, 1991.

    [CGS93] S. Chatterjee, J. R. Gillbert and R. Schreiber. Mobile and Replicated Alignment of Arrays in Data-Parallel Programs. In Proceeding Supercomputing, pp.420429, 1993.

    [CH97] P. –S. Chen. Studies on Optimizing Communication for Massively Parallel Processing Systems. M.S. Thesis, National Cheng Kung University at Taiwan, 1997.

    [CH99] W.-L. Chang. Improvement of Data Dependence Testing and Data Alignment Methods in Parallel Compilers, Ph. D. Thesis, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, Republic of China, June 1999.

    [CHC04] W.-L. Chang, J.-W. Huang and C.-P. Chu. Using Elementary Linear Algebra to Solve Data Alignment for Arrays with Linear or Quadratic References, IEEE Transaction on Parallel and Distributed System, Vol. 15, No. 1, pp. 28-39, January 2004.

    [CHL04] P.-S. Chen, Y.-S. Hwang, R. D-C. Ju and J. K. Lee. Interprocedural Probabilistic Pointer Analysis, IEEE Transactions on Parallel and Distributed System, Volume 15, Issue 10, pp. 893-907, October 2004.

    [CS94] T. S. Chen and J. P. Sheu. Communication-Free Data Allocation Techniques forParallelizing Compilers on Multi-Computers. IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 9, pp.924938, 1994.

    [DFR91] J. Dongar, M. Furtney and S. Reinhardt. Parallel Loops - A Test Suite for Parallel Compilers: Description and Example Results. Parallel Computing 17, pp. 1247-1255, 1991

    [ED67] J. Edmonds. Systems of Distinct Representative and Linear Algebra, Journal of research of national bureau of standards, Sect. B, Vol. 71, No. 4, pp.241245, 1967.

    [EHP98] R. Eigenmann, J. Hoeflinger and D. Padua. On the Automatic Parallelization of the Perfect Benchmarks, IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 1, pp. 5-23, 1998.

    [GB94] M. Gupta and P. Banerjee. Compile-time Estimation of Communication Cost of Programs. The Journal of Programming Languages, pp.139, 1994.

    [GHL89] M. B. Girkar, M. R. Haghighat, C. L. Lee, B. P. Leung and D. A. Schouten. Parafrase-2: An Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors, Proceedings of the International Conference on Parallel Processing, St. Charles IL, , pp. II39-48. August 1989

    [GHL90] M. B. Girkar, M. R. Haghighat, C. L. Lee, B. P. Leung and D. A. Schouten. The Structure of Parafrase-2: An Advanced Parallelizing Compiler for C and Fortran, Languages and Compilers for Parallel Computing, MIT Press, 1990.

    [HH06] T.-C. Huang and P.-H. Hsu. Predecessor/Successor Approach for High-performance Run-time Wavefront Scheduling, Information Sciences, Volume 176, Issue 7, pp. 845-860, 2006

    [HO98] J. Hoeflinger. Interprocedural Parallelization Using Memory Classification Analysis, Ph.D. thesis, University of Illinois at Urbana-Champaign, 1998.

    [HP91] M. Haghighat, C. Polychronopoulos. Symbolic Dependence Analysis for High-Performance Parallelizing Compilers. Parallel and Distributed Computing: Advances in Languages and Compilers for Parallel Processing, MIT Press, Cambridge, MA, pp. 310330, 1991.

    [HS93] C.-H. Huang and P. Sadayappan. Communication-Free Hyperplane Partitioning of Nested Loops. Journal of Parallel and Distributed Computing, Vol. 19, pp.90102, 1993.

    [HW93] K. Hwang. Advanced Computer Architecture: Parallelism, Scalability, and Programmability, McGraw-Hill, New York, 1993.

    [HY00] T.-C. Huang and C.-M. Yang. Data Dependence Analysis for Array References, The Journal of Systems and Software 52, pp. 55-65, 2000

    [KKP91] X. Kong, D. Klappholz and K. Psarriss. The I Test. IEEE Transaction on Parallel and Distributed Systems, Vol. 2, No. 3, pp.342-349, 1991.

    [KKP93] X. Kong, D. Klappholz and K. Psarriss. The Direction Vector I Test. IEEE Transaction on Parallel and Distributed Systems, Vol. 4, No. 11 pp. 12801290, 1993.

    [KN81] D.E. Knuth. The Art of Computer Programming, Vol. 2, Seminumerical Algorithms, second edition Reading, MA: Addison-Wesley, 1981.

    [LC90] J. Li and M. Chen. Index Domain Alignment: Minimizing Cost of Cross-Reference Between Distributed Arrays. In: Proceedings 3rd Symposium on the Frontiers of Massively Computation (College Park, MA), pp.424433, 1990.

    [LC91] J. Li and M. Chen. The Aata Alignment Phase in Compiling Programs for Distributed-Memory Machines. Journal of Parallel and Distributed Computing, Vol. 13, pp.213221, 1991.

    [LCD91] D. Levine, D. Callahan and J. Dongarra. A comparative study of automatic vectorizing compilers. Parallel Computing 17, pp.1223-1244, 1991

    [LL94] A. W. Lim and M. S. Lam. Communication-Free Parallelization via Affine Transformations. In Proceedings of the 7th Workshop on Programming Languages and Compilers for Parallel Computing, pp.92106, 1994.

    [LW89] J.M. Levesque and J.W. Williamson. A Guidebook to Fortran on Supercomputing (New York, Academic Press, 1989).

    [LYZ90] Z. Li, P.-C. Yew and C.-Q. Zhu. An Efficient Data Dependence Analysis for Parallelizing Compilers. IEEE Transaction on Parallel and Distributed Systems, Vol. 1, No. 1, pp. 2634, 1990.

    [MA87] M. Mace, Memory Storage Patterns in Parallel Processing (Boston, MA: Kluwer Academic), 1987.

    [NP99] D. Niedzielski and K. Psarriss. An Analytical Comparison of the I-Test and Omega Test. LCPC'99: Twelfth International Workshop on Languages and Compilers for Parallel Computing, 1999.

    [PU92] W. Pugh. A Practical Algorithm for Exact Array Dependence Analysis. Communication of the ACM, 35(8), pp.102114, 1992.

    [PE93] P. M. Petersen. Evaluation of Programs and Parallelizing Compilers Using Dynamic Analysis Techniques. PhD thesis, University of Illinois at Urbana-Champaign, January 1993.

    [PHP02] Y. Paek, J. Hoeflinger and D. A. Padua. Efficient and precise array access analysis, ACM Transaction Programming Language System 24(1): pp. 65-109, 2002.

    [PP96] P. M. Petersen and D. A. Padua. Static and Dynamic Evaluation of Data Dependence Analysis Techniques. IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 11, pp. 1121-1132, 1996.

    [RRH03] S. Rus, L. Rauchwerger and J. Hoeflinger. Hybrid Analysis: Static & Dynamic Memory Reference Analysis. International Journal of Parallel Programming 31(4): pp. 251-283, 2003.

    [RS91] J. Ramanujam and P. Sadayappan. Compile-Time Techniques for Data Distributed in Distributed Memory Machines. IEEE Trans. on Parallel and Distributed Systems, Vol. 2, No. 4, pp.472482, 1991.

    [RU64] W. Rudin. Principles of Mathematical Analysis. International Series in Pure and Applied Mathematics, McGraw-Hill Book Company, 1964.

    [RWP06] G. Ren, P. Wu and D. A. Padua. Optimizing data permutations for SIMD devices, Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, Ottawa, Ontario, Canada, June 11-14, pp. 118-131, 2006.

    [SM76] B. J. Smith et al., Matrix Eigensystem Routines-Eispack Guide (Heidelberg: Springer), 1976.

    [SLY92] Z. Shen, Z. Li and P.-C. Yew. An Empirical Study of Fortran Programs for Parallelizing Compilers. IEEE Transaction on Parallel and Distributed Systems, Vol. 1, No. 3, pp.356-364, 1992.

    [SSH96] K.-P. Shih, J.-P. Sheu and C.-H. Huang. Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers. In Conference Record of the 9th Workshop on Languages and Compilers for Parallel Computing, pp.389403, 1996.

    [THH05] X. Tian, J. Hoeflinger, G. Haab, Y.-K. Chen, M. Girkar and S. Shah. A Compiler for Exploiting Nested Parallelism in Open MP Programs, Parallel Computing 31(10-12): pp. 960-983, 2005.

    [TIF86] R. Triolet, F. Irigoin and P. Feautrier. Direct Parallelization of Call Statements, in Proc. SIGPLAN Symp. Compiler Construction, Palo Alto, C.A., pp.176-185, June 1986.

    [VA86] J. W. Vaughan. A residuals management model of the iron and steel industry: a linear programming approach, Mich.: Univ. Microfilms International, Ann Arbor, 1986.

    [WO82] M. Wolfe. Optimizing Supercompilers for Supercomputers, Ph.D. Dissertation, Technical Report 82-1009, Department of Computer Science, University of Illinois at Urbana-Champaign, 1982.

    [WO86] M. Wolfe. Advanced Loop Interchanging. Proceedings of the 1986 International Conference on Parallel Proceeding, Charles, Illinois, pp. 536-543, 1986.

    [WO96] M. Wolfe. High Performance Compiler for Parallel Computing, Addison-Wesley Publishing Company, 1996.

    [WT92] M. Wolfe and C.-W. Tseng. The Power Test for Data Dependence. IEEE Transaction on Parallel and Distributed Systems, Vol. 3, No. 5, pp.591601, 1992.

    [YA00] C.-M. Yang. Data Dependence Analysis and Its Application on Loop Transformation, Ph. D. Thesis, Department of Electrical Engineering, National Sun Yat-Sen University, Taiwan, Republic of China, June 2000.

    [ZC91] H. P. Zima and B. Chapman. Supercompilers for parallel and Vector computers, New York MA: Addison-Wesley, 1991.

    下載圖示 校內:2008-07-25公開
    校外:2008-07-25公開
    QR CODE