A Systolic Array Optimizing Compiler

Capa
Springer Science & Business Media, 31/01/1989 - 202 páginas
This book is a revision of my Ph. D. thesis dissertation submitted to Carnegie Mellon University in 1987. It documents the research and results of the compiler technology developed for the Warp machine. Warp is a systolic array built out of custom, high-performance processors, each of which can execute up to 10 million floating-point operations per second (10 MFLOPS). Under the direction of H. T. Kung, the Warp machine matured from an academic, experimental prototype to a commercial product of General Electric. The Warp machine demonstrated that the scalable architecture of high-peiformance, programmable systolic arrays represents a practical, cost-effective solu tion to the present and future computation-intensive applications. The success of Warp led to the follow-on iWarp project, a joint project with Intel, to develop a single-chip 20 MFLOPS processor. The availability of the highly integrated iWarp processor will have a significant impact on parallel computing. One of the major challenges in the development of Warp was to build an optimizing compiler for the machine. First, the processors in the xx A Systolic Array Optimizing Compiler array cooperate at a fine granularity of parallelism, interaction between processors must be considered in the generation of code for individual processors. Second, the individual processors themselves derive their performance from a VLIW (Very Long Instruction Word) instruction set and a high degree of internal pipelining and parallelism. The compiler contains optimizations pertaining to the array level of parallelism, as well as optimizations for the individual VLIW processors.
 

Índice

Introduction
1
11 Research approach
4
12 Overview of results
5
122 Cell level optimizations
6
13 This presentation
8
Architecture of Warp
11
21 The architecture
13
211 Warp cell
14
531 Bounds on the initiation interval
98
532 Scheduling an acyclic graph
100
533 Scheduling a cyclic graph
103
5331 Combining strongly connected components
104
5332 Scheduling a strongly connected component
106
5333 Complete algorithm
113
54 Modulo variable expansion
114
55 Code size requirement
117

212 Interface unit
18
22 Application domain of Warp
19
23 Programming complexity
21
A Machine Abstraction
25
31 Previous systolic array synthesis techniques
26
32 Comparisons of machine abstractions
29
321 Programrnability
30
3211 Partitioning methods
31
3212 Programmability of synchronous models
32
322 Efficiency
35
asynchronous communication
37
331 Effect of parallelism in cells
39
332 Scheduling constraints between receives and sends
43
3321 The problem
44
3322 The analysis
45
3323 Implications
48
333 Discussion on the asynchronous communication model
50
34 Hardware and software support
51
queues
52
342 Compiletime flow control
53
3421 The skewed computation model
54
3422 Algorithm to find the minimum skew
58
342 3 Hardware design
64
35 Chapter summary
66
The W2 Language and Compiler
69
41 The W2 language
70
42 Compiler overview
73
43 Scheduling a basic block
77
432 List scheduling
79
433 Ordering and priority function
80
Software Pipelining
83
51 Introduction to software pipelining
87
52 The scheduling problem
91
522 Definition and complexity of problem
95
53 Scheduling algorithm
96
56 Comparison with previous work
120
562 The polycyclic machine
121
57 Chapter summary
123
Hierarchical Reduction
125
61 The iterative construct
128
62 The conditional construct
130
621 Branches taking different amounts of time
133
622 Code size
134
63 Global code motions
135
64 Comparison with previous work
139
641 Trace scheduling
140
6411 Loop branches
141
6412 Conditionals
143
642 Comparison with vector instructions
144
Evaluation
147
71 The experiment
148
711 Status of compiler
149
712 The programs
151
72 Performance analysis of global scheduling techniques
158
722 Efficiency of scheduler
161
7221 Exclusive IO time
163
7222 Global resource use count
165
7223 Data dependency
168
723 Discussion on effectiveness of the Warp architecture
169
73 Performance of software pipelining
170
732 Effectiveness of software pipelining
171
733 Feasibility of software pipelining
176
74 Livermore Loops
180
75 Summary and discussion
183
Conclusions
187
81 Machine abstraction for systolic arrays
188
82 Code scheduling techniques
190
References
193
Index
Direitos de autor

Outras edições - Ver tudo

Palavras e frases frequentes

Passagens conhecidas

Página 194 - Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 62-73, 1992.

Informação bibliográfica