A Systolic Array Optimizing Compiler

Springer Science & Business Media, 31/01/1989 - 202 páginas

This book is a revision of my Ph. D. thesis dissertation submitted to Carnegie Mellon University in 1987. It documents the research and results of the compiler technology developed for the Warp machine. Warp is a systolic array built out of custom, high-performance processors, each of which can execute up to 10 million floating-point operations per second (10 MFLOPS). Under the direction of H. T. Kung, the Warp machine matured from an academic, experimental prototype to a commercial product of General Electric. The Warp machine demonstrated that the scalable architecture of high-peiformance, programmable systolic arrays represents a practical, cost-effective solu tion to the present and future computation-intensive applications. The success of Warp led to the follow-on iWarp project, a joint project with Intel, to develop a single-chip 20 MFLOPS processor. The availability of the highly integrated iWarp processor will have a significant impact on parallel computing. One of the major challenges in the development of Warp was to build an optimizing compiler for the machine. First, the processors in the xx A Systolic Array Optimizing Compiler array cooperate at a fine granularity of parallelism, interaction between processors must be considered in the generation of code for individual processors. Second, the individual processors themselves derive their performance from a VLIW (Very Long Instruction Word) instruction set and a high degree of internal pipelining and parallelism. The compiler contains optimizations pertaining to the array level of parallelism, as well as optimizations for the individual VLIW processors.

Pré-visualizar este livro »

Páginas seleccionadas

Página do título

Índice remissivo

Referências

Índice

Introduction	1

11 Research approach	4

12 Overview of results	5

122 Cell level optimizations	6

13 This presentation	8

Architecture of Warp	11

21 The architecture	13

211 Warp cell	14

531 Bounds on the initiation interval	98

532 Scheduling an acyclic graph	100

533 Scheduling a cyclic graph	103

5331 Combining strongly connected components	104

5332 Scheduling a strongly connected component	106

5333 Complete algorithm	113

54 Modulo variable expansion	114

55 Code size requirement	117

212 Interface unit	18

22 Application domain of Warp	19

23 Programming complexity	21

A Machine Abstraction	25

31 Previous systolic array synthesis techniques	26

32 Comparisons of machine abstractions	29

321 Programrnability	30

3211 Partitioning methods	31

3212 Programmability of synchronous models	32

322 Efficiency	35

asynchronous communication	37

331 Effect of parallelism in cells	39

332 Scheduling constraints between receives and sends	43

3321 The problem	44

3322 The analysis	45

3323 Implications	48

333 Discussion on the asynchronous communication model	50

34 Hardware and software support	51

queues	52

342 Compiletime flow control	53

3421 The skewed computation model	54

3422 Algorithm to find the minimum skew	58

342 3 Hardware design	64

35 Chapter summary	66

The W2 Language and Compiler	69

41 The W2 language	70

42 Compiler overview	73

43 Scheduling a basic block	77

432 List scheduling	79

433 Ordering and priority function	80

Software Pipelining	83

51 Introduction to software pipelining	87

52 The scheduling problem	91

522 Definition and complexity of problem	95

53 Scheduling algorithm	96

56 Comparison with previous work	120

562 The polycyclic machine	121

57 Chapter summary	123

Hierarchical Reduction	125

61 The iterative construct	128

62 The conditional construct	130

621 Branches taking different amounts of time	133

622 Code size	134

63 Global code motions	135

64 Comparison with previous work	139

641 Trace scheduling	140

6411 Loop branches	141

6412 Conditionals	143

642 Comparison with vector instructions	144

Evaluation	147

71 The experiment	148

711 Status of compiler	149

712 The programs	151

72 Performance analysis of global scheduling techniques	158

722 Efficiency of scheduler	161

7221 Exclusive IO time	163

7222 Global resource use count	165

7223 Data dependency	168

723 Discussion on effectiveness of the Warp architecture	169

73 Performance of software pipelining	170

732 Effectiveness of software pipelining	171

733 Feasibility of software pipelining	176

74 Livermore Loops	180

75 Summary and discussion	183

Conclusions	187

81 Machine abstraction for systolic arrays	188

82 Code scheduling techniques	190

References	193

Index

Direitos de autor

Outras edições - Ver tudo

A Systolic Array Optimizing Compiler
Monica S. Lam
Pré-visualização limitada - 2012

A Systolic Array Optimizing Compiler
Monica S. Lam
Pré-visualização indisponível - 2011

Palavras e frases frequentes

achieved acyclic graph asynchronous communication asynchronous communication model basic block branches buffer chapter code scheduling compaction compile-time flow control computation rate conditional statements cycles data dependency data flow data item data path delay dynamic flow control Figure floating-point flow graph functional units hardware heuristics hierarchical reduction host implemented innermost loops input instruction ISBN iWarp large numbers list scheduling lower bound machine abstraction machine model MFLOPS micro-operation micro-operation sequence microcode Microprogramming minimum skew modulo variable expansion multiple node number of iterations optimal output overlapped parallel and pipelined PC Warp performance polynomial evaluation precedence constraints prolog prototype machine queue model receive and send RecX reduced register files resource usage Sched scheduling algorithm scheduling constraints scheduling problem send operations Sendx SIMD software pipelined loop software pipelining speed strongly connected components systolic algorithms systolic array throughput tion trace scheduling upper bound vector VLIW Warp array Warp cell Warp machine

Passagens conhecidas

Página 194 - Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 62-73, 1992.‎

Aparece em 45 livros desde 1987-2004

Referências a este livro

Superscalar Microprocessor Design
Mike Johnson
Visualização de excertos - 1991

IWarp: Anatomy of a Parallel Computing System
Thomas Gross,David Richard O'Hallaron
Pré-visualização limitada - 1998

Todos os resultados do Google Livros »

Informação bibliográfica

Título	A Systolic Array Optimizing Compiler Volume 64 de The Springer International Series in Engineering and Computer Science
Autor	Monica S. Lam
Edição	ilustrada
Editora	Springer Science & Business Media, 1989
ISBN	0898383005, 9780898383003
Número de páginas	202 páginas

Exportar citação	BiBTeX EndNote RefMan

Acerca do Google Livros - Política de Privacidade - Termos de serviço - Informação para as editoras - Comunicar um problema - Ajuda - Google Página inicial