Algorithms and Parallel ComputingISBN: 9780470902103
364 pages
April 2011

List of Acronyms.
1 Introduction.
1.1 Introduction.
1.2 Toward Automating Parallel Programming.
1.3 Algorithms.
1.4 Parallel Computing Design Considerations.
1.5 Parallel Algorithms and Parallel Architectures.
1.6 Relating Parallel Algorithm and Parallel Architecture.
1.7 Implementation of Algorithms: A TwoSided Problem.
1.8 Measuring Benefi ts of Parallel Computing.
1.9 Amdahl’s Law for Multiprocessor Systems.
1.10 Gustafson–Barsis's Law.
1.11 Applications of Parallel Computing.
2 Enhancing Uniprocessor Performance.
2.1 Introduction.
2.2 Increasing Processor Clock Frequency.
2.3 Parallelizing ALU Structure.
2.4 Using Memory Hierarchy.
2.5 Pipelining.
2.6 Very Long Instruction Word (VLIW) Processors.
2.7 InstructionLevel Parallelism (ILP) and Superscalar Processors.
2.8 Multithreaded Processor.
3 Parallel Computers.
3.1 Introduction.
3.2 Parallel Computing.
3.3 SharedMemory Multiprocessors (Uniform Memory Access [UMA]).
3.4 DistributedMemory Multiprocessor (Nonuniform Memory Access [NUMA]).
3.5 SIMD Processors.
3.6 Systolic Processors.
3.7 Cluster Computing.
3.8 Grid (Cloud) Computing.
3.9 Multicore Systems.
3.10 SM.
3.11 Communication Between Parallel Processors.
3.12 Summary of Parallel Architectures.
4 SharedMemory Multiprocessors.
4.1 Introduction.
4.2 Cache Coherence and Memory Consistency.
4.3 Synchronization and Mutual Exclusion.
5 Interconnection Networks.
5.1 Introduction.
5.2 Classification of Interconnection Networks by Logical Topologies.
5.3 Interconnection Network Switch Architecture.
6 Concurrency Platforms.
6.1 Introduction.
6.2 Concurrency Platforms.
6.3 Cilk++.
6.4 OpenMP.
6.5 Compute Unifi ed Device Architecture (CUDA).
7 Ad Hoc Techniques for Parallel Algorithms.
7.1 Introduction.
7.2 Defining Algorithm Variables.
7.3 Independent Loop Scheduling.
7.4 Dependent Loops.
7.5 Loop Spreading for Simple Dependent Loops.
7.6 Loop Unrolling.
7.7 Problem Partitioning.
7.8 DivideandConquer (Recursive Partitioning) Strategies.
7.9 Pipelining.
8 Nonserial–Parallel Algorithms.
8.1 Introduction.
8.2 Comparing DAG and DCG Algorithms.
8.3 Parallelizing NSPA Algorithms Represented by a DAG.
8.4 Formal Technique for Analyzing NSPAs.
8.5 Detecting Cycles in the Algorithm.
8.6 Extracting Serial and Parallel Algorithm Performance Parameters.
8.7 Useful Theorems.
8.8 Performance of Serial and Parallel Algorithms on Parallel Computers.
9 zTransform Analysis.
9.1 Introduction.
9.2 Definition of zTransform.
9.3 The 1D FIR Digital Filter Algorithm.
9.4 Software and Hardware Implementations of the zTransform.
9.5 Design 1: Using Horner’s Rule for Broadcast Input and Pipelined Output.
9.6 Design 2: Pipelined Input and Broadcast Output.
9.7 Design 3: Pipelined Input and Output.
10 Dependence Graph Analysis.
10.1 Introduction.
10.2 The 1D FIR Digital Filter Algorithm.
10.3 The Dependence Graph of an Algorithm.
10.4 Deriving the Dependence Graph for an Algorithm.
10.5 The Scheduling Function for the 1D FIR Filter.
10.6 Node Projection Operation.
10.7 Nonlinear Projection Operation.
10.8 Software and Hardware Implementations of the DAG Technique.
11 Computational Geometry Analysis.
11.1 Introduction.
11.2 Matrix Multiplication Algorithm.
11.3 The 3D Dependence Graph and Computation Domain D.
11.4 The Facets and Vertices of D.
11.5 The Dependence Matrices of the Algorithm Variables.
11.6 Nullspace of Dependence Matrix: The Broadcast Subdomain B.
11.7 Design Space Exploration: Choice of Broadcasting versus Pipelining Variables.
11.8 Data Scheduling.
11.9 Projection Operation Using the Linear Projection Operator.
11.10 Effect of Projection Operation on Data.
11.11 The Resulting Multithreaded/Multiprocessor Architecture.
11.12 Summary of Work Done in this Chapter.
12 Case Study: OneDimensional IIR Digital Filters.
12.1 Introduction.
12.2 The 1D IIR Digital Filter Algorithm.
12.3 The IIR Filter Dependence Graph.
12.4 zDomain Analysis of 1D IIR Digital Filter Algorithm.
13 Case Study: Two and ThreeDimensional Digital Filters.
13.1 Introduction.
13.2 Line and Frame Wraparound Problems.
13.3 2D Recursive Filters.
13.4 3D Digital Filters.
14 Case Study: Multirate Decimators and Interpolators.
14.1 Introduction.
14.2 Decimator Structures.
14.3 Decimator Dependence Graph.
14.4 Decimator Scheduling.
14.5 Decimator DAG for s1 = [1 0].
14.6 Decimator DAG for s2 = [1 1].
14.7 Decimator DAG for s3 = [1 1].
14.8 Polyphase Decimator Implementations.
14.9 Interpolator Structures.
14.10 Interpolator Dependence Graph.
14.11 Interpolator Scheduling.
14.12 Interpolator DAG for s1 = [1 0].
14.13 Interpolator DAG for s2 = [1 1].
14.14 Interpolator DAG for s3 = [1 1].
14.15 Polyphase Interpolator Implementations.
15 Case Study: Pattern Matching.
15.1 Introduction.
15.2 Expressing the Algorithm as a Regular Iterative Algorithm (RIA).
15.3 Obtaining the Algorithm Dependence Graph.
15.4 Data Scheduling.
15.5 DAG Node Projection.
15.6 DESIGN 1: Design Space Exploration When s = [1 1]t.
15.7 DESIGN 2: Design Space Exploration When s = [1 1]t.
15.8 DESIGN 3: Design Space Exploration When s = [1 0]t.
16 Case Study: Motion Estimation for Video Compression.
16.1 Introduction.
16.2 FBMAs.
16.3 Data Buffering Requirements.
16.4 Formulation of the FBMA.
16.5 Hierarchical Formulation of Motion Estimation.
16.6 Hardware Design of the Hierarchy Blocks.
17 Case Study: Multiplication over GF(2m).
17.1 Introduction.
17.2 The Multiplication Algorithm in GF(2m).
17.3 Expressing Field Multiplication as an RIA.
17.4 Field Multiplication Dependence Graph.
17.5 Data Scheduling.
17.6 DAG Node Projection.
17.7 Design 1: Using d1 = [1 0]t.
17.8 Design 2: Using d2 = [1 1]t.
17.9 Design 3: Using d3 = [1 1]t.
17.10 Applications of Finite Field Multipliers.
18 Case Study: Polynomial Division over GF(2).
18.1 Introduction.
18.2 The Polynomial Division Algorithm.
18.3 The LFSR Dependence Graph.
18.4 Data Scheduling.
18.5 DAG Node Projection.
18.6 Design 1: Design Space Exploration When s1 = [1 1].
18.7 Design 2: Design Space Exploration When s2 = [1 0].
18.8 Design 3: Design Space Exploration When s3 = [1 0.5].
18.9 Comparing the Three Designs.
19 The Fast Fourier Transform.
19.1 Introduction.
19.2 DecimationinTime FFT.
19.3 Pipeline Radix2 DecimationinTime FFT Processor.
19.4 DecimationinFrequency FFT.
19.5 Pipeline Radix2 DecimationinFrequency FFT Processor.
20 Solving Systems of Linear Equations.
20.1 Introduction.
20.2 Special Matrix Structures.
20.3 Forward Substitution (Direct Technique).
20.4 Back Substitution.
20.5 Matrix Triangularization Algorithm.
20.6 Successive over Relaxation (SOR) (Iterative Technique).
20.7 Problems.
21 Solving Partial Differential Equations Using Finite Difference Method.
21.1 Introduction.
21.2 FDM for 1D Systems.
References.
Index.
Algorithms and Parallel Computing (US $114.00)
and LargeScale Computing Techniques for Complex System Simulations (US $107.95)
Total List Price: US $221.95
Discounted Price: US $166.46 (Save: US $55.49)