Skip to main content

Modeling and Optimization of Parallel and Distributed Embedded Systems

Modeling and Optimization of Parallel and Distributed Embedded Systems

Arslan Munir , Ann Gordon-Ross , Sanjay Ranka

ISBN: 978-1-119-08639-0

Dec 2015, Wiley-IEEE Press

400 pages

$112.99

Description

This book introduces the state-of-the-art in research in parallel and distributed embedded systems, which have been enabled by developments in silicon technology, micro-electro-mechanical systems (MEMS), wireless communications, computer networking, and digital electronics. These systems have diverse applications in domains including military and defense, medical, automotive, and unmanned autonomous vehicles.

The emphasis of the book is on the modeling and optimization of emerging parallel and distributed embedded systems in relation to the three key design metrics of performance, power and dependability.

Key features:

  • Includes an embedded wireless sensor networks case study to help illustrate the modeling and optimization of distributed embedded systems.  
  • Provides an analysis of multi-core/many-core based embedded systems to explain the modeling and optimization of parallel embedded systems.
  • Features an application metrics estimation model; Markov modeling for fault tolerance and analysis; and queueing theoretic modeling for performance evaluation.
  • Discusses optimization approaches for distributed wireless sensor networks; high-performance and energy-efficient techniques at the architecture, middleware and software levels for parallel multicore-based embedded systems; and dynamic optimization methodologies.
  • Highlights research challenges and future research directions.

The book is primarily aimed at researchers in embedded systems; however, it will also serve as an invaluable reference to senior undergraduate and graduate students with an interest in embedded systems research.

Preface xv

Acknowledgment xxi

Part I OVERVIEW

1 Introduction 3

1.1 Embedded Systems Applications 6

1.1.1 Cyber-Physical Systems 6

1.1.2 Space 6

1.1.3 Medical 7

1.1.4 Automotive 8

1.2 Characteristics of Embedded Systems Applications 9

1.2.1 Throughput-Intensive 9

1.2.2 Thermal-Constrained 9

1.2.3 Reliability-Constrained 10

1.2.4 Real-Time 10

1.2.5 Parallel and Distributed 10

1.3 Embedded Systems—Hardware and Software 11

1.3.1 Embedded Systems Hardware 11

1.3.2 Embedded Systems Software 14

1.4 Modeling—An Integral Part of the Embedded Systems Design Flow 15

1.4.1 Modeling Objectives 16

1.4.2 Modeling Paradigms 18

1.4.3 Strategies for Integration of Modeling Paradigms 20

1.5 Optimization in Embedded Systems 21

1.5.1 Optimization of Embedded Systems Design Metrics 23

1.5.2 Multiobjective Optimization 26

1.6 Chapter Summary 27

2 Multicore-Based EWSNs—An Example of Parallel and Distributed Embedded Systems 29

2.1 Multicore Embedded Wireless Sensor Network Architecture 31

2.2 Multicore Embedded Sensor Node Architecture 33

2.2.1 Sensing Unit 34

2.2.2 Processing Unit 34

2.2.3 Storage Unit 34

2.2.4 Communication Unit 35

2.2.5 Power Unit 35

2.2.6 Actuator Unit 35

2.2.7 Location Finding Unit 36

2.3 Compute-Intensive Tasks Motivating the Emergence of MCEWSNs 36

2.3.1 Information Fusion 36

2.3.2 Encryption 38

2.3.3 Network Coding 38

2.3.4 Software-Defined Radio (SDR) 38

2.4 MCEWSN Application Domains 38

2.4.1 Wireless Video Sensor Networks (WVSNs) 39

2.4.2 Wireless Multimedia Sensor Networks (WMSNs) 39

2.4.3 Satellite-Based Wireless Sensor Networks (SBWSN) 40

2.4.4 Space Shuttle Sensor Networks (3SN) 41

2.4.5 Aerial–Terrestrial Hybrid Sensor Networks (ATHSNs) 42

2.4.6 Fault-Tolerant (FT) Sensor Networks 43

2.5 Multicore Embedded Sensor Nodes 43

2.5.1 InstraNode 43

2.5.2 Mars Rover Prototype Mote 43

2.5.3 Satellite-Based Sensor Node (SBSN) 44

2.5.4 Multi-CPU-Based Sensor Node Prototype 44

2.5.5 Smart Camera Mote 44

2.6 Research Challenges and Future Research Directions 45

2.7 Chapter Summary 47

Part II MODELING

3 An Application Metrics Estimation Model for Embedded Wireless Sensor Networks 51

3.1 Application Metrics Estimation Model 52

3.1.1 Lifetime Estimation 53

3.1.2 Throughput Estimation 56

3.1.3 Reliability Estimation 57

3.1.4 Models Validation 57

3.2 Experimental Results 58

3.2.1 Experimental Setup 58

3.2.2 Results 59

3.3 Chapter Summary 61

4 Modeling and Analysis of Fault Detection and Fault Tolerance in Embedded Wireless Sensor Networks 63

4.1 Related Work 67

4.1.1 Fault Detection 67

4.1.2 Fault Tolerance 68

4.1.3 WSN Reliability Modeling 69

4.2 Fault Diagnosis in WSNs 70

4.2.1 Sensor Faults 70

4.2.2 Taxonomy for Fault Diagnosis Techniques 72

4.3 Distributed Fault Detection Algorithms 74

4.3.1 Fault Detection Algorithm 1: The Chen Algorithm 74

4.3.2 Fault Detection Algorithm 2: The Ding Algorithm 76

4.4 Fault-Tolerant Markov Models 77

4.4.1 Fault-Tolerance Parameters 77

4.4.2 Fault-Tolerant Sensor Node Model 79

4.4.3 Fault-Tolerant WSN Cluster Model 81

4.4.4 Fault-Tolerant WSN Model 83

4.5 Simulation of Distributed Fault Detection Algorithms 85

4.5.1 Using ns−2 to Simulate Faulty Sensors 85

4.5.2 Experimental Setup for Simulated Data 86

4.5.3 Experiments Using Real-World Data 86

4.6 Numerical Results 91

4.6.1 Experimental Setup 91

4.6.2 Reliability and MTTF for an NFT and an FT Sensor Node 91

4.6.3 Reliability and MTTF for an NFT and an FT WSN Cluster 95

4.6.4 Reliability and MTTF for an NFT and an FT WSN 98

4.7 Research Challenges and Future Research Directions 101

4.7.1 Accurate Fault Detection 101

4.7.2 Benchmarks for Comparing Fault Detection Algorithms 101

4.7.3 Energy-Efficient Fault Detection and Tolerance 101

4.7.4 Machine-Learning-Inspired Fault Detection 102

4.7.5 FT in Multimedia Sensor Networks 102

4.7.6 Security 102

4.7.7 WSN Design and Tuning for Reliability 104

4.7.8 Novel WSN Architectures 104

4.8 Chapter Summary 105

5 A Queueing Theoretic Approach for Performance Evaluation of Low-Power Multicore-Based Parallel Embedded Systems 107

5.1 Related Work 110

5.2 Queueing Network Modeling of Multicore Embedded Architectures 112

5.2.1 Queueing Network Terminology 112

5.2.2 Modeling Approach 113

5.2.3 Assumptions 119

5.3 Queueing Network Model Validation 120

5.3.1 Theoretical Validation 120

5.3.2 Validation with a Multicore Simulator 120

5.3.3 Speedup 124

5.4 Queueing Theoretic Model Insights 125

5.4.1 Model Setup 125

5.4.2 The Effects of Cache Miss Rates on Performance 129

5.4.3 The Effects of Workloads on Performance 132

5.4.4 Performance per Watt and Performance per Unit Area Computations 135

5.5 Chapter Summary 139

Part III OPTIMIZATION

6 Optimization Approaches in Distributed Embedded Wireless Sensor Networks 143

6.1 Architecture-Level Optimizations 144

6.2 Sensor Node Component-Level Optimizations 146

6.2.1 Sensing Unit 146

6.2.2 Processing Unit 148

6.2.3 Transceiver Unit 148

6.2.4 Storage Unit 148

6.2.5 Actuator Unit 148

6.2.6 Location Finding Unit 149

6.2.7 Power Unit 149

6.3 Data Link-Level Medium Access Control Optimizations 149

6.3.1 Load Balancing and Throughput Optimizations 149

6.3.2 Power/Energy Optimizations 150

6.4 Network-Level Data Dissemination and Routing Protocol Optimizations 152

6.4.1 Query Dissemination Optimizations 152

6.4.2 Real-Time Constrained Optimizations 154

6.4.3 Network Topology Optimizations 154

6.4.4 Resource-Adaptive Optimizations 154

6.5 Operating System-Level Optimizations 155

6.5.1 Event-Driven Optimizations 155

6.5.2 Dynamic Power Management 155

6.5.3 Fault Tolerance 155

6.6 Dynamic Optimizations 156

6.6.1 Dynamic Voltage and Frequency Scaling 156

6.6.2 Software-Based Dynamic Optimizations 156

6.6.3 Dynamic Network Reprogramming 157

6.7 Chapter Summary 157

7 High-Performance Energy-Efficient Multicore-Based Parallel Embedded Computing 159

7.1 Characteristics of Embedded Systems Applications 163

7.1.1 Throughput-Intensive 163

7.1.2 Thermal-Constrained 165

7.1.3 Reliability-Constrained 165

7.1.4 Real-Time 165

7.1.5 Parallel and Distributed 165

7.2 Architectural Approaches 166

7.2.1 Core Layout 166

7.2.2 Memory Design 168

7.2.3 Interconnection Network 170

7.2.4 Reduction Techniques 172

7.3 Hardware-Assisted Middleware Approaches 173

7.3.1 Dynamic Voltage and Frequency Scaling 174

7.3.2 Advanced Configuration and Power Interface 174

7.3.3 Gating Techniques 175

7.3.4 Threading Techniques 176

7.3.5 Energy Monitoring and Management 177

7.3.6 Dynamic Thermal Management 178

7.3.7 Dependable Techniques 179

7.4 Software Approaches 180

7.4.1 Data Forwarding 180

7.4.2 Load Distribution 180

7.5 High-Performance Energy-Efficient Multicore Processors 182

7.5.1 ARM11 MPCore 183

7.5.2 ARM Cortex A-9 MPCore 184

7.5.3 MPC8572E PowerQUICC III 184

7.5.4 Tilera TILEPro64 and TILE-Gx 184

7.5.5 AMD Opteron Processor 185

7.5.6 Intel Xeon Processor 185

7.5.7 Intel Sandy Bridge Processor 185

7.5.8 Graphics Processing Units 186

7.6 Challenges and Future Research Directions 186

7.7 Chapter Summary 189

8 An MDP-Based Dynamic Optimization Methodology for Embedded Wireless Sensor Networks 191

8.1 Related Work 193

8.2 MDP-Based Tuning Overview 195

8.2.1 MDP-Based Tuning Methodology for Embedded Wireless Sensor Networks 195

8.2.2 MDP Overview with Respect to Embedded Wireless Sensor Networks 197

8.3 Application-Specific Embedded Sensor Node Tuning Formulation as an MDP 200

8.3.1 State Space 200

8.3.2 Decision Epochs and Actions 200

8.3.3 State Dynamics 201

8.3.4 Policy and Performance Criterion 201

8.3.5 Reward Function 202

8.3.6 Optimality Equation 204

8.3.7 Policy Iteration Algorithm 205

8.4 Implementation Guidelines and Complexity 205

8.4.1 Implementation Guidelines 205

8.4.2 Computational Complexity 206

8.4.3 Data Memory Analysis 207

8.5 Model Extensions 207

8.6 Numerical Results 210

8.6.1 Fixed Heuristic Policies for Performance Comparisons 210

8.6.2 MDP Specifications 210

8.6.3 Results for a Security/Defense System Application 213

8.6.4 Results for a Healthcare Application 216

8.6.5 Results for an Ambient Conditions Monitoring Application 220

8.6.6 Sensitivity Analysis 222

8.6.7 Number of Iterations for Convergence 223

8.7 Chapter Summary 223

9 Online Algorithms for Dynamic Optimization of Embedded Wireless Sensor Networks 225

9.1 Related Work 227

9.2 Dynamic Optimization Methodology 228

9.2.1 Methodology Overview 228

9.2.2 State Space 229

9.2.3 Objective Function 229

9.2.4 Online Optimization Algorithms 230

9.3 Experimental Results 233

9.3.1 Experimental Setup 233

9.3.2 Results 235

9.4 Chapter Summary 239

10 A Lightweight Dynamic Optimization Methodology for Embedded Wireless Sensor Networks 241

10.1 Related Work 243

10.2 Dynamic Optimization Methodology 244

10.2.1 Overview 244

10.2.2 State Space 246

10.2.3 Optimization Objection Function 246

10.3 Algorithms for Dynamic Optimization Methodology 248

10.3.1 Initial Tunable Parameter Value Settings and Exploration Order 248

10.3.2 Parameter Arrangement 249

10.3.3 Online Optimization Algorithm 251

10.3.4 Computational Complexity 252

10.4 Experimental Results 252

10.4.1 Experimental Setup 253

10.4.2 Results 255

10.5 Chapter Summary 266

11 Parallelized Benchmark-Driven Performance Evaluation of Symmetric Multiprocessors and Tiled Multicore Architectures for Parallel Embedded Systems 269

11.1 Related Work 271

11.2 Multicore Architectures and Benchmarks 272

11.2.1 Multicore Architectures 272

11.2.2 Benchmark Applications and Kernels 273

11.3 Parallel Computing Device Metrics 275

11.4 Results 277

11.4.1 Quantitative Comparison of SMPs and TMAs 277

11.4.2 Benchmark-Driven Results for SMPs 278

11.4.3 Benchmark-Driven Results for TMAs 280

11.4.4 Comparison of SMPs and TMAs 282

11.5 Chapter Summary 285

12 High-Performance Optimizations on Tiled Manycore Embedded Systems: A Matrix Multiplication Case Study 287

12.1 Related Work 290

12.1.1 Performance Analysis and Optimization 290

12.1.2 Parallelized MM Algorithms 290

12.1.3 Cache Blocking 291

12.1.4 Tiled Manycore Architectures 292

12.2 Tiled Manycore Architecture (TMA) Overview 293

12.2.1 Intel’s TeraFLOPS Research Chip 294

12.2.2 IBM’s Cyclops-64 (C64) 296

12.2.3 Tilera’s TILEPro64 297

12.2.4 Tilera’s TILE64 300

12.3 Parallel Computing Metrics and Matrix Multiplication (MM) Case Study 301

12.3.1 Parallel Computing Metrics for TMAs 301

12.3.2 Matrix Multiplication (MM) Case Study 302

12.4 Matrix Multiplication Algorithms’ Code Snippets for Tilera’s TILEPro64 303

12.4.1 Serial Non-blocked Matrix Multiplication Algorithm 303

12.4.2 Serial Blocked Matrix Multiplication Algorithm 304

12.4.3 Parallel Blocked Matrix Multiplication Algorithm 307

12.4.4 Parallel Blocked Cannon’s Algorithm for Matrix Multiplication 309

12.5 Performance Optimization on a Manycore Architecture 314

12.5.1 Performance Optimization on a Single Tile 314

12.5.2 Parallel Performance Optimizations 315

12.5.3 Compiler-Based Optimizations 319

12.6 Results 323

12.6.1 Data Allocation, Data Decomposition, Data Layout, and Communication 324

12.6.2 Performance Optimizations on a Single Tile 327

12.6.3 Parallel Performance Optimizations 332

12.7 Chapter Summary 339

13 Conclusions 343

References 349

Index 369