Wiley.com
Print this page Share

Multi-Agent Machine Learning: A Reinforcement Approach

ISBN: 978-1-118-36208-2
256 pages
August 2014
Multi-Agent Machine Learning: A Reinforcement Approach (111836208X) cover image

Description

The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-player grid games—two player grid games, Q-learning, and Nash Q-learning. Chapter 5 discusses differential games, including multi player differential games, actor critique structure, adaptive fuzzy control and fuzzy interference systems, the evader pursuit game, and the defending a territory games. Chapter 6 discusses new ideas on learning within robotic swarms and the innovative idea of the evolution of personality traits.

• Framework for understanding a variety of methods and approaches in multi-agent machine learning.

• Discusses methods of reinforcement learning such as a number of forms of multi-agent Q-learning

• Applicable to research professors and graduate students studying electrical and computer engineering, computer science, and mechanical and aerospace engineering

See More

Table of Contents

Preface ix

Chapter 1 A Brief Review of Supervised Learning 1

1.1 Least Squares Estimates 1

1.2 Recursive Least Squares 5

1.3 Least Mean Squares 6

1.4 Stochastic Approximation 10

References 11

Chapter 2 Single-Agent Reinforcement Learning 12

2.1 Introduction 12

2.2 n-Armed Bandit Problem 13

2.3 The Learning Structure 15

2.4 The Value Function 17

2.5 The Optimal Value Functions 18

2.5.1 The Grid World Example 20

2.6 Markov Decision Processes 23

2.7 Learning Value Functions 25

2.8 Policy Iteration 26

2.9 Temporal Difference Learning 28

2.10 TD Learning of the State-Action Function 30

2.11 Q-Learning 32

2.12 Eligibility Traces 33

References 37

Chapter 3 Learning in Two-Player Matrix Games 38

3.1 Matrix Games 38

3.2 Nash Equilibria in Two-Player Matrix Games 42

3.3 Linear Programming in Two-Player Zero-Sum Matrix Games 43

3.4 The Learning Algorithms 47

3.5 Gradient Ascent Algorithm 47

3.6 WoLF-IGA Algorithm 51

3.7 Policy Hill Climbing (PHC) 52

3.8 WoLF-PHC Algorithm 54

3.9 Decentralized Learning in Matrix Games 57

3.10 Learning Automata 59

3.11 Linear Reward–Inaction Algorithm 59

3.12 Linear Reward–Penalty Algorithm 60

3.13 The Lagging Anchor Algorithm 60

3.14 LR−I Lagging Anchor Algorithm 62

3.14.1 Simulation 68

References 70

Chapter 4 Learning in Multiplayer Stochastic Games 73

4.1 Introduction 73

4.2 Multiplayer Stochastic Games 75

4.3 Minimax-Q Algorithm 79

4.3.1 2 ×2 Grid Game 80

4.4 Nash Q-Learning 87

4.4.1 The Learning Process 95

4.5 The Simplex Algorithm 96

4.6 The Lemke–Howson Algorithm 100

4.7 Nash-Q Implementation 107

4.8 Friend-or-Foe Q-Learning 111

4.9 Infinite Gradient Ascent 112

4.10 Policy Hill Climbing 114

4.11 WoLF-PHC Algorithm 114

4.12 Guarding a Territory Problem in a Grid World 117

4.12.1 Simulation and Results 119

4.13 Extension of LR−I Lagging Anchor Algorithm to Stochastic Games 125

4.14 The Exponential Moving-Average Q-Learning (EMA Q-Learning) Algorithm 128

4.15 Simulation and Results Comparing EMA Q-Learning to Other Methods 131

4.15.1 Matrix Games 131

4.15.2 Stochastic Games 134

References 141

Chapter 5 Differential Games 144

5.1 Introduction 144

5.2 A Brief Tutorial on Fuzzy Systems 146

5.2.1 Fuzzy Sets and Fuzzy Rules 146

5.2.2 Fuzzy Inference Engine 148

5.2.3 Fuzzifier and Defuzzifier 151

5.2.4 Fuzzy Systems and Examples 152

5.3 Fuzzy Q-Learning 155

5.4 Fuzzy Actor–Critic Learning 159

5.5 Homicidal Chauffeur Differential Game 162

5.6 Fuzzy Controller Structure 165

5.7 Q()-Learning Fuzzy Inference System 166

5.8 Simulation Results for the Homicidal Chauffeur 171

5.9 Learning in the Evader–Pursuer Game with Two Cars 174

5.10 Simulation of the Game of Two Cars 177

5.11 Differential Game of Guarding a Territory 180

5.12 Reward Shaping in the Differential Game of Guarding a Territory 184

5.13 Simulation Results 185

5.13.1 One Defender Versus One Invader 185

5.13.2 Two Defenders Versus One Invader 191

References 197

Chapter 6 Swarm Intelligence and the Evolution of Personality Traits 200

6.1 Introduction 200

6.2 The Evolution of Swarm Intelligence 200

6.3 Representation of the Environment 201

6.4 Swarm-Based Robotics in Terms of Personalities 203

6.5 Evolution of Personality Traits 206

6.6 Simulation Framework 207

6.7 A Zero-Sum Game Example 208

6.7.1 Convergence 208

6.7.2 Simulation Results 214

6.8 Implementation for Next Sections 216

6.9 Robots Leaving a Room 218

6.10 Tracking a Target 221

6.11 Conclusion 232

References 233

Index 237

See More

Author Information

Howard M. Schwartz, PhD, received his B.Eng. Degree from McGill University, Montreal, Canada in une 1981 and his MS Degree and PhD Degree from MIT, Cambridge, USA in 1982 and 1987 respectively. He is currently a professor in systems and computer engineering at Carleton University, Canada. His research interests include adaptive and intelligent control systems, robotic, artificial intelligence, system modelling, system identification, and state estimation.
See More

Reviews

“This is an interesting book both as research reference as well as teaching material for Master and PhD students.”  (Zentralblatt MATH, 1 April 2015)

 

.

See More
Back to Top