Visual Statistics: Seeing Data with Dynamic Interactive GraphicsISBN: 9780471681601
400 pages
August 2006

Visual Statistics brings the most complex and advanced statistical methods within reach of those with little statistical training by using animated graphics of the data. Using ViSta: The Visual Statistics Systemdeveloped by Forrest Young and Pedro ValeroMora and available free of charge on the Internetstudents can easily create fully interactive visualizations from relevant mathematical statistics, promoting perceptual and cognitive understanding of the data's story. An emphasis is placed on a paradigm for understanding data that is visual, intuitive, geometric, and active, rather than one that relies on convoluted logic, heavy mathematics, systems of algebraic equations, or passive acceptance of results.
A companion Web site complements the book by further demonstrating the concept of creating interactive and dynamic graphics. The book provides users with the opportunity to view the graphics in a dynamic way by illustrating how to analyze statistical data and explore the concepts of visual statistics.
Visual Statistics addresses and features the following topics:
* Why use dynamic graphics?
* A history of statistical graphics
* Visual statistics and the graphical user interface
* Visual statistics and the scientific method
* Characterbased statistical interface objects
* Graphicsbased statistical interfaces
* Visualization for exploring univariate data
This is an excellent textbook for undergraduate courses in data analysis and regression, for students majoring or minoring in statistics, mathematics, science, engineering, and computer science, as well as for graduatelevel courses in mathematics. The book is also ideal as a reference/selfstudy guide for engineers, scientists, and mathematicians.
With contributions by highly regarded professionals in the field, Visual Statistics not only improves a student's understanding of statistics, but also builds confidence to overcome problems that may have previously been intimidating.
Part I Introduction
1 Introduction
1.1 Visual Statistics 6
1.2 Dynamic Interactive Graphies 7
1.2.1 An Analogy 7
1.2.2 Why Use Dynamic Graphies? 8
1.2.3 The Four Respects 8
1.3 Three Examples 9
1.3.1 Nonrandom Numbers 9
1.3.2 Automobile Efficiency 11
1.3.3 Fidelity and Marriage 14
1.4 History of Statistical Graphics 18
1.4.1 16001699: Measurement and Theory 18
1.4.2 17001799: New Graphic Forms and Data 19
1.4.3 18001899: Modern Graphics and the Golden Age 20
1.4.4 19001950: The Dark Ages of Statistical Graphics—The Golden Age of Mathematical Statistics 21
1.4.5 19501975: Rebirth of Statistical Graphics 22
1.4.6 19752000: Statistical Graphics Comes of Age 23
1.5 About Software 24
1.5.1 XLispStat 25
1.5.2 Commercial Systems 26
1.5.3 Noncommercial Systems 26
1.5.4 ViSta 27
1.6 About Data 29
1.6.1 Essential Characteristics 30
1.6.2 Datatypes 32
1.6.3 Datatype Examples 34
1.7 About This Book 34
1.7.1 What This Book Is—and Isn't 34
1.7.2 Organization 34
1.7.3 Who Our Audience Is—and Isn't 37
1.7.4 Comics 38
1.7.5 ThumbPowered Dynamic Graphics 39
1.8 Visual Statistics and the Graphical User Interface 40
1.9 Visual Statistics and the Scientific Method 40
1.9.1 A Paradigm for Seeing Data 41
1.9.2 About Statistical Data Analysis: Visual or Otherwise 42
2 Examples 45
2.1 Random Numbers 47
2.2 Medical Diagnosis 52
2.3 Fidelity and Marriage 59
Part II See Data—The Process
3 Interfaces and Environments 73
3.1 Objects 77
3.2 User Interfaces for Seeing Data 78
3.3 CharacterBased Statistical Interface Objects 79
3.3.1 Command Line 79
3.3.2 Calculator 80
3.3.3 Program Editor 80
3.3.4 Report Generator 81
3.4 GraphicsBased Statistical Interfaces 81
3.4.1 Datasheets 81
3.4.2 Variable Window 82
3.4.3 Desktop 82
3.4.4 Workmap 83
3.4.5 Selector 87
3.5 Plots 88
3.5.1 Look of Plots 89
3.5.2 Feel of Plots 91
3.5.3 Impact of Plot Look and Feel 93
3.6 Spreadplots 94
3.6.1 Layout 96
3.6.2 Coordination 98
3.6.3 SpreadPlots 100
3.6.4 Look of Spreadplots 102
3.6.5 Feel of Spreadplots 104
3.6.6 Look and Feel of Statistical Data Analysis 104
3.7 Environments for Seeing Data 111
3.8 Sessions and Projects 114
3.9 The Next Reality 114
3.9.1 The Fantasy 114
3.9.2 The Reality 116
3.9.3 Reality Check 118
4 Tools and Techniques 119
4.1 Types of Controls 123
4.1.1 Buttons 123
4.1.2 Palettes 125
4.1.3 Menus and Menu Items 125
4.1.4 Dialog Boxes 125
4.1.5 Sliders 126
4.1.6 Control Panels 127
4.1.7 The Plot Itself 127
4.1.8 Hyperlinking 127
4.2 Datasheets 128
4.3 Plots 129
4.3.1 Activating Plot Objects 131
4.3.2 Manipulating Plot Objects 132
4.3.3 Manipulating Plot Dimensions 138
4.3.4 Adding Graphical Elements 141
Part III Seeing Data—Objects
5 Seeing Frequency Data 145
5.1 Data 148
5.1.1 Automobile Efficiency: 148
5.1.2 Berkeley Admissions Data 148
5.1.3 Tables of Frequency data 150
5.1.4 Working at the Categories Level 151
5.1.5 Working at the Variables Level 153
5.2 Frequency Plots 157
5.2.1 Mosaic Displays 157
5.2.2 Dynamic Mosaic Displays 159
5.3 Visual Fitting of LogLinear Models 164
5.3.1 LogLinear Spreadplot 165
5.3.2 Specifying LogLinear Models and the Model Builder Window 166
5.3.3 Evaluating the Global Fit of Models and Their History 170
5.3.4 Visualizing Fitted and Residual Values with Mosaic Displays 174
5.3.5 Interpreting the Parameters of the Model 176
5.4 Conclusions 179
6 Seeing Univariate Data 181
6.1 Introduction 183
6.2 Data: Automobile Efficiency 185
6.2.1 Looking at the Numbers 186
6.2.2 What Can Unidimensional Methods Reveal? 186
6.3 Univariate Plots 190
6.3.1 Dotplots 190
6.3.2 Boxplots 193
6.3.3 Cumulative Distribution Plots 196
6.3.4 Histograms and Frequency Polygons 199
6.3.5 Ordered Series Plots 208
6.3.6 Namelists 209
6.4 Visualization for Exploring Univariate Data 209
6.5 What Do We See in MPG1 212
7 Seeing Bivariate Data 215
7.1 Introduction 217
7.1.1 Plots About Relationships 217
7.1.2 Chapter Preview 220
7.2 Data: Automobile Efficiency 221
7.2.1 What the Data Seem to Say 222
7.3 Bivariate Plots 224
7.3.1 Scatterplots 224
7.3.2 Distribution Comparison Plots 233
7.3.3 ParallelCoordinates Plots and Parallel Boxplots 236
7.4 Multiple Bivariate Plots 236
7.4.1 Scatterplot Plot Matrix 237
7.4.2 Quantile Plot Matrix 238
7.4.3 Numerical Plotmatrix 238
7.4.4 BoxPlot Plot Matrix 239
7.5 Bivariate Visualization Methods 241
7.6 Visual Exploration 242
7.6.1 Two Bivariate Data Visualizations 243
7.6.2 Using These Visualizations 245
7.7 Visual Transformation: BoxCox 247
7.7.1 The Transformation Visualization 249
7.7.2 Using Transformation Visualization 251
7.7.3 The BoxCox Power Transformation 255
7.8 Visual Fitting: Simple Regression 256
7.9 Conclusions 260
8 Seeing Multivariate Data 263
8.1 Data: Medical Diagnosis 266
8.2 Three Families of Multivariate Plots 270
8.3 ParallelAxes Plots 272
8.3.1 ParallelCoordinates Plot 272
8.3.2 ParallelComparisons Plot 276
8.3.3 Parallel Univariate Plots 277
8.4 OrthogonalAxes Plots 279
8.4.1 Spinplot 280
8.4.2 Orbitplot 283
8.4.3 BiPlot 286
8.4.4 WiggleWorm (Multivariable Comparison) Plot 291
8.5 PairedAxes Plots 292
8.5.1 Spinplot Plot Matrix 293
8.5.2 ParallelCoordinates Plot Matrix 294
8.6 Multivariate Visualization 295
8.6.1 Variable Visualization 295
8.6.2 Principal Components Analysis 296
8.6.3 Fit Visualization 298
8.6.4 Principal Components Visualization 300
8.6.5 One More Step  Discriminant Analysis 302
8.7 Summary 304
8.7.1 What Did We See? Clusters! 304
8.7.2 How Did We See It? 304
8.7.3 How Do We Interpret It? With Diagnostic Groups! 305
8.8 Conclusion 306
9 Seeing Missing Values 309
9.1 Introduction 312
9.2 Data: Sleep in Mammals 314
9.3 Missing Data Visualization Tools 315
9.3.1 Missing Values Bar Charts 316
9.3.2 Histograms and Bar Charts 316
9.3.3 Boxplots 316
9.3.4 Scatterplots 316
9.4 Visualizing Imputed Values 317
9.4.1 Marking the Imputed Values 318
9.4.2 Single Imputation 320
9.4.3 Multiple Imputation 325
9.4.4 Summary of Imputation 327
9.5 Missing Data Patterns 327
9.5.1 Patterns and Number of Cases 328
9.5.2 The Mechanisms Leading to Missing Data 329
9.5.3 Visualizing Dynamically the Patterns of Missing Data 331
9.6 Conclusions 337
References 339
Author Index 351
Subject Index 355
PEDRO M. VALEROMORA, PhD, is Professor of Data Processing at the University of Valencia in Spain. He is the author of several research papers. He received his PhD in methodology in the behavioral sciences from the University of Valencia in Spain.
MICHAEL FRIENDLY, PhD, is Professor in the Department of Psychology at York University in Toronto, Ontario, Canada. He received his PhD in psychometrics and cognitive psychology from Princeton University. He is the author of two books and numerous research papers.
"…the book admirably guides readers through the process of exploring and analyzing a variety of types of data." (The American Statistician, August 2007)
"A technically wellproduced book, significant in that it documents efforts these psychometrics faculty made in using visual approaches to analyze data." (CHOICE, March 2007)
"Forrest Young has been a leading light in developing dynamic interactive graphics for decades…so [I] am delighted to see this book appear." (Short Book Reviews, December 2006)
"...the book not only improves a student's understanding of statistics, but also builds confidence to overcome problems that may have previously been intimidating...an excellent textbook" (Computing Reviews.com, November 16, 2006)
"...covers its subject matter skillfully, and provides great insight into the world of the dynamic visualization of statistical data." (Journal of Applied Statistics, 2007)
"The people who would benefit most from this book are those who teach courses in statistics." (AstA Advances in Statistical Analysis, 2008)