Robust Correlation: Theory and ApplicationsISBN: 9781118493458
352 pages
September 2016

Description
This bookpresents material on both the analysis of the classical concepts of correlation and on the development of their robust versions, as well as discussing the related concepts of correlation matrices, partial correlation, canonical correlation, rank correlations, with the corresponding robust and nonrobust estimation procedures. Every chapter contains a set of examples with simulated and reallife data.
Key features:
 Makes modern and robust correlation methods readily available and understandable to practitioners, specialists, and consultants working in various fields.
 Focuses on implementation of methodology and application of robust correlation with R.
 Introduces the main approaches in robust statistics, such as Huber’s minimax approach and Hampel’s approach based on influence functions.
 Explores various robust estimates of the correlation coefficient including the minimax variance and bias estimates as well as the most B and Vrobust estimates.
 Contains applications of robust correlation methods to exploratory data analysis, multivariate statistics, statistics of time series, and to reallife data.
 Includes an accompanying website featuring computer code and datasets
 Features exercises and examples throughout the text using both small and large data sets.
Theoretical and applied statisticians, specialists in multivariate statistics, robust statistics, robust time series analysis, data analysis and signal processing will benefit from this book. Practitioners who use correlation based methods in their work as well as postgraduate students in statistics will also find this book useful.
Table of Contents
Preface xv
Acknowledgements xvii
About the Companion Website xix
1 Introduction 1
1.1 Historical Remarks 1
1.2 Ontological Remarks 4
1.2.1 Forms of data representation 5
1.2.2 Types of data statistics 5
1.2.3 Principal aims of statistical data analysis 6
1.2.4 Prior information about data distributions and related approaches to statistical data analysis 6
References 8
2 Classical Measures of Correlation 10
2.1 Preliminaries 10
2.2 Pearson’s Correlation Coefficient: Definitions and Interpretations 12
2.2.1 Introductory remarks 13
2.2.2 Correlation via regression 13
2.2.3 Correlation via the coefficient of determination 16
2.2.4 Correlation via the variances of the principal components 18
2.2.5 Correlation via the cosine of the angle between the variable vectors 21
2.2.6 Correlation via the ratio of two means 22
2.2.7 Pearson’s correlation coefficient between random events 23
2.3 Nonparametric Measures of Correlation 24
2.3.1 Introductory remarks 24
2.3.2 The quadrant correlation coefficient 26
2.3.3 The Spearman rank correlation coefficient 27
2.3.4 The Kendall 𝜏rank correlation coefficient 28
2.3.5 Concluding remark 29
2.4 Informational Measures of Correlation 29
2.5 Summary 31
References 31
3 Robust Estimation of Location 33
3.1 Preliminaries 33
3.2 Huber’s Minimax Approach 35
3.2.1 Introductory remarks 35
3.2.2 Minimax variance Mestimates of location 36
3.2.3 Minimax bias Mestimates of location 43
3.2.4 Lestimates of location 44
3.2.5 Restimates of location 45
3.2.6 The relations between M, L and Restimates of location 46
3.2.7 Concluding remarks 47
3.3 Hampel’s Approach Based on Influence Functions 47
3.3.1 Introductory remarks 47
3.3.2 Sensitivity curve 47
3.3.3 Influence function and its properties 49
3.3.4 Local measures of robustness 51
3.3.5 B and Vrobustness 52
3.3.6 Global measure of robustness: the breakdown point 52
3.3.7 Redescending Mestimates 53
3.3.8 Concluding remark 56
3.4 Robust Estimation of Location: A Sequel 56
3.4.1 Introductory remarks 56
3.4.2 Huber’s minimax variance approach in distribution density models of a nonneighborhood nature 57
3.4.3 Robust estimation of location in distribution models with a bounded variance 62
3.4.4 On the robustness of robust solutions: stability of least informative distributions 69
3.4.5 Concluding remark 73
3.5 Stable Estimation 73
3.5.1 Introductory remarks 73
3.5.2 Variance sensitivity 74
3.5.3 Estimation stability 76
3.5.4 Robustness of stable estimates 78
3.5.5 Maximin stable redescending Mestimates 83
3.5.6 Concluding remarks 84
3.6 Robustness Versus Gaussianity 85
3.6.1 Introductory remarks 85
3.6.2 Derivations of the Gaussian distribution 87
3.6.3 Properties of the Gaussian distribution 92
3.6.4 Huber’s minimax approach and Gaussianity 100
3.6.5 Concluding remarks 101
3.7 Summary 102
References 102
4 Robust Estimation of Scale 107
4.1 Preliminaries 107
4.1.1 Introductory remarks 107
4.1.2 Estimation of scale in data analysis 108
4.1.3 Measures of scale defined by functionals 110
4.2 M and LEstimates of Scale 111
4.2.1 Mestimates of scale 111
4.2.2 Lestimates of scale 115
4.3 Huber Minimax Variance Estimates of Scale 116
4.3.1 Introductory remarks 116
4.3.2 The least informative distribution 117
4.3.3 Minimax variance M and Lestimates of scale 118
4.4 Highly Efficient Robust Estimates of Scale 119
4.4.1 Introductory remarks 119
4.4.2 The median of absolute deviations and its properties 120
4.4.3 The quartile of pairwise absolute differences Qn estimate and its properties 121
4.4.4 Mestimate approximations to the Qn estimate: MQ𝛼n, FQ𝛼n , and FQn estimates of scale 122
4.5 Monte Carlo Experiment 130
4.5.1 A remark on the Monte Carlo experiment accuracy 131
4.5.2 Monte Carlo experiment: distribution models 131
4.5.3 Monte Carlo experiment: estimates of scale 132
4.5.4 Monte Carlo experiment: characteristics of performance 133
4.5.5 Monte Carlo experiment: results 134
4.5.6 Monte Carlo experiment: discussion 136
4.5.7 Concluding remarks 138
4.6 Summary 138
References 139
5 Robust Estimation of Correlation Coefficients 140
5.1 Preliminaries 140
5.2 Main Groups of Robust Estimates of the Correlation Coefficient 141
5.2.1 Introductory remarks 141
5.2.2 Direct robust counterparts of Pearson’s correlation coefficient 142
5.2.3 Robust correlation via nonparametric measures of correlation 143
5.2.4 Robust correlation via robust regression 143
5.2.5 Robust correlation via robust principal component variances 145
5.2.6 Robust correlation via twostage procedures 147
5.2.7 Concluding remarks 147
5.3 Asymptotic Properties of the Classical Estimates of the Correlation Coefficient 148
5.3.1 Pearson’s sample correlation coefficient 148
5.3.2 The maximum likelihood estimate of the correlation coefficient at the normal 149
5.4 Asymptotic Properties of Nonparametric Estimates of Correlation 151
5.4.1 Introductory remarks 151
5.4.2 The quadrant correlation coefficient 152
5.4.3 The Kendall rank correlation coefficient 152
5.4.4 The Spearman rank correlation coefficient 153
5.5 Bivariate Independent Component Distributions 155
5.5.1 Definition and properties 155
5.5.2 Independent component and Tukey grosserror distribution models 156
5.6 Robust Estimates of the Correlation Coefficient Based on Principal Component Variances 158
5.7 Robust Minimax Bias and Variance Estimates of the Correlation Coefficient 161
5.7.1 Introductory remarks 161
5.7.2 Minimax property 162
5.7.3 Concluding remarks 163
5.8 Robust Correlation via Highly Efficient Robust Estimates of Scale 163
5.8.1 Introductory remarks 163
5.8.2 Asymptotic bias and variance of generalized robust estimates of the correlation coefficient 164
5.8.3 Concluding remarks 165
5.9 Robust MEstimates of the Correlation Coefficient in Independent Component Distribution Models 165
5.9.1 Introductory remarks 165
5.9.2 The maximum likelihood estimate of the correlation coefficient in independent component distribution models 165
5.9.3 Mestimates of the correlation coefficient 166
5.9.4 Asymptotic variance of Mestimators 166
5.9.5 Minimax variance Mestimates of the correlation coefficient 167
5.9.6 Concluding remarks 168
5.10 Monte Carlo Performance Evaluation 168
5.10.1 Introductory remarks 168
5.10.2 Monte Carlo experiment setup 168
5.10.3 Discussion 171
5.10.4 Concluding remarks 173
5.11 Robust Stable Radical MEstimate of the Correlation Coefficient of the Bivariate Normal Distribution 173
5.11.1 Introductory remarks 173
5.11.2 Asymptotic characteristics of the stable radical estimate of the correlation coefficient 174
5.11.3 Concluding remarks 175
5.12 Summary 176
References 176
6 Classical Measures of Multivariate Correlation 178
6.1 Preliminaries 178
6.2 Covariance Matrix and Correlation Matrix 179
6.3 Sample Mean Vector and Sample Covariance Matrix 181
6.4 Families of Multivariate Distributions 182
6.4.1 Construction of multivariate locationscatter models 182
6.4.2 Multivariate symmetrical distributions 183
6.4.3 Multivariate normal distribution 184
6.4.4 Multivariate elliptical distributions 184
6.4.5 Independent component model 186
6.4.6 Copula models 186
6.5 Asymptotic Behavior of Sample Covariance Matrix and Sample Correlation Matrix 187
6.6 First Uses of Covariance and Correlation Matrices 189
6.7 Working with the Covariance Matrix–Principal Component Analysis 191
6.7.1 Principal variables 191
6.7.2 Interpretation of principal components 193
6.7.3 Asymptotic behavior of the eigenvectors and eigenvalues 194
6.8 Working with Correlations–Canonical Correlation Analysis 195
6.8.1 Canonical variates and canonical correlations 195
6.8.2 Testing for independence between subvectors 197
6.9 Conditionally Uncorrelated Components 199
6.10 Summary 200
References 200
7 Robust Estimation of Scatter and Correlation Matrices 202
7.1 Preliminaries 202
7.2 Multivariate Location and Scatter Functionals 202
7.3 Influence Functions and Asymptotics 205
7.4 Mfunctionals for Location and Scatter 208
7.5 Breakdown Point 210
7.6 Use of Robust Scatter Matrices 211
7.6.1 Ellipticity assumption 211
7.6.2 Robust correlation matrices 212
7.6.3 Principal component analysis 212
7.6.4 Canonical correlation analysis 213
7.7 Further Uses of Location and Scatter Functionals 213
7.8 Summary 215
References 215
8 Nonparametric Measures of Multivariate Correlation 217
8.1 Preliminaries 217
8.2 Univariate Signs and Ranks 218
8.3 Marginal Signs and Ranks 220
8.4 Spatial Signs and Ranks 222
8.5 Affine Equivariant Signs and Ranks 226
8.6 Summary 229
References 230
9 Applications to Exploratory Data Analysis: Detection of Outliers 231
9.1 Preliminaries 231
9.2 State of the Art 232
9.2.1 Univariate boxplots 232
9.2.2 Bivariate boxplots 234
9.3 Problem Setting 237
9.4 A New Measure of Outlier Detection Performance 239
9.4.1 Introductory remarks 240
9.4.2 Hmean: motivation, definition and properties 241
9.5 Robust Versions of the Tukey Boxplot with Their Application to Detection of Outliers 243
9.5.1 Data generation and performance measure 243
9.5.2 Scale and shift contamination 243
9.5.3 Reallife data results 244
9.5.4 Concluding remarks 245
9.6 Robust Bivariate Boxplots and Their Performance Evaluation 245
9.6.1 Bivariate FQboxplot 245
9.6.2 Bivariate FQboxplot performance 247
9.6.3 Measuring the elliptical deviation from the convex hull 249
9.7 Summary 253
References 253
10 Applications to Time Series Analysis: Robust Spectrum Estimation 255
10.1 Preliminaries 255
10.2 Classical Estimation of a Power Spectrum 256
10.2.1 Introductory remarks 256
10.2.2 Classical nonparametric estimation of a power spectrum 258
10.2.3 Parametric estimation of a power spectrum 259
10.3 Robust Estimation of a Power Spectrum 259
10.3.1 Introductory remarks 259
10.3.2 Robust analogs of the discrete Fourier transform 261
10.3.3 Robust nonparametric estimation 262
10.3.4 Robust estimation of power spectrum through the Yule–Walker equations 263
10.3.5 Robust estimation through robust filtering 263
10.4 Performance Evaluation 264
10.4.1 Robustness of the median Fourier transform power spectra 264
10.4.2 Additive outlier contamination model 264
10.4.3 Disorder contamination model 264
10.4.4 Concluding remarks 270
10.5 Summary 270
References 270
11 Applications to Signal Processing: Robust Detection 272
11.1 Preliminaries 272
11.1.1 Classical approach to detection 272
11.1.2 Robust minimax approach to hypothesis testing 273
11.1.3 Asymptotically optimal robust detection of a weak signal 274
11.2 Robust Minimax Detection Based on a Distance Rule 275
11.2.1 Introductory remarks 275
11.2.2 Asymptotic robust minimax detection of a known constant signal with the 𝜌distance rule 276
11.2.3 Detection performance in asymptotics and on finite samples 278
11.2.4 Concluding remarks 283
11.3 Robust Detection of a Weak Signal with Redescending MEstimates 285
11.3.1 Introductory remarks 285
11.3.2 Detection error sensitivity and stability 287
11.3.3 Performance evaluation: a comparative study 289
11.3.4 Concluding remarks 291
11.4 A Unified Neyman–Pearson Detection of Weak Signals in a Fusion Model with Fading Channels and NonGaussian Noises 296
11.4.1 Introductory remarks 296
11.4.2 Problem setting—an asymptotic fusion rule 298
11.4.3 Asymptotic performance analysis 299
11.4.4 Numerical results 303
11.4.5 Concluding remarks 305
11.5 Summary 306
References 306
12 Final Remarks 308
12.1 Points of Growth: Open Problems in Multivariate Statistics 308
12.2 Points of Growth: Open Problems in Applications 309
Index 311
Author Information
Georgy L. Shevlyakov, Department of Applied Mathematics, St. Petersburg State Polytechnic University, Russia
Hannu Oja, School of Health Sciences, University of Tampere, Finland