Forecast Verification: A Practitioner's Guide in Atmospheric Science, 2nd EditionISBN: 9780470660713
292 pages
December 2011

Reviews of first edition:
"This book will provide a good reference, and I recommend it especially for developers and evaluators of statistical forecast systems." (Bulletin of the American Meteorological Society; April 2004)
"...a good mixture of theory and practical applications...well organized and clearly written..." (Royal Statistical Society, Vol.168, No.1, January 2005)
NEW to the second edition:
 Completely updated chapter on the Verification of Spatial Forecasts taking account of the wealth of new research in the area
 New separate chapters on Probability Forecasts and Ensemble Forecasts
 Includes new chapter on Forecasts of Extreme Events and Warnings
 Includes new chapter on Seasonal and Climate Forecasts
 Includes new Appendix on Verification Software
Cover image credit: The triangle of barplots shows a novel use of colour for visualizing probability forecasts of ternary categories – see Fig 6b of Jupp et al. 2011, On the visualisation, verification and recalibration of ternary probabilistic forecasts, Phil. Trans. Roy. Soc. (in press).
Preface xiii
Preface to the First Edition xv
1 Introduction 1
Ian T. Jolliffe and David B. Stephenson
1.1 A brief history and current practice 1
1.1.1 History 1
1.1.2 Current practice 2
1.2 Reasons for forecast verification and its benefits 3
1.3 Types of forecast and verification data 4
1.4 Scores, skill and value 5
1.4.1 Skill scores 6
1.4.2 Artificial skill 6
1.4.3 Statistical significance 7
1.4.4 Value added 8
1.5 Data quality and other practical considerations 8
1.6 Summary 9
2 Basic concepts 11
Jacqueline M. Potts
2.1 Introduction 11
2.2 Types of predictand 11
2.3 Exploratory methods 12
2.4 Numerical descriptive measures 15
2.5 Probability, random variables and expectations 20
2.6 Joint, marginal and conditional distributions 20
2.7 Accuracy, association and skill 22
2.8 Properties of verification measures 22
2.9 Verification as a regression problem 23
2.10 The Murphy–Winkler framework 25
2.11 Dimensionality of the verification problem 28
3 Deterministic forecasts of binary events 31
Robin J. Hogan and Ian B. Mason
3.1 Introduction 31
3.2 Theoretical considerations 33
3.2.1 Some basic descriptive statistics 33
3.2.2 A general framework for verification: the distributionsoriented approach 34
3.2.3 Performance measures in terms of factorizations of the joint distribution 37
3.2.4 Diagrams for visualizing performance measures 38
3.2.5 Case study: verification of cloudfraction forecasts 41
3.3 Signal detection theory and the ROC 42
3.3.1 The signal detection model 43
3.3.2 The relative operating characteristic (ROC) 44
3.4 Metaverification: criteria for assessing performance measures 45
3.4.1 Desirable properties 45
3.4.2 Other properties 49
3.5 Performance measures 50
3.5.1 Overview of performance measures 51
3.5.2 Sampling uncertainty and confidence intervals for performance measures 55
3.5.3 Optimal threshold probabilities 57
Acknowledgements 59
4 Deterministic forecasts of multicategory events 61
Robert E. Livezey
4.1 Introduction 61
4.2 The contingency table: notation, definitions, and measures of accuracy 62
4.2.1 Notation and definitions 62
4.2.2 Measures of accuracy 64
4.3 Skill scores 64
4.3.1 Desirable attributes 65
4.3.2 Gandin and Murphy equitable scores 66
4.3.3 Gerrity equitable scores 69
4.3.4 LEPSCAT 71
4.3.5 SEEPS 72
4.3.6 Summary remarks on scores 73
4.4 Sampling variability of the contingency table and skill scores 73
5 Deterministic forecasts of continuous variables 77
Michel Deque
5.1 Introduction 77
5.2 Forecast examples 77
5.3 Firstorder moments 79
5.3.1 Bias 79
5.3.2 Mean Absolute Error 80
5.3.3 Bias correction and artificial skill 81
5.3.4 Mean absolute error and skill 81
5.4 Second and higherorder moments 82
5.4.1 Mean Squared Error 82
5.4.2 MSE skill score 82
5.4.3 MSE of scaled forecasts 83
5.4.4 Correlation 84
5.4.5 An example: testing the ‘limit of predictability’ 86
5.4.6 Rank correlations 87
5.4.7 Comparison of moments of the marginal distributions 88
5.4.8 Graphical summaries 90
5.5 Scores based on cumulative frequency 91
5.5.1 Linear Error in Probability Space (LEPS) 91
5.5.2 Quantilequantile plots 92
5.5.3 Conditional quantile plots 92
5.6 Summary and concluding remarks 94
6 Forecasts of spatial fields 95
Barbara G. Brown, Eric Gilleland and Elizabeth E.
Ebert
6.1 Introduction 95
6.2 Matching methods 96
6.3 Traditional verification methods 97
6.3.1 Standard continuous and categorical approaches 97
6.3.2 S1 and anomaly correlation 98
6.3.3 Distributional methods 99
6.4 Motivation for alternative approaches 100
6.5 Neighbourhood methods 103
6.5.1 Comparing neighbourhoods of forecasts and observations 104
6.5.2 Comparing spatial forecasts with point observations 104
6.6 Scale separation methods 105
6.7 Featurebased methods 108
6.7.1 Featurematching techniques 108
6.7.2 StructureAmplitudeLocation (SAL) technique 110
6.8 Field deformation methods 111
6.8.1 Location metrics 111
6.8.2 Field deformation 112
6.9 Comparison of approaches 113
6.10 New approaches and applications: the future 114
6.11 Summary 116
7 Probability forecasts 119
Jochen Broecker
7.1 Introduction 119
7.2 Probability theory 120
7.2.1 Basic concepts from probability theory 120
7.2.2 Probability forecasts, reliability and sufficiency 121
7.3 Probabilistic scoring rules 122
7.3.1 Definition and properties of scoring rules 122
7.3.2 Commonly used scoring rules 124
7.3.3 Decomposition of scoring rules 125
7.4 The relative operating characteristic (ROC) 126
7.5 Evaluation of probabilistic forecasting systems from data 128
7.5.1 Three examples 128
7.5.2 The empirical ROC 130
7.5.3 The empirical score as a measure of performance 130
7.5.4 Decomposition of the empirical score 131
7.5.5 Binning forecasts and the leaveoneout error 132
7.6 Testing reliability 134
7.6.1 Reliability analysis for forecast A: the reliability diagram 134
7.6.2 Reliability analysis for forecast B: the chisquared test 136
7.6.3 Reliability analysis for forecast C: the PIT 138
Acknowledgements 139
8 Ensemble forecasts 141
Andreas P. Weigel
8.1 Introduction 141
8.2 Example data 142
8.3 Ensembles interpreted as discrete samples 143
8.3.1 Reliability of ensemble forecasts 144
8.3.2 Multidimensional reliability 152
8.3.3 Discrimination 157
8.4 Ensembles interpreted as probabilistic forecasts 159
8.4.1 Probabilistic interpretation of ensembles 159
8.4.2 Probabilistic skill metrics applied to ensembles 160
8.4.3 Effect of ensemble size on skill 163
8.5 Summary 166
9 Economic value and skill 167
David S. Richardson
9.1 Introduction 167
9.2 The cost/loss ratio decision model 168
9.2.1 Value of a deterministic binary forecast system 169
9.2.2 Probability forecasts 172
9.2.3 Comparison of deterministic and probabilistic binary forecasts 174
9.3 The relationship between value and the ROC 175
9.4 Overall value and the Brier Skill Score 178
9.5 Skill, value and ensemble size 180
9.6 Applications: value and forecast users 182
9.7 Summary 183
10 Deterministic forecasts of extreme events and warnings
185
Christopher A.T. Ferro and David B. Stephenson
10.1 Introduction 185
10.2 Forecasts of extreme events 186
10.2.1 Challenges 186
10.2.2 Previous studies 187
10.2.3 Verification measures for extreme events 189
10.2.4 Modelling performance for extreme events 191
10.2.5 Extreme events: summary 194
10.3 Warnings 195
10.3.1 Background 195
10.3.2 Format of warnings and observations for verification 196
10.3.3 Verification of warnings 197
10.3.4 Warnings: summary 200
Acknowledgements 201
11 Seasonal and longerrange forecasts 203
Simon J. Mason
11.1 Introduction 203
11.2 Forecast formats 204
11.2.1 Deterministic and probabilistic formats 204
11.2.2 Defining the predictand 206
11.2.3 Inclusion of climatological forecasts 206
11.3 Measuring attributes of forecast quality 207
11.3.1 Skill 207
11.3.2 Other attributes 215
11.3.3 Statistical significance and uncertainty estimates 216
11.4 Measuring the quality of individual forecasts 217
11.5 Decadal and longerrange forecast verification 218
11.6 Summary 220
12 Epilogue: new directions in forecast verification
221
Ian T. Jolliffe and David B. Stephenson
12.1 Introduction 221
12.2 Review of key concepts 221
12.3 Forecast evaluation in other disciplines 223
12.3.1 Statistics 223
12.3.2 Finance and economics 225
12.3.3 Medical and clinical studies 226
12.4 Current research and future directions 228
Acknowledgements 230
Appendix: Verification Software 231
Matthew Pocernich
A.1 What is good software? 231
A.1.1 Correctness 232
A.1.2 Documentation 232
A.1.3 Open source/closed source/commercial 232
A.1.4 Large user base 232
A.2 Types of verification users 232
A.2.1 Students 233
A.2.2 Researchers 233
A.2.3 Operational forecasters 233
A.2.4 Institutional use 233
A.3 Types of software and programming languages 233
A.3.1 Spreadsheets 235
A.3.2 Statistical programming languages 235
A.4 Institutional supported software 238
A.4.1 Model Evaluation Tool (MET) 238
A.4.2 Ensemble Verification System (EVS) 239
A.4.3 EUMETCAL Forecast Verification Training Module 239
A.5 Displays of verification information 239
A.5.1 National Weather Service Performance Management 240
A.5.2 Forecast Evaluation Tool 240
Glossary 241
References 251
Index 267