Skip to main content

Handbook of Statistical Data Editing and Imputation

Handbook of Statistical Data Editing and Imputation

Ton de Waal, Jeroen Pannekoek, Sander Scholtus

ISBN: 978-0-470-90483-1

Mar 2011

432 pages



A practical, one-stop reference on the theory and applications of statistical data editing and imputation techniques

Collected survey data are vulnerable to error. In particular, the data collection stage is a potential source of errors and missing values. As a result, the important role of statistical data editing, and the amount of resources involved, has motivated considerable research efforts to enhance the efficiency and effectiveness of this process. Handbook of Statistical Data Editing and Imputation equips readers with the essential statistical procedures for detecting and correcting inconsistencies and filling in missing values with estimates. The authors supply an easily accessible treatment of the existing methodology in this field, featuring an overview of common errors encountered in practice and techniques for resolving these issues.

The book begins with an overview of methods and strategies for statistical data editing and imputation. Subsequent chapters provide detailed treatment of the central theoretical methods and modern applications, with topics of coverage including:

  • Localization of errors in continuous data, with an outline of selective editing strategies, automatic editing for systematic and random errors, and other relevant state-of-the-art methods

  • Extensions of automatic editing to categorical data and integer data

  • The basic framework for imputation, with a breakdown of key methods and models and a comparison of imputation with the weighting approach to correct for missing values

  • More advanced imputation methods, including imputation under edit restraints

Throughout the book, the treatment of each topic is presented in a uniform fashion. Following an introduction, each chapter presents the key theories and formulas underlying the topic and then illustrates common applications. The discussion concludes with a summary of the main concepts and a real-world example that incorporates realistic data along with professional insight into common challenges and best practices.

Handbook of Statistical Data Editing and Imputation is an essential reference for survey researchers working in the fields of business, economics, government, and the social sciences who gather, analyze, and draw results from data. It is also a suitable supplement for courses on survey methods at the upper-undergraduate and graduate levels.


1 Introduction to statistical data editing and imputation.

1.1 Introduction.

1.2 Statistical data editing and imputation in the statistical process.

1.3 Data, errors, missing data and edits.

1.4 Basic methods for statistical data editing and imputation.

1.5 An edit and imputation strategy.

2 Methods for deductive correction.

2.1 Introduction.

2.2 Theory and applications.

2.3 Examples.

2.4 Summary.

3 Automatic editing of continuous data.

3.1 Introduction.

3.2 Automatic error localisation of random errors.

3.3 Aspects of the Fellegi-Holt paradigm.

3.4 Algorithms based on the Fellegi-Holt paradigm.

3.5 Summary.

4 Automatic editing: extensions to categorical data.

4.1 Introduction.

4.2 The error localisation problem for mixed data.

4.3 The Fellegi-Holt approach.

4.4 A branch-and-bound algorithm for automatic editing of mixed data.

4.5 The Nearest-neighbour Imputation Methodology.

5 Automatic editing: extensions to integer data.

5.1 Introduction.

5.2 An illustration of the error localisation problem for integer data.

5.3 Fourier-Motzkin elimination in integer data.

5.4 Error localisation in categorical, continuous and integer data.

5.5 A heuristic procedure.

5.6 Computational results.

5.7 Discussion.

6 Selective editing.

6.1 Introduction.

6.2 Historical notes.

6.3 Micro-selection: the score function approach.

6.4 Selection at macro-level.

6.5 Interactive editing.

6.6 Summary and conclusions.

7 Imputation.

7.1 Introduction.

7.2 General issues in applying imputation methods.

7.3 Regression imputation.

7.4 Ratio imputation.

7.5 (Group) mean imputation.

7.6 Hot deck donor imputation.

7.7 A general imputation model.

7.8 Imputation of longitudinal data.

7.9 Approaches to variance estimation with imputed data.

7.10 Fractional imputation.

8 Multivariate imputation.

8.1 Introduction.

8.2 Multivariate imputation models.

8.3 Maximum likelihood estimation in the presence of missing data.

8.4 Example: the public libraries.

9 Imputation under edit constraints.

9.1 Introduction.

9.2 Deductive imputation.

9.3 The ratio hot deck method.

9.4 Imputing from a Dirichlet distribution.

9.5 Imputing from a singular normal distribution.

9.6 An imputation approach based on Fourier-Motzkin elimination.

9.7 A sequential regression approach.

9.8 Calibrated imputation of numerical data under linear edit restrictions.

9.9 Calibrated hot deck imputation subject to edit restrictions.

10 Adjustment of imputed data.

10.1 Introduction.

10.2 Adjustment of numerical variables.

10.3 Adjustment of mixed continuous and categorical data.

11 Practical applications.

11.1 Introduction.

11.2 Automatic editing of environmental costs.

11.3 The EUREDIT project: an evaluation study.

11.4 Selective editing in the Dutch Agricultural Census.