DescriptionThere is more statistical data produced in today’s modern society than ever before. This data is analysed and cross-referenced for innumerable reasons. However, many data sets have no shared element and are harder to combine and therefore obtain any meaningful inference from. Statistical matching allows just that; it is the art of combining information from different sources (particularly sample surveys) that contain no common unit. In response to modern influxes of data, it is an area of rapidly growing interest and complexity. Statistical Matching: Theory and Practice introduces the basics of statistical matching, before going on to offer a detailed, up-to-date overview of the methods used and an examination of their practical applications.
- Presents a unified framework for both theoretical and practical aspects of statistical matching.
- Provides a detailed description covering all the steps needed to perform statistical matching.
- Contains a critical overview of the available statistical matching methods.
- Discusses all the major issues in detail, such as the Conditional Independence Assumption and the assessment of uncertainty.
- Includes numerous examples and applications, enabling the reader to apply the methods in their own work.
- Features an appendix detailing algorithms written in the R language.
Statistical Matching: Theory and Practice presents a comprehensive exploration of an increasingly important area. Ideal for researchers in national statistics institutes and applied statisticians, it will also prove to be an invaluable text for scientists and researchers from all disciplines engaged in the multivariate analysis of data collected from different sources.
1 The Statistical Matching Problem.
1.2 The Statistical Framework.
1.3 The Missing Data Mechanism in the Statistical Matching Problem.
1.4 Accuracy of a Statistical Matching Procedure.
1.4.1 Model assumptions.
1.4.2 Accuracy of the estimator.
1.4.3 Representativeness of the synthetic file.
1.4.4 Accuracy of estimators applied on the synthetic data set.
1.5 Outline of the Book.
2 The Conditional Independence Assumption.
2.1 The Macro Approach in a Parametric Setting.
2.1.1 Univariate normal distributions case.
2.1.2 The multinormal case.
2.1.3 The multinomial case.
2.2 The Micro (Predictive) Approach in the Parametric Framework.
2.2.1 Conditional mean matching.
2.2.2 Draws based on conditional predictive distributions.
2.2.3 Representativeness of the predicted files.
2.3 Nonparametric Macro Methods.
2.4 The Nonparametric Micro Approach.
2.4.1 Random hot deck.
2.4.2 Rank hot deck.
2.4.3 Distance hot deck.
2.4.4 The matching noise.
2.5 Mixed Methods.
2.5.1 Continuous variables.
2.5.2 Categorical variables.
2.6 Comparison of Some Statistical Matching Procedures under the CIA.
2.7 The Bayesian Approach.
2.8 Other IdentifiableModels.
2.8.1 The pairwise independence assumption.
2.8.2 Finite mixture models.
3 Auxiliary Information.
3.1 Different Kinds of Auxiliary Information.
3.2 Parametric Macro Methods.
3.2.1 The use of a complete third file.
3.2.2 The use of an incomplete third file.
3.2.3 The use of information on inestimable parameters.
3.2.4 The multinormal case.
3.2.5 Comparison of different regression parameter estimators through simulation.
3.2.6 The multinomial case.
3.3 Parametric Predictive Approaches.
3.4 Nonparametric Macro Methods.
3.5 The Nonparametric Micro Approach with Auxiliary Information.
3.6 Mixed Methods.
3.6.1 Continuous variables.
3.6.2 Comparison between some mixed methods.
3.6.3 Categorical variables.
3.7 Categorical Constrained Techniques.
3.7.1 Auxiliary micro information and categorical constraints.
3.7.2 Auxiliary information in the form of categorical constraints.
3.8 The Bayesian Approach.
4 Uncertainty in Statistical Matching.
4.2 A Formal Definition of Uncertainty.
4.3 Measures of Uncertainty.
4.3.1 Uncertainty in the normal case.
4.3.2 Uncertainty in the multinomial case.
4.4 Estimation of Uncertainty.
4.4.1 Maximum likelihood estimation of uncertainty in the multinormal case.
4.4.2 Maximum likelihood estimation of uncertainty in the multinomial case.
4.5 Reduction of Uncertainty: Use of Parameter Constraints.
4.5.1 The multinomial case.
4.6 Further Aspects of Maximum Likelihood Estimation of Uncertainty.
4.7 An Example with Real Data.
4.8 Other Approaches to the Assessment of Uncertainty.
4.8.1 The consistent approach.
4.8.2 The multiple imputation approach.
4.8.3 The de Finetti coherence approach.
5 Statistical Matching and Finite Populations.
5.1 Matching Two Archives.
5.1.1 Definition of the CIA.
5.2 Statistical Matching and Sampling from a Finite Population.
5.3 Parametric Methods under the CIA.
5.3.1 The macro approach when the CIA holds.
5.3.2 The predictive approach.
5.4 Parametric Methods when Auxiliary Information is Available.
5.4.1 The macro approach.
5.4.2 The predictive approach.
5.5 File Concatenation.
5.6 Nonparametric Methods.
6 Issues in Preparing for Statistical Matching.
6.1 Reconciliation of Concepts and Definitions of Two Sources.
6.1.1 Reconciliation of biased sources.
6.1.2 Reconciliation of inconsistent definitions.
6.2 How to Choose the Matching Variables.
7.2 Case Study: The Social Accounting Matrix.
7.2.1 Harmonization step.
7.2.2 Modelling the social accounting matrix.
7.2.3 Choosing the matching variables.
7.2.4 The SAM under the CIA.
7.2.5 The SAM and auxiliary information.
7.2.6 Assessment of uncertainty for the SAM.
A Statistical Methods for Partially Observed Data.
A.1 Maximum Likelihood Estimation with Missing Data.
A.1.1 Missing data mechanisms.
A.1.2 Maximum likelihood and ignorable nonresponse.
A.2 Bayesian Inference withMissing Data.
B Loglinear Models.
B.1 Maximum Likelihood Estimation of the Parameters.
C Distance Functions.
D Finite Population Sampling.
E R Code.
E.1 The R Environment.
E.2 R Code for Nonparametric Methods.
E.3 R Code for Parametric and Mixed Methods.
E.4 R Code for the Study of Uncertainty.
E.5 Other R Functions.
""My compliments to the authors for making these (seemingly) arcane ideas available to a whole new generation of statisticians and economists."" (Journal of the American Statistical Association, September 2007)