Skip to main content

Applied Spatial Statistics for Public Health Data

Applied Spatial Statistics for Public Health Data

Lance A. Waller, Carol A. Gotway

ISBN: 978-0-471-66268-6

Aug 2004

520 pages


An application-based introduction to the statistical analysis of spatially referenced health data

Sparked by the growing interest in statistical methods for the analysis of spatially referenced data in the field of public health, Applied Spatial Statistics for Public Health Data fills the need for an introductory, application-oriented text on this timely subject. Written for practicing public health researchers as well as graduate students in related fields, the text provides a thorough introduction to basic concepts and methods in applied spatial statistics as well as a detailed treatment of some of the more recent methods in spatial statistics useful for public health studies that have not been previously covered elsewhere.

Assuming minimal knowledge of spatial statistics, the authors provide important statistical approaches for assessing such questions as:

  • Are newly occurring cases of a disease "clustered" in space?
  • Do the cases cluster around suspected sources of increased risk, such as toxic waste sites or other environmental hazards?
  • How do we take monitored pollution concentrations measured at specific locations and interpolate them to locations where no measurements were taken?
  • How do we quantify associations between local disease rates and local exposures?
  • After reviewing traditional statistical methods used in public health research, the text provides an overview of the basic features of spatial data, illustrates various geographic mapping and visualization tools, and describes the sources of publicly available spatial data that might be useful in public health applications.

Related Resources


Request an Evaluation Copy for this title



1 Introduction.

1.1 Why Spatial Data in Public Health?

1.2 Why Statistical Methods for Spatial Data?

1.3 Intersection of Three Fields of Study.

1.4 Organization of the Book.

2 Analyzing Public Health Data.

2.1 Observational vs. Experimental Data.

2.2 Risk and Rates.

2.2.1 Incidence and Prevalence.

2.2.2 Risk.

2.2.3 Estimating Risk: Rates and Proportions.

2.2.4 Relative and Attributable Risks.

2.3 Making Rates Comparable: Standardized Rates.

2.3.1 Direct Standardization.

2.3.2 Indirect Standardization.

2.3.3 Direct or Indirect?

2.3.4 Standardizing to What Standard?

2.3.5 Cautions with Standardized Rates.

2.4 Basic Epidemiological Study Designs.

2.4.1 Prospective Cohort Studies.

2.4.2 Retrospective Case–Control Studies.

2.4.3 Other Types of Epidemiological Studies.

2.5 Basic Analytic Tool: The Odds Ratio.

2.6 Modeling Counts and Rates.

2.6.1 Generalized Linear Models.

2.6.2 Logistic Regression.

2.6.3 Poisson Regression.

2.7 Challenges in the Analysis of Observational Data.

2.7.1 Bias.

2.7.2 Confounding.

2.7.3 Effect Modification.

2.7.4 Ecological Inference and the Ecological Fallacy.

2.8 Additional Topics and Further Reading.

2.9 Exercises.

3 Spatial Data.

3.1 Components of Spatial Data.

3.2 An Odyssey into Geodesy.

3.2.1 Measuring Location: Geographical Coordinates.

3.2.2 Flattening the Globe: Map Projections and Coordinate Systems.

3.2.3 Mathematics of Location: Vector and Polygon Geometry.

3.3 Sources of Spatial Data.

3.3.1 Health Data.

3.3.2 Census-Related Data.

3.3.3 Geocoding.

3.3.4 Digital Cartographic Data.

3.3.5 Environmental and Natural Resource Data.

3.3.6 Remotely Sensed Data.

3.3.7 Digitizing.

3.3.8 Collect Your Own!

3.4 Geographic Information Systems.

3.4.1 Vector and Raster GISs.

3.4.2 Basic GIS Operations.

3.4.3 Spatial Analysis within GIS.

3.5 Problems with Spatial Data and GIS.

3.5.1 Inaccurate and Incomplete Databases.

3.5.2 Confidentiality.

3.5.3 Use of ZIP Codes.

3.5.4 Geocoding Issues.

3.5.5 Location Uncertainty.

4 Visualizing Spatial Data.

4.1 Cartography: The Art and Science of Mapmaking.

4.2 Types of Statistical Maps.

MAP STUDY: Very Low Birth Weights in Georgia Health Care District 9.

4.2.1 Maps for Point Features.

4.2.2 Maps for Areal Features.

4.3 Symbolization.

4.3.1 Map Generalization.

4.3.2 Visual Variables.

4.3.3 Color.

4.4 Mapping Smoothed Rates and Probabilities.

4.4.1 Locally Weighted Averages.

4.4.2 Nonparametric Regression.

4.4.3 Empirical Bayes Smoothing.

4.4.4 Probability Mapping.

4.4.5 Practical Notes and Recommendations.

CASE STUDY: Smoothing New York Leukemia Data.

4.5 Modifiable Areal Unit Problem.

4.6 Additional Topics and Further Reading.

4.6.1 Visualization.

4.6.2 Additional Types of Maps.

4.6.3 Exploratory Spatial Data Analysis.

4.6.4 Other Smoothing Approaches.

4.6.5 Edge Effects.

4.7 Exercises.

5 Analysis of Spatial Point Patterns.

5.1 Types of Patterns.

5.2 Spatial Point Processes.

5.2.1 Stationarity and Isotropy.

5.2.2 Spatial Poisson Processes and CSR.

5.2.3 Hypothesis Tests of CSR via Monte Carlo Methods.

5.2.4 Heterogeneous Poisson Processes.

5.2.5 Estimating Intensity Functions.

DATA BREAK: Early Medieval Grave Sites.

5.3 K Function.

5.3.1 Estimating the K Function.

5.3.2 Diagnostic Plots Based on the K Function.

5.3.3 Monte Carlo Assessments of CSR Based on the K Function.

DATA BREAK: Early Medieval Grave Sites.

5.3.4 Roles of First- and Second-Order Properties.

5.4 Other Spatial Point Processes.

5.4.1 Poisson Cluster Processes.

5.4.2 Contagion/Inhibition Processes.

5.4.3 Cox Processes.

5.4.4 Distinguishing Processes.

5.5 Additional Topics and Further Reading.

5.6 Exercises.

6 Spatial Clusters of Health Events: Point Data for Cases and Controls.

6.1 What Do We Have? Data Types and Related Issues.

6.2 What Do We Want? Null and Alternative Hypotheses.

6.3 Categorization of Methods.

6.4 Comparing Point Process Summaries.

6.4.1 Goals.

6.4.2 Assumptions and Typical Output.

6.4.3 Method: Ratio of Kernel Intensity Estimates.

DATA BREAK: Early Medieval Grave Sites.

6.4.4 Method: Difference between K Functions.

DATA BREAK: Early Medieval Grave Sites.

6.5 Scanning Local Rates.

6.5.1 Goals.

6.5.2 Assumptions and Typical Output.

6.5.3 Method: Geographical Analysis Machine.

6.5.4 Method: Overlapping Local Case Proportions.

DATA BREAK: Early Medieval Grave Sites.

6.5.5 Method: Spatial Scan Statistics.

DATA BREAK: Early Medieval Grave Sites.

6.6 Nearest-Neighbor Statistics.

6.6.1 Goals.

6.6.2 Assumptions and Typical Output.

6.6.3 Method: q Nearest Neighbors of Cases.

CASE STUDY: San Diego Asthma.

6.7 Further Reading.

6.8 Exercises.

7 Spatial Clustering of Health Events: Regional Count Data.

7.1 What Do We Have and What Do We Want?

7.1.1 Data Structure.

7.1.2 Null Hypotheses.

7.1.3 Alternative Hypotheses.

7.2 Categorization of Methods.

7.3 Scanning Local Rates.

7.3.1 Goals.

7.3.2 Assumptions.

7.3.3 Method: Overlapping Local Rates.

DATA BREAK: New York Leukemia Data.

7.3.4 Method: Turnbull et al.’s CEPP.

7.3.5 Method: Besag and Newell Approach.

7.3.6 Method: Spatial Scan Statistics.

7.4 Global Indexes of Spatial Autocorrelation.

7.4.1 Goals.

7.4.2 Assumptions and Typical Output.

7.4.3 Method: Moran’s I .

7.4.4 Method: Geary’s c.

7.5 Local Indicators of Spatial Association.

7.5.1 Goals.

7.5.2 Assumptions and Typical Output.

7.5.3 Method: Local Moran’s I.

7.6 Goodness-of-Fit Statistics.

7.6.1 Goals.

7.6.2 Assumptions and Typical Output.

7.6.3 Method: Pearson’s χ2.

7.6.4 Method: Tango’s Index.

7.6.5 Method: Focused Score Tests of Trend.

7.7 Statistical Power and Related Considerations.

7.7.1 Power Depends on the Alternative Hypothesis.

7.7.2 Power Depends on the Data Structure.

7.7.3 Theoretical Assessment of Power.

7.7.4 Monte Carlo Assessment of Power.

7.7.5 Benchmark Data and Conditional Power Assessments.

7.8 Additional Topics and Further Reading.

7.8.1 Related Research Regarding Indexes of Spatial Association.

7.8.2 Additional Approaches for Detecting Clusters and/or Clustering.

7.8.3 Space–Time Clustering and Disease Surveillance.

7.9 Exercises.

8 Spatial Exposure Data.

8.1 Random Fields and Stationarity.

8.2 Semivariograms.

8.2.1 Relationship to Covariance Function and Correlogram.

8.2.2 Parametric Isotropic Semivariogram Models.

8.2.3 Estimating the Semivariogram.

DATA BREAK: Smoky Mountain pH Data.

8.2.4 Fitting Semivariogram Models.

8.2.5 Anisotropic Semivariogram Modeling.

8.3 Interpolation and Spatial Prediction.

8.3.1 Inverse-Distance Interpolation.

8.3.2 Kriging.

CASE STUDY: Hazardous Waste Site Remediation.

8.4 Additional Topics and Further Reading.

8.4.1 Erratic Experimental Semivariograms.

8.4.2 Sampling Distribution of the Classical Semivariogram Estimator.

8.4.3 Nonparametric Semivariogram Models.

8.4.4 Kriging Non-Gaussian Data.

8.4.5 Geostatistical Simulation.

8.4.6 Use of Non-Euclidean Distances in Geostatistics.

8.4.7 Spatial Sampling and Network Design.

8.5 Exercises.

9 Linking Spatial Exposure Data to Health Events.

9.1 Linear Regression Models for Independent Data.

9.1.1 Estimation and Inference.

9.1.2 Interpretation and Use with Spatial Data.

DATA BREAK: Raccoon Rabies in Connecticut.

9.2 Linear Regression Models for Spatially Autocorrelated Data.

9.2.1 Estimation and Inference.

9.2.2 Interpretation and Use with Spatial Data.

9.2.3 Predicting New Observations: Universal Kriging.

DATA BREAK: New York Leukemia Data.

9.3 Spatial Autoregressive Models.

9.3.1 Simultaneous Autoregressive Models.

9.3.2 Conditional Autoregressive Models.

9.3.3 Concluding Remarks on Conditional Autoregressions.

9.3.4 Concluding Remarks on Spatial Autoregressions.

9.4 Generalized Linear Models.

9.4.1 Fixed Effects and the Marginal Specification.

9.4.2 Mixed Models and Conditional Specification.

9.4.3 Estimation in Spatial GLMs and GLMMs.

DATA BREAK: Modeling Lip Cancer Morbidity in Scotland.

9.4.4 Additional Considerations in Spatial GLMs.

CASE STUDY: Very Low Birth Weights in Georgia Health Care District 9.

9.5 Bayesian Models for Disease Mapping.

9.5.1 Hierarchical Structure.

9.5.2 Estimation and Inference.

9.5.3 Interpretation and Use with Spatial Data.

9.6 Parting Thoughts.

9.7 Additional Topics and Further Reading.

9.7.1 General References.

9.7.2 Restricted Maximum Likelihood Estimation.

9.7.3 Residual Analysis with Spatially Correlated Error Terms.

9.7.4 Two-Parameter Autoregressive Models.

9.7.5 Non-Gaussian Spatial Autoregressive Models.

9.7.6 Classical/Bayesian GLMMs.

9.7.7 Prediction with GLMs.

9.7.8 Bayesian Hierarchical Models for Spatial Data.

9.8 Exercises.


Author Index.

Subject Index.

"…a fine textbook for a course on spatial statistics…easy to follow and agreeable to read…an excellent introduction and overview…" (Statistics in Medical Research, August 2006)

"...will be a successful addition to existing literature and foster the application of spatial statistical methods to topics in epidemiology and public health." (Biometrics, December 2005)

"…an interesting and worthwhile read for all practitioners of spatial statistics." (Computers & Geosciences, July 2005)

"…I am pleased to add it to my collection and feel sure that it will be widely read and appreciated." (Journal of the American Statistical Association, June 2005)