Quantitative Methods in Population Health: Extensions of Ordinary Regression
August 2003, ©2003
Explanations are designed to assume as little background in mathematics and statistical theory as possible, except that some knowledge of calculus is necessary for certain parts.
SAS commands are provided for applying the methods. (PROC REG, PROC MIXED, and PROC GENMOD)
All sections contain real life examples, mostly from epidemiologic research
First chapter includes a SAS refresher
I.1 Newborn Lung Project.
I.2 Wisconsin Diabetes Registry.
I.3 Wisconsin Sleep Cohort Study.
1 Review of Ordinary Linear Regression and Its Assumptions.
1.1 The Ordinary Linear Regression Equation and Its Assumptions.
1.1.1 Straight-Line Relationship.
1.1.2 Equal Variance Assumption.
1.1.3 Normality Assumption.
1.1.4 Independence Assumption.
1.2 A Note on How the Least-Squares Estimators are Obtained.
Output Packet I: Examples of Ordinary Regression Analyses.
2 The Maximum Likelihood Approach to Ordinary Regression.
2.1 Maximum Likelihood Estimation.
2.3 Properties of Maximum Likelihood Estimators.
2.4 How to Obtain a Residual Plot with PROC MIXED.
Output Packet II: Using PROC MIXED and Comparisons to PROC RE G.
3 Reformulating Ordinary Regression Analysis in Matrix Notation.
3.1 Writing the Ordinary Regression Equation in Matrix Notation.
3.2 Obtaining the Least-Squares Estimator β in Matrix Notation.
3.2.1 Example: Matrices in Regression Analysis.
3.3 List of Matrix Operations to Know.
4 Variance Matrices and Linear Transformations.
4.1 Variance and Correlation Matrices.
4.2 How to Obtain the Variance of a Linear Transformation.
4.2.1 Two Variables.
4.2.2 Many Variables.
5 Variance Matrices of Estimators of Regression Coefficients.
5.1 Usual Standard Error of Least-Squares Estimator of Regression Slope in Nonmatrix Formulation.
5.2 Standard Errors of Least-Squares Regression Estimators in Matrix Notation.
5.3 The Large Sample Variance Matrix of Maximum Likelihood Estimators.
5.4 Tests and Confidence Intervals.
5.4.1 Example-Comparing PROC REG and PROC MIXED.
6 Dealing with Unequal Variance Around the Regression Line.
6.1 Ordinary Least Squares with Unequal Variance.
6.2 Analysis Taking Unequal Variance into Account.
6.2.1 The Functional Transformation Approach.
6.2.2 The Linear Transformation Approach.
6.2.3 Standard Errors of Weighted Regression Estimators.
Output Packet III: Applying the Empirical Option to Adjust Standard Errors.
Output Packet IV: Analyses with Transformation of the Outcome Variable to Equalize Residual Variance.
Output Packet V: Weighted Regression Analyses of GHb Data on Age.
7 Application of Weighting with Probability Sampling and Nonresponse.
7.1 Sample Surveys with Unequal Probability Sampling.
7.2 Examining the Impact of Nonresponse.
7.2.1 Example (of Reweighting as Well as Some SAS Manipulations).
7.2.2 A Few Comments on Weighting by a Variable Versus Including it in the Regression Model.
Output Packet VI: Survey and Missing Data Weights.
8 Principles in Dealing with Correlated Data.
8.1 Analysis of Correlated Data by Ordinary Unweighted Least-Squares Estimation.
8.1.2 Deriving the Variance Estimator.
8.2 Specifying Correlation and Variance Matrices.
8.3 The Least-Squares Equation Incorporating Correlation.
8.3.1 Another Application of the Spectral Theorem.
8.4 Applying the Spectral Theorem to the Regression Analysis of Correlated Data.
8.5 Analysis of Correlated Data by Maximum Likelihood.
8.5.1 Non equal Variance.
8.5.2 Correlated Errors.
Output Packet VII: Analysis of Longitudinal Data in Wisconsin Sleep Cohort.
9 A Further Study of How the Transformation Works with Correlated Data.
9.1 Why Would ?W and ?B Differ?
9.2 How the Between- and Within-Individual Estimators are Combined.
9.3 How to Proceed in Practice.
Output Packet VIII: Investigating and Fitting Within- and Between-Individual Effects.
10 Random Effects.
10.1 Random Intercept.
10.2 Random Slopes.
10.3 Obtaining “The Best” Estimates of Individual Intercepts and Slopes.
Output Packet IX: Fitting Random Effects Models.
11 The Normal Distribution and Likelihood Revisited.
11.1 PROC GENMOD.
Output Packet X: Introducing PROC GENMOD.
12 The Generalization to Non-normal Distributions.
12.1 The Exponential Family.
12.1.1 The Binomial Distribution.
12.1.2 The Poisson Distribution.
12.2 Score Equations for the Exponential Family and the Canonical Link.
12.3 Other Link Functions.
13 Modeling Binomial and Binary Outcomes.
13.1 A Brief Review of Logistic Regression.
13.1.1 Example: Review of the Output from PROC LOGIST.
13.2 Analysis of Binomial Data in the Generalized Linear Models Framework.
13.2.1 Example of Logistic Regression with Binary Outcome.
13.2.2 Example with Binomial Outcome.
13.2.3 Some More Examples of Goodness-of-Fit Tests.
13.3 Other Links for Binary and Binomial Data.
Output Packet XI: Logistic Regression Analysis with PROC LOGIST and PROC GENMOD.
Output Packet XII: Analysis of Grouped Binomial Data.
Output Packet XIII: Some Goodness-of-Fit Tests for Binomial Outcome.
Output Packet XIV: Three Link Functions for Binary Outcome.
Output Packet XV: Poisson Regression.
Output Packet XVI: Dealing with Overdispersion in Rates.
14 Modeling Poisson Outcomes—The Analysis of Rates.
14.1 Review of Rates.
14.1.1 Relationship Between Rate and Risk.
14.2 Regression Analysis.
14.3 Example with Cancer Mortality Rates.
14.3.1 Example with Hospitalization of Infants.
14.4.1 Fitting a Dispersion Parameter.
14.4.2 Fitting a Different Distribution.
14.4.3 Using Robust Standard Errors.
14.4.4 Applying Adjustments for Over Dispersion to the Examples.
Output Packet XV: Poisson Regression.
15 Modeling Correlated Outcomes with Generalized Estimating Equations.
15.1 A Brief Review and Reformulation of the Normal Distribution, Least Squares and Likelihood.
15.2 Further Developments for the Exponential Family.
15.3 How are the Generalized Estimating Equations Justified?
15.3.1 Analysis of Longitudinal Systolic Blood Pressure by PROC MIXED and GENMOD.
15.3.2 Analysis of Longitudinal Hypertension Data by PROC GENMOD.
15.3.3 Analysis of Hospitalizations Among VLBW Children Up to Age 5.
15.4 Another Way to Deal with Correlated Binary Data.
Output Packet XVII: Mixed Versus GENMOD for Longitudinal SBP and Hypertension Data.
Output Packet XVIII: Longitudinal Analysis of Rates.
Output Packet XIX: Conditional Logistic Regression of Hypertension Data.
Appendix: Matrix Operations.
A.1 Adding Matrices.
A.2 Multiplying Matrices by a Number.
A.3 Multiplying Matrices by Each Other.
A.4 The Inverse of a Matrix.
"The book is well written…a timely book that appears to cover a gap in existing literature." (Journal of the American Statistical Association, June 2005)
“…provides an accessible guide for students in an applied statistics sequence as well as for practising researchers and professionals...” (Zentralblatt Math, Vol.1038, No.13, 2004)
"It is highly recommended for academic and research libraries supporting programs of demography, public health, and other interdisciplinary programs related to population health.” (E-STREAMS, August 2004)
“...assembles the information...investigators need most often in the course of several long-term population-based observational studies.” (Quarterly of Applied Mathematics, Vol. LXII, No. 1, March 2004)
"...this book...provides the most pages of illustrations relative to pages of text of any book that I can recall...a fantastic book for practitioners..." (Technometrics, Vol. 46, No. 1, February 2004)