Statistical Rules of Thumb, 2nd Edition
"For a beginner [this book] is a treasure trove; for an experienced person it can provide new ideas on how better to pursue the subject of applied statistics."
—Journal of Quality Technology
Sensibly organized for quick reference, Statistical Rules of Thumb, Second Edition compiles simple rules that are widely applicable, robust, and elegant, and each captures key statistical concepts. This unique guide to the use of statistics for designing, conducting, and analyzing research studies illustrates real-world statistical applications through examples from fields such as public health and environmental studies. Along with an insightful discussion of the reasoning behind every technique, this easy-to-use handbook also conveys the various possibilities statisticians must think of when designing and conducting a study or analyzing its data.
Each chapter presents clearly defined rules related to inference, covariation, experimental design, consultation, and data representation, and each rule is organized and discussed under five succinct headings: introduction; statement and illustration of the rule; the derivation of the rule; a concluding discussion; and exploration of the concept's extensions. The author also introduces new rules of thumb for topics such as sample size for ratio analysis, absolute and relative risk, ANCOVA cautions, and dichotomization of continuous variables. Additional features of the Second Edition include:
- Additional rules on Bayesian topics
New chapters on observational studies and Evidence-Based Medicine (EBM)
Additional emphasis on variation and causation
Updated material with new references, examples, and sources
A related Web site provides a rich learning environment and contains additional rules, presentations by the author, and a message board where readers can share their own strategies and discoveries. Statistical Rules of Thumb, Second Edition is an ideal supplementary book for courses in experimental design and survey research methods at the upper-undergraduate and graduate levels. It also serves as an indispensable reference for statisticians, researchers, consultants, and scientists who would like to develop an understanding of the statistical foundations of their research efforts. A related website www.vanbelle.org provides additional rules, author presentations and more.
Preface to the First Edition.
1. The Basics.
1.1 Four Basic Questions.
1.2 Observation is Selection.
1.3 Replicate to Characterize Variability.
1.4 Variability Occurs at Multiple Levels.
1.5 Invalid Selection is the Primary Threat to Valid Inference.
1.6 There is Variation in Strength of Inference.
1.7 Distinguish Randomized and Observational Studies.
1.8 Beware of Linear Models.
1.9 Keep Models As Simple As Possible, But Not More Simple.
1.10 Understand Omnibus Quantities.
1.11 Do Not Multiply Probabilities More Than Necessary.
1.12 Use Two-sided p-Values.
1.13 p-Values for Sample Size, Confidence Intervals for Results.
1.14 At Least Twelve Observations for a Confidence Interval.
1.15 Estimate ± Two Standard Errors is Remarkably Robust.
1.16 Know the Unit of the Variable.
1.17 Be Flexible About Scale of Measurement Determining Analysis.
1.18 Be Eclectic and Ecumenical in Influence.
2. Sample Size.
2.1 Begin with a Basic Formula for Sample Size-Lehr’s Equation.
2.2 Calculating Sample Size Using the Coefficient of Variation.
2.3 No Finite Population Correction for Survey Sample Size.
2.4 Standard Deviation and Sample Range.
2.5 Do Not Formulate a Study Solely in Terms of Effect Size.
2.6 Overlapping Confidence Intervals Do Not Imply Nonsignificance.
2.7 Sample Size Calculation for the Poisson Distribution.
2.8 Sample Size for Poisson with Background Rate.
2.9 Sample Size Calculation for the Binomial Distribution.
2.10 When Unequal Sample Sizes Matters; When They Don’t.
2.11 Sample Size With Different Costs for the Two Samples.
2.12 The Rule of Threes for 95% Upper Bounds When There Are No Events.
2.13 Sample Size Calculations Are Determined by the Analysis.
3. Observational Studies.
3.1 The Model for an Observational Study is the Sample Survey.
3.2 Large Sample Size Does Not Guarantee Validity.
3.3 Good Observational Studies Are Designed.
3.4 To Establish Cause Effect Requires Longitudinal Data.
3.5 Make Theories Elaborate to Establish Cause and Effect.
3.6 The Hill Guidelines Are a Useful Guide to Show Cause Effect.
3.7 Sensitivity Analyses Assess Models Uncertainty and Missing Data.
4.1 Assessing and Describing Covariation.
4.2 Don’t Summarize Regression Sampling Schemes.
4.3 Do Not Correlate Rates or Ratios Indiscriminately.
4.4 Determining Sample Size to Estimate a Correlation.
4.5 Pairing Data is not Always Good.
4.6 Go Beyond Correlation in Drawing Conclusions.
4.7 Agreement As Accuracy, Scale Differential, and Precision.
4.8 Assess Test Reliability by Means of Agreement.
4.9 Range of the Predictor Variable and Regression.
4.10 Measuring Change: Width More Important than Numbers.
5. Environmental Studies.
5.1 Begin with the Lognormal Distributions in Environmental Studies.
5.2 Differences Are More Symmetrical.
5.3 Know the Sample Space for Statements of Risk.
5.4 Beware of Pseudoreplication.
5.5 Think Beyond Simple Random Sampling.
5.6 The Size of the Population and Small Effects.
5.7 Models of Small Effects Are Sensitive to Assumptions.
5.8 Distinguish Between Variability and Uncertainty.
5.9 Description of the Database is As Important as Its Data.
5.10 Always Assess the Statistical Basis for an Environmental Standard.
5.11 Measurement of a Standard and Policy.
5.12 Parametric Analyses Make Maximum Use of the Data.
5.13 Confidence, Prediction, and Tolerance Intervals.
5.14 Statistics and Risk Assessment.
5.15 Exposure Assessment is the Weak Link in Assessing Health Effects of Pollutants.
5.16 Assess the Errors in Calibration Due to Inverse Regression.
6.1 Start with the Poisson to Model Incidence or Prevalence.
6.2 The Odds Ratio Approximates the Relative Risk Assuming the Disease is Rare.
6.3 The Number of Events is Crucial in Estimating Sample Size.
6.4 Use a Logarithmic Formulation to Calculate Sample Size.
6.5 Take No More than Four or Five Controls per Case.
6.6 Obtain at Least Ten Subjects for Every Variable Investigated.
6.7 Begin with Two Exponential Distribution to Model Time to Event.
6.8 Begin with Two Exponentials for Comparing Survival Times.
6.9 Be Wary of Surrogates.
6.10 Prevalence Dominates in Screening Rare Diseases.
6.11 Do Not Dichotomize Unless Absolutely Necessary.
6.12 Additive and Multiplicative Models.
7. Evidence-Based Medicine.
7.1 Strength of Evidence.
7.2 Relevance of Information: POEM vs. DOE.
7.3 Begin with Absolute Risk Reduction, then follow with Relative Risk.
7.4 The Number Needed to Treat (NNT) is Clinically Useful.
7.5 Variability in Response to Treatment Needs to be Considered.
7.6 Safety is the Weak Component of EBM.
7.7 Intent to Treat is the Default Analysis.
7.8 Use Prior Information but not Priors.
7.9 The Four Key Questions for Meta-analysis.
8. Design, Conduct, and Analysis.
8.1 Randomization Puts Systematic Effects into the Error Term.
8.2 Blocking is the Key to Reducing Variability.
8.3 Factorial Designs and Joint Effects.
8.4 High-Order Interactions Occur Rarely.
8.5 Balanced Designs Allow Easy Assessment of Joint Effects.
8.6 Analysis Follows Designs.
8.7 Independence, Equal Variance, and Normality.
8.8 Plan to Graph the Results of an Analysis.
8.9 Distinguish Between Design Structure and Treatment Structure.
8.10 Make Hierarchical Analyses the Default Analysis.
8.11 Distinguish Between Nested and Crossed Designs-Not Always Easy.
8.12 Plan for Missing Data.
8.13 Address Multiple Comparisons Before Starting the Study.
8.14 Know Properties Preserved When Transforming Units.
8.15 Consider Bootstrapping for Complex Relationships.
9. Words, Tables, and Graphs.
9.1 Use Text for a Few Numbers, Tables for Many Numbers, Graphs and Complex Relationships.
9.2 Arrange Information in a Table to Drive Home the Message.
9.3 Always Graph the Data.
9.4 Always Graph Results of An Analysis of Variance.
9.5 Never Use a Pie Chart.
9.6 Bar Graphs Waste Ink; They Don’t Illuminate Complex Relationships.
9.7 Stacked Bar Graphs Are Worse Than Bar Graphs.
9.8 Three-Dimensional Bar Graphs Constitute Misdirected Artistry.
9.9 Identify Cross-sectional and Longitudinal Patterns in Longitudinal Data.
9.10 Use Rendering, Manipulation, and Linking in High-Dimensional Data.
10.1 Session Has Beginning, Middle, and End.
10.2 Ask Questions.
10.3 Make Distinctions.
10.4 Know Yourself, Know the Investigator.
10.5 Tailor Advice to the Level of the Investigator.
10.6 Use Units the Investigator is Comfortable With.
10.7 Agee on Assignment of Responsibilities.
10.8 Any Basic Statistical Computing Package Will Do.
10.9 Ethics Precedes, Guides, and Follows Consultation.
10.10 Be Proactive in Statistical Consulting.
10.11 Use the Web for Reference, Resource, and Education.
10.12 Listen to, and Heed the Advice of Experts in the Field.
New rules have been incorporated into the new edition addressing topics such as absolute and relative risk; ANCOVA cautions; dichotomization issues; and observational studies and clinical trials.
A complete section on Bayesian topics has been added to the second edition.
The references have been completely updated and expanded.
Each rule of thumb has a brief presentation, a simple statement of the rule, illustrations, theoretical underpinnings, and extensions.
A related website (www.vanbelle.org) provides additional rules, author presentations and more.
Additional emphasis has been placed on variation and causation throughout the book.
"For the applied researcher who does much of her or his own data analysis, this book is a must-have. Even the applied statistician would benefit from owning a copy of this collection. It is certain that some 'rules' will be new, and the descriptions in the text can come in quite handy when one i trying to explain a concept to a non-statistician. In short, this collection of 'rules' is highly recommended." (MAA Reviews, December 10, 2008)
"For the applied researcher who does much of her or his own data analysis, this book is a must-have. Even the applied statistician would benefit from owning a copy of this collection. It is certain that some 'rules' will be new, and the descriptions in the text can come in quite handy when one is trying to explain a concept to a non-statistician. In short, this collection of 'rules' is highly recommended." (MAA Reviews, Dec 2008)“The first edition was masterful, the second is beyond wonderful. First-edition topics have been updated; new chapters on observational studies and evidence-based medicine broaden and deepen impact. A must read for all who produce or read quantitative studies.”
–Thomas A. Louis, PhD, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
"The first edition was masterful, the second is beyond wonderful. First-edition topics have been updated; new chapters on observational studies and evidence-based medicine broaden and deepen impact. A must read for all who produce or read quantitative studies."
—Thomas A. Louis, PhD, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
“The first edition was masterful, the second is beyond wonderful. First-edition topics have been updated; new chapters on observational studies and evidence-based medicine broaden and deepen impact. A must read for all who produce or read quantitative studies.” (Thomas A. Louis, PhD, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health)