# Understanding Biostatistics

ISBN: 978-0-470-66636-4
388 pages
April 2011

## Description

Understanding Biostatistics looks at the fundamentals of biostatistics, using elementary statistics to explore the nature of statistical tests.

This book is intended to complement first-year statistics and biostatistics textbooks. The main focus here is on ideas, rather than on methodological details. Basic concepts are illustrated with representations from history, followed by technical discussions on what different statistical methods really mean. Graphics are used extensively throughout the book in order to introduce mathematical formulae in an accessible way.

Key features:

• Discusses confidence intervals and p-values in terms of confidence functions.
• Explains basic statistical methodology represented in terms of graphics rather than mathematical formulae, whilst highlighting the mathematical basis of biostatistics.
• Looks at problems of estimating parameters in statistical models and looks at the similarities between different models.
• Provides an extensive discussion on the position of statistics within the medical scientific process.
• Discusses distribution functions, including the Guassian distribution and its importance in biostatistics.

This book will be useful for biostatisticians with little mathematical background as well as those who want to understand the connections in biostatistics and mathematical issues.

See More

Preface ix

1 Statistics and medical science 1

1.1 Introduction 1

1.2 On the nature of science 3

1.3 How the scientific method uses statistics 5

1.4 Finding an outcome variable to assess your hypothesis 7

1.5 How we draw medical conclusions from statistical results 8

1.6 A few words about probabilities 13

1.7 The need for honesty: the multiplicity issue 16

1.8 Prespecification and p-value history 19

1.9 Adaptive designs: controlling the risks in an experiment 21

1.10 The elusive concept of probability 23

References 27

2 Observational studies and the need for clinical trials 29

2.1 Introduction 29

2.2 Investigations of medical interventions and risk factors 29

2.3 Observational studies and confounders 33

2.4 The experimental study 39

2.5 Population risks and individual risks 42

2.6 Confounders, Simpson’s paradox and stratification 44

2.7 On incidence and prevalence in epidemiology 51

References 54

3 Study design and the bias issue 57

3.1 Introduction 57

3.2 What bias is all about 58

3.3 The need for a representative sample: on selection bias 58

3.4 Group comparability and randomization 61

3.5 Information bias in a cohort study 65

3.6 The study, or placebo, effect 68

3.7 The curse of missing values 70

3.8 Approaches to data analysis: avoiding self-inflicted bias 75

3.9 On meta-analysis and publication bias 79

References 82

4 The anatomy of a statistical test 85

4.1 Introduction 85

4.2 Statistical tests, medical diagnosis and Roman law 85

4.3 The risks with medical diagnosis 87

4.3.1 Medical diagnosis based on a single test 87

4.3.2 Bayes’ theorem and the use and misuse of screening tests 89

4.4 The law: a non-quantitative analogue 91

4.5 Risks in statistical testing 93

4.5.1 Does tonsillectomy increase the risk of Hodgkin’s lymphoma? 93

4.5.2 General discussion about statistical tests 98

4.6 Making statements about a binomial parameter 101

4.6.1 The frequentist approach 101

4.6.2 The Bayesian approach 104

4.7 The bell-shaped error distribution 109

References 113

4.A Appendix: The evolution of the central limit theorem 115

5 Learning about parameters, and some notes on planning 119

5.1 Introduction 119

5.2 Test statistics described by parameters 120

5.3 How we describe our knowledge about a parameter from an experiment 122

5.4 Statistical analysis of two proportions 127

5.4.1 Some ways to compare two proportions 127

5.4.2 Analysis of the group difference 130

5.5 Adjusting for confounders in the analysis 133

5.6 The power curve of an experiment 138

5.7 Some confusing aspects of power calculations 143

References 145

5.A Appendix: Some technical comments 146

5.A.1 The non-central hypergeometric distribution and 2 × 2 tables 146

5.A.2 The gamma and χ2 distributions 147

6 Empirical distribution functions 149

6.1 Introduction 149

6.2 How to describe the distribution of a sample 149

6.3 Describing the sample: descriptive statistics 153

6.4 Population distribution parameters 156

6.5 Confidence in the CDF and its parameters 158

6.6 Analysis of paired data 162

6.7 Bootstrapping 163

6.8 Meta-analysis and heterogeneity 166

References 170

6.A Appendix: Some technical comments 171

6.A.1 The extended family of the univariate Gaussian distributions 171

6.A.2 The Wiener process and its bridge 173

6.A.3 Confidence regions for the CDF and the Kolmogorov–Smirnov test 174

7 Correlation and regression in bivariate distributions 177

7.1 Introduction 177

7.2 Bivariate distributions and correlation 178

7.3 On baseline corrections and other covariates 183

7.4 Bivariate Gaussian distributions 186

7.5 Regression to the mean 189

7.6 Statistical analysis of bivariate Gaussian data 195

7.7 Simultaneous analysis of two binomial proportions 199

References 203

7.A Appendix: Some technical comments 205

7.A.1 The regression to the mode equation 205

7.A.2 Analysis of data from the multivariate Gaussian distribution 206

7.A.3 On the geometric approach to univariate confidence limits 207

8 How to compare the outcome in two groups 209

8.1 Introduction 209

8.2 Simple models that compare two distributions 210

8.3 Comparison done the horizontal way 212

8.4 Analysis done the vertical way 216

8.5 Some ways to compute p-values 224

8.6 The discrete Wilcoxon test 226

8.7 The two-period crossover trial 229

8.8 Multivariate analysis and analysis of covariance 232

References 240

9 Least squares, linear models and beyond 245

9.1 Introduction 245

9.2 The purpose of mathematical models 246

9.3 Different ways to do least squares 250

9.4 Logistic regression, with variations 252

9.5 The two-step modeling approach 257

9.6 The effect of missing covariates 260

9.7 The exponential family of distributions 263

9.8 Generalized linear models 269

References 270

10 Analysis of dose response 273

10.1 Introduction 273

10.2 Dose–response relationship 274

10.3 Relative dose potency and therapeutic ratio 278

10.4 Subject-specific and population averaged dose response 279

10.5 Estimation of the population averaged dose–response relationship 281

10.6 Estimating subject-specific dose responses 285

References 288

11 Hazards and censored data 289

11.1 Introduction 289

11.2 Censored observations: incomplete knowledge 290

11.3 Hazard models from a population perspective 291

11.4 The impact of competing risks 296

11.5 Heterogeneity in survival analysis 300

11.6 Recurrent events and frailty 304

11.7 The principles behind the analysis of censored data 306

11.8 The Kaplan–Meier estimator of the CDF 309

References 313

11.A Appendix: On the large-sample approximations of counting processes 314

12 From the log-rank test to the Cox proportional hazards model 317

12.1 Introduction 317

12.2 Comparing hazards between two groups 318

12.3 Nonparametric tests for hazards 319

12.4 Parameter estimation in hazard models 324

12.5 The accelerated failure time model 328

12.6 The Cox proportional hazards model 331

12.7 On omitted covariates and stratification in the log-rank test 336

References 339

12.A Appendix: Comments on interval-censored data 341

13 Remarks on some estimation methods 343

13.1 Introduction 343

13.2 Estimating equations and the robust variance estimate 344

13.3 From maximum likelihood theory to generalized estimating equations 351

13.4 The analysis of recurrent events 355

13.5 Defining and estimating mixed effects models 360

References 367

13.A Appendix: Formulas for first-order bias 368

Index 371

See More

## Author Information

Anders Källén, Department of Biostatistics, AstraZeneca R&D, Sweden.
See More

## Reviews

"Overall, the book is well-written . . . The topics are presented in a logical progression as is the level of their mathematical difficulty. Any biostatistician will find this a valuable complement to his/her favorite biostatistics textbook." (Journal of Biopharmaceutical Statistics, 2012)
See More