# Clinical Trials with Missing Data: A Guide for Practitioners

# Clinical Trials with Missing Data: A Guide for Practitioners

ISBN: 978-1-118-76251-6 February 2014 472 Pages

## Description

This book provides practical guidance for statisticians, clinicians, and researchers involved in clinical trials in the biopharmaceutical industry, medical and public health organisations. Academics and students needing an introduction to handling missing data will also find this book invaluable.

The authors describe how missing data can affect the outcome and credibility of a clinical trial, show by examples how a clinical team can work to prevent missing data, and present the reader with approaches to address missing data effectively.

The book is illustrated throughout with realistic case studies and worked examples, and presents clear and concise guidelines to enable good planning for missing data. The authors show how to handle missing data in a way that is transparent and easy to understand for clinicians, regulators and patients. New developments are presented to improve the choice and implementation of primary and sensitivity analyses for missing data. Many SAS code examples are included – the reader is given a toolbox for implementing analyses under a variety of assumptions.

## Table of contents

Preface xv

References xvii

Acknowledgments xix

Notation xxi

Table of SAS code fragments xxv

Contributors xxix

**1 What’s the problem with missing data? 1**

*Michael O’Kelly and Bohdana Ratitch*

1.1 What do we mean by missing data? 2

1.1.1 Monotone and non-monotone missing data 3

1.1.2 Modeling missingness, modeling the missing value and ignorability 4

1.1.3 Types of missingness (MCAR, MAR and MNAR) 4

1.1.4 Missing data and study objectives 5

1.2 An illustration 6

1.3 Why can’t I use only the available primary endpoint data? 7

1.4 What’s the problem with using last observation carried forward? 9

1.5 Can we just assume that data are missing at random? 11

1.6 What can be done if data may be missing not at random? 14

1.7 Stress-testing study results for robustness to missing data 15

1.8 How the pattern of dropouts can bias the outcome 15

1.9 How do we formulate a strategy for missing data? 16

1.10 Description of example datasets 18

1.10.1 Example dataset in Parkinson’s disease treatment 18

1.10.2 Example dataset in insomnia treatment 23

1.10.3 Example dataset in mania treatment 28

Appendix 1.A: Formal definitions of MCAR, MAR and MNAR 33

References 34

**2 The prevention of missing data 36**

*Sara Hughes*

2.1 Introduction 36

2.2 The impact of “too much” missing data 37

2.2.1 Example from human immunodeficiency virus 38

2.2.2 Example from acute coronary syndrome 38

2.2.3 Example from studies in pain 39

2.3 The role of the statistician in the prevention of missing data 39

2.3.1 Illustrative example from HIV 41

2.4 Methods for increasing subject retention 48

2.5 Improving understanding of reasons for subject withdrawal 49

Acknowledgments 49

Appendix 2.A: Example protocol text for missing data prevention 49

References 50

**3 Regulatory guidance – a quick tour 53**

*Michael O’Kelly*

3.1 International conference on harmonization guideline: Statistical principles for clinical trials: E9 54

3.2 The US and EU regulatory documents 55

3.3 Key points in the regulatory documents on missing data 55

3.4 Regulatory guidance on particular statistical approaches 57

3.4.1 Available cases 57

3.4.2 Single imputation methods 57

3.4.3 Methods that generally assume MAR 59

3.4.4 Methods that are used assuming MNAR 60

3.5 Guidance about how to plan for missing data in a study 62

3.6 Differences in emphasis between the NRC report and EU guidance documents 63

3.6.1 The term “conservative” 63

3.6.2 Last observation carried forward 63

3.6.3 Post hoc analyses 63

3.6.4 Non-monotone or intermittently missing data 63

3.6.5 Assumptions should be readily interpretable 65

3.6.6 Study report 65

3.6.7 Training 65

3.7 Other technical points from the NRC report 66

3.7.1 Time-to-event analyses 66

3.7.2 Tipping point sensitivity analyses 66

3.8 Other US/EU/international guidance documents that refer to missing data 66

3.8.1 Committee for medicinal products for human use guideline on anti-cancer products, recommendations on survival analysis 66

3.8.2 US guidance on considerations when research supported by office of human research protections is discontinued 67

3.8.3 FDA guidance on data retention 67

3.9 And in practice? 67

References 69

**4 A guide to planning for missing data 71**

*Michael O’Kelly and Bohdana Ratitch*

4.1 Introduction 72

4.1.1 Missing data may bias trial results or make them more difficult to generalize to subjects outside the trial 72

4.1.2 Credibility of trial results when there is missing data 74

4.1.3 Demand for better practice with regard to missing data 74

4.2 Planning for missing data 76

4.2.1 The case report form and non-statistical sections of the protocol 76

4.2.2 The statistical sections of the protocol and the statistical analysis plan 81

4.2.3 Using historic data to narrow the choice of primary analysis and sensitivity analyses 88

4.2.4 Key points in choosing an approach for missing data 108

4.3 Exploring and presenting missingness 113

4.4 Model checking 114

4.5 Interpreting model results when there is missing data 116

4.6 Sample size and missing data 117

Appendix 4.A: Sample protocol/SAP text for study in Parkinson’s disease 119

Appendix 4.B: A formal definition of a sensitivity parameter 125

References 126

**5 Mixed models for repeated measures using categorical time effects (MMRM) 130**

*Sonia Davis*

5.1 Introduction 131

5.2 Specifying the mixed model for repeated measures 132

5.2.1 The mixed model 132

5.2.2 Covariance structures 135

5.2.3 Mixed model for repeated measures versus generalized estimating equations 139

5.2.4 Mixed model for repeated measures versus last observation carried forward 140

5.3 Understanding the data 141

5.3.1 Parkinson’s disease example 141

5.3.2 A second example showing the usefulness of plots: The CATIE study 144

5.4 Applying the mixed model for repeated measures 145

5.4.1 Specifying the model 146

5.4.2 Interpreting and presenting results 150

5.5 Additional mixed model for repeated measures topics 162

5.5.1 Treatment by subgroup and treatment by site interactions 162

5.5.2 Calculating the effect size 164

5.5.3 Another strategy to model baseline 166

5.6 Logistic regression mixed model for repeated measures using the generalized linear mixed model 168

5.6.1 The generalized linear mixed model 168

5.6.2 Specifying the model 170

5.6.3 Interpreting and presenting results 173

5.6.4 Other modeling options 181

References 182

Table of SAS Code Fragments 183

**6 Multiple imputation 185**

*Bohdana Ratitch*

6.1 Introduction 185

6.1.1 How is multiple imputation different from single imputation? 186

6.1.2 How is multiple imputation different from maximum likelihood methods? 187

6.1.3 Multiple imputation’s assumptions about missingness mechanism 188

6.1.4 A general three-step process for multiple imputation and inference 189

6.1.5 Imputation versus analysis model 190

6.1.6 Note on notation use 192

6.2 Imputation phase 192

6.2.1 Missing patterns: Monotone and non-monotone 192

6.2.2 How do we get multiple imputations? 195

6.2.3 Imputation strategies: Sequential univariate versus joint multivariate 197

6.2.4 Overview of the imputation methods 199

6.2.5 Reusing the multiply-imputed dataset for different analyses or summary scales 212

6.3 Analysis phase: Analyzing multiple imputed datasets 213

6.4 Pooling phase: Combining results from multiple datasets 216

6.4.1 Combination rules 216

6.4.2 Pooling analyses of continuous outcomes 219

6.4.3 Pooling analyses of categorical outcomes 222

6.5 Required number of imputations 227

6.6 Some practical considerations 231

6.6.1 Choosing an imputation model 231

6.6.2 Multivariate normality 235

6.6.3 Rounding and restricting the range for the imputed values 238

6.6.4 Convergence of Markov chain Monte Carlo 240

6.7 Pre-specifying details of analysis with multiple imputation 244

Appendix 6.A: Additional methods for multiple imputation 245

References 251

Table of SAS Code Fragments 255

**7 Analyses under missing-not-at-random assumptions 257**

*Michael O’Kelly and Bohdana Ratitch*

7.1 Introduction 258

7.2 Background to sensitivity analyses and pattern-mixture models 259

7.2.1 The purpose of a sensitivity analysis 259

7.2.2 Pattern-mixture models as sensitivity analyses 261

7.3 Two methods of implementing sensitivity analyses via pattern-mixture models 264

7.3.1 A sequential method of implementing pattern-mixture models with multiple imputation 264

7.3.2 Providing stress-testing “what ifs” using pattern-mixture models 266

7.3.3 Two implementations of pattern-mixture models for sensitivity analyses 267

7.3.4 Characteristics and limitations of the sequential modeling method of implementing pattern-mixture models 268

7.3.5 Pattern-mixture models implemented using the joint modeling method 271

7.3.6 Characteristics of the joint modeling method of implementing pattern-mixture models 279

7.3.7 Summary of differences between the joint modeling and sequential modeling methods 281

7.4 A “toolkit”: Implementing sensitivity analyses via SAS 284

7.4.1 Reminder: General approach using multiple imputation with regression 284

7.4.2 Sensitivity analyses assuming withdrawals have trajectory of control arm 288

7.4.3 Sensitivity analyses assuming withdrawals have distribution of control arm 292

7.4.4 Baseline-observation-carried-forward-like and last-observation-carried-forward-like analyses 297

7.4.5 The general principle of using selected subsets of observed data as the basis to implement “what if” stress tests 306

7.4.6 Using a mixture of “what ifs,” depending on reason for discontinuation 306

7.4.7 Assuming trajectory of withdrawals is worse by some : Delta adjustment and tipping point analysis 308

7.5 Examples of realistic strategies and results for illustrative datasets of three indications 320

7.5.1 Parkinson’s disease 320

7.5.2 Insomnia 323

7.5.3 Mania 330

Appendix 7.A How one could implement the neighboring case missing value assumption using visit-by-visit multiple imputation 335

Appendix 7.B SAS code to model withdrawals from the experimental arm, using observed data from the control arm 336

Appendix 7.C SAS code to model early withdrawals from the experimental arm, using the last-observation-carried-forward-like values 342

Appendix 7.D SAS macro to impose delta adjustment on a responder variable in the mania dataset 345

Appendix 7.E SAS code to implement tipping point via exhaustive scenarios for withdrawals in the mania dataset 346

Appendix 7.F SAS code to perform sensitivity analyses for the Parkinson’s disease dataset 348

Appendix 7.G SAS code to perform sensitivity analyses for the insomnia dataset 351

Appendix 7.H SAS code to perform sensitivity analyses for the mania dataset 356

Appendix 7.I Selection models 358

Appendix 7.J Shared parameter models 362

References 365

Table of SAS Code Fragments 368

**8 Doubly robust estimation 369**

*Belinda Hern´andez, Ilya Lipkovich and Michael O’Kelly*

8.1 Introduction 370

8.2 Inverse probability weighted estimation 370

8.2.1 Inverse probability weighting estimators for estimating equations 372

8.2.2 Summary of inverse probability weighting advantages 373

8.2.3 Inverse probability weighting disadvantages 373

8.3 Doubly robust estimation 374

8.3.1 Doubly robust methods explained 375

8.3.2 Advantages of doubly robust methods 376

8.3.3 Limitations of doubly robust methods 376

8.4 Vansteelandt et al. method for doubly robust estimation 377

8.4.1 Theoretical justification for the Vansteelandt et al. method 378

8.4.2 Implementation of the Vansteelandt et al. method for doubly robust estimation 379

8.5 Implementing the Vansteelandt et al. method via SAS 383

8.5.1 Mania dataset 383

8.5.2 Insomnia dataset 390

Appendix 8.A How to implement Vansteelandt et al. method for mania dataset (binary response) 392

Appendix 8.B SAS code to calculate estimates from the bootstrapped datasets 400

Appendix 8.C How to implement Vansteelandt et al. method for insomnia dataset 401

References 408

Table of SAS Code Fragments 408

Bibliography 409

Index 423

## Reviews

“In summary, the book is a must-have tool for any biostatistician dealing with missing data. It is an excellent reference book for postgraduate students or researchers working in the area of missing data.” *(**Biometrical Journal*, 1 June 2015)

“This is an excellent addition to the field, dealing with problems arising from missing data or unobserved data in clinical trials. It successfully bridges the gap between clinicians and statisticians using relatively common language to build common ground.” * (Doody’s*, 9 January 2015)