# Medical Statistics from Scratch: An Introduction for Health Professionals, 4th Edition

# Medical Statistics from Scratch: An Introduction for Health Professionals, 4th Edition

ISBN: 978-1-119-52388-8 September 2019 440 Pages

## Description

Correctly understanding and using medical statistics is a key skill for all medical students and health professionals.

In an informal and friendly style, *Medical Statistics from Scratch* provides a practical foundation for everyone whose first interest is probably not medical statistics. Keeping the level of mathematics to a minimum, it clearly illustrates statistical concepts and practice with numerous real-world examples and cases drawn from current medical literature.

*Medical Statistics from Scratch* is an ideal learning partner for all medical students and health professionals needing an accessible introduction, or a friendly refresher, to the fundamentals of medical statistics.

## Table of contents

Preface to the 4th Edition

Preface to the 3rd Edition xv

Preface to the 2nd Edition xvii

Preface to the 1st Edition xix

Introduction xxi

**I Some Fundamental Stuff**

**1 First things first – the nature of data**

Variables and data

The good, the bad, and the ugly – types of variables

Categorical data

Metric data

**II Descriptive Statistics**

**2 Describing data with tables**

Descriptive statistics. What can we do with raw data?

Frequency tables – nominal data

Frequency tables – ordinal data

Frequency tables – metric data

Cumulative frequency

Cross-tabulation – contingency tables

Ranking data

**3 Every picture tells a story – describing data with charts**

Picture it!

Charting nominal and ordinal data

Charting discrete metric data

Charting continuous metric data

Charting cumulative data

Charting time-based data – the time series chart

**4 Describing data from its shape**

The shape of things to come

Negative skew

Positive skew

Symmetric or mound-shaped distributions

Normal-ness – the Normal distribution

Bimodal distributions

Determining skew from a box plot

**5 Measures of location – Numbers R Us**

Numbers, percentages, and proportions

Summary measures of location

The mode

The median

The mean

Percentiles

**6 Measures of spread – Numbers R Us (again)**

The range

The interquartile range (IQR)

The boxplot (also known as the box and whisker plot)

Standard deviation

Standard deviation and the Normal distribution

Transforming data

**7 Incidence, prevalence, and standardization**

The incidence rate

The incidence rate ratio (IRR)

Prevalence

Some other useful (and common) rates

Age specific mortality rate

Standardisation – the standardised mortality rate

The direct method

The standard population

The indirect method

The standardised mortality ratio

**III The Confounding Problem**

**8 Confounding – like the poor, (nearly) always with us**

What is confounding?

Confounding by indication

Residual confounding

Detecting confounding

Dealing with confounding. If confounding is such a problem, what can we do about it?

**IV Design and Data**

**9 Research design – Part I: Observational study designs**

Hey ho! Hey ho! It’s off to work we go!

Types of study

Case reports and case series

Observational studies

From here to eternity – cohort studies

Confounding in cohort studies

Back to the future – case–control studies

Confounding in case–control studies

Comparing cohort and case–control designs

Ecological studies

The ecological fallacy

**10 Research design – Part II: Getting stuck in – experimental studies**

Clinical trials

Randomisation

Block randomisation

Stratification

Blinding

The cross-over randomised controlled trial

Selection of participants

Intention-to-treat analysis

**11 Getting the participants for your study: ways of sampling**

From populations to samples – statistical inference

The target and study populations, and the sample

Collecting the data – types of sample

The simple random sample

The systematic random sample

The stratified random sample

The cluster sample

Consecutive and convenience samples

How many participants should we have? Sample size

Inclusion and exclusion criteria

Getting the data

**V Chance Would be a Fine Thing**

**12 The idea of probability**

Calculating probability – proportional frequency

Two useful rules for simple probability

The multiplication rule for independent events

The addition rule for mutually exclusive events

Conditional and Bayesian statistics

Probability distributions

What is a probability distribution?

Discrete versus continuous probability distributions

The binomial probability distribution

The Poisson probability distribution

The Normal probability distribution

**13 Risk and odds**

Absolute risk and the absolute risk reduction

The risk ratio

The reduction in the risk ratio (or relative risk reduction) RRR

Reference value

Number needed to treat

What happens if the initial risk is small?

Confounding with the risk ratio

Odds

Why you can’t calculate risk in a case–control study

The odds ratio

Confounding with the odds ratio

Approximating the odds ratio with the risk ratio

**VI The Informed Guess – An Introduction to Confidence Intervals**

**14 Estimating the value of a single population parameter – the idea of confidence intervals**

Confidence interval estimation for a population mean

The standard error of the mean

How we use the standard error of the mean to calculate a confidence interval for a population mean

Confidence interval for a population proportion

Confidence interval for the median of a single population

**15 Using confidence intervals to compare two population parameters**

What’s the difference?

Comparing two independent population means

Assessing the confidence interval and the sample size

Comparing two paired population means

Within-subject and between-subject variation

Comparing two independent population proportions

Comparing two independent population medians – the Mann–Whitney rank sums method

Comparing two matched population medians – the Wilcoxon signed-ranks method

**16 Confidence intervals for the ratio of two population parameters**

Confidence interval for the ratio of two independent population means

Confidence interval for a population risk ratio

Confidence intervals for a population odds ratio

Confidence intervals for hazard ratios

**VII Putting it to the Test**

**17 Testing hypotheses about the difference between two population parameters**

Answering the question

The hypothesis

The null hypothesis

The hypothesis testing process

The p-value and the decision rule

A brief summary of a few of the most common tests

Using the p-value to compare the means of two independent populations

Interpreting computer hypothesis test results for the difference in two independent

population means – the two-sample t test

Output from Minitab – two-sample t test of difference in mean birthweights of babies

born to White mothers and to non-White mothers

Output from SPSS: two-sample t test of difference in mean birthweights of babies born to

White mothers and to non-White mothers

Comparing the means of two paired populations – the matched-pairs t test

Using p-values to compare the medians of two independent populations

The Mann–Whitney rank-sums test

How the Mann–Whitney test works

Correction for multiple comparisons

The Bonferroni correction

Interpreting computer output for the Mann–Whitney test

With Minitab

With SPSS

Comparing two matched medians – the Wilcoxon signed-ranks test

Confidence intervals versus hypothesis test

What could possibly go wrong?

Types of error – type I and type II errors

The power of a test

Maximising power – calculating sample size

Sample size when comparing the means of two independent populations

Sample size when comparing the proportions of two independent populations

**18 The chi-squared (χ2) test – what, why, and how?**

Of all the tests in all the world – you had to walk into my hypothesis testing procedure

Using chi-squared to test for related-ness or the equality of proportions

Calculating the chi-squared statistic

Using the chi-squared test

Yate’s correction (continuity correction)

Fisher’s exact test

The chi-squared test with Minitab

The chi-squared test with SPSS

The chi-squared test for trend

SPSS output for chi-squared trend test

**19 Testing hypotheses about the ratio of two population parameters**

The chi-squared test with the risk ratio

The chi-squared test with odds ratios

The chi-squared test with hazard ratios

**VIII Becoming Acquainted**

**20 Measuring the association between two variables**

Plotting data

Association

The scatterplot

The correlation coefficient

Pearson’s correlation coefficient

Is the correlation coefficient significant in the population?

Spearman’s rank correlation coefficient

**21 Measuring agreement**

Spearman’s rank correlation coefficient

To agree or not to agree: that is the question

Cohen’s kappa (κ)

Weighted kappa

Measuring the agreement between two metric continuous variables; Bland–Altman

**IX Getting into a Relationship**

**22 Straight line models: linear regression**

Weighted kappa

Relationship and association

Finding the equation of a straight line from a graph

A causal relationship – explaining variation

The linear regression model

Is the relationship linear?

Estimating the regression parameters – the method of ordinary least squares

Basic assumptions of the ordinary least squares procedure

Is the relationship statistically significant?

Estimating the regression parameters with SPSS and Minitab

Interpreting the regression coefficients

Goodness-of-fit, *R*^{2}

Multiple linear regression

Adjusted goodness-of-fit: *R*^{2}

Including nominal independent variables in the regression model: design variables and coding

Building your model. Which variables to include?

Automated variable selection methods

Manual variable selection methods

Adjustment and confounding

An example from practice

Diagnostics – checking the basic assumptions of the multiple linear regression model

Analysis of variance

**23 Curvy models: Logistic regression**

The binary outcome variable

Finding an appropriate model

The logistic regression model

Estimating the parameter values

Interpreting the regression coefficients

Is the model significant in the population?

Getting the odds ratio directly from the regression results

Is the odds ratio significant?

The multiple logistic regression model

Building the model – variable selection

Goodness-of-fit

Pearson's chi-squared; the Deviance statistic; the Hosmer–Lemeshow statistic

**24 Counting models: Poisson regression**

Poisson regression and the Poisson regression model

Interpreting the regression coefficients

When the outcome is a count

When the outcome is a rate

Building the model – variable selection

Goodness-of-fit

The zero-inflated Poisson regression model

Negative binomial regression

Zero-inflated negative binomial regression

**X Four More Chapters**

**25 Measuring survival**

Censored data

Calculating survival probabilities and the proportion surviving: the Kaplan–Meier table

The Kaplan–Meier curve

Determining median survival time

Comparing survival with two groups

The log-rank test

The hazard ratio

The proportional hazards (Cox’s) regression model

The proportional hazards (Cox’s) regression model – the detail

Checking the proportional hazards assumption

**26 Systematic review and meta-analysis**

Introduction

Systematic review – what it is

The forest plot – what does it show?

Publication and other biases

The funnel plot

Graphical interpretation of funnel plot for asymmetry

Significance test for asymmetry in funnel plot – Begg's and Egger's tests

Combining the studies: meta-analysis

The problem of heterogeneity

Testing for heterogeneity – Cochrane's Q test; the I2 test

**27 Diagnostic testing**

Sensitivity, specificity

Positive predictive value (PPV)

Negative predictive value (NPV)

The sensitivity versus specificity trade-off

Using the ROC curve to find the optimum trade-off value (or cut-off)

**28 Missing data**

The missing data problem

Types of missingness: MCAR, MAR, MNAR

The consequences of missing data

Methods for dealing with missing data

List-wise deletion (or complete case analysis)

Pair-wise deletion (or available case analysis)

Simple imputation

Replacement by the mean

Last observation carried forward (LOCF)

Regression-based imputation

Multiple imputation (MI)

Other methods: FIML, EM, MissForest, Nearest Neighbour

Appendix: Table of random numbers

References

Solutions to Exercises

Index