Skip to main content

Medical Statistics from Scratch: An Introduction for Health Professionals, 4th Edition





Medical Statistics from Scratch: An Introduction for Health Professionals, 4th Edition

David Bowers

ISBN: 978-1-119-52388-8 September 2019 440 Pages



Correctly understanding and using medical statistics is a key skill for all medical students and health professionals.

In an informal and friendly style, Medical Statistics from Scratch provides a practical foundation for everyone whose first interest is probably not medical statistics. Keeping the level of mathematics to a minimum, it clearly illustrates statistical concepts and practice with numerous real-world examples and cases drawn from current medical literature.

Medical Statistics from Scratch is an ideal learning partner for all medical students and health professionals needing an accessible introduction, or a friendly refresher, to the fundamentals of medical statistics.

Preface to the 4th Edition

Preface to the 3rd Edition xv

Preface to the 2nd Edition xvii

Preface to the 1st Edition xix

Introduction xxi

I Some Fundamental Stuff

1 First things first – the nature of data

Variables and data

The good, the bad, and the ugly – types of variables

Categorical data

Metric data

II Descriptive Statistics

2 Describing data with tables

Descriptive statistics. What can we do with raw data?

Frequency tables – nominal data

Frequency tables – ordinal data

Frequency tables – metric data

Cumulative frequency

Cross-tabulation – contingency tables

Ranking data

3 Every picture tells a story – describing data with charts

Picture it!

Charting nominal and ordinal data

Charting discrete metric data

Charting continuous metric data

Charting cumulative data

Charting time-based data – the time series chart

4 Describing data from its shape

The shape of things to come

Negative skew

Positive skew

Symmetric or mound-shaped distributions

Normal-ness – the Normal distribution

Bimodal distributions

Determining skew from a box plot

5 Measures of location – Numbers R Us

Numbers, percentages, and proportions

Summary measures of location

The mode

The median

The mean


6 Measures of spread – Numbers R Us (again)

The range

The interquartile range (IQR)

The boxplot (also known as the box and whisker plot)

Standard deviation

Standard deviation and the Normal distribution

Transforming data

7 Incidence, prevalence, and standardization

The incidence rate

The incidence rate ratio (IRR)


Some other useful (and common) rates

Age specific mortality rate

Standardisation – the standardised mortality rate

The direct method

The standard population

The indirect method

The standardised mortality ratio

III The Confounding Problem

8 Confounding – like the poor, (nearly) always with us

What is confounding?

Confounding by indication

Residual confounding

Detecting confounding

Dealing with confounding. If confounding is such a problem, what can we do about it?

IV Design and Data

9 Research design – Part I: Observational study designs

Hey ho! Hey ho! It’s off to work we go!

Types of study

Case reports and case series

Observational studies

From here to eternity – cohort studies

Confounding in cohort studies

Back to the future – case–control studies

Confounding in case–control studies

Comparing cohort and case–control designs

Ecological studies

The ecological fallacy

10 Research design – Part II: Getting stuck in – experimental studies

Clinical trials


Block randomisation



The cross-over randomised controlled trial

Selection of participants

Intention-to-treat analysis

11 Getting the participants for your study: ways of sampling

From populations to samples – statistical inference

The target and study populations, and the sample 

Collecting the data – types of sample

The simple random sample

The systematic random sample

The stratified random sample

The cluster sample

Consecutive and convenience samples

How many participants should we have? Sample size

Inclusion and exclusion criteria

Getting the data

V Chance Would be a Fine Thing

12 The idea of probability

Calculating probability – proportional frequency

Two useful rules for simple probability

The multiplication rule for independent events

The addition rule for mutually exclusive events

Conditional and Bayesian statistics

Probability distributions

What is a probability distribution?

Discrete versus continuous probability distributions

The binomial probability distribution

The Poisson probability distribution

The Normal probability distribution

13 Risk and odds

Absolute risk and the absolute risk reduction

The risk ratio

The reduction in the risk ratio (or relative risk reduction) RRR

Reference value

Number needed to treat

What happens if the initial risk is small?

Confounding with the risk ratio


Why you can’t calculate risk in a case–control study

The odds ratio

Confounding with the odds ratio

Approximating the odds ratio with the risk ratio

VI The Informed Guess – An Introduction to Confidence Intervals

14 Estimating the value of a single population parameter – the idea of confidence intervals

Confidence interval estimation for a population mean

The standard error of the mean

How we use the standard error of the mean to calculate a confidence interval for a population mean

Confidence interval for a population proportion

Confidence interval for the median of a single population

15 Using confidence intervals to compare two population parameters

What’s the difference?

Comparing two independent population means

Assessing the confidence interval and the sample size

Comparing two paired population means

Within-subject and between-subject variation

Comparing two independent population proportions

Comparing two independent population medians – the Mann–Whitney rank sums method

Comparing two matched population medians – the Wilcoxon signed-ranks method

16 Confidence intervals for the ratio of two population parameters

Confidence interval for the ratio of two independent population means

Confidence interval for a population risk ratio

Confidence intervals for a population odds ratio

Confidence intervals for hazard ratios

VII Putting it to the Test

17 Testing hypotheses about the difference between two population parameters

Answering the question

The hypothesis

The null hypothesis

The hypothesis testing process

The p-value and the decision rule

A brief summary of a few of the most common tests

Using the p-value to compare the means of two independent populations

Interpreting computer hypothesis test results for the difference in two independent

population means – the two-sample t test

Output from Minitab – two-sample t test of difference in mean birthweights of babies

born to White mothers and to non-White mothers

Output from SPSS: two-sample t test of difference in mean birthweights of babies born to

White mothers and to non-White mothers

Comparing the means of two paired populations – the matched-pairs t test

Using p-values to compare the medians of two independent populations

The Mann–Whitney rank-sums test

How the Mann–Whitney test works

Correction for multiple comparisons

The Bonferroni correction

Interpreting computer output for the Mann–Whitney test

With Minitab


Comparing two matched medians – the Wilcoxon signed-ranks test

Confidence intervals versus hypothesis test

What could possibly go wrong?

Types of error – type I and type II errors

The power of a test

Maximising power – calculating sample size

Sample size when comparing the means of two independent populations

Sample size when comparing the proportions of two independent populations

18 The chi-squared (χ2) test – what, why, and how?

Of all the tests in all the world – you had to walk into my hypothesis testing procedure

Using chi-squared to test for related-ness or the equality of proportions

Calculating the chi-squared statistic

Using the chi-squared test

Yate’s correction (continuity correction)

Fisher’s exact test

The chi-squared test with Minitab

The chi-squared test with SPSS

The chi-squared test for trend

SPSS output for chi-squared trend test

19 Testing hypotheses about the ratio of two population parameters

The chi-squared test with the risk ratio

The chi-squared test with odds ratios

The chi-squared test with hazard ratios

VIII Becoming Acquainted

20 Measuring the association between two variables

Plotting data


The scatterplot

The correlation coefficient

Pearson’s correlation coefficient

Is the correlation coefficient significant in the population?

Spearman’s rank correlation coefficient

21 Measuring agreement

Spearman’s rank correlation coefficient

To agree or not to agree: that is the question

Cohen’s kappa (κ)

Weighted kappa

Measuring the agreement between two metric continuous variables; Bland–Altman

IX Getting into a Relationship

22 Straight line models: linear regression

Weighted kappa

Relationship and association

Finding the equation of a straight line from a graph

A causal relationship – explaining variation

The linear regression model

Is the relationship linear?

Estimating the regression parameters – the method of ordinary least squares

Basic assumptions of the ordinary least squares procedure

Is the relationship statistically significant?

Estimating the regression parameters with SPSS and Minitab

Interpreting the regression coefficients

Goodness-of-fit, R2

Multiple linear regression

Adjusted goodness-of-fit: R2

Including nominal independent variables in the regression model: design variables and coding

Building your model. Which variables to include?

Automated variable selection methods

Manual variable selection methods

Adjustment and confounding

An example from practice

Diagnostics – checking the basic assumptions of the multiple linear regression model

Analysis of variance

23 Curvy models: Logistic regression

The binary outcome variable

Finding an appropriate model

The logistic regression model

Estimating the parameter values

Interpreting the regression coefficients

Is the model significant in the population?

Getting the odds ratio directly from the regression results

Is the odds ratio significant?

The multiple logistic regression model

Building the model – variable selection


Pearson's chi-squared; the Deviance statistic; the Hosmer–Lemeshow statistic 

24 Counting models: Poisson regression

Poisson regression and the Poisson regression model

Interpreting the regression coefficients

When the outcome is a count

When the outcome is a rate

Building the model – variable selection


The zero-inflated Poisson regression model

Negative binomial regression

Zero-inflated negative binomial regression

X Four More Chapters

25 Measuring survival

Censored data

Calculating survival probabilities and the proportion surviving: the Kaplan–Meier table

The Kaplan–Meier curve

Determining median survival time

Comparing survival with two groups

The log-rank test

The hazard ratio

The proportional hazards (Cox’s) regression model

The proportional hazards (Cox’s) regression model – the detail

Checking the proportional hazards assumption

26 Systematic review and meta-analysis


Systematic review – what it is

The forest plot – what does it show?

Publication and other biases

The funnel plot

Graphical interpretation of funnel plot for asymmetry

Significance test for asymmetry in funnel plot – Begg's and Egger's tests

Combining the studies: meta-analysis

The problem of heterogeneity

Testing for heterogeneity – Cochrane's Q test; the I2 test

27 Diagnostic testing

Sensitivity, specificity

Positive predictive value (PPV)

Negative predictive value (NPV)

The sensitivity versus specificity trade-off

Using the ROC curve to find the optimum trade-off value (or cut-off)

28 Missing data

The missing data problem

Types of missingness: MCAR, MAR, MNAR

The consequences of missing data

Methods for dealing with missing data

List-wise deletion (or complete case analysis)

Pair-wise deletion (or available case analysis)

Simple imputation

Replacement by the mean

Last observation carried forward (LOCF)

Regression-based imputation

Multiple imputation (MI)

Other methods: FIML, EM, MissForest, Nearest Neighbour

Appendix: Table of random numbers


Solutions to Exercises