# Handbook of Statistical Genetics, 3rd Edition

# Handbook of Statistical Genetics, 3rd Edition

ISBN: 978-0-470-99762-8 June 2008 1616 Pages

**E-Book**

$356.99

## Description

The*Handbook for Statistical Genetics*is widely regarded as

**the**reference work in the field. However, the field has developed considerably over the past three years. In particular the modeling of genetic networks has advanced considerably via the evolution of microarray analysis. As a consequence the 3rd edition of the handbook contains a much expanded section on Network Modeling, including 5 new chapters covering metabolic networks, graphical modeling and inference and simulation of pedigrees and genealogies. Other chapters new to the 3rd edition include Human Population Genetics, Genome-wide Association Studies, Family-based Association Studies, Pharmacogenetics, Epigenetics, Ethic and Insurance.

As with the second Edition, the Handbook includes a glossary of terms, acronyms and abbreviations, and features extensive cross-referencing between the chapters, tying the different areas together. With heavy use of up-to-date examples, real-life case studies and references to web-based resources, this continues to be must-have reference in a vital area of research.

Edited by the leading international authorities in the field.

**David Balding - Department of Epidemiology & Public Health, Imperial College **An advisor for our Probability & Statistics series, Professor Balding is also a previous Wiley author, having written

*Weight-of-Evidence for Forensic DNA Profiles*, as well as having edited the two previous editions of HSG. With over 20 years teaching experience, he’s also had dozens of articles published in numerous international journals.

**Martin Bishop – Head of the Bioinformatics Division at the HGMP Resource Centre**

As well as the first two editions of HSG, Dr Bishop has edited a number of introductory books on the application of informatics to molecular biology and genetics. He is the Associate Editor of the journal

*Bioinformatics*and Managing Editor of

*Briefings in Bioinformatics*.

**Chris Cannings – Division of Genomic Medicine, University of Sheffield**

With over 40 years teaching in the area, Professor Cannings has published over 100 papers and is on the editorial board of many related journals. Co-editor of the two previous editions of HSG, he also authored a book on this topic.

**Volume 1.**

List of Contributors.

Editor’s Preface to the Third Edition.

Glossary of Terms.

Abbreviations and Acronyms.

**Part 1 GENOMES.**

**1 Chromosome Maps.**

*T.P. Speed and H. Zhao.*

1.1 Introduction.

1.2 Genetic Maps.

1.3 Physical Maps.

1.4 Radiation Hybrid Mapping.

1.5 Other Physical Mapping Approaches.

1.6 Gene Maps.

Acknowledgments.

References.

**2 Statistical Significance in Biological Sequence Comparison.**

*W.R. Pearson and T.C. Wood.*

2.1 Introduction.

2.2 Statistical Significance and Biological Significance.

2.3 Estimating Statistical Significance for Local Similarity Searches.

2.4 Summary: Exploiting Statistical Estimates.

Acknowledgments.

References.

**3 Bayesian Methods in Biological Sequence Analysis.**

*Jun S. Liu and T. Logvinenko.*

3.1 Introduction.

3.2 Overview of the Bayesian Methodology.

3.3 Hidden Markov Model: A General Introduction.

3.4 Pairwise Alignment of Biological Sequences.

3.5 Multiple Sequence Alignment.

3.6 Finding Recurring Patterns in Biological Sequences.

3.7 Joint Analysis of Sequence Motifs and Expression Microarrays.

3.8 Summary.

Acknowledgments.

Appendix A: Markov Chain Monte Carlo Methods.

References.

**4 Statistical Approaches in Eukaryotic Gene Prediction.**

*V. Solovyev.*

4.1 Structural Organization and Expression of Eukaryotic Genes.

4.2 Methods of Functional Signal Recognition.

4.3 Linear Discriminant Analysis.

4.4 Prediction of Donor and Acceptor Splice Junctions.

4.5 Identification of Promoter Regions in Human DNA.

4.6 Recognition of PolyA Sites.

4.7 Characteristics for Recognition of 3-Processing Sites.

4.8 Identification of Multiple Genes in Genomic Sequences.

4.9 Discriminative and Probabilistic Approaches for Multiple Gene Prediction.

4.10 Internal Exon Recognition.

4.11 Recognition of Flanking Exons.

4.12 Performance of Gene Identification Programs

4.13 Using Protein Similarity Information to Improve Gene Prediction.

4.14 Genome Annotation Assessment Project (EGASP).

4.15 Annotation of Sequences from Genome Sequencing Projects.

4.16 Characteristics and Computational Identification of miRNA genes.

4.17 Prediction of microRNA Targets.

4.18 Internet Resources for Gene Finding and Functional Site Prediction.

Acknowledgments.

References.

**5 Comparative Genomics.**

*J. Dicks and G. Savva.*

5.1 Introduction.

5.2 Homology.

5.3 Genomic Mutation.

5.4 Comparative Maps.

5.5 Gene Order and Content.

5.6 Whole Genome Sequences.

5.7 Conclusions and Future Research.

Acknowledgments.

References.

**Part 2 BEYOND THE GENOME.**

**6 Analysis of Microarray Gene Expression Data.**

*W. Huber, A. von Heydebreck and M. Vingron.*

6.1 Introduction.

6.2 Data Visualization and Quality Control.

6.3 Error Models, Calibration and Measures of Differential Expression.

6.4 Identification of Differentially Expressed Genes.

6.5 Pattern Discovery.

6.6 Conclusions.

Acknowledgments.

References.

**7 Statistical Inference for Microarray Studies.**

*S.B. Pounds, C. Cheng and A. Onar.*

7.1 Introduction.

7.2 Initial Data Processing.

7.3 Testing the Association of Phenotype with Expression.

7.4 Multiple Testing.

7.5 Annotation Analysis.

7.6 Validation Analysis.

7.7 Study Design and Sample Size.

7.8 Discussion.

Related Chapters.

References.

**8 Bayesian Methods for Microarray Data.**

*A. Lewin and S. Richardson.*

8.1 Introduction.

8.2 Extracting Signal From Observed Intensities.

8.3 Differential Expression.

8.4 Clustering Gene Expression Profiles.

8.5 Multivariate Gene Selection Models.

Acknowledgments.

Related Chapters.

References.

**9 Inferring Causal Associations between Genes and Disease via the Mapping of Expression Quantitative Trait Loci.**

*S.K. Sieberts and E.E. Schadt.*

9.1 Introduction.

9.2 An Overview of Transcription as a Complex Process.

9.3 Human Versus Experimental Models.

9.4 Heritability of Expression Traits.

9.5 Joint eQTL Mapping.

9.6 Multilocus Models AND FDR.

9.7 eQTL and Clinical Trait Linkage Mapping to Infer Causal Associations.

9.8 Using eQTL Data to Reconstruct Coexpression Networks.

9.9 Using eQTL Data to Reconstruct Probabilistic Networks.

9.10 Conclusions.

9.11 Software.

References.

**10 Protein Structure Prediction.**

*D.P. Klose and W.R. Taylor.*

10.1 History.

10.2 Basic Structural Biology.

10.3 Protein Structure Prediction.

10.4 Model Evaluation.

10.5 Conclusions.

References.

**11 Statistical Techniques in Metabolic Profiling.**

*M. De Iorio, T.M.D. Ebbels and D.A. Stephens.*

11.1 Introduction.

11.2 Principal Components Analysis and Regression.

11.3 Partial Least Squares and Related Methods.

11.4 Clustering Procedures.

11.5 Neural Networks, Kernel Methods and Related Approaches.

11.6 Evolutionary Algorithms.

11.7 Conclusions.

Acknowledgments.

References.

**Part 3 EVOLUTIONARY GENETICS.**

**12 Adaptive Molecular Evolution.**

*Z. Yang.*

12.1 Introduction.

12.2 Markov Model of Codon Substitution.

12.3 Estimation of Synonymous (*dS*) and Nonsynonymous (*dN*) Substitution Rates Between Two Sequences.

12.4 Likelihood Calculation on a Phylogeny.

12.5 Detecting Adaptive Evolution Along Lineages.

12.6 Inferring Amino Acid Sites Under Positive Selection.

12.7 Testing Positive Selection Affecting Particular Sites and Lineages.

12.8 Limitations of Current Methods.

12.9 Computer Software.

Acknowledgments.

References.

**13 Genome Evolution.**

*J.F.Y. Brookfield.*

13.1 Introduction.

13.2 The Structure and Function of Genomes.

13.3 The Organisation of Genomes.

13.4 Population Genetics and the Genome.

13.5 Mobile DNAs.

13.6 Conclusions.

References.

**14 Probabilistic Models for the Study of Protein Evolution.**

*J.L. Thorne and N. Goldman.*

14.1 Introduction.

14.2 Empirically Derived Models of Amino Acid Replacement.

14.3 Amino Acid Composition.

14.4 Heterogeneity of Replacement Rates Among Sites.

14.5 Protein Structural Environments.

14.6 Variation of Preferred Residues Among Sites.

14.7 Models with a Physicochemical Basis.

14.8 Codon-Based Models.

14.9 Dependence Among Positions: Simulation.

14.10 Dependence Among Positions: Inference.

14.11 Conclusions.

Acknowledgments.

References.

**15 Application of the Likelihood Function in Phylogenetic Analysis.**

*J.P. Huelsenbeck and J.P. Bollback.*

15.1 Introduction.

15.2 History.

15.3 Likelihood Function.

15.4 Developing an Intuition of Likelihood.

15.5 Method of Maximum Likelihood.

15.6 Bayesian Inference.

15.7 Markov Chain Monte Carlo.

15.8 Assessing Uncertainty of Phylogenies.

15.9 Hypothesis Testing and Model Choice.

15.10 Comparative Analysis.

15.11 Conclusions.

References.

**16 Phylogenetics: Parsimony, Networks, and Distance Methods.**

*D. Penny, M.D. Hendy and B.R. Holland.*

16.1 Introduction.

16.2 DATA.

16.3 Theoretical Background.

16.4 Methods for Inferring Evolutionary Trees.

16.5 Phylogenetic Networks.

16.6 Search Strategies.

16.7 Overview and Conclusions.

References.

**17 Evolutionary Quantitative Genetics.**

*B. Walsh.*

17.1 Introduction.

17.2 Selection Response Under the Infinitesimal Model.

17.3 Fitness.

17.4 Fitness Surfaces.

17.5 Measuring Multivariate Selection.

17.6 Multiple Trait Selection.

17.7 Phenotypic Evolution Models.

17.8 Theorems of Natural Selection: Fundamental and Otherwise.

17.9 Final Remarks.

Acknowledgments.

References.

**Part 4 ANIMAL AND PLANT BREEDING.**

**18 Quantitative Trait Loci in Inbred Lines.**

*R.C. Jansen.*

18.1 Introduction.

18.2 Segregation Analysis.

18.3 Dissecting Quantitative Variation With the Aid of Molecular Markers.

18.4 Qtl Detection Strategies.

18.5 Bibliographic Notes.

Acknowledgments.

References.

**19 Mapping Quantitative Trait Loci in Outbred Pedigrees.**

*I. H¨oschele.*

19.1 Introduction.

19.2 Linkage Mapping via Least Squares or Maximum Likelihood and Fixed Effects Models.

19.3 Linkage Mapping via Residual Maximum Likelihood and Random Effects Models.

19.4 Linkage Mapping via Bayesian Methodology.

19.5 Deterministic Haplotyping In Complex Pedigrees.

19.6 Genotype Sampling In Complex Pedigrees.

19.7 Fine Mapping of Quantitative Trait Loci.

19.8 Concluding Remarks.

Acknowledgments.

References.

**20 Inferences from Mixed Models in Quantitative Genetics.**

*D. Gianola.*

20.1 Introduction.

20.2 Landmarks.

20.3 Future Developments.

Acknowledgments.

References.

**21 Marker-assisted Selection and Introgression.**

*L. Moreau, F. Hospital and J. Whittaker.*

21.1 Introduction.

21.2 Marker-assisted Selection: Inbred Line Crosses.

21.3 Marker-assisted selection: outbred populations.

21.4 Marker-assisted Introgression.

21.5 Marker-assisted Gene Pyramiding.

21.6 Discussion.

Acknowledgments.

References.

**Reference Author Index.**

**Subject Index.**

*VOLUME 2.*

**List of Contributors.**

**Editor’s Preface to the Third Edition.**

**Glossary of Terms.**

**Abbreviations and Acronyms.**

**Part 5 POPULATION GENETICS.**

**22 Mathematical Models in Population Genetics.**

*C. Neuhauser.*

22.1 A Brief History of The Role of Selection.

22.2 Mutation, Random Genetic Drift, and Selection.

22.3 The Diffusion Approximation.

22.4 The Infinite Allele Model.

22.5 Other Models of Mutation and Selection.

22.6 Coalescent Theory.

22.7 Detecting Selection.

Acknowledgments.

References.

**23 Inference, Simulation and Enumeration of Genealogies.**

*C. Cannings and A. Thomas*

23.1 Genealogies as Graphs.

23.2 Relationships.

23.3 The Identity Process Along a Chromosome.

23.4 State Space Enumeration.

23.5 Marriage Node Graphs.

23.6 Moral Graphs.

References.

**24 Graphical Models in Genetics.**

*S.L. Lauritzen and N.S. Sheehan.*

24.1 Introduction.

24.2 Bayesian Networks and Other Graphical Models.

24.3 Representation of Pedigree Information.

24.4 Peeling and Related Algorithms.

24.5 Pedigree Analysis and Beyond.

24.6 Causal Inference.

24.7 Other Applications.

References.

**25 Coalescent Theory.**

*M. Nordborg.*

25.1 Introduction.

25.2 The coalescent.

25.3 Generalizing the Coalescent.

25.4 Geographical Structure.

25.5 Segregation.

25.6 Recombination.

25.7 Selection.

25.8 Neutral Mutations.

25.9 Conclusion.

Acknowledgments.

References.

**26 Inference Under the Coalescent.**

*M. Stephens.*

26.1 Introduction.

26.2 The Likelihood and the Coalescent.

26.3 Importance Sampling.

26.4 Markov Chain Monte Carlo.

26.5 Other Approaches.

26.6 Software and Web Resources.

Acknowledgments.

References.

**27 Linkage Disequilibrium, Recombination and Selection.**

*G. McVean.*

27.1 What Is Linkage Disequilibrium?

27.2 Measuring Linkage Disequilibrium.

27.3 Modelling LD and Genealogical History.

27.4 Inference.

27.5 Prospects.

Acknowledgments.

Related Chapters.

References.

**28 Inferences from Spatial Population Genetics.**

*F. Rousset.*

28.1 Introduction.

28.2 Neutral Models of Geographical Variation.

28.3 Methods of Inference.

28.4 Inference Under the Different Models.

28.5 Separation of Timescales.

28.6 Other Methods.

28.7 Integrating Statistical Techniques into the Analysis of Biological Processes.

Acknowledgments.

Related Chapters.

References.

Appendix A: Analysis of Variance and Probabilities of Identity.

Appendix B: Likelihood Analysis of the Island Model .

**29 Analysis of Population Subdivision.**

*L. Excoffier.*

29.1 Introduction.

29.2 The Fixation Index *F.*

29.3 Wright’s *F* Statistics in Hierarchic Subdivisions.

29.4 Analysis of Genetic Subdivision Under an Analysis of Variance Framework.

29.5 Relationship Between Different Definitions of Fixation Indexes.

29.6 *F* Statistics and Coalescence Times.

29.7 Analysis of Molecular Data: The Amova Framework.

29.8 Significance Testing.

29.9 Related and Remaining Problems.

Acknowledgments.

References.

**30 Conservation Genetics.**

*M.A. Beaumont.*

30.1 Introduction.

30.2 Estimating Effective Population Size.

30.3 Admixture.

30.4 Genotypic Modelling.

30.5 Relatedness and Pedigree Estimation.

Acknowledgments.

Related Chapters.

References.

**31 Human Genetic Diversity and its History.**

*G. Barbujani and L. Chikhi.*

31.1 Introduction.

31.2 Human Genetic Diversity: Historical Inferences.

31.3 Human Genetic Diversity: Geographical Structure.

31.4 Final Remarks.

Acknowledgments.

References.

**Part 6 GENETIC EPIDEMIOLOGY.**

**32 Epidemiology and Genetic Epidemiology.**

*P.R. Burton, J.M. Bowden and M.D. Tobin.*

32.1 Introduction.

32.2 Descriptive Epidemiology.

32.3 Descriptive Genetic Epidemiology.

32.4 Studies Investigating Specific Aetiological Determinants.

32.5 The Future.

Acknowledgments.

References.

**33 Linkage Analysis.**

*E.A. Thompson.*

33.1 Introduction.

33.2 The Early Years.

33.3 The Development of Human Genetic Linkage Analysis.

33.4 The Pedigree Years; Segregation and Linkage Analysis.

33.5 Likelihood and Location Score Computation.

33.6 Monte Carlo Multipoint Linkage Likelihoods.

33.7 Linkage Analysis of Complex Traits.

33.8 Map Estimation, Map Uncertainty, and The Meiosis Model.

33.9 The Future.

Acknowledgments.

References.

**34 Non-parametric Linkage.**

*P. Holmans.*

34.1 Introduction.

34.2 Pros and Cons of Model-free Methods.

34.3 Model-free Methods for Dichotomous Traits.

34.4 Model-free Methods for Analysing Quantitative Traits.

34.5 Conclusions.

Related Chapters.

References.

**35 Population Admixture and Stratification in Genetic Epidemiology.**

*P.M. McKeigue.*

35.1 Background.

35.2 Admixture Mapping.

35.3 Statistical Models.

35.4 Testing For Linkage With Locus Ancestry.

35.5 Conclusions.

References.

**36 Population Association.**

*D. Clayton.*

36.1 Introduction.

36.2 Measures of Association.

36.3 Case-Control Studies.

36.4 Tests For Association.

36.5 Logistic Regression And Log-Linear Models.

36.6 Stratification And Matching.

36.7 Unmeasured Confounding.

36.8 Multiple Alleles.

36.9 Multiple Loci.

36.10 Discussion.

Acknowledgments.

References.

**37 Whole Genome Association.**

*A.P. Morris and L.R. Cardon.*

37.1 Introduction.

37.2 Genotype Quality Control.

37.3 Single-Locus Analysis.

37.4 Population Structure.

37.5 Multi-Locus Analysis.

37.6 Epistasis.

37.7 Replication.

37.8 Prospects for Whole-Genome Association Studies.

References.

**38 Family-based Association.**

*F. Dudbridge.*

38.1 Introduction.

38.2 Transmission/Disequilibrium Test.

38.3 Logistic Regression Models.

38.4 Haplotype Analysis.

38.5 General Pedigree Structures.

38.6 Quantitative Traits.

38.7 Association in the Presence of Linkage.

38.8 Conclusions.

References.

**39 Cancer Genetics.**

*M.D. Teare.*

39.1 Introduction.

39.2 Armitage–Doll Models of Carcinogenesis.

Electronic Resources.

References.

**40 Epigenetics.**

*K.D. Siegmund and S. Lin.*

40.1 A Brief Introduction.

40.2 Technologies for CGI Methylation Interrogation.

40.3 Modeling Human Cell Populations.

40.4 Mixture Modeling.

40.5 Recapitulation of Tumor Progression Pathways.

40.6 Future Challenges.

Acknowledgments.

References.

**Part 7 SOCIAL AND ETHICAL ASPECTS.**

**41 Ethics Issues in Statistical Genetics.**

*R.E. Ashcroft.*

41.1 Introduction: Scope of This Chapter.

41.2 A Case Study in Ethical Regulation of Population Genetics Research: UK Biobank’s Ethics and Governance Framework.

41.3 Stewardship.

41.4 Wider Social Issues.

41.5 Conclusions.

Acknowledgments.

References.

**42 Insurance.**

*A.S. Macdonald.*

42.1 Principles of Insurance.

42.2 Actuarial Modelling.

42.3 Examples and Conclusions.

References.

**43 Forensics.**

*B.S. Weir.*

43.1 Introduction.

43.2 Principles of Interpretation.

43.3 Profile Probabilities.

43.4 Parentage Issues.

43.5 Identification of Remains.

43.6 Mixtures.

43.7 Sampling Issues.

43.8 Other Forensic Issues.

43.9 Conclusions.

References.

**Reference Author Index.**

**Subject Index.**

*Journal of Tropical Pediatrics*, February 2009)

"A treasure house of information on genetics and statistical methods … .All approaches are fully described and evaluated, which make this handbook a work of scholarship and a valuable resource." (*Journal of Tropical Pediatrics,* February 2009)

"This handbook may be highly recommended as a reference book and as a comprehensive opportunity to get a broader overview of new research areas in the field of statistical genetics. It will be useful source especially for research groups in this area." (*Biometrics,* September 2008)

"This highly recommended set is critical investment for academic and special libraries, given rapid developments in statistical genetics." (*American Reference Books* Annual, March 2008)