Exploration and Analysis of DNA Microarray and Protein Array Data
Genomics is one of the major scientific revolutions of this century, and the use of microarrays to rapidly analyze numerous DNA samples has enabled scientists to make sense of mountains of genomic data through statistical analysis. Today, microarrays are being used in biomedical research to study such vital areas as a drug’s therapeutic value–or toxicity–and cancer-spreading patterns of gene activity.
Exploration and Analysis of DNA Microarray and Protein Array Data answers the need for a comprehensive, cutting-edge overview of this important and emerging field. The authors, seasoned researchers with extensive experience in both industry and academia, effectively outline all phases of this revolutionary analytical technique, from the preprocessing to the analysis stage.
Highlights of the text include:
- A review of basic molecular biology, followed by an introduction to microarrays and their preparation
- Chapters on processing scanned images and preprocessing microarray data
- Methods for identifying differentially expressed genes in comparative microarray experiments
- Discussions of gene and sample clustering and class prediction
- Extension of analysis methods to protein array data
Numerous exercises for self-study as well as data sets and a useful collection of computational tools on the authors’ Web site make this important text a valuable resource for both students and professionals in the field.
1 A Brief Introduction.
1.1 A Note on Exploratory Data Analysis.
1.2 Computing Considerations and Software.
1.3 A Brief Outline of the Book.
2 Genomics Basics.
2.3 Gene Expression.
2.4 Hybridization Assays and Other Laboratory Techniques.
2.5 The Human Genome.
2.6 Genome Variations and Their Consequences.
2.8 The Role of Genomics in Pharmaceutical Research.
3.1 Types of Microarray Experiments.
3.1.1 Experiment Type 1: Tissue-Specific Gene Expression.
3.1.2 Experiment Type 2: Developmental Genetics.
3.1.3 Experiment Type 3: Genetic Diseases.
3.1.4 Experiment Type 4: Complex Diseases.
3.1.5 Experiment Type 5: Pharmacological Agents.
3.1.6 Experiment Type 6: Plant Breeding.
3.1.7 Experiment Type 7: Environmental Monitoring.
3.2 A Very Simple Hypothetical Microarray Experiment.
3.3 A Typical Microarray Experiment.
3.3.1 Microarray Preparation.
3.3.2 Sample Preparation.
3.3.3 The Hybridization Step.
3.3.4 Scanning the Microarray.
3.3.5 Interpreting the Scanned Image.
3.4 Multichannel cDNA Microarrays.
3.5 Oligonucleotide Arrays.
3.6 Bead-Based Arrays.
3.7 Confirmation of Microarray Results.
Supplementary Reading and Electronic References.
4 Processing the Scanned Image.
4.1 Converting the Scanned Image to the Spotted Image.
4.2 Quality Assessment.
4.2.1 Visualizing the Spotted Image.
4.2.2 Numerical Evaluation of Array Quality.
4.2.3 Spatial Problems.
4.2.4 Spatial Randomness.
4.2.5 Quality Control of Arrays.
4.2.6 Assessment of Spot Quality.
4.3 Adjusting for Background.
4.3.1 Estimating the Background.
4.3.2 Adjusting for the Estimated Background.
4.4 Expression Level Calculation for Two-Channel cDNA Microarrays.
4.5 Expression Level Calculation for Oligonucleotide Arrays.
4.5.1 The Average Difference.
4.5.2 A Weighted Average Difference.
4.5.3 Perfect Matches Only.
4.5.4 Background Adjustment Approach.
4.5.5 Model-Based Approach.
4.5.6 Absent-Present Calls.
5 Preprocessing Microarray Data.
5.1 Logarithmic Transformation.
5.2 Variance Stabilizing Transformations.
5.3 Sources of Bias.
5.5 Intensity-Dependent Normalization.
5.5.1 Smooth Function Normalization.
5.5.2 Quantile Normalization.
5.5.3 Normalization of Oligonucleotide Arrays.
5.5.4 Normalization of Two-Channel Arrays.
5.5.5 Spatial Normalization.
5.5.6 Stagewise Normalization.
5.6 Judging the Success of a Normalization.
5.7 Outlier Identification.
5.7.1 Nonresistant Rules for Outlier Identification.
5.7.2 Resistant Rules for Outlier Identification.
5.8 Assessing Replicate Array Quality.
6.2 Technical Replicates.
6.3 Biological Replicates.
6.4 Experiments with Both Technical and Biological Replicates.
6.5 Multiple Oligonucleotide Arrays.
6.6 Estimating Fold Change in Two-Channel Experiments.
6.7 Bayes Estimation of Fold Change.
7 Two-Group Comparative Experiments.
7.1 Basics of Statistical Hypothesis Testing.
7.2 Fold Changes.
7.3 The Two-Sample t Test.
7.4 Diagnostic Checks.
7.5 Robust t Tests.
7.6 Randomization Tests.
7.7 The Mann–Whitney–Wilcoxon Rank Sum Test.
7.8.1 A Pragmatic Approach to the Issue of Multiplicity.
7.8.2 Simple Multiplicity Adjustments.
7.8.3 Sequential Multiplicity Adjustments.
7.9 The False Discovery Rate.
7.9.1 The Positive False Discovery Rate.
7.10 Small Variance-Adjusted t Tests and SAM.
7.10.1 Modifying the t Statistic.
7.10.2 Assesing Significance with the SAM t Statistic.
7.10.3 Strategies for Using SAM.
7.10.4 An Empirical Bayes Framework.
7.10.5 Understanding the SAM Adjustment.
7.11 Conditional t.
7.12 Borrowing Strength across Genes.
7.12.1 Simple Methods.
7.12.2 A Bayesian Model.
7.13 Two-Channel Experiments.
7.13.1 The Paired Sample t Test and SAM.
7.13.2 Borrowing Strength via Hierarchical Modeling.
8 Model-Based Inference and Experimental Design Considerations.
8.1 The F Test.
8.2 The Basic Linear Model.
8.3 Fitting the Model in Two Stages.
8.4 Multichannel Experiments.
8.5 Experimental Design Considerations.
8.5.1 Comparing Two Varieties with Two-Channel Microarrays.
8.5.2 Comparing Multiple Varieties with Two-Channel Microarrays.
8.5.3 Single-Channel Microarray Experiments.
8.6 Miscellaneous Issues.
9 Pattern Discovery.
9.1 Initial Considerations.
9.2 Cluster Analysis.
9.2.1 Dissimilarity Measures and Similarity Measures.
9.2.2 Guilt by Association.
9.2.3 Hierarchical Clustering.
9.2.4 Partitioning Methods.
9.2.5 Model-Based Clustering.
9.2.6 Chinese Restaurant Clustering.
9.3 Seeking Patterns Visually.
9.3.1 Principal Components Analysis.
9.3.2 Factor Analysis.
9.3.4 Spectral Map Analysis.
9.3.5 Multidimensional Scaling.
9.3.6 Projection Pursuit.
9.3.7 Data Visualization with the Grand Tour and Projection Pursuit.
9.4 Two-Way Clustering.
9.4.1 Block Clustering.
9.4.2 Gene Shaving.
9.4.3 The Plaid Model.
10 Class Prediction.
10.1 Initial Considerations.
10.1.1 Misclassification Rates.
10.1.2 Reducing the Number of Classifiers.
10.2 Linear Discriminant Analysis.
10.3 Extensions of Fisher’s LDA.
10.4 Nearest Neighbors.
10.5 Recursive Partitioning.
10.5.1 Classification Trees.
10.5.2 Activity Region Finding.
10.6 Neural Networks.
10.7 Support Vector Machines.
10.8 Integration of Genomic Information.
10.8.1 Integration of Gene Expression Data and Molecular Structure Data.
10.8.2 Pathway Inference.
11 Protein Arrays.
11.2 Protein Array Experiments.
11.3 Special Issues with Protein Arrays.
11.5 Using Antibody Antigen Arrays to Measure Protein Concentrations.
JAVIER CABRERA, PhD, is an Associate Professor in the Department of Statistics at Rutgers University. He has a doctorate in statistics from Princeton University and has over fifty publications in applied statistics. His research interests include DNA microarray, data mining of biopharmaceutical databases, computer vision, statistical computing and graphics, robustness, and biostatistics.
"…an extensive overview of current microarray data analysis…" (Clinical Chemistry, November 2004)
"The book would be useful to anyone studying or working with the DNA and protein arrays." (Annals of Biomedical Engineering, November 2004)
“...presents an extensive series of computational, visual, and statistical tools that are being used for exploring and analyzing microarray data...” (Quarterly of Applied Mathematics, Vol. LXII, No. 1, March 2004)
“...outlines methodologies for analyzing DNA microarrays and protein array data for industrial and academic applications...” (Genetic Engineering News, March 15, 2004)