Skip to main content

Bioinformatics For Dummies, 2nd Edition

Jean-Michel Claverie, Cedric Notredame

ISBN: 978-0-470-08985-9

Dec 2006

464 pages

In Stock



Were you always curious about biology but were afraid to sit through long hours of dense reading? Did you like the subject when you were in high school but had other plans after you graduated? Now you can explore the human genome and analyze DNA without ever leaving your desktop!

Bioinformatics For Dummies is packed with valuable information that introduces you to this exciting new discipline. This easy-to-follow guide leads you step by step through every bioinformatics task that can be done over the Internet. Forget long equations, computer-geek gibberish, and installing bulky programs that slow down your computer. You’ll be amazed at all the things you can accomplish just by logging on and following these trusty directions. You get the tools you need to:

  • Analyze all types of sequences
  • Use all types of databases
  • Work with DNA and protein sequences
  • Conduct similarity searches
  • Build a multiple sequence alignment
  • Edit and publish alignments
  • Visualize protein 3-D structures
  • Construct phylogenetic trees

This up-to-date second edition includes newly created and popular databases and Internet programs as well as multiple new genomes. It provides tips for using servers and places to seek resources to find out about what’s going on in the bioinformatics world. Bioinformatics For Dummies will show you how to get the most out of your PC and the right Web tools so you’ll be searching databases and analyzing sequences like a pro!

Related Resources


Part I: Getting Started in Bioinformatics.

Chapter 1: Finding Out What Bioinformatics Can Do for You.

Chapter 2: How Most People Use Bioinformatics.

Part II: A Survival Guide to Bioinformatics.

Chapter 3: Using Nucleotide Sequence Databases.

Chapter 4: Using Protein and Specialized Sequence Databases.

Chapter 5: Working with a Single DNA Sequence.

Chapter 6: Working with a Single Protein Sequence.

Part III: Becoming a Pro in Sequence Analysis.

Chapter 7: Similarity Searches on Sequence Databases.

Chapter 8: Comparing Two Sequences.

Chapter 9: Building a Multiple Sequence Alignment.

Chapter 10: Editing and Publishing Alignments.

Part IV: Becoming a Specialist: Advanced Bioinformatics Techniques.

Chapter 11: Working with Protein 3-D Structures.

Chapter 12: Working with RNA.

Chapter 13: Building Phylogenetic Trees.

Part V: The Part of Tens.

Chapter 14: The Ten (Okay, Twelve) Commandments for Using Servers.

Chapter 15: Some Useful Bioinformatics Resources.


Test Questions
These test questions are in Microsoft Word format.
Chapter 1 PowerPoint files
These presentations are in Microsoft PowerPoint format. If you are unable to view PowerPoint files, you can download OpenOffice for free.

Chapter 2 PowerPoint files
Chapter 3 PowerPoint files
Chapter 4 PowerPoint files
Chapter 5 PowerPoint files
Chapter 6 PowerPoint files
Chapter 7 PowerPoint files
Chapter 8 PowerPoint files
Chapter 9 PowerPoint files
Chapter 10 PowerPoint files
Chapter 11 PowerPoint files
Chapter 12 PowerPoint files
Chapter 13 PowerPoint files
Chapter 14 PowerPoint files
Chapter 1 images
Chapter 2 images
Chapter 3 images
Chapter 4 images
Chapter 5 images
Chapter 6 images
Chapter 7 images
Chapter 8 images
Chapter 9 images
Chapter 10 images
Chapter 11 images
Chapter 12 images
Chapter 13 images

Bonus Material

For your convenience, we have listed the resources chapter by chapter, following the order in which they appear in the book. Along with the chapters the authors have provided images and diagrams used in the book. You may go to the corresponding chapter to download that specific chapter.
(All images are kept in .zip archives and are available on the download tab. You may download winzip a utility to open the archives.)


Chapter 1 Finding Out What Bioinformatics Can Do for You
Chapter 2 How Most People Use Bioinformatics
Chapter 3 Using Nucleotide Sequence Databases
Chapter 4 Using Protein and Specialized Sequence Databases
Chapter 5 Working with a Single DNA Sequence
Chapter 6 Working with a Single Protein Sequence
Chapter 7 Similarity Searches on Sequence Databases
Chapter 8 Comparing Two Sequences
Chapter 9 Building a Multiple Sequence Alignment
Chapter 10 Editing and Publishing Alignments
Chapter 11 Working with Protein 3-D Structures
Chapter 12 Working with RNA Structures
Chapter 13 Building Phylogenetic Trees
Chapter 15 Some Useful Bioinformatics Resources


Chapter 1: Finding Out What Bioinformatics Can Do for You

Beyond the book: Finding out about DNA chips and micro-arrays

Address Description A leading laboratory offering a complete "do-it-yourself" tutorial on micro-arrays A great resource from the U.S. National Institutes of Health The public repository for micro-array data from the European Bioinformatics Institute The leading company in DNA chips Nice pictures and animations from a leading provider of micro-array readers

Back to Menu


Chapter 2: How Most People Use Bioinformatics

The sites everybody should know about

Address Description The top site for bibliographic information in biomedical sciences The best starting point for finding out about proteins and their genes The US site of the joint international DNA sequence repository (GenBank) Its counterpart in Europe (EMBL) Its counterpart in Japan (DDBJ) The main site to compare your sequence with all others A user-friendly site for analyzing your protein sequence and trying your first multiple sequence alignment with CLUSTALW

Back to Menu


Chapter 3: Using DNA databases

A few places for finding genomic information

Address Description The US site of the joint international DNA sequence repository (GenBank) The Institute of Genomic Research: microbial genomics The place to find out about the human genome Another user-friendly human genome browser

Back to Menu


Chapter 4: Using Protein and Specialized Sequence Databases

The two main information resources about protein sequences

Address Description The Expasy/SWISS-PROT server The Protein Information Resource server

Some good places for refreshing your biochemistry

Address Description The glycan structure database The ultimate lipid database ChemIDplus: Identifying molecules by drawing them up!

The main resources for biochemical pathways and enzymes

Address Description Find which metabolic pathway a molecule belongs to. The famous Kyoto Encyclopedia of Genes and Genomes (KEGG). E.C. (Enzyme Codes) numbers or gene names are the best starting points for this resource.

The comprehensive enzyme information system BRENDA. The official site for enzyme nomenclature of the International Union of Biochemistry and Molecular Biology (IUBMB). The Encyclopedia of E. coli Genes and Metabolism. It is progressively extending to other bacteria.

Some great 3-D structure information resources

Address Description PDB, the official repository database for protein 3-D structures. MMDB, NCBI's database of macromolecular 3-D structures with visualization tools.

SCOP, a Structural Classification Of Proteins. CATH (Class, Architecture, Topology, Homologous superfamily), a hierarchical classification of protein structures. Swiss-Model, a fully automated protein structure homology-modeling server.

Some specialized protein databases

Address Description IMGT, the International Immunogenetics database, specializes in proteins involved in the immune response. Rebase, the reference restriction-modification enzyme database.

CAZy, an information resource on enzymes that degrade, modify, or create glycosidic bonds. MEROPS, a database specializing on proteases. PKR, the Protein Kinase Resource, focuses on the protein kinase family of enzymes. NRR, the Nuclear Receptor Resource, is a collection of individual databases on the steroid and thyroid hormone receptors. The Human Brain Database provides information on the proteins involved in neural processes, such as ion channels, membrane receptors of neurotransmitters and neuromodulators, as well as olfactory receptors (ORDB). The COG (Cluster of Orthologous Groups) database regroups proteins shared by at least three major phylogenetic lineages (ancient conserved domains).

Back to Menu


Chapter 5: Working with a Single DNA Sequence

Some sites for performing DNA analysis

Address Description
VecScreen_docs Screen your sequence for vector contamination Tools for detecting and masking repeats
Sites to compute restriction maps for your sequence Designing PCR primers Tools for various DNA composition analyses
Two sites for interactive dot-plot analysis A basic ORF finder
Gene prediction in prokaryotes using GeneMark
Various sites for predicting protein-coding genes in eukaryote DNA sequences For predicting complete gene structures from vertebrate DNA sequences A straightforward Web-service for small-scale gene assembly
Popular software for assembling and managing DNA sequences (you need to install them on your computer)
Main commercial sequence assembly software

Web Sites for searching motifs in DNA sequences

Address Description Search for potential transcriptional elements using the TRANSFAC database Search for transcriptional elements using the IMD database Predict putative eukaryotic promoter regions Detect distance correlations between sequence elements Detect regulatory signals in plant sequences Discover motifs in groups of related DNA or protein sequences Tools to analyze regulatory sequences

Back to Menu


Chapter 6 :Working with a Single Protein Sequence

The Main Domain Collections

Name Address Number of Domains Generation 616 Manual 7973 Manual 1900 Manual 736000 Manual 685 Manual 4852 Manual 2453 Manual 12542 Manual

Protein sequence analysis over the Internet

ExPASyPbilPIRCBSHitsInterProCD search
Name Site Description Proteins Proteins Proteins Proteins Proteins Domains Domains

Back to Menu


Chapter 7: Similarity Searches on Sequence Databases

A few BLAST and PSI-BLAST servers around the world

Country or Continent Program URL
Europe BLAST
Europe BLAST
Europe BLAST


Address Description The Home of WU-BLAST (no online server) Program Program Program Program

Alternative Methods for Homology Searches

Country/Continent Program Address

Back to Menu


Chapter 8: Comparing Two Sequences

Various flavors of dot-plot programs

Name Used For Range URL Platforms
Dotlet Proteins, DNA 10,000 All (Java)
Dnadot Proteins, DNA 100,000 All (Java)
Dotter Proteins, DNA 100,000 Unix, Linux, Windows
Dottup Complete genomes, DNA >100,000 Unix, Linux

Online pairwise alignment programs

Name Address Alignment Type
lalign Global/Local
lalign Global/Local
USC Global/Local
alion Global/Local
align Global/Local
align Global/Local
xenAliTwo Local for DNA
Blast2seqs Local BLAST
Protal2dna Protein against DNA
Pal2nal Prottein against DNA

Online pairwise alignment analyses

Name Address Function
lalnview Visualization
prss Evaluation
prss Evaluation
graph-align Evaluation

Back to Menu


Chapter 9: Building a Multiple Sequence Alignment

Application Procedure
Extrapolation A good multiple alignment can help convince you that an uncharacterized sequence is really a member of a protein family. Alignments that include SWISS-PROT sequences are the most informative. Use the ExPASyBLAST server (at to gather and align them.
Phylogenetic analysis If you carefully choose the sequences you include in your multiple alignment, you can reconstruct the history of these proteins. Use the Pasteur Phylip server at
Pattern identification By discovering very conserved positions, you can identify a region that is characteristic of a function (in proteins or in nucleic-acid sequences). Use the logo server for that purpose:
Domain identification It is possible to turn a multiple sequence alignment into a profile that describes a protein family or a protein domain (PSSM). You can use this profile to scan databases for new members of the family. Use NCBI-BLAST to produce and analyze PSSMs:
DNA regulatory elements You can turn a DNA multiple alignment of a binding site into a weight matrix and scan other DNA sequences for potentially similar binding sites. Use the Gibbs sampler to identify these sites:
Structure prediction A good multiple alignment can give you an almost perfect prediction of your protein secondary structure for both proteins and RNA. Sometimes it can also help in the building of a 3-D model.
nsSNP analysis Various gene alleles often have different amino-acid sequences. Multiple alignments can help you predict whether a Non-Synonymous Single-Nucleotide Polymorphism is likely to be harmful. See the SIFT site for more details:
PCR Analysis A good multiple alignment can help you identify the less-degenerated portions of a protein family, in order to fish out new members by PCR (polymerase chain reaction). If this is what you want to do, you can use the following site:

BLAST servers integrating multiple alignment methods

Address What You Can Do There Extract entire sequences,
Export sequences in FASTA,
Submit sequences to ClustalW, Tcoffee or MAFFT.
Turn the list of Hits into a non-redundant collection of sequences Extract entire sequences;
Extract sequence fragments;
Export sequences in FASTA;
Submit sequences to ClustalW Submit sequences to ClustalW

A List of ClustalW servers

Name Location URL
EBI Europe
EMBnet Europe
GenomeNet Japan
DDBJ Japan
Strasbourg Europe

Multiple Sequence Alignment Resources Over the Internet

Method Description Address
Tcoffee Accurate combination of sequences and structures
Probcons A Bayesian version of Tcoffee
MUSCLE A fast and accurate sequence cruncher
Kalign A fast sequence aligner
MAFFT A fast and accurate sequence cruncher using Fast Fourier Tranforms
Dialign Ideal for Sequences With Local Homology

Motif-finding methods available online

Method Address
Gibbs Sampler

Back to Menu


Chapter 10: Editing and Publishing Alignments

Packages for Editing Multiple Sequence Alignments

Name Address Description
Jalview Java package, available online
Kalignview Nice online alignment viewer
CINEMA A very complete Java package
Seaview A beautiful editor, very easy to install
Belvu Useful for removing redundancy
Bioedit Adapted for RNA
RALEE An RNA viewer
Review A very complete list of viewers

Extracting information from a multiple sequence alignment

in your multiple alignment
Name URL Description
Logo ,, Logos
Blocks Identifies blocks
Lama Compares your multiple alignment with the BLOCKs database
Amas Identifies important features in the multiple alignment

Multiple alignment beautifying tools

Name URL Description
ESPript A very powerful shading-and-coloring tool
Boxshade Shading in black and white
Mview Can process BLAST alignments

Back to Menu


Chapter 11: Working with Protein 3-D Structures

Predicting secondary structures

URL Description PsiPred for predicting protein secondary structures
PredictProtein for predicting protein secondary structures The Protein Database, containing every publicly available protein structure The NCBI section dedicated to structure analysis
Two very popular PDB viewers (you must install them on your machine)
Popular structure classification collections
Homology modeling
Threading sequences onto PDB structures ab-initio folding
MolMovDB Molecular dynamics Protein Interaction

Back to Menu


Chapter 12: Working with RNA

Hunting Micro RNAs (miRNAs) over the Web

Address Description An extensive collection of resources on silencing RNAs A database of all known human silencing RNAs The home of miRNAs at the Sanger Center in the UK. Probably one of the most extensive resources on micro-RNAs. A resource for predicting miRNAs using probabilistic methods. Prediction of the potential target of your miRNA on complete genomes. A resource for predicting the potential target of your miRNA on a user-provided genomic sequence. Runs your genomic sequence against an exhaustive database of miRNAs

Ribosomal RNA resources on the Internet

URL Description A European database on the larger of the two ribosomal subunits. It contains predicted structures. It is possible to query the database online. Features lots of online software. The other European database, this time dedicated to the small ribosomal subunit.

Some non-coding RNA resources

URL Description Dedicated to small non-coding RNAs. Dedicated to tRNAs. Dedicated to the untranslated regions of genes. Dedicated to the recently discovered tmRNA that are both transfer and messenger RNAs. (If you don't yet know what this is, you MUST take a look at this fascinating Web site!)

A list of generic RNA resources

URL Description A site dedicated to the detection of non-coding RNAs. RNA World, one of the most complete sites currently available. Another very complete list of sites.

Back to Menu


Chapter 13: Building Phylogenetic Trees

Online sites for making phylogenetic trees

Address Description You can use ClustalW to build multiple alignments and compute NJ trees. Remember: You cannot do both at the same time! The Genebee server can produce genuine phylogenetic trees in one step. Tcoffee computes a genuine NJ-phylogenetic tree in one step You can use Jalview to produce NJ trees. Its a very powerful tool that combines alignment editing with tree computation. A powerful method to compute maximum likelihood trees from Gascuel and his team. An interface to BioNJ, a novel NJ method. A powerful Java tool to gather members of a protein family and build the associated tree. A Web interface for Phylip. Very powerful interface for a new tree reconstruction method.

Generic phylogenetic resources on the Internet

Address Description Joe Felsenstein's pages, where Phylip lives; it's also one of the most extensive collections of resources available. Truly a legendary site! A very complete list of phylogeny resources. The home of PAUP, legendary phylogeny package using Parsimony. Although PAUP is a commercial package, its reasonably priced and worth every penny, according to specialists. The NCBI primer on phylogeny. A high-quality course on tree reconstruction methods.

Collections of Orthologous Sequences

Address Description Clusters of orthologous sequences maintained by the NCBI. Each cluster contains proteins from bacterial genomes. A collection of orthologous vertebrate genes. A collection of orthologous bacterial genes. Another collection of homologous sequences.,,
Three extensive collections of ribosomal RNA sequences, which are very useful for classifying new organisms, and come with appropriate phylogenetic tools.

Back to Menu


Chapter 15: Some Useful Bioinformatics Resources

Ten important bioinformatics databases

Name URL Description
GenBank/DDBJ/EMBL Nucleotide sequences
Ensembl Human/mouse genome
PubMed Literature references
NR Non redundant Protein sequences
SWISS-PROT Protein sequences
InterPro Protein domains
OMIM Genetic diseases
Enzymes Enzymes
PDB Protein structures
KEGG Metabolic pathways

Twelve important software programs in bioinformatics

Category Name URL Description
Database Search SRS Database search
  Entrez Database search (Chapter 3)
  BLAST Homology search (Chapter 7)
  DALI Structure database search (Chapter 11)
Multiple alignment ClustalW Multiple sequence alignment (Chapter 9)
  MUSCLE Multiple sequence alignment (Chapter 9)
  Tcoffee Multiple Sequence Alignment (Chapter 9)
Prediction GenScan Gene prediction (Chapter 5)
  PsiPred Protein structure prediction (Chapter 11)
  Mfold RNA structure prediction (Chapter 12)
Phylogenetics Phylip Tree reconstruction (Chapter 13)
  PhyML Tree reconstruction (Chapter 13)
Edition/Visualization Jalview Alignment editor (Chapter 10)
  Logos A MSA Visualization Tool (Chapter 10).
  Trees Tree Visualization (Chapter 13).
  Rasmol Structure visualization (Chapter 11)

Ten bioinformatics resource locators

Name Address Description
ExPASy Dedicated to proteins
ArrayExpress DNA chips
Swbic Miscellaneous links
Pasteur Miscellaneous links; many online tools
RNA World RNA-related links
miRNAs Extensive Resources on miRNA
Phylip Everything on phylogeny
NCBI primers Very good primers on many subjects
Bielefeld Awesome online course
Bio-informer The EBI online news
Coffee Corner NCBI Online News.

Ten Places to Go Farther

Name Address Description
Nucleic Acid Research Once a year, NAR publishes both a database issue and Web-server issue. These are available for free -- and contain the state of the art in bioinformatics.
Nucleic Acid Research Bioinformatics contains articles describing the most recent methods in bioinformatics.
Nucleic Acid Research An exhaustive list of major conferences in the field of bioinformatics, provided by the International Society For Computational Biology.

Back to Menu