| |
INTRODUCTION
For many years, sequencing the human genome was seen as the Holy Grail of biological
science and medicine. But the simultaneous completion of a draft sequence by two
groups, one public and one private, in February of 2001 ushered in a new “post-genomic”
era as scientists began to contemplate the next step in understanding genomic
function.
The sequencing all the nucleotides in the human genome does not mean that
everything is known about our DNA. Although approximately 30,000-40,000 genes
have been identified within our 23 pairs of chromosomes, the functions of only
about half of them have been identified. Perhaps more surprisingly, all those
genes account for only about 2% of the DNA in our cells. Some of the other 98%
is involved in regulating the expression of genes, that is, turning them on and
off as necessary. However, most of the genome consists of stretches of DNA that
do not appear to be needed for anything. These sequences may be the inactive remnants
of genes that have accumulated over evolutionary timescales, that is, genes discarded
by the primitive creatures from which we evolved. Or, these sequences may be involved
in some very intricate mechanism of genomic function and gene regulation that
we have yet to discern. Clearly there is much we do not understand about this
so-called junk DNA; this and other mysteries of our genome will keep scientists
busy for decades to come.
Nevertheless, while the research into gene regulation and chromosomal structure
continues, other avenues of research have branched off to investigate the end
products of gene expression—the proteins of the cell. DNA is the blueprint
of life, no question, but the proteins it encodes are the real cellular workhorses.
Structural proteins, such as the collagen found in tendons and ligaments, and
the keratin found in hair and nails, provide our bodies with strength and shape.
Enzymes are catalytic proteins that facilitate the biochemical reactions of our
metabolism. The antibodies produced by the immune system are proteins, as well
as many hormones, muscle fibers, and the hemoglobin that carries oxygen from the
lungs to tissues throughout the body. The various types of proteins in our cells
are essential for life.
Human cells have 100,000 or more individual proteins, and therefore scientists
were surprised to find that the human genome appears to house only about 40,000
genes. How can this be, when every protein sequence is necessarily encoded by
our genes? A new science of proteins, called proteomics, has recently blossomed
to answer this and other questions about how all the proteins in our cells function
to sustain life. The study of individual proteins has been the bread-and-butter
of biochemistry research for many years, but the focus of proteomics is a bit
different. Proteomics is aimed at examining the whole set of proteins contained
by a cell at any one time. This set of proteins is known as the proteome.
As daunting as it has been (and continues to be) to study the genome of a cell,
it will be that much more difficult to study the cell's proteome. Whereas DNA
is assembled from four nucleotide building blocks, commonly called G, A, T, and
C, there are 20 amino acids that are used to construct a protein. Also, the shapes
of proteins are more complicated than the shape of DNA as well. DNA twists into
a spiral-shaped “double helix” independent of its sequence, but the
shape of each folded protein is unique. Since a protein's shape is critical to
its function, understanding the three-dimensional structures of proteins is important
to proteomics scientists.
Another level of complexity in studying proteomes has to do with how much
the proteome itself can vary, even from cell to cell within an organism. Although
each cell has an identical copy of the genome, the differences between cells reflects
which genes are actually expressed. In other words, each type of cell has a different
proteome, or set of proteins, that gives that cell its unique characteristics.
This is further complicated by the fact that even within a single cell, a proteome
can change over time as the cell develops and matures or becomes diseased (which
is of particular interest to medical researchers and pharmaceutical companies).
DEFINING THE PROTEOME: PROTEIN PROFILING
To
get a better grasp on the functions of cellular proteins, proteomics scientists
are focusing their efforts in three major areas: identifying proteins, predicting
their structures, and understanding how proteins interact. The task of determining
which proteins make up a given proteome is often referred to as “protein
profiling.” This is not as simple as just looking at the organism’s
genomic sequences and identifying the open reading frames. Even today’s
powerful genome-scanning computer algorithms are not perfect at detecting genes
for very small (but biologically important) proteins. Furthermore, not all proteins
are synthesized in every cell, and some proteins are produced in great amounts
while others are rare.
To complicate matters, a gene often serves as a blueprint for more than one
protein. Sometimes several different protein-encoding mRNAs are created from one
gene through alternative splicing mechanisms. It is also possible for newly made
polypeptide strands to be cleaved and rejoined in different ways as they fold
into three-dimensional proteins. And once made, many proteins undergo further
modifications in the cell; chemical groups (such as phosphate or methyl groups)
and biological molecules (such as fats or sugars) can be covalently attached to
the protein. Because of such alterations, some scientists believe that the human
genome can potentially express close to one million different proteins.
Examining proteins in a cell has traditionally meant laborious experiments
using one- or two-dimensional gel electrophoresis to isolate individual proteins,
followed by chemical sequencing. But such processes are both time-consuming and
expensive and are likely to miss very small proteins as well as proteins present
in tiny amounts. Because effective proteomics research means identifying thousands
of proteins quickly and accurately, scientists are beginning to look at new techniques
that will provide them with the data they need. One trend, still in its infancy,
is to move to mass spectrometry (where electron beams pry proteins apart, and
the fragments are identified by their mass). Already, peptide sequences that took
hours to determine chemically can now be read in seconds. In addition to the time
savings, such systems are both extremely sensitive to rare proteins, and can be
more readily automated.
Scientists hope that comparing the proteomes of different types of cells may
provide insight into how the genome is utilized by specific tissues. In addition,
differences in the proteomes healthy and abnormal cells can help pinpoint the
causative factor in disease, leading to diagnostic and hopefully therapeutic advances
in the form of rationally designed drugs.
APPLYING THE LESSONS OF THE PROTEOME: RATIONAL DRUG DESIGN
 |
| Drug targetting: Rational drug design of a chemotherapeutic
agent |
The best drugs are those that can perform the desired function at the lowest
dose possible, with the fewest side effects. Rational drug design depends on identifying
a biomolecule (such as a protein) that causes disease and then tailoring a drug
to alter or inhibit the function of that protein (see figure at right).
After a particular protein has been implicated in causing a disease, it is
studied in detail. Information about the three-dimensional shape of the protein
is used to design drugs that can specifically inhibit the function of the target
protein and thus halt progress of the disease.
The figure at right describes one scenario for the rational design of a chemotherapeutic
agent. First, tissue from a healthy brain and from a cancerous brain tumor are
collected, and the proteins from each sample are extracted. The proteomes of both
samples are then analyzed via two-dimensional gel electrophoresis, which separates
the proteins by size in one dimension, and by electrical charge in the second
dimension. Comparison of the resulting pattern of protein spots results in the
identification of a protein that is solely present or present to a much larger
degree in the cancerous tissue. This protein is carefully collected and purified,
and its three-dimensional structure is then determined via X-ray crystallography.
The structural information is used to design compounds that will bind to the protein’s
active site. In the final steps of the design, the most promising compounds are
synthesized and tested to ensure that they have the desired effect of halting
the growth of the cancerous tissue, without unacceptable side effects.
PREDICTING PROTEIN STRUCTURE
A second focus of proteomics research is protein structure determination.
Knowing the folded conformation of a protein is important because a protein’s
function depends largely on its shape. Therefore, in order to understand how all
the proteins in a given proteome are able to do their jobs in the cell, scientists
need fast, reliable ways to determine protein shape. X-Ray crystallography has
long been the standard method for doing this, but it continues to be a difficult
and time-consuming process, and not all proteins crystallize well. But since many
thousands of protein structures are already known, it is becoming possible to
use this information to develop methods for predicting the three-dimensional shapes
of proteins from their amino acid sequences.
|
|
Three dimensional structure
of a protein whose activity is implicated in certain cancers.
At right is a close-up view of the structure with a bound inhibitory drug (potential
chemotherepeutic). |
Although this branch of proteomics holds great promise, the computer algorithms
that have been developed are not yet sophisticated enough to determine the shape
of a protein with great accuracy. Still, pharmaceutical companies continue to
be keenly interested in the advances in computer modeling of proteins, since rational
drug design requires knowledge of the three-dimensional shape of the protein of
interest.
NO PROTEIN IS AN ISLAND: PROTEIN NETWORKS
 |
| Isolation of protein complexes |
A third area of proteomics research is focused on determining how proteins
work in networks. Scientists have known for years that some proteins interact
with others in the cell, joining forces to get a particular job done. However,
the extent of this kind of protein cooperation wasn’t well understood until
two studies published early in 2002 gave scientists a better feel for the importance
of cellular protein interactions. In separate studies using the yeast Saccharomyces
cerevisiae, researchers used some of the yeast proteins as “bait”
to see what other proteins they could fish out (see figure at right).
In this approach, genetic engineering techniques are used to place a molecular
“tag” on a “bait” protein synthesized by growing yeast
cells. The yeast cells are then harvested and gently broken open, and the cellular
material is poured over a special column designed to catch the tagged bait protein.
Any proteins associated with the tagged protein are also retained in clusters,
while unassociated proteins are rinsed away. The protein cluster is then collected,
broken apart into individual proteins on a denaturing gel, and each protein strand
is sequenced by mass spectrometry. Proteins in the cluster can then be identified
by matching their protein sequence to protein sequences in public databases (a
process that relies on the science of bioinformatics).
Such “protein fishing” experiments were repeated many times, with
hundreds of different yeast proteins used as “bait.” When the proteins
caught by the column were examined, it was found that they were usually attached
to one or more other proteins. Scientists now believe that at least 80% of proteins
interact with other proteins to form complexes. Even more intriguing, many proteins
are found in more than one protein complex. This suggests that the regulation
of cellular activities depends not just on signal cascades (with one protein activating
the next, and so on), but on an intricate network of protein interactions that
works as a system of checks and balances to keep the cell running smoothly. Compiling
the data from many “protein fishing” experiments will allow scientists
to construct detailed maps of all the interactions in a proteome.
Approaches such as these are not perfect, for some well-known protein interactions
are frequently not detected, and false-positive interactions are common. However,
these forays into examining the cellular machinery as a whole (rather than studying
isolated proteins), are furthering our understanding of life as a complex system.
The data generated by these types of experiments will allow scientists to clarify
the roles of proteins involved in metabolic pathways, regulatory cascades, and
other functions critical for cell survival.
THE FUTURE OF PROTEOMICS
In this postgenomic era, the new field of proteomics shows great promise in
revolutionizing our understanding of biological processes. But it also faces daunting
technical challenges to fulfill that promise. New technologies to rapidly sequence
proteins, to determine the cellular locations of proteins, and to analyze genomic
and protein sequencing data are still being developed and improved. Multidisciplinary
collaborations will also need to be forged between computer scientists, geneticists,
and protein chemists, since proteomics encompasses expertise in multiple fields.
But if preliminary results are any indication, all this effort will yield rich
rewards.
For example, in a small but highly promising study from the National Cancer
Institute published early in 2002, analysis of the blood proteins of women with
and without ovarian cancer allowed researchers to correctly identify each woman
who had the disease. This news was especially welcome because ovarian cancer is
often not detected until the late stages of the disease, when chances for a cure
are small. The power of proteomics may help medical researchers develop simple
blood tests for other difficult-to-detect cancers, giving doctors and patients
the one thing they need most: time for early detection and an early cure.
Thus, proteomics is likely to be a part of standard medical diagnostics in
the future. And perhaps many diseases will soon be treated with efficacious drugs
developed through rational design with the aid of proteomics.

|