GENOME DIVERSIFICATION AND THE TREE OF LIFE

The success of living organisms based on DNA, RNA, and protein has been spectacular. Through its billions of years of proliferation, life has populated the oceans, covered the land, penetrated deep into Earth’s crust, and molded the surface of our planet. Our oxygen-rich atmosphere, the deposits of coal and oil, the layers of iron ores, the cliffs of chalk and limestone and marble—all these are products, directly or indirectly, of past biological activity on Earth.

Living things are not confined to the familiar temperate realm of land, water, and sunlight inhabited by plants and animals. They are found in the darkest depths of the ocean, in hot volcanic mud, in pools beneath the frozen surface of the Antarctic, and buried kilometers deep in Earth’s crust. The creatures that live in these extreme environments are generally unfamiliar, not only because they are inaccessible, but also because they are mostly microscopic and cannot be maintained in a laboratory. Even in more familiar habitats, most organisms are too small for us to see without special equipment: they tend to go unnoticed, unless they cause a disease or rot the timbers of our houses. Yet such microorganisms (microbes) are by far the most numerous living organisms on our planet. Only recently, through new methods of molecular analysis including rapid DNA sequencing, have we begun to get a picture of life on Earth that is not grossly distorted by our biased perspective as large animals living on dry land.

In this section, we consider—in very broad terms—the diversity of organisms on our planet and the relationships among them. Because the genetic information for every organism is written in the universal language of DNA sequences, and because the DNA sequence of any organism’s genome can be readily determined, it is now possible to characterize, catalog, and compare any set of living organisms with reference to these sequences. From such comparisons we can specify the place of each organism in the family tree of living species—the “tree of life.”

The Tree of Life Has Three Major Domains: Eukaryotes, Bacteria, and Archaea

The classification of living things has traditionally depended on comparisons of their outward appearances: we can see that a fish has eyes, jaws, backbone, brain, and so on, just as humans do, and that a worm does not—just as we can see that a rosebush is more similar to an apple tree than to grass. As Darwin showed, we can readily interpret such close family resemblances in terms of an evolution from common ancestors, and we can find the remains of many of these ancestors preserved in the fossil record. In this way, it became possible to draw a family tree of living organisms, showing the various lines of descent, as well as branch points in evolutionary history, where the ancestors of one group of species became different from those of another.

When the disparities between organisms become very great, however, these methods begin to fail. How do we decide whether a fungus is more closely related to a plant or to an animal? When it comes to microscopic organisms such as bacteria, the task becomes harder still: one tiny rod or sphere can look much like another. Moreover, much of our knowledge of the microbial world was traditionally restricted to those species that can be isolated and cultured in the laboratory. But direct DNA sequencing of populations of microbes in their natural habitats—such as soil, ocean water, or even the human mouth—has taught us that the vast majority of microbes cannot be easily cultured in the laboratory. Often, they thrive in the wild as components of complex ecosystems and—when separated from their natural surroundings—cannot survive. Until modern DNA sequencing was developed, these organisms were largely unknown to us, especially those that inhabit extreme environments such as the deep Earth’s crust or seawater miles below the ocean surface.

Genome analysis has now provided us with a simple, direct, and powerful way to determine evolutionary relationships. The complete DNA sequence of an organism defines its nature with almost perfect precision and in exhaustive detail. Moreover, this specification is in a digital form—a string of letters—that can be entered into a computer and compared with the corresponding information for any other organism. Because DNA is subject to random changes that accumulate over long periods of time (as we will see shortly), the number of differences between the DNA sequences of two organisms can provide a direct, objective, and quantitative indication of the evolutionary distance between them.

For constructing a comprehensive tree of life, it is necessary to begin with a segment of DNA that is easily recognized in the genomes of all organisms. We discussed earlier how all cells use the same fundamental mechanism to translate a nucleotide sequence into a protein sequence, and we saw that the ribosome is the “decoding machine” that carries this out. Ribosomes are fundamentally similar in all organisms, and an especially well-conserved component of them is the RNA molecules that make up their core. Although the exact sequence of these ribosomal RNAs (rRNAs) differs across organisms, they are similar enough to use them as a ruler to judge how closely two species are related: the more similar the ribosomal RNA sequences, the more recently the two species diverged from a common ancestor and the more related they must be. Once a rough approximation of the tree of life has been obtained in this way, many additional DNA sequences in genomes—those that might not be identifiable in all organisms—can accurately determine relationships among more closely related species.

This approach has revealed that the living world consists of three major divisions, or domains: eukaryotes, bacteria, and archaea, as illustrated in Figure 1–9; in the following paragraphs, we briefly introduce each in turn.

Figure 1–9 A global tree of life, based on genome comparisons, shows the three major divisions (domains) of the living world. The lengths of the branches are proportional to differences among genomes using common genes that can be recognized and compared across many different species. Some of the organisms discussed in this and later chapters are indicated. Of the three domains of life (bacteria, archaea, and eukaryotes), bacteria encompass by far the greatest diversity, commensurate with their ability to colonize nearly every ecological niche on the planet. So many new bacterial species are currently being identified through DNA sequencing of environmental samples that simply naming them has become a challenge. Although eukaryotes (and especially animals) are the main focus of this book, they comprise only a small slice of the global diversity. An expanded eukaryotic tree is shown in Figure 1–35, and an expanded tree of mammals is given in Figure 4–67. (Adapted from C.J. Castelle and J.F. Banfield, Cell 172:1181–1197, 2018.)

Eukaryotes Make Up the Domain of Life That Is Most Familiar to Us

The great variety of living creatures that we see around us are eukaryotes. The name is from the Greek, meaning “truly nucleated” (from the words eu, “well” or “truly,” and karyon, “kernel” or “nucleus”), reflecting the fact that the cells of these organisms have their DNA enclosed in a membrane-bound organelle called the nucleus. Visible by simple light microscopy, this feature was used in the early twentieth century to classify living organisms as either eukaryotes (those with a nucleus) or prokaryotes (those without a nucleus). We now know that prokaryotes comprise two of the three major domains of life, the bacteria and archaea. Eukaryotic cells are typically much larger than those of bacteria and archaea; in addition to a nucleus, they typically contain a variety of membrane-bound organelles that are also lacking in the prokaryotes. The genomes of eukaryotes also tend to run much larger—containing more than 20,000 genes for humans and corals, for example, compared with 4000–6000 genes for the typical bacteria or archaea.

In addition to plants and animals, the eukaryotes include fungi (such as mushrooms or the yeasts used in beer- and bread-making), as well as an astonishing variety of single-celled, microscopic forms of life. Most of this book is focused on the cell biology of eukaryotic organisms (especially animals); in the final sections of this chapter, we shall return to eukaryotes and focus on the variety within this group.

On the Basis of Genome Analysis, Bacteria Are the Most Diverse Group of Organisms on the Planet

When modern trees of life were constructed using genome information, one of the big surprises was how much more evolutionarily diverse the bacterial world is compared with the eukaryotes; we now know that this great diversity reflects the much earlier appearance of bacteria in the evolutionary history of the planet. Bacteria are usually very small (and invisible to the unaided eye), and they generally live as independent individuals or in loosely organized communities, rather than as multicellular organisms. They are typically spherical or rod-shaped and measure a few micrometers (μm) in linear dimension (Figure 1–10). They often have a tough protective coat, called a cell wall, beneath which a plasma membrane encloses a single cytoplasmic compartment—the cytoplasm—containing DNA, RNA, proteins, and the many small molecules needed for life (Figure 1–11). Although difficult to discern in the light microscope, the interior of a bacterium is nevertheless highly organized, a topic we discuss in Chapter 16.

Figure 1–10 Shapes and sizes of some bacteria. Although most are small, as shown, measuring a few micrometers in linear dimension, there are also some giant species. An extreme example is the cigar-shaped bacterium Epulopiscium fishelsoni, which lives in the gut of a surgeonfish and can be up to 600 μm long (not shown).
Figure 1–11 Bacterial structure. (A) A drawing of the bacterium Vibrio cholerae, showing its simple internal organization. This species can infect the human small intestine to cause cholera; the severe diarrhea that accompanies this disease kills more than 100,000 people a year worldwide. Like many other bacteria, Vibrio has a helical appendage at one end—a flagellum—that rotates as a propeller to drive the cell forward. (B) An electron micrograph of a longitudinal section through the widely studied bacterium Escherichia coli (E. coli). E. coli is part of our normal intestinal microbiota, the complete collection of microbes in our gut. It has many flagella distributed over its surface, but they are not visible in this section. Both of the bacteria shown here are Gram negative, having both an outer and an inner (plasma) membrane. However, many bacterial species lack the outer membrane; these are classified as Gram positive. (B, courtesy of E. Kellenberger.)

Commensurate with the diversity of their genomes, bacteria live in an enormous variety of ecological niches, and they are astonishingly varied in their biochemical capabilities. There exist species that can utilize virtually any type of organic molecule as food, ranging from sugars and amino acids to hydrocarbons, including the simplest hydrocarbon, methane gas (CH4). Other species (Figure 1–12) harvest light energy in a variety of ways; some, like plants, carry out photosynthesis and generate oxygen as a by-product. Still others can feed on a plain diet of inorganic nutrients, getting their carbon from CO2, and relying on a host of other chemicals that occur in the environment to fuel their energy needs—including H2, Fe2+, H2S, and elemental sulfur (Figure 1–13).

Figure 1–12 Photosynthetic bacteria. (A) A light micrograph of the bacterium Anabaena cylindrica. Its cells form long chains, in which most of the cells (labeled V) perform photosynthesis (and thereby capture CO2 and incorporate C into organic compounds); others (labeled H) become specialized for fixing N from N2; and still others (labeled S) develop into spores, which can resist unfavorable conditions. (B) An electron micrograph of a related photosynthetic bacterium, Phormidium laminosum, which shows the intracellular membranes where photosynthesis occurs. As shown in these micrographs, some prokaryotes have intracellular membranes and form colonies that resemble simple multicellular organisms. (A, courtesy of David Adams; B, courtesy of D.P. Hill and C.J. Howe.)
Figure 1–13 The bacterium Beggiatoa. It lives in sulfurous environments (for example, see Figure 1–15) and gets its energy by oxidizing H2S; it can fix carbon even in the dark. Note the yellow deposits of sulfur inside the cells. (Courtesy of Ralph S. Wolfe.)

A wide range of bacteria directly affect human health. The bubonic plague of the Middle Ages (estimated to have killed half the population of Europe) and the current tuberculosis pandemic (more than a million deaths a year) are each due to a specific species of bacteria. And thousands of different bacterial species reside in our gut and on our skin, where they are often beneficial to us. We shall discuss bacteria throughout the book, as it is the study of these relatively simple cells that led to much of our understanding of basic biological processes—including DNA replication, transcription, and translation. We focus again on bacteria in Chapter 24 when we examine the cell biology of infectious disease. Finally, genetic engineering techniques allow bacteria to be put to use as small “factories” to produce human pharmaceuticals, biofuels, and other high-value chemical products, as we discuss in Chapter 8.

Archaea: The Most Mysterious Domain of Life

Of the three domains of life, archaea remains the most poorly understood. Most of its members have been identified only by DNA sequencing of samples from the environment, and relatively few have been cultured and studied up close in the laboratory. Like bacteria, the archaea we know most about are small and lack the internal, membrane-bound organelles that distinguish the eukaryotes. But they differ from bacteria in many ways, including the chemistry of their cell walls, the kinds of lipids that make up their membrane, and the range of biochemical reactions that they can carry out. Another surprising conclusion came from genome comparisons: although archaea resemble bacteria in their outward appearances, their genomes are much more closely related to eukaryotes than to bacteria (see Figure 1–9). It has even been proposed that the tree of life should be considered to have only two principal domains, with the archaea and eukaryotes making up one domain and bacteria constituting the other. The close relationship of archaea and eukaryotes has also changed our views on how the earliest eukaryotic cell evolved, a topic addressed later in this chapter.

At first it was thought that archaea occupied only extreme environments such as volcanoes, salt lakes, acid hot springs, and the stomachs of cattle, but they are now recognized to be present also in more congenial surroundings such as soils, seawater, and our skin. Commensurate with the wide variety of ecological niches in which they have been found, different species of archaea have highly diverse chemistries. They are believed to be the predominant life-form in soil and seawater, and they play major roles in recycling nitrogen and carbon, two of the most important elements for all cells.

Organisms Occupy Most of Our Planet

To understand life on Earth, we need to understand more than its diversity; we also need to know where life is found on our planet and how various living species are distributed. Organisms inhabit nearly all of the planet, and we continue to discover new habitats. Amazingly, some bacteria and archaea even live miles down in Earth’s deep crust and in the deepest and most hostile parts of the oceans.

Figure 1–14 The distribution of living biomass on Earth. The total biomass on Earth expressed as gigatons of carbon (Gt C) is estimated to be 550 Gt C. In the graph shown, the area of each taxon represented is proportional to the taxon’s global biomass, so plants account for about 80% (450/550) of the total biomass, whereas animals account for 0.4% (2/550). These recent estimates are based on various advanced techniques, including DNA sequencing and remote sensing. (Adapted from Y.M. Bar-On et al., Proc. Natl. Acad. Sci. USA 115:6506–6511, 2018. With permission from the authors.)

How are the main groups of organisms distributed among different environments? DNA sequencing and other advanced technologies have been used recently to address this question. The total biomass on Earth is estimated to contain 550 gigatons (1015 grams) of carbon, of which 450 gigatons of carbon (Gt C) is plants, 70 Gt C is bacteria, 7 Gt C is archaea, and 2 Gt C is animals (Figure 1–14). The plants are mainly terrestrial; the bacteria and archaea are mainly in the soil and Earth’s crust. Total terrestrial biomass is 100 times greater than that in the oceans, although most of the animal mass is found in the oceans. The human biomass is 10 times greater than that of all measurable wild animals together, and—while human biomass continues to increase—that of wild animals is falling, largely as a result of human activities.

Although humans and other animals make up a small fraction of Earth’s biomass, their existence depends completely on other forms of life. In the next section, we shall see some of the ways that these different life-forms work together to capture and recycle energy from Earth’s inanimate features.

Cells Can Be Powered by a Wide Variety of Free-Energy Sources

Organisms obtain the free energy needed for life in different ways. Some—such as animals, fungi, and the many different bacteria that live in the human gut—get it by feeding on other living things or the organic chemicals they produce; such organisms are called organotrophic (from the Greek word trophe, meaning “food”). Others derive their free energy directly from the nonliving world. These primary energy converters fall into two classes: those that harvest the energy of sunlight, and those that capture their energy from energy-rich systems of inorganic chemicals in the environment (chemical systems that are far from chemical equilibrium). Organisms of the former class are called phototrophic (feeding on sunlight); those of the latter are called lithotrophic (feeding on rock). The organotrophic organisms like ourselves could not exist without these primary energy converters, which are the most plentiful form of life.

The phototrophic organisms include many types of bacteria, as well as algae and plants, on which we—and virtually all the living things that we ordinarily see around us—depend. Phototrophic organisms have changed the whole chemistry of our environment: as a prime example, the oxygen in Earth’s atmosphere is a by-product of their biosynthetic activities.

Lithotrophic organisms are not such an obvious feature of our world, because they are microscopic and mostly live in habitats that humans do not frequent—deep in the ocean, buried in Earth’s crust, or in various other seemingly inhospitable environments. But they are a major part of the living world, and they are especially important in any consideration of the history of life on Earth.

Some lithotrophs get energy from aerobic reactions, which use molecular oxygen from the environment; because atmospheric O2 is ultimately the product of living phototrophic organisms, these aerobic lithotrophs are, in a sense, feeding on the products of past life. There are, however, many other lithotrophs that live anaerobically, in places where little or no molecular oxygen is present; these are circumstances similar to those that existed in the early days of life on Earth, before oxygen had accumulated.

The most dramatic of the anaerobic sites are the hot hydrothermal vents on the floor of the Pacific and Atlantic Oceans. They are located where the ocean floor is spreading as new portions of Earth’s crust form by a gradual upwelling of material from Earth’s interior (Figure 1–15). Downward-percolating seawater is heated and driven back upward as a submarine geyser, carrying with it a current of chemicals from the hot rocks below. A typical cocktail might include H2S, H2, CO, Mn2+, Fe2+, Ni2+, CH4, NH4+, and phosphorus-containing compounds. A dense population of microorganisms lives in the neighborhood of the vent, thriving on this austere diet and harvesting free energy from reactions between the available chemicals. Various invertebrate marine animals—clams, mussels, and giant marine worms—in turn, live off the microbes at the vent, forming an entire ecosystem analogous to the world of plants and animals that we belong to, but one powered by geochemical energy instead of light (Figure 1–16).

Figure 1–15 The geology of a hot hydrothermal vent in the ocean floor. As indicated, seawater percolates down toward the hot, molten, volcanic rock upwelling (basalt) from Earth’s interior and is heated and driven back upward, carrying a mixture of minerals leached from the hot rock. A temperature gradient is set up, from more than 350°C near the core of the vent, down to 2–3°C in the surrounding ocean. Minerals precipitate from the water as it cools, forming a chimney. Different classes of organisms, thriving at different temperatures, live in different neighborhoods of the chimney. A typical chimney might be a few meters tall, spewing out hot, mineral-rich water. The locations of lithotrophic bacteria and the invertebrate marine animals that depend on them are also shown (see Figure 1–16).
Figure 1–16 Organisms living at a depth of 2500 meters near a vent in the ocean floor. Close to the vent, at temperatures up to about 120°C, various lithotrophic species of bacteria and archaea live, directly fueled by geochemical energy. A little further away, where the temperature is lower, various invertebrate animals live by feeding on these microorganisms. Most remarkable are the giant (2-meter-long) tube worms, Riftia pachyptila, which are shown in the photograph. Rather than feed on the lithotrophic microbes, these worms live in symbiosis with them: specialized organs in the worms harbor huge numbers of symbiotic sulfur-oxidizing bacteria, which harness geochemical energy and supply nourishment to their hosts, which have no mouth, gut, or anus. The tube worms are thought to have evolved from more conventional animals and to have become secondarily adapted to life at hydrothermal vents. (Science History Images/Alamy Stock Photo.)

Some Cells Fix Nitrogen and Carbon Dioxide for Other Cells

To make a living cell requires matter, as well as free energy. DNA, RNA, and protein are composed of just six elements: hydrogen, carbon, nitrogen, oxygen, sulfur, and phosphorus. These are all plentiful in the nonliving environment, in Earth’s rocks, water, and atmosphere. But they are not present in chemical forms that allow easy incorporation into biological molecules. Atmospheric N2 and CO2, in particular, are extremely unreactive. A large amount of free energy is required to drive the reactions that use these inorganic molecules to make the organic compounds needed for further biosynthesis; that is, to fix nitrogen and carbon dioxide, so as to make N and C available to living organisms. Many types of cells lack the biochemical machinery to achieve this fixation; they instead rely on other classes of cells to do the job for them. We animals depend on plants, directly or indirectly, for our supplies of carbon- and nitrogen-containing organic compounds. Plants in turn, although they can fix carbon dioxide from the atmosphere, lack the ability to fix atmospheric nitrogen; they depend in part on nitrogen-fixing bacteria to supply their need for nitrogen-containing organic compounds. Plants of the pea family, for example, harbor symbiotic nitrogen-fixing bacteria in nodules in their roots.

Because living cells can differ widely in some of the most basic aspects of their biochemistry, cells with complementary needs and capabilities have frequently developed close associations. Some of these symbiotic associations, as we will see later, have evolved to the point where the partners have lost their separate identities altogether: they have joined forces to form a single composite cell—an endosymbiotic association, as opposed to an ectosymbiotic one between separate organisms.

Genomes Diversify Over Evolutionary Time, Producing New Types of Organisms

Having discussed our current views on the diversity of life-forms, how they are distributed across Earth, and how they depend on one another, we now turn to the question of how this great diversity was generated. All life depends on the storage of genetic information in the form of each organism’s DNA genome, so our focus is on how genomes change over evolutionary time.

In storing and copying genetic information, random accidents and errors occur, altering the nucleotide sequence; that is, creating mutations. Therefore, when a cell divides, the genomes of its two daughters are often not quite identical to each other or to that of the parent cell. On rare occasions, the error may represent a change for the better; more probably, it will cause no significant difference in the cell’s prospects. But in some cases, the error will cause serious damage; for example, by disrupting the coding sequence for a key protein or RNA molecule. Changes due to mistakes of the first type will tend to be perpetuated, because the altered cell has an increased likelihood of surviving and reproducing itself. Changes due to mistakes of the second type—neutral changes—may be perpetuated or not: in the competition for limited resources, it is a matter of chance whether the altered cell or its cousins will succeed. But changes that cause serious damage lead nowhere: the cell that suffers them dies, leaving no progeny. Through endless repetition of this cycle of error and trial—of mutation and natural selection—organisms evolve: their genetic specifications change, sometimes giving organisms new ways to exploit the environment more effectively, to survive in competition with others, and to reproduce successfully.

Some parts of the genome will change more readily than others in the course of evolution. A segment of DNA that does not code for protein or RNA and has no significant regulatory role is free to change at a rate limited only by the frequency of random errors. In contrast, a gene that codes for a highly optimized, essential protein or RNA molecule cannot alter so easily: when mistakes occur, the faulty cells are almost always disabled and eliminated. Genes of this latter sort are therefore highly conserved. Through 3.5 billion years or more of evolutionary history, many DNA sequences have changed beyond all recognition, but the most highly conserved genes remain perfectly recognizable in all living species.

These latter genes are the ones we must examine if we wish to trace family relationships between the most distantly related organisms in the tree of life. We discussed an example of one such gene—that for ribosomal RNA—when we introduced the classification of the living world into the three domains of eukaryotes, bacteria, and archaea. Because the production of proteins is fundamental to all living cells, this component of the ribosome has been highly conserved since early in the history of life on Earth (Figure 1–17).

Figure 1–17 Genetic information conserved since the days of the last universal common ancestor of all living things. A part of the gene that codes for the smaller of the two main ribosomal RNA (rRNA) molecules in the ribosome is shown. (The complete molecule is about 1500–1900 nucleotides long, depending on the species.) Corresponding segments of nucleotide sequences from an archaeon (Methanococcus jannaschii), a bacterium (Escherichia coli), and a eukaryote (Homo sapiens) are aligned. The red vertical lines indicate sites where the nucleotides are identical between the species; the human sequence is repeated at the bottom of the alignment so that all three two-way comparisons can be seen. The black dot halfway along the E. coli sequence denotes a site where a nucleotide has been either deleted from the bacterial lineage in the course of evolution or inserted in the other two lineages. Note that the sequences from these three organisms, representative of the three domains of the living world, still retain unmistakable similarities.

The ribosomal RNA genes are exceptional in being so well conserved, whereas most parts of genomes have diversified much more dramatically over evolutionary time. A complete DNA sequence for an organism—its genome sequence—reveals all the genes that an organism possesses, as well as those it lacks. When we compare the three domains of the living world, we can begin to see which genes are common to all of them—and must therefore have been present in the last universal common ancestral cell that was the founder of all present-day living things. We can also identify those genes that are peculiar to a single branch in the tree of life. To explain such findings, we need to consider how new genes arise and, more generally, how genomes evolve.

New Genes Are Generated from Preexisting Genes

The raw material of evolution is the DNA sequence that already exists: there is no natural mechanism for making long stretches of new, random, DNA sequence. In this sense, no gene is ever entirely new. Innovation can, however, occur in several ways (Figure 1–18):

  1. Intragenic mutation: an existing gene can be randomly modified by changes in its DNA sequence, through various types of errors that occur in the process of DNA replication and DNA repair.
  2. Gene duplication: an existing gene can be accidentally duplicated, creating a pair of initially identical genes within a single cell; these two genes may then diverge in the course of evolution.
  3. DNA segment shuffling: two or more existing genes can break and rejoin to make a hybrid gene consisting of DNA segments that originally belonged to separate genes.
  4. Horizontal (intercellular) DNA transfer: a piece of DNA can be transferred from the genome of one cell to that of another—including between species. This process contrasts with the usual vertical transfer of genetic information from parent to progeny.
Figure 1–18 Four modes of genetic innovation and their effects on the DNA sequence of an organism. A special form of horizontal transfer occurs when cells of two different species enter into a permanent symbiotic association; genes from one of the cells may subsequently be transferred to the genome of the other, as we will see later when we discuss the likely evolutionary origins of mitochondria and chloroplasts.

Each of these types of change leaves a characteristic trace in the DNA sequence of the organism, and there is clear evidence that all four processes have occurred frequently during evolution. In Chapters 4 and 5, we discuss the mechanisms underlying these changes, but for the present we focus on the consequences.

The Function of a Gene Can Often Be Deduced from Its Nucleotide Sequence

Family relationships among genes are important not just for their evolutionary interest, but also because they simplify the task of deciphering gene functions. Once the nucleotide sequence of a newly discovered gene has been determined, a scientist can tap a few keys on a computer to search large databases of known gene sequences for gene relatives. In many cases, the function of one or more of these homologs will have been already determined experimentally—generally in one of the model organisms described later in this chapter. Because gene sequence determines gene function, one can frequently make a good guess at the new gene’s function, as it is likely to be similar to that of the already known homologs. In this way, it is possible to decipher a great deal about the biology of an organism simply by analyzing the DNA sequence of its genome.

More Than 200 Gene Families Are Common to All Three Domains of Life

Given the complete genome sequences of representative organisms from all three domains of life—eukaryotes, bacteria, and archaea—we can search systematically for homologies that span this enormous evolutionary divide. In this way, we can begin to take stock of the common inheritance of all living things. There are considerable difficulties in this enterprise. For example, individual species have often lost some of the ancestral genes, and other genes have almost certainly been acquired by horizontal transfer from another species and therefore are not truly ancestral. In fact, genome comparisons strongly suggest that both lineage-specific gene loss and horizontal gene transfer, in some cases between evolutionarily distant species, have been major factors in evolution, at least among bacteria and archaea. As an additional difficulty, in the course of 2 or 3 billion years, some genes that were initially shared will have changed beyond recognition through mutation.

Because of all these vagaries of the evolutionary process, it is difficult, if not impossible, to determine the ancestral gene set that diversified into the present-day variety of life. A crude approximation can be obtained by tallying the gene families that have representatives in multiple—but not necessarily all—species from the three major domains of life. One such analysis revealed 264 ancient conserved families, each of which could be assigned a function on the basis of the best-characterized family member. As shown in Table 1–1, the largest number of shared gene families were involved in translation and in amino acid metabolism and transport. However, it must be emphasized that this set of highly conserved gene families represents only a very rough sketch of the common inheritance of all modern life.

TABLE 1–1 The Number of Gene Families, Classified by Function, Common to All Three Domains of the Living World

Information processing

Metabolism

Translation

63

Energy production and conversion

19

Transcription

7

Carbohydrate transport and metabolism

16

DNA replication, recombination, and repair

13

Amino acid transport and metabolism

43

Cellular processes and signaling

Nucleotide transport and metabolism

15

Cell-cycle control, mitosis, and meiosis

2

Coenzyme transport and metabolism

22

Defense mechanisms

3

Lipid transport and metabolism

9

Signal-transduction mechanisms

1

Inorganic ion transport and metabolism

8

Cell wall/membrane biogenesis

2

Secondary metabolite biosynthesis, transport, and catabolism

5

Intracellular trafficking and secretion

4

Poorly characterized

Post-translational modification, protein turnover, chaperones

8

General biochemical function predicted; specific biological role unknown

24

For the purpose of this analysis, gene families are defined as “universal” if they are represented in the genomes of at least two diverse archaea (Archaeoglobus fulgidus and Aeropyrum pernix), two evolutionarily distant bacteria (Escherichia coli and Bacillus subtilis), and one eukaryote (yeast, Saccharomyces cerevisiae). (Data from R.L. Tatusov et al., Science 278:631–637, 1997; R.L. Tatusov et al., BMC Bioinformatics 4:41, 2003; and the COGs database at the US National Library of Medicine.)

Summary

For most of human history, the living world around us was classified by what we could see. Genome sequencing has radically changed our view of life on the planet, and we now realize that living things fall into three broad domains: bacteria, archaea, and eukaryotes. The organisms in the first two domains are largely invisible to our naked eye, and many of them cannot yet be grown in a laboratory—being known only by their DNA sequences. But they make up the vast majority of life’s evolutionary diversity, including species that can obtain all their energy and nutrients from inorganic chemical sources—such as the reactive mixtures of minerals released at hydrothermal vents on the ocean floor—the sort of diet that may have nourished the first living cells more than 3.5 billion years ago. The eukaryotes (whose cells are larger and contain a variety of membrane-bound organelles) evolved later in evolutionary history and are consequently less diverse as a group than either the bacteria or archaea. Eukaryotes, which include all plants and animals, are the organisms most familiar to us, and they are the main focus of this textbook.

Many of the genes within a single organism or species show strong family resemblances in their DNA sequences, implying that they originated from the same ancestral gene through gene duplication and divergence. Family resemblances (homologies) are also clear when gene sequences are compared between different species, and more than 200 gene families have been so highly conserved that they can be recognized as common to most species from all three domains of the living world, suggesting they were present in the ancestral cell from which all life evolved. Given the DNA sequence of a newly discovered gene in any organism, it is therefore often possible to deduce the gene’s function from the known function of a homologous gene in a better-studied organism.

Glossary

eukaryote
Organism composed of one or more cells that have a distinct nucleus. Member of one of the three main divisions of the living world, the other two being bacteria and archaea.
prokaryote
Single-celled microorganism whose cells lack a well-defined, membrane-enclosed nucleus. Either a bacterium or an archaeon.
mutation
Heritable change in the nucleotide sequence of a chromosome.
gene family
The set of genes in an organism related in DNA sequence because of their derivation from the same ancestor.
orthologs
Genes or proteins from different species that are similar in sequence because they are descendants of the same gene in the last common ancestor of those species; orthologs often have the same or a very similar function in each organism.
paralogs
Genes or proteins that are similar in sequence because they are the result of a gene duplication event occurring in an ancestral organism. Those in two different organisms are less likely to have the same function than are orthologs. Compare orthologs.
homolog
One of two or more genes that are similar in sequence as a result of derivation from the same ancestral gene. The term covers both orthologs and paralogs.