Genes are material entities that parents pass to offspring during reproduction. These entities encode information essential for the construction and regulation of polypeptides, proteins and other molecules that determines the growth and functioning of the organism.
The word "gene" is shared by many disciplines, including classical genetics, molecular genetics, evolutionary biology and population genetics. Because each discipline models the biology of life differently, the material entity that supports the gene in one discipline is not the same as in the other.
Following the discovery that DNA is the genetic material, and with the growth of biotechnology and the project to sequence the human genome, the common usage of the word "gene" has increasingly reflected its meaning in molecular biology. In the molecular-biological sense, genes are the segments of DNA which cells transcribe into RNAs and translate, at least in part, into proteins.
In common speech, "gene" is often used to refer to the hereditary cause of a trait, disease or condition—as in "the gene for obesity." A biologist, in contrast, might refer to an allele or a mutation that has been implicated in or is associated with obesity. This is because biologists know that many factors other than genes decide whether a person is obese or not: prenatal environment, upbringing, culture and the availability of food for example.
Moreover, it is very unlikely that variations within a single gene—or single genetic locus—determine one's genetic predisposition for obesity. These aspects of inheritance—the interplay between genes and environment, the influence of many genes—appear to be the norm with regard to many and perhaps most ("multifactoral") traits. The term phenotype refers to the characteristics that result from this interplay (see genotype-phenotype distinction).
Overview
Properties of genes
In molecular biology, a gene encodes the chemical structure of a protein. The genetic code describes the way in which proteins are determined by the gene. This code is essentially the same for all known life, from bacteria to humans.
Through the proteins they encode, genes govern the cells in which they reside. In multicellular organisms they control the development of the individual from the fertilized egg and the day-to-day functions of the cells that make up tissues and organs. The instrumental roles of their protein products range from mechanical support of the cell structure to the transportation and manufacture of other molecules and to the regulation of other proteins' activities.
The genes that exist today are those that have reproduced successfully in the past. This is the basis of the selfish gene view, publicised by Richard Dawkins. He points out in his book, The Selfish Gene, that all DNA exists with no other purpose than to propagate itself, even at the expense of the host organism's welfare. According to Dawkins, the possibly disappointing answer to the question "what is the meaning of life?" may be "the survival and perpetuation of ribonucleic acids and their associated proteins".
Types of genes
Due to rare, spontaneous errors (e.g. in DNA replication) mutations in the sequence of a gene may arise. Once propagated to the next generation, this mutation may lead to variations within a species' population. Variants of a single gene are known as alleles, and differences in alleles may give rise to differences in traits, for example eye color. A gene's most common allele is called the wild type allele, and rare alleles are called mutants.
Normally, RNA is an intermediate product in the translation of a molecular gene into a protein. However, for some gene sequences, RNA molecules are actually the functional end products. For example, RNAs known as ribozymes are capable of enzymatic function, or small interfering RNAs have a regulatory role. The DNA sequences from which such RNAs are transcribed are known as non-coding RNA, or RNA genes.
Although all cell-based organisms carry their genes and transmit them to offspring as DNA, some viruses that parasitize and reproduce in them carry only RNA. Because they use RNA, their cellular hosts may synthesize their proteins as soon as they are infected and without the delay in waiting for transcription. RNA retroviruses, on the other hand, require reverse transcription of their genome from RNA into DNA.
Human gene nomenclature
For each known human gene the HUGO Gene Nomenclature Committee (HGNC) approve a gene name and symbol (short-form abbreviation). All approved symbols are stored in Genew, the Human Gene Nomenclature Database. Each symbol is unique and each gene is only given one approved gene symbol. It is necessary to provide a unique symbol for each gene so that people can talk about them, it also facilitates electronic data retrieval from publications. In preference each symbol maintains parallel construction in different members of a gene family and can also be used in other species, especially the mouse.
Typical numbers of genes in an organism
The following table gives typical numbers of genes and genome size for some organisms. Estimates of the number of genes in an organism are somewhat controversial, because it is only possible to discover a gene, and no techniques currently exist to prove that a DNA sequence contains no gene. Nonetheless, estimates are made based on current knowledge.
organism | # of genes | base pairs |
---|---|---|
Plants | <50000 | <1011 |
Humans | 35000 | 3×109 |
Flies | 12000 | 1.6×108 |
Fungi | 6000 | 1.3×107 |
Bacteria | 500-6000 | 5×105–107 |
Mycoplasma genitalium | 500 | 580,000 |
DNA viruses | 10-300 | 5000–200,000 |
RNA viruses | 1-25 | 1000–23,000 |
Viroids | 0-1 | ~500 |
Prions | 0 | ;0 |
Chemistry and function of genes
Chemical structure of a gene
Four kinds of sequentially linked nucleotides compose a DNA molecule or strand (more at DNA). These four nucleotides constitute the genetic alphabet. A sequence of three consecutive nucleotides, called a codon, is the protein-coding vocabulary. The sequence of codons in a gene specifies the amino-acid sequence of the protein it encodes.
In most eukaryotic species, very little of the DNA in the genome encodes proteins, and the genes may be separated by vast sequences of so-called junk DNA. Moreover, the genes are often fragmented internally by non-coding sequences called introns, which can be many times longer than the genes themselves. Introns are removed on the heels of transcription by splicing. In the primary molecular sense they represent parts of a gene, however.
All the genes and intervening DNA together make up the genome of an organism, which in many species is divided among several chromosomes and typically present in two or more copies. The location (or locus) of a gene and the chromosome on which it is situated is in a sense arbitrary. Genes that appear together on the chromosomes of one species, such as humans, may appear on separate chromosomes in another species, such as mice. Two genes positioned near one another on a chromosome may encode proteins that figure in the same cellular process or in completely unrelated processes. As an example of the former, many of the genes involved in spermatogenesis reside together on the Y chromosome.
In the many species that carry more than one copy of their genome within each of their somatic cells, these copies are practically never identical. With respect to each gene, the copies that an individual possesses are liable to be distinct alleles, which may act synergistically or antagonistically to generate a trait or phenotype (more at genetics, allele).
Expression of molecular genes
For various reasons, the relationship between DNA strand and a phenotype trait is not direct. The same DNA strand in 2 different individuals may result in different traits because of the effect of other DNA strands or the environment.
- the DNA strand is expressed into a trait only if it is transcribed to RNA. Because the transcription starts from a specific base-pair sequence (a promoter) and stops at another (a terminator), our DNA strand needs to be correctly placed between the two. If not, it is considered as junk DNA, and is not expressed.
- Cells regulate the activity of genes in part by increasing or decreasing their rate of transcription. Over the short term, this regulation occurs through the binding or unbinding of proteins, known as transcription factors, to specific non-coding DNA sequences called regulatory elements. So, to be expressed, our DNA strand needs to be properly regulated by other DNA strands.
- the DNA strand may also be silenced through DNA methylation or by chemical changes to the protein components of chromosomes (see histone). This is a permanent form of regulation of the transcription.
- the RNA is often edited before its translation into a protein. Eukaryotic cells splice the transcripts of a gene, by keeping the exon and removing the intron. So, our DNA strand needs to be in an exon to be expressed. Because of the complexity of the splicing process, one transcribed RNA may be spliced in alternate ways to produce not one but a variety of proteins (alternative splicing) from one pre-mRNA. Prokaryotes produce a similar effect by shifting reading frames during translation.
- the translation of RNA into a protein also starts with a specific start and stop sequence.
- once produced, the protein interacts with the many other proteins in the cell, according to the cell metabolism. This interaction finally produces the trait.
This complex process helps explain the different meanings of "gene":
- a nucleotide sequence in a DNA strand;
- or the transcribed RNA, prior to splicing;
- or the transcribed RNA after splicing, i.e. without the introns
The latter meaning of gene is the result of more "material entity" that the first one.
Mutations and evolution
Just as there are many factors influencing the expression of a particular DNA strand, there are many ways to have genetic mutations.
For example, natural variations within regulatory sequences appear to underlie many of the heritable characteristics seen in organisms. The influence of such variations on the trajectory of evolution through natural selection may be as large as or larger than variation in sequences that encode proteins. Thus, though regulatory elements are often distinguished from genes in molecular biology, in effect they satisfy the shared and historical sense of the word. Indeed, a breeder or geneticist, in following the inheritance pattern of a trait, has no immediate way to know whether this pattern arises from coding sequences or regulatory sequences. Typically, he or she will simply attribute it to variations within a gene.
Errors during DNA replication may lead to the duplication of a gene, which may diverge over time. Though the two sequences may remain the same or be only slightly altered, they are typically regarded as separate genes (i.e. not as alleles of the same gene). The same is true when duplicate sequences appear in different species. Yet, though the alleles of a gene differ in sequence, nevertheless they are regarded as a single gene (occupying a single locus).
History
The existence of genes was first suggested by Gregor Mendel, who studied inheritance in pea plants and hypothesized a factor that conveys traits from parent to offspring. Although he did not use the term gene, he explained his results in terms of inherited characteristics. Mendel was also the first to hypothesize independent assortment, the distinction between dominant and recessive traits, the distinction between a heterozygote and homozygote, and the difference between what would later be described as genotype and phenotype.
Wilhelm Johannsen coined gene in 1909, based on the work of Gregor Mendel.
In 1910, Thomas Hunt Morgan showed that genes reside on specific chromosomes. He later showed that genes occupy specific locations on the chromosome. With this knowledge, Morgan and his students began the first chromosomal map of Drosophila.
In 1941, George Wells Beadle and Edward Lawrie Tatum showed that mutations in genes caused errors in certain steps in metabolic pathways. This showed that specific genes code for specific protiens, leading to the "one gene, one enzyme" hypothesis.