In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.
The transmission of genes to an organism's offspring, is the basis of the inheritance of from one generation to the next. These genes make up different DNA sequences, together called a genotype, that is specific to every given individual, within the gene pool of the population of a given species. The genotype, along with environmental and developmental factors, ultimately determines the phenotype of the individual.
Most biological traits occur under the combined influence of (a set of different genes) and gene–environment interactions. Some genetic traits are instantly visible, such as eye color or the number of limbs, others are not, such as blood type, the risk for specific diseases, or the thousands of basic biochemistry processes that constitute life. A gene can acquire in its gene sequence, leading to different variants, known as , in the population. These alleles encode slightly different versions of a gene, which may cause different phenotypical traits.
The Mendelian gene is the classical gene of genetics and it refers to any heritable trait. This is the gene described in The Selfish Gene. More thorough discussions of this version of a gene can be found in the articles Genetics and Gene-centered view of evolution.
The molecular gene definition is more commonly used across biochemistry, molecular biology, and most of genetics—the gene that is described in terms of DNA sequence. There are many different definitions of this gene—some of which are misleading or incorrect.
Very early work in the field that became molecular genetics suggested the concept that one gene makes one protein (originally 'one gene – one enzyme'). However, genes that produce repressor RNAs were proposed in the 1950s and by the 1960s, textbooks were using molecular gene definitions that included those that specified functional RNA molecules such as ribosomal RNA and tRNA (noncoding genes) as well as protein-coding genes.
This idea of two kinds of genes is still part of the definition of a gene in most textbooks. For example,
The important parts of such definitions are: (1) that a gene corresponds to a transcription unit; (2) that genes produce both mRNA and noncoding RNAs; and (3) regulatory sequences control gene expression but are not part of the gene itself. However, there is one other important part of the definition and it is emphasized in Kostas Kampourakis' book Making Sense of Genes.
The emphasis on function is essential because there are stretches of DNA that produce non-functional transcripts and they do not qualify as genes. These include obvious examples such as transcribed pseudogenes as well as less obvious examples such as junk RNA produced as noise due to transcription errors. In order to qualify as a true gene, by this definition, one has to prove that the transcript has a biological function.
Early speculations on the size of a typical gene were based on high-resolution genetic mapping and on the size of proteins and RNA molecules. A length of 1500 base pairs seemed reasonable at the time (1965). This was based on the idea that the gene was the DNA that was directly responsible for production of the functional product. The discovery of introns in the 1970s meant that many eukaryotic genes were much larger than the size of the functional product would imply. Typical mammalian protein-coding genes, for example, are about 62,000 base pairs in length (transcribed region) and since there are about 20,000 of them they occupy about 35–40% of the mammalian genome (including the human genome).
In spite of the fact that both protein-coding genes and noncoding genes have been known for more than 50 years, there are still a number of textbooks, websites, and scientific publications that define a gene as a DNA sequence that specifies a protein. In other words, the definition is restricted to protein-coding genes. Here is an example from a 2021 article in American Scientist.
This restricted definition is so common that it has spawned many recent articles that criticize this "standard definition" and call for a new expanded definition that includes noncoding genes. However, some modern writers still do not acknowledge noncoding genes although this so-called "new" definition has been recognised for more than half a century.
Although some definitions can be more broadly applicable than others, the fundamental complexity of biology means that no definition of a gene can capture all aspects perfectly. Not all genomes are DNA (e.g. ), bacterial are multiple protein-coding regions transcribed into single large mRNAs, alternative splicing enables a single genomic region to encode multiple district products and trans-splicing concatenates mRNAs from shorter coding sequence across the genome. Since molecular definitions exclude elements such as introns, promotors, and other regulatory regions, these are instead thought of as "associated" with the gene and affect its function.
An even broader operational definition is sometimes used to encompass the complexity of these diverse phenomena, where a gene is defined as a union of genomic sequences encoding a coherent set of potentially overlapping functional products. This definition categorizes genes by their functional products (proteins or RNA) rather than their specific DNA loci, with regulatory elements classified as gene-associated regions.
Prior to Mendel's work, the dominant theory of heredity was one of blending inheritance,
Mendel's work went largely unnoticed after its first publication in 1866, but was rediscovered in the late 19th century by Hugo de Vries, Carl Correns, and Erich von Tschermak, who (claimed to have) reached similar conclusions in their own research. Specifically, in 1889, Hugo de Vries published his book Intracellular Pangenesis, Translated in 1908 from German to English by Open Court Publishing Co., Chicago, 1910 in which he postulated that different characters have individual hereditary carriers and that inheritance of specific traits in organisms comes in particles. De Vries called these units "pangenes" ( Pangens in German), after Darwin's 1868 pangenesis theory.
Twenty years later, in 1909, Wilhelm Johannsen introduced the term "gene" (inspired by the ancient Greek: γόνος, gonos, meaning offspring and procreation) From p. 124: "Dieses "etwas" in den Gameten bezw. in der Zygote, ... – kurz, was wir eben Gene nennen wollen – bedingt sind." (This "something" in the gametes or in the zygote, which has crucial importance for the character of the organism, is usually called by the quite ambiguous term Anlagen primordium,. Many other terms have been suggested, mostly unfortunately in closer connection with certain hypothetical opinions. The word "pangene", which was introduced by Darwin, is perhaps used most frequently in place of Anlagen. However, the word "pangene" was not well chosen, as it is a compound word containing the roots pan (the neuter form of Πας all, every) and gen (from γί-γ(ε)ν-ομαι, to become). Only the meaning of this latter i.e., comes into consideration here; just the basic idea – namely, that a trait in the developing organism can be determined or is influenced by "something" in the gametes – should find expression. No hypothesis about the nature of this "something" should be postulated or supported by it. For that reason it seems simplest to use in isolation the last syllable gen from Darwin's well-known word, which alone is of interest to us, in order to replace, with it, the poor, ambiguous word Anlage. Thus we will say simply "gene" and "genes" for "pangene" and "pangenes". The word gene is completely free of any hypothesis; it expresses only the established fact that in any case many traits of the organism are determined by specific, separable, and thus independent "conditions", "foundations", "plans" – in short, precisely what we want to call genes.) and, in 1906, William Bateson, that of "genetics" while Eduard Strasburger, among others, still used the term "pangene" for the fundamental physical and functional unit of heredity.
In the early 1950s the prevailing view was that the genes in a chromosome acted like discrete entities arranged like beads on a string. The experiments of Seymour Benzer using defective in the rII region of bacteriophage T4 (1955–1959) showed that individual genes have a simple linear structure and are likely to be equivalent to a linear section of DNA.
Collectively, this body of research established the central dogma of molecular biology, which states that are translated from RNA, which is transcribed from DNA. This dogma has since been shown to have exceptions, such as reverse transcription in . The modern study of genetics at the level of DNA is known as molecular genetics.
In 1972, Walter Fiers and his team were the first to determine the sequence of a gene: that of bacteriophage MS2 coat protein. The subsequent development of chain-termination DNA sequencing in 1977 by Frederick Sanger improved the efficiency of sequencing and turned it into a routine laboratory tool. An automated version of the Sanger method was used in early phases of the Human Genome Project.
This view of evolution was emphasized by George C. Williams' gene-centric view of evolution. He proposed that the Mendelian gene is a unit of natural selection with the definition: "that which segregates and recombines with appreciable frequency."
The development of the neutral theory of evolution in the late 1960s led to the recognition that random genetic drift is a major player in evolution and that neutral theory should be the null hypothesis of molecular evolution. This led to the construction of phylogenetic trees and the development of the molecular clock, which is the basis of all dating techniques using DNA sequences. These techniques are not confined to molecular gene sequences but can be used on all DNA segments in the genome.
Two chains of DNA twist around each other to form a DNA double helix with the phosphate–sugar backbone spiralling around the outside, and the bases pointing inward with adenine to thymine and guanine to cytosine. The specificity of base pairing occurs because adenine and thymine align to form two , whereas cytosine and guanine form three hydrogen bonds. The two strands in a double helix must, therefore, be complementary, with their sequence of bases matching such that the adenines of one strand are paired with the thymines of the other strand, and so on.
Due to the chemical composition of the pentose residues of the bases, DNA strands have directionality. One end of a DNA polymer contains an exposed hydroxyl group on the deoxyribose; this is known as the 3' end of the molecule. The other end contains an exposed phosphate group; this is the 5' end. The two strands of a double-helix run in opposite directions. Nucleic acid synthesis, including DNA replication and transcription occurs in the 5'→3' direction, because new nucleotides are added via a dehydration reaction that uses the exposed 3' hydroxyl as a nucleophile.
The gene expression of genes encoded in DNA begins by transcribing the gene into RNA, a second type of nucleic acid that is very similar to DNA, but whose monomers contain the sugar ribose rather than deoxyribose. RNA also contains the base uracil in place of thymine. RNA molecules are less stable than DNA and are typically single-stranded. Genes that encode proteins are composed of a series of three-nucleotide sequences called , which serve as the "words" in the genetic "language". The genetic code specifies the correspondence during protein translation between codons and . The genetic code is nearly the same for all known organisms.
The majority of eukaryotic genes are stored on a set of large, linear chromosomes. The chromosomes are packed within the cell nucleus in complex with storage proteins called to form a unit called a nucleosome. DNA packaged and condensed in this way is called chromatin. The manner in which DNA is stored on the histones, as well as chemical modifications of the histone itself, regulate whether a particular region of DNA is accessible for gene expression. In addition to genes, eukaryotic chromosomes contain sequences involved in ensuring that the DNA is copied without degradation of end regions and sorted into daughter cells during cell division: replication origins, , and the centromere. Replication origins are the sequence regions where DNA replication is initiated to make two copies of the chromosome. Telomeres are long stretches of repetitive sequences that cap the ends of the linear chromosomes and prevent degradation of coding and regulatory regions during DNA replication. The length of the telomeres decreases each time the genome is replicated and has been implicated in the aging process. The centromere is required for binding to separate sister chromatids into daughter cells during cell division.
(bacteria and archaea) typically store their genomes on a single, large, circular chromosome. Similarly, some eukaryotic contain a remnant circular chromosome with a small number of genes. Prokaryotes sometimes supplement their chromosome with additional small circles of DNA called , which usually encode only a few genes and are transferable between individuals. For example, the genes for antibiotic resistance are usually encoded on bacterial plasmids and can be passed between individual cells, even those of different species, via horizontal gene transfer.
Whereas the chromosomes of prokaryotes are relatively gene-dense, those of eukaryotes often contain regions of DNA that serve no obvious function. Simple single-celled eukaryotes have relatively small amounts of such DNA, whereas the genomes of complex multicellular organisms, including humans, contain an absolute majority of DNA without an identified function. This DNA has often been referred to as "junk DNA". However, more recent analyses suggest that, although protein-coding DNA makes up barely 2% of the human genome, about 80% of the bases in the genome may be expressed, so the term "junk DNA" may be a misnomer.
The gene structure consists of many elements of which the actual coding region is often only a small part. These include introns and untranslated regions of the mature mRNA. Noncoding genes can also contain introns that are removed during processing to produce the mature functional RNA.
All genes are associated with regulatory sequences that are required for their expression. First, genes require a promoter sequence. The promoter is recognized and bound by transcription factors that recruit and help RNA polymerase bind to the region to initiate transcription. The recognition typically occurs as a consensus sequence like the TATA box. A gene can have more than one promoter, resulting in messenger RNAs (mRNA) that differ in how far they extend in the 5' end. Highly transcribed genes have "strong" promoter sequences that form strong associations with transcription factors, thereby initiating transcription at a high rate. Others genes have "weak" promoters that form weak associations with transcription factors and initiate transcription less frequently. Eukaryote promoter regions are much more complex and difficult to identify than prokaryote promoters.
Additionally, genes can have regulatory regions many kilobases upstream or downstream of the gene that alter expression. These act by binding to transcription factors which then cause the DNA to loop so that the regulatory sequence (and bound transcription factor) become close to the RNA polymerase binding site. For example, enhancers increase transcription by binding an activator protein which then helps to recruit the RNA polymerase to the promoter; conversely silencers bind repressor proteins and make the DNA less available for RNA polymerase.
The mature messenger RNA produced from protein-coding genes contains untranslated regions at both ends which contain binding sites for ribosomes, RNA-binding proteins, microRNA, as well as terminator, and start codon and stop codons. In addition, most eukaryotic open reading frames contain untranslated , which are removed and , which are connected together in a process known as RNA splicing. Finally, the ends of gene transcripts are defined by polyadenylation, where newly produced pre-mRNA gets cleaved and a string of ~200 adenosine monophosphates is added at the 3' end. The polyadenylation tail protects mature mRNA from degradation and has other functions, affecting translation, localization, and transport of the transcript from the nucleus. Splicing, followed by CPA, generate the final mature mRNA, which encodes the protein or RNA product.
Many noncoding genes in eukaryotes have different transcription termination mechanisms and they do not have poly(A) tails.
Many prokaryotic genes are organized into , with multiple protein-coding sequences that are transcribed as a unit. The genes in an operon are transcribed as a continuous messenger RNA, referred to as a polycistronic mRNA. The term cistron in this context is equivalent to gene. The transcription of an operon's mRNA is often controlled by a repressor that can occur in an active or inactive state depending on the presence of specific metabolites. When active, the repressor binds to a DNA sequence at the beginning of the operon, called the operator region, and represses transcription of the operon; when the repressor is inactive transcription of the operon can occur (see e.g. Lac operon). The products of operon genes typically have related functions and are involved in the same regulatory network.
Additionally, a "start codon", and three "" indicate the beginning and end of the coding region. There are 64 possible codons (four possible nucleotides at each of three positions, hence 43 possible codons) and only 20 standard amino acids; hence the code is redundant and multiple codons can specify the same amino acid. The correspondence between codons and amino acids is nearly universal among all known living organisms.
In , transcription occurs in the cytoplasm; for very long transcripts, translation may begin at the 5' end of the RNA while the 3' end is still being transcribed. In , transcription occurs in the nucleus, where the cell's DNA is stored. The RNA molecule produced by the polymerase is known as the primary transcript and undergoes post-transcriptional modifications before being exported to the cytoplasm for translation. One of the modifications performed is the splicing of which are sequences in the transcribed region that do not encode a protein. Alternative splicing mechanisms can result in mature transcripts from the same gene having different sequences and thus coding for different proteins. This is a major form of regulation in eukaryotic cells and also occurs in some prokaryotes.
Some store their entire genomes in the form of RNA, and contain no DNA at all.
Alleles at a locus may be dominant gene or recessive gene; dominant alleles give rise to their corresponding phenotypes when paired with any other allele for the same trait, whereas recessive alleles give rise to their corresponding phenotype only when paired with another copy of the same allele. If you know the genotypes of the organisms, you can determine which alleles are dominant and which are recessive. For example, if the allele specifying tall stems in pea plants is dominant over the allele specifying short stems, then pea plants that inherit one tall allele from one parent and one short allele from the other parent will also have tall stems. Mendel's work demonstrated that alleles assort independently in the production of , or , ensuring variation in the next generation. Although Mendelian inheritance remains a good model for many traits determined by single genes (including a number of well-known genetic disorders) it does not include the physical processes of DNA replication and cell division.
The rate of DNA replication in living cells was first measured as the rate of phage T4 DNA elongation in phage-infected E. coli and found to be impressively rapid. During the period of exponential DNA increase at 37 °C, the rate of elongation was 749 nucleotides per second.
After DNA replication, the cell must physically separate the two genome copies and divide into two distinct membrane-bound cells. In (bacteria and archaea) this usually occurs via a relatively simple process called binary fission, in which each circular genome attaches to the cell membrane and is separated into the daughter cells as the membrane invagination to split the cytoplasm into two membrane-bound portions. Binary fission is extremely fast compared to the rates of cell division in . Eukaryotic cell division is a more complex process known as the cell cycle; DNA replication occurs during a phase of this cycle known as S phase, whereas the process of segregating and splitting the cytoplasm occurs during M phase.
During the process of meiotic cell division, an event called genetic recombination or crossing-over can sometimes occur, in which a length of DNA on one chromatid is swapped with a length of DNA on the corresponding homologous non-sister chromatid. This can result in reassortment of otherwise linked alleles. The Mendelian principle of independent assortment asserts that each of a parent's two genes for each trait will sort independently into gametes; which allele an organism inherits for one trait is unrelated to which allele it inherits for another trait. This is in fact only true for genes that do not reside on the same chromosome or are located very far from one another on the same chromosome. The closer two genes lie on the same chromosome, the more closely they will be associated in gametes and the more often they will appear together (known as genetic linkage). Genes that are very close are essentially never separated because it is extremely unlikely that a crossover point will occur between them.
Although the number of base-pairs of DNA in the human genome has been known since the 1950s, the estimated number of genes has changed over time as definitions of genes, and methods of detecting them have been refined. Initial theoretical predictions of the number of human genes in the 1960s and 1970s were based on mutation load estimates and the numbers of mRNAs and these estimates tended to be about 30,000 protein-coding genes. During the 1990s there were guesstimates of up to 100,000 genes and early data on detection of mRNAs (expressed sequence tags) suggested more than the traditional value of 30,000 genes that had been reported in the textbooks during the 1980s.
The initial draft sequences of the human genome confirmed the earlier predictions of about 30,000 protein-coding genes however that estimate has fallen to about 19,000 with the ongoing GENCODE annotation project. The number of noncoding genes is not known with certainty but the latest estimates from Ensembl suggest 26,000 noncoding genes.
Essential genes include housekeeping genes (critical for basic cell functions) as well as genes that are expressed at different times in the organisms development or life cycle. Housekeeping genes are used as experimental controls when analysing gene expression, since they are gene expression at a relatively constant level.
Genetic engineering is now a routine research tool with . For example, genes are easily added to bacteria and lineages of knockout mice with a specific gene's function disrupted are used to investigate that gene's function. Many organisms have been genetically modified for applications in agriculture, industrial biotechnology, and medicine.
For multicellular organisms, typically the embryo is engineered which grows into the adult genetically modified organism. However, the genomes of cells in an adult organism can be edited using gene therapy techniques to treat genetic diseases.
History
Discovery of discrete inherited units
Discovery of DNA
Modern synthesis and its successors
Molecular basis
DNA
Chromosomes
Structure and function
Structure
Complexity
Gene expression
Genetic code
Transcription
Translation
Regulation
RNA genes
Inheritance
Mendelian inheritance
DNA replication and cell division
Molecular inheritance
Genome
Number of genes
Essential genes
Genetic and genomic nomenclature
Genetic engineering
See also
Citations
Sources
Further reading
External links
|
|