An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.
History
The term
exon is a shortening of the phrase
expressed region and was coined by American
biochemist Walter Gilbert in 1978:
The notion of the cistron... must be replaced by that of a transcription unit containing regions which will be lost from the mature messengerwhich I suggest we call introns (for intragenic regions)alternating with regions which will be expressedexons.
This definition was originally made for protein-coding transcripts that are spliced before being translated. The term later came to include sequences removed from
rRNA and
tRNA,
and other
ncRNA and it also was used later for RNA molecules originating from different parts of the genome that are then ligated by trans-splicing.
Contribution to genomes and size distribution
Although unicellular
such as yeast have either no introns or very few,
Animal and especially
vertebrate genomes have a large fraction of
Noncoding DNA. For instance, in the
human genome only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA.
This can provide a practical advantage in
omics-aided
health care (such as precision medicine) because it makes commercialized
exome sequencing a smaller and less expensive challenge than commercialized whole genome sequencing. The large variation in
genome size and
C-value across
Organism has posed an interesting challenge called the C-value enigma.
Across all eukaryotic genes in GenBank, there were (in 2002), on average, 5.48 exons per protein coding gene. The average exon encoded 30-36 . While the longest exon in the human genome is 11555 Base pair long, several exons have been found to be only 2 bp long. A single-nucleotide exon has been reported from the Arabidopsis genome. In humans, like protein coding mRNA, most non-coding RNA also contain multiple exons
Structure and function
In protein-coding genes, the exons include both the protein-coding sequence and the 5′- and 3′-untranslated regions (UTR). Often the first exon includes both the 5′-UTR and the first part of the coding sequence, but exons containing only regions of 5′-UTR or (more rarely) 3′-UTR occur in some genes, i.e. the UTRs may contain introns.
Some
non-coding RNA transcripts also have exons and introns.
Mature mRNAs originating from the same gene need not include the same exons, since different introns in the pre-mRNA can be removed by the process of alternative splicing.
Exonization is the creation of a new exon, as a result of mutations in introns.
Experimental approaches using exons
Exon trapping or '
gene trapping' is a molecular biology technique that exploits the existence of the intron-exon
RNA splicing to find new genes.
The first exon of a 'trapped' gene splices into the exon that is contained in the
insertional DNA. This new exon contains the Open Reading Frame for a
reporter gene that can now be expressed using the enhancers that control the target gene. A scientist knows that a new gene has been trapped when the reporter gene is expressed.
Splicing can be experimentally modified so that targeted exons are excluded from mature mRNA transcripts by blocking the access of splice-directing SnRNP (snRNPs) to pre-mRNA using Morpholino. This has become a standard technique in developmental biology. Morpholino oligos can also be targeted to prevent molecules that regulate splicing (e.g. splice enhancers, splice suppressors) from binding to pre-mRNA, altering patterns of splicing.
Common misuse of the term
Common incorrect uses of the term
exon are that 'exons code for protein', or 'exons code for amino-acids' or 'exons are translated'
. However, these sorts of definitions only cover protein-coding genes, and omit those exons that become part of a
non-coding RNA or the untranslated region of an
mRNA.
Such incorrect definitions still occur in overall reputable secondary sources.
See also
Bibliography
External links