The foresight of sequencing a complete genome as opposed to only cataloging all the mRNAs (expressed part of the genome) came in recognizing that a genome is more than a bundle of genes and that the organization of genes in the context of surrounding information in the rest of the DNA is equally important . An enormous amount of functionally important information has been found in the genome, in addition to the protein coding sequences. The human genome sequence contains non-coding RNA genes, regulatory sequences and structural motifs . It also maintains short- and long -range spatial organization of sequences and it contains important evolutionary information. Only by going systematically along each chromosome from end to end could every piece of information be captured with certainty. This realization has initiated complete sequencing of genomes of several other organisms as well. The human genome contains over 10 million common polymorphic sites and an unlimited number of rarer variants. More than 7 million single-nucleotide polymorphisms (SNPs) have been mapped on the genome sequence and about 5 million have been validated [3-5]. This allows examination of each gene for variants that alter protein coding sequence or splice sites and test them for functional significance. One can also select polymorphisms for use as genetic markers, download the flanking sequence, determine the genotype of individual DNA samples and search for disease associations.
The finite number of protein coding genes hides a much greater diversity and extent of functional information in the human genome. For example, alternate splicing allows multiple functions encoded by the same gene to be selected in a cell specific manner [6,7]. Multiple promoters can confer diversity of inducible responses and substrate specificities on the same gene. The use of genomic information by each cell is governed by the interaction of multiple proteins with regulatory sequences that act as signal processors. As a result a response is initiated that takes into account all the information received from either inside or outside the cell. When analyzing the genome sequence, it is much harder to recognize regulatory sequences than protein coding sequences, because the rules are more complex and less obvious. Like protein coding regions, many regulatory sequences have been conserved during evolution, allowing one to use information from other organisms to try to find these functionally important elements of the human genome. Gene regulation is also governed by modifications to the DNA sequence (methylation) and to the proteins that bind to the DNA such as histones (acetylation) known as epigenetic changes . Not much is known about the mechanisms and rules that govern this process or the replication of human chromosomes. Functional annotation of the protein coding region is relatively straightforward. Determining three dimensional structures, disrupting a gene sequence and correlating with the resulting phenotype, using model organisms for functional annotations are a few of the approaches that will enable one to harness the full potential of the human genome sequence.
Was this article helpful?