Integrated Analysis of Metabolic Pathways, Sequence Evolution
and Genome Organization
Hiroyuki Ogata, Wataru Fujibuchi, Susumu Goto, and Minoru
Kanehisa
We introduce and discuss a new computational method for
automatic extraction of functional units by making use of genomic
data and biochemical pathway data in KEGG
(http://www.genome.ad.jp/kegg/). In order to obtain functional
clues of a gene in a complete genome, it is customary to perform
similarity search of the gene against database sequences. However,
the search alone leaves, at least, one third to one half of genes in a
genome as hypothetical. To overcome this situation, we have been
focusing on functional units, sets of genes or gene products, which
make basal building blocks of cellular functions
(http://www.genome.ad.jp/dbget-bin/get_htext?Ortholog).
Although the homology search against the functional units
represented in the ortholog tables and the following examinations of
completeness of the units are obviously useful for gene function
prediction, collection and compilation of the units are time
consuming and to be automated.
To this end we have recently developed a method to
automatically extract the functional units. The method is based on a
concept of graph, where a node is a gene or a gene product and an
edge is a link or a relationship between genes or gene products. By
comparing two biological networks represented as undirected
graphs, it detects local clusters of corresponding nodes that
represent links of genes and/or gene products.
Different kinds of links make different networks or graphs.
For example, a genome is seen as a set of genes that are one-
dimensionally linked, so it is represented by a linear graph. A set of
interacting gene products in a biochemical pathway is another type
of graph. The utility of the method is demonstrated in the following
two comparisons .
If the method is used for a comparison of a genome with a
set of known biochemical pathways, it extracts gene clusters that
play their roles at close positions on the biochemical pathways. An
analysis on metabolic pathways showed that most of the gene
clusters in E. coli thus detected corresponded to enzyme operons.
By comparing each known genome against metabolic pathways we
observed many common gene clusters that were conserved among
multiple organisms, as well as many organism-specific gene
clusters. This type of analysis would be useful for reconstructing
and characterizing functional units of biochemical pathways.
If the method is used for a comparison of a genome versus
another genome with correspondence information of sequence
similarity, it extracts pairs of orthologous gene clusters. While it is
well known that global arrangement of orthologous gene clusters on
the genome can be highly shuffled between two distantly-related
bacterial lineage, we also observe gene shuffling events by
translocations, inversions, insertions and deletions within some of
the orthologous gene clusters. Practically, the extraction of
orthologous gene clusters gives important clues for identification of
orthologs that have been missed by simple homology searches.
In both examples, most of the genes in each cluster appear
to have functional relation with each other. Extracted clusters are
merged and represented into ortholog tables, which would be useful
for function prediction of genes. We believe comparative analysis
of networks of biological entities at this level of abstraction would
be fruitful for development of practical tools such as for automatic
annotation of gene functions.
Dagstuhl Seminar - Modeling and Simulation of Gene Regulation and Metabolic Pathways
(June, 21-26, 1998, Schloss Dagstuhl, Germany)