Integrated Analysis of Metabolic Pathways, Sequence Evolution and Genome Organization

Hiroyuki Ogata, Wataru Fujibuchi, Susumu Goto, and Minoru Kanehisa

We introduce and discuss a new computational method for automatic extraction of functional units by making use of genomic data and biochemical pathway data in KEGG (http://www.genome.ad.jp/kegg/). In order to obtain functional clues of a gene in a complete genome, it is customary to perform similarity search of the gene against database sequences. However, the search alone leaves, at least, one third to one half of genes in a genome as hypothetical. To overcome this situation, we have been focusing on functional units, sets of genes or gene products, which make basal building blocks of cellular functions (http://www.genome.ad.jp/dbget-bin/get_htext?Ortholog). Although the homology search against the functional units represented in the ortholog tables and the following examinations of completeness of the units are obviously useful for gene function prediction, collection and compilation of the units are time consuming and to be automated. To this end we have recently developed a method to automatically extract the functional units. The method is based on a concept of graph, where a node is a gene or a gene product and an edge is a link or a relationship between genes or gene products. By comparing two biological networks represented as undirected graphs, it detects local clusters of corresponding nodes that represent links of genes and/or gene products. Different kinds of links make different networks or graphs. For example, a genome is seen as a set of genes that are one- dimensionally linked, so it is represented by a linear graph. A set of interacting gene products in a biochemical pathway is another type of graph. The utility of the method is demonstrated in the following two comparisons .

If the method is used for a comparison of a genome with a set of known biochemical pathways, it extracts gene clusters that play their roles at close positions on the biochemical pathways. An analysis on metabolic pathways showed that most of the gene clusters in E. coli thus detected corresponded to enzyme operons. By comparing each known genome against metabolic pathways we observed many common gene clusters that were conserved among multiple organisms, as well as many organism-specific gene clusters. This type of analysis would be useful for reconstructing and characterizing functional units of biochemical pathways.

If the method is used for a comparison of a genome versus another genome with correspondence information of sequence similarity, it extracts pairs of orthologous gene clusters. While it is well known that global arrangement of orthologous gene clusters on the genome can be highly shuffled between two distantly-related bacterial lineage, we also observe gene shuffling events by translocations, inversions, insertions and deletions within some of the orthologous gene clusters. Practically, the extraction of orthologous gene clusters gives important clues for identification of orthologs that have been missed by simple homology searches. In both examples, most of the genes in each cluster appear to have functional relation with each other. Extracted clusters are merged and represented into ortholog tables, which would be useful for function prediction of genes. We believe comparative analysis of networks of biological entities at this level of abstraction would be fruitful for development of practical tools such as for automatic annotation of gene functions.
Dagstuhl Seminar - Modeling and Simulation of Gene Regulation and Metabolic Pathways
(June, 21-26, 1998, Schloss Dagstuhl, Germany)