GENE FUNCTION IDENTIFICATION SYSTEM BASED ON THE METABOLIC PATHWAY DATABASE

Hidemasa Bono, Hiroyuki Ogata, Susumu Goto, Wataru Fujibuchi and Minoru Kanehisa
Institute for Chemical Research, Kyoto University,
Gokasho, Uji, Kyoto 611, Japan

Analysis for gene function is the next target of genome project after the completion of sequencing. One in silico approach toward the analysis is being achieved by KEGG (Kyoto Encyclopedia of Genes and Genomes) project. As a pilot work in KEGG project we focus on the metabolic pathway, which is the best studied pathway in biology and which contains the EC numbering scheme for enzymatic functions that integrates different gene names in different organisms.

The functional assignment of open reading frames (ORFs) is usually done by searching sequence similarities and/or sequence motifs against the databases. This assignment is done separately for each gene. As the result, known metabolic pathways may not be reconstructed by simply collecting the enzymes thus assigned. The relation among ORFs should be considered in the assignment procedure.

We developed an EC number assignment system for the set of amino acid sequences of ORFs in a given organism. By comparing this query set against other sets of amino acid sequences for different organisms, the relations of orthologous genes are identified. Currently, this comparison is made against eight organisms in our database, E.coli, H.influenzae, B.subtilis, M.genitalium, M.pneumoniae, Synechocystis sp., M.jannaschii and S.cerevisiae. By using the information of orthologous genes in different organisms, EC numbers are assigned to the query sequences. Under the KEGG project, all known metabolic pathways are computerized as graphical diagrams in the PATHWAY database. If the set of ORFs is complete for an organism, the organism-specific pathways should be reconstructed, which can be visualized by marking the assigned enzymes on the PATHWAY diagrams.

It is often the case that we find missing enzymes, without which the reconstructed pathway is not complete. Here we report the results of reconstructing the amino acid biosynthetic pathways by searching for missing enzymes and alternative reactions. Except Mycoplasmas, almost all pathways could be reconstructed in E.coli, H.influenzae and Synechocystis. The EC number assignment system, the visualization tool of reconstructed pathways, and the search for alternative reaction pathways are all provided by KEGG at http://www.genome.ad.jp/kegg/.


Meeting on "Genome Mapping & Sequencing"
(May 14-18, 1997, Cold Spring Harbor Laboratory, New York)