Systematic prediction of enzyme genes by the metabolic pathway database

Hidemasa Bono, Susumu Goto, Hiroyuki Ogata, Minoru Kanehisa

Institute for Chemical Research, Kyoto University,
Gokanosho, Uji, Kyoto 611, Japan

The metabolic pathway is one of the best studied pathways in biological organisms. We have constructed the database of metabolic pathways and the catalogs of enzyme genes in several organisms under the KEGG (Kyoto Encyclopedia of Genes and Genomes) project. KEGG draws an organism-specific pathway which can be viewed as a sequence of colored boxes corresponding to the known enzymes of a certain organism. This paper presents a systematic method of predicting enzyme genes, utilizing the knowledge organized in KEGG, for an organism with its entire genomic sequence known.

In the maps of organism-specific pathways there can be interruptions and incompleteness because the genes coding for particular enzymes are not assigned or predicted by the analysis of the entire genome. If the chemical product of this pathway is indispensable to the organism, the pathway should become complete in either of the two possibilities. Namely, the assignment of enzyme genes must be re-examined or the existence of alternative chemical reactions must be investigated. Here we focus on the first possibility.

The functional assignment of predicted genes (open reading frames) is usually done by searching sequence similarities and/or sequence motifs against the databases. This assignment is done separately for each gene. As the result, many open reading frames are left without any functional assignments, and for enzyme genes, the assigned role of a particular gene may be wrong because similar enzymes can catalyze different chemical reactions or have different ligand specificities. We are developing a system of assigning enzyme genes by comapring all the open reading frames of a genome against all known enzymes in different organisms stored in KEGG. The system examines the completeness of the organism-specific pathways formed as well as the completeness of the enzyme catalog derived, which is the feature not incorporated in the other gene prediction systems. The system helps us reconsider the functional assignments of the genomes previously determined and we can fix the gene catalog of KEGG accordingly. We will present the result of utilizing this system for prediction and analysis of enzyme genes in Haemophilus influenzae and Synechocystis sp.


International Workshop on Recent Advance in Genome Biology of Micro-organisms
(October 28-29, 1996, at Makuhari, Chiba, Japan)