The functional assignment of open reading frames (ORFs) is usually done by searching sequence similarities and/or sequence motifs against the databases. This assignment is done separately for each gene. As the result, known metabolic pathways may not be reconstructed by simply collecting the enzymes thus assigned. The relation among ORFs should be considered in the assignment procedure.
We developed an EC number assignment system for the set of amino acid sequences of ORFs in a given organism. By comparing this query set against other sets of amino acid sequences for different organisms, the relations of orthologous genes are identified. Currently, this comparison is made against eight organisms in our database, E.coli, H.influenzae, B.subtilis, M.genitalium, M.pneumoniae, Synechocystis sp., M.jannaschii and S.cerevisiae. By using the information of orthologous genes in different organisms, EC numbers are assigned to the query sequences. Under the KEGG project, all known metabolic pathways are computerized as graphical diagrams in the PATHWAY database. If the set of ORFs is complete for an organism, the organism-specific pathways should be reconstructed, which can be visualized by marking the assigned enzymes on the PATHWAY diagrams.
It is often the case that we find missing enzymes, without which the reconstructed pathway is
not complete. Here we report the results of reconstructing the amino acid biosynthetic
pathways by searching for missing enzymes and alternative reactions. Except Mycoplasmas,
almost all pathways could be reconstructed in E.coli, H.influenzae and Synechocystis. The EC
number assignment system, the visualization tool of reconstructed pathways, and the search
for alternative reaction pathways are all provided by KEGG at
http://www.genome.ad.jp/kegg/.