Show simple item record

dc.contributor.authorJiang, Xiaoyuen_US
dc.contributor.authorNariai, Naokien_US
dc.contributor.authorSteffen, Martinen_US
dc.contributor.authorKasif, Simonen_US
dc.contributor.authorKolaczyk, Eric Den_US
dc.date.accessioned2012-01-11T00:37:40Z
dc.date.available2012-01-11T00:37:40Z
dc.date.copyright2008en_US
dc.date.issued2008-8-22en_US
dc.identifier.citationJiang, Xiaoyu, Naoki Nariai, Martin Steffen, Simon Kasif, Eric D Kolaczyk. "Integration of relational and hierarchical network information for protein function prediction" BMC Bioinformatics 9:350. (2008)en_US
dc.identifier.issn1471-2105en_US
dc.identifier.urihttp://hdl.handle.net/2144/3004
dc.description.abstractBACKGROUND. In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions. RESULTS. We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing. CONCLUSION. A cross-validation study, using data from the yeast Saccharomyces cerevisiae, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor) and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased positive predictive value), and that this increase is consistent uniformly with GO-term depth. Additional in silico validation on a collection of new annotations recently added to GO confirms the advantages suggested by the cross-validation study. Taken as a whole, our results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network-based methods.en_US
dc.description.sponsorshipNational Human Genome Research Institute (R01 HG003367-01A1); National Institutes of Health (GM078987); National Science Foundation (ITR-048715); Office of Naval Research (N00014-06-1-0096)en_US
dc.language.isoenen_US
dc.publisherBioMed Centralen_US
dc.rightsCopyright 2008 Jiang et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.en_US
dc.rights.urihttp://creativecommons.org/licenses/by/2.0en_US
dc.titleIntegration of Relational and Hierarchical Network Information for Protein Function Predictionen_US
dc.typearticleen_US
dc.identifier.doi10.1186/1471-2105-9-350en_US
dc.identifier.pubmedid18721473en_US
dc.identifier.pmcid2535605en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Copyright 2008 Jiang et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Except where otherwise noted, this item's license is described as Copyright 2008 Jiang et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.