The Big Data Analytics Center (BIDAC)

Identifying Protein Complexes in Protein-Protein Interaction Data Using Graph Convolutional Network


Prof. Dr. NAZAR ZAKI, College of Information Technology, UAE University Asst. Prof. Dr. Elfadil A. Mohamed, College of Engineering and Information Technology, Ajman University HARSH SINGH, Big Data Analytics Center (BIDAC), UAE University


Protein complexes are groups of two or more polypeptide chains that bind to form noncovalent networks of protein interactions. Over the past decade, researchers have created a number of means of computing the ways in which protein complexes and their members can be identified through these interaction networks. Although most of the existing methods identify protein functional complexes from the protein-protein interaction networks (PPIs) at a fairly decent level, the applicability of advanced graph network methods has not yet been adequately investigated.

This paper proposes various graph convolutional network (GCN) methods to improve the detection of protein complexes. We first formulate the protein complex detection problem as a node classification problem. Then, we developed a Neural Overlapping Community Detection (NOCD) model to cluster the nodes (proteins) using a complex affiliation matrix. A representation learning approach, that combines a multi-class GCN feature extractor (to obtain the nodes’ features) and a mean shift clustering algorithm (to perform the clustering), is also utilized.

We convert the dense-dense matrix operations into dense-sparse or sparse-sparse matrix operations to improve the efficiency of the multi-class GCN network by reducing space and time complexities. The proposed solution significantly improves the scalability of the existing GCN. Finally, we apply clustering aggregation to find the best protein complexes. A grid search is then performed on various detected complexes obtained via three well-known protein detection methods, namely ClusterONE, CMC, and PEWCC, with the help of the Meta-Clustering Algorithm (MCLA) and the Hybrid Bipartite Graph Formulation (HBGF). We test the proposed GCN-based methods on various publicly available datasets and find that they perform significantly better than previous state-of-the-art methods.

Project Github Page


© BIDAC 2022 - All rights reserved.