We published the JDHMT model in JBI. https://www.sciencedirect.com/science/article/pii/S1532046421003026
- We propose a typical knowledge form in multi-relational heterogeneous graph, i.e., (gene/disease, uni-relation, other heterogeneous entity), (gene, multi-relation, disease), where a relatively sparse multi-relation knowledge between gene and disease is preserved. (我们在多关系异构图中提出了一种典型的知识形式，即（基因/疾病，单关系，其他异构实体），（基因，多关系，疾病），其中基因之间相对稀疏的多关系知识和 疾病得以保存。)
2. We curate human genes and diseases in seven mainstream datasets and construct a massive heterogeneous gene-disease network, which consists of 163,024 nodes and 25,265,607 edges, and relates to 27,165 genes, 2,665 diseases, 15,067 chemicals, 108,023 mutations, 2,363 pathways, and 7.732 phenotypes. (我们在七个主流数据集中整理人类基因和疾病，并构建了一个庞大的异质基因疾病网络，该网络由 163,024 个节点和 25,265,607 个边组成，涉及 27,165 个基因、2,665 个疾病、15,067 种化学物质、108,023 个突变、2,363 个途径和 7.732 个表型。)
3. We introduce a joint tensor/matrix decomposition to capture semantics from both uni-relation and multi-relation in a heterogeneous graph. It suggests a reliable approach to capture both uni- and multi-relational knowledge in the view of relation-learning paradigm for embeddings generation. （我们引入了一种联合张量/矩阵分解来从异构图中的单关系和多关系中捕获语义。 从嵌入生成的关系学习范式的角度来看，它提出了一种可靠的方法来捕获单关系和多关系知识。）
Node embedding of biological entity network has been widely investigated for the downstream application scenarios. To embed full semantics of gene and disease, a multi-relational heterogeneous graph is considered in a scenario where uni-relation between gene/disease and other heterogeneous entities are abundant while multi-relation between gene and disease is relatively sparse. After introducing this novel graph format, it is illuminative to design a specific data integration algorithm to fully capture the graph information and bring embeddings with high quality.
First, a typical multi-relational triple dataset was introduced, which carried significant association between gene and disease. Second, we curated all human genes and diseases in seven mainstream datasets and constructed a large-scale gene-disease network, which compromising 163,024 nodes and 25,265,607 edges, and relates to 27,165 genes, 2,665 diseases, 15,067 chemicals, 108,023 mutations, 2,363 pathways, and 7.732 phenotypes. Third, we proposed a Joint Decomposition of Heterogeneous Matrix and Tensor (JDHMT) model, which integrated all heterogeneous data resources and obtained embedding for each gene or disease. Forth, a visualized intrinsic evaluation was performed, which investigated the embeddings in terms of interpretable data clustering. Furthermore, an extrinsic evaluation was performed in the form of linking prediction. Both intrinsic and extrinsic evaluation results showed that JDHMT model outperformed other eleven state-of-the-art (SOTA) methods which are under relation-learning, proximity-preserving or message-passing paradigms. Finally, the constructed gene-disease network, embedding results and codes were made available.
Data and Codes Availability
The constructed massive gene-disease network is available at: https://hzaubionlp.com/heterogeneous-biological-network/. The codes are available at: https://github.com/bionlp-hzau/JDHMT.
Kaiyin Zhou, Sheng Zhang, Yuxing Wang, Kevin Bretonnel Cohen, Jin-Dong Kim, Qi Luo, Xinzhi Yao, Xingyu Zhou, Jingbo Xia*. High-quality gene/disease embedding in a multi-relational heterogeneous graph after a joint matrix/tensor decomposition. Journal of Biomedical Informatics. 2022. 126:103973