CoIn: Correlation Induced Clustering for Cognition of High Dimensional Bioinformatics Data

Zeng Zeng, Ziyuan Zhao, Kaixin Xu, Yangfan Li*, Cen Chen*, Xiaofeng Zou, Yulan Wang, Wei Wei, Pierce K.H. Chow, Xiaoli Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

Analysis of high dimensional biomedical data such as microarray gene expression data and mass spectrometry images, is crucial to provide better medical services including cancer subtyping, protein homology detection, etc. Clustering is a fundamental cognitive task which aims to group unlabeled data into multiple clusters based on their intrinsic similarities. However, for most clustering methods, including the most widely used K-means algorithm, all features of the high dimensional data are considered equally in relevance, which distorts the performance when clustering high-dimensional data where there exist many redundant variables and correlated variables. In this paper, we aim at addressing the problem of the high dimensional bioinformatics data clustering and propose a new correlation induced clustering, CoIn, to capture complex correlations among high dimensional data and guarantee the correlation consistency within each cluster. We evaluate the proposed method on a high dimensional mass spectrometry dataset of liver cancer tumor to explore the metabolic differences on tissues and discover the intra-tumor heterogeneity (ITH). By comparing the results of baselines and ours, it has been found that our method produces more explainable and understandable results for clinical analysis, which demonstrates the proposed clustering paradigm has the potential with application to knowledge discovery in high dimensional bioinformatics data.

Original languageEnglish
Pages (from-to)598-607
Number of pages10
JournalIEEE Journal of Biomedical and Health Informatics
Volume27
Issue number2
DOIs
Publication statusPublished - Feb 1 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

ASJC Scopus Subject Areas

  • Computer Science Applications
  • Health Informatics
  • Electrical and Electronic Engineering
  • Health Information Management

Keywords

  • Clustering
  • correlation analysis
  • correlation induced clustering

Fingerprint

Dive into the research topics of 'CoIn: Correlation Induced Clustering for Cognition of High Dimensional Bioinformatics Data'. Together they form a unique fingerprint.

Cite this