ROBUST PEARSON CORRELATION COEFFICIENT FOR IMBALANCED SAMPLE SIZE AND HIGH DIMENSIONAL DATA SET

Authors

  • Friday Zinzendoff Okwonu Institute of Strategic Industrial Decision Modeling, School of Quantitative Science, Universiti Utara Malaysia,06010 UUM Sintok, Kedah,
  • Owoyi Mildred Chiyeaka Department of Mathematics, Faculty of Science, Dennis Osadebay University, Asaba,
  • Nor Aishah Ahad Institute of Strategic Industrial Decision Modeling, School of Quantitative Science, Universiti Utara Malaysia,06010 UUM Sintok, Kedah,
  • Olimjon Sharipov Department of Probability Theory and Mathematical Statistics, Institute of Mathematics, National University of Uzbekistan, Tashkent,

Abstract

 Conventionally, datasets of practical applications often vary in terms of sample sizes and dimensions; for example, undersampling or oversampling techniques are often applied to solve the minority sample size problems. However, formulating the Pearson correlation for imbalanced sample size and high dimensional data poses impracticable challenges. This study addressed the imbalance sample size problem and proposed a new method that could be used as a dual enabler to solve correlation problems for high dimensional data sets.  The mean variance cloning technique (MVCT) would be applied to solve the imbalance sample size problem and the absolute variance variable selection technique (AVVS) would be applied as transpose enabler to enhance the computation of the Pearson correlation. This study aimed at revealing how strong or weak the relationship of an imbalanced sample size and high dimensional data set between two objects could be determined. The comparative results showed that the MVCT and the AVVS Pearson correlation demonstrated robust performance for the imbalanced sample size and high dimensional data set. Therefore, the simulation results have shown that the two preprocessing techniques (MVCT and AVVS) are enabler to enhance robust performance of the Pearson correlation. This study concluded that the enhanced Pearson correlation coefficient (AVVS-PCC, MVCT-AVVS-PCC, MVCT-PCC) indicated robust association and potentially suitable to perform different practical tasks that are aimed at solving complex practical problems.

Downloads

Published

2025-03-31

Issue

Section

ARTICLES