ROBUST PEARSON CORRELATION COEFFICIENT FOR IMBALANCED SAMPLE SIZE AND HIGH DIMENSIONAL DATA SET
Abstract
Conventionally, datasets of practical applications often vary in terms of sample sizes and dimensions; for example, undersampling or oversampling techniques are often applied to solve the minority sample size problems. However, formulating the Pearson correlation for imbalanced sample size and high dimensional data poses impracticable challenges. This study addressed the imbalance sample size problem and proposed a new method that could be used as a dual enabler to solve correlation problems for high dimensional data sets. The mean variance cloning technique (MVCT) would be applied to solve the imbalance sample size problem and the absolute variance variable selection technique (AVVS) would be applied as transpose enabler to enhance the computation of the Pearson correlation. This study aimed at revealing how strong or weak the relationship of an imbalanced sample size and high dimensional data set between two objects could be determined. The comparative results showed that the MVCT and the AVVS Pearson correlation demonstrated robust performance for the imbalanced sample size and high dimensional data set. Therefore, the simulation results have shown that the two preprocessing techniques (MVCT and AVVS) are enabler to enhance robust performance of the Pearson correlation. This study concluded that the enhanced Pearson correlation coefficient (AVVS-PCC, MVCT-AVVS-PCC, MVCT-PCC) indicated robust association and potentially suitable to perform different practical tasks that are aimed at solving complex practical problems.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Science World Journal

This work is licensed under a Creative Commons Attribution 4.0 International License.