Improving the Performance of Feature Selection Methods with Low-Sample-Size Data

  • Wanwan Zheng
    Institute of Industrial Internet and Internet of Things , China Academy of Information and Communications Technology
  • Mingzhe Jin
    Graduate School of Culture and Information Science , Doshisha University

説明

<jats:title>Abstract</jats:title> <jats:p>Feature selection refers to a critical preprocessing of machine learning to remove irrelevant and redundant data. According to feature selection methods, sufficient samples are usually required to select a reliable feature subset, especially considering the presence of outliers. However, sufficient samples cannot always be ensured in several real-world applications (e.g. neuroscience, bioinformatics and psychology). This study proposed a method to improve the performance of feature selection methods with ultra low-sample-size data, which is named feature selection based on data quality and variable training samples (QVT). Given that none of feature selection methods can perform optimally in all scenarios, QVT is primarily characterized by its versatility, because it can be implemented in any feature selection method. Furthermore, compared to the existing methods which tried to extract a stable feature subset for low-sample-size data by increasing the sample size or using more complicated algorithm, QVT tried to get improvement using the original data. An experiment was performed using 20 benchmark datasets, three feature selection methods and three classifiers to verify the feasibility of QVT; the results showed that using features selected by QVT is capable of achieving higher classification accuracy than using the explicit feature selection method, and significant differences exist.</jats:p>

収録刊行物

  • The Computer Journal

    The Computer Journal 66 (7), 1664-1686, 2022-04-09

    Oxford University Press (OUP)

参考文献 (31)*注記

もっと見る

関連プロジェクト

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ