Investigation on particle swarm optimisation for feature selection on high-dimensional data: local search and selection bias Academic Article uri icon

abstract

  • Feature selection is an essential step in classification tasks with a large number of features, such as in gene expression data. Recent research has shown that particle swarm optimisation (PSO) is a promising approach to feature selection. However, it also has potential limitation to get stuck into local optima, especially for gene selection problems with a huge search space. Therefore, we developed a PSO algorithm (PSO-LSRG) with a fast “local search” combined with a gbest resetting mechanism as a way to improve the performance of PSO for feature selection. Furthermore, since many existing PSO-based feature selection approaches on the gene expression data have feature selection bias, i.e. no unseen test data is used, 2 sets of experiments on 10 gene expression datasets were designed: with and without feature selection bias. As compared to standard PSO, PSO with gbest resetting only, and PSO with local search only, PSO-LSRG obtained a substantial dimensionality reduction and a significant improvement on the classification performance in both sets of experiments. PSO-LSRG outperforms the other three algorithms when feature selection bias exists. When there is no feature selection bias, PSO-LSRG selects the smallest number of features in all cases, but the classification performance is slightly worse in a few cases, which may be caused by the overfitting problem. This shows that feature selection bias should be avoided when designing a feature selection algorithm to ensure its generalisation ability on unseen data.

publication date

  • 2016