i-Detect: Feature Selection for Unsupervised Learning through Local Learning

We consider the problem of feature selection for unsupervised learning and develop a new algorithm capable of identifying informative features supporting complex structures embedded in a high-dimensional space. The development of the algorithm is inspired by human learning in detecting complex data structures. We formulate it as an optimization problem with a well-defined objective function, and solve the problem by using an iterative approach. The algorithm can be easily implemented and is computationally very efficient. We use gap statistics to estimate the parameters so that the proposed method is completely parameter-free. We also develop a scheme based on permutation tests to estimate the statistical significance of the presence of a data structure. We demonstrate the effectiveness and versatility of the algorithm by comparing it with seven existing methods on a set of synthetic datasets with a wide variety of structures and cancer microarray gene expression datasets.


J. Yao, Q. Mao, V. Mai, S. Goodison, Y. Sun, Feature Selection for Unsupervised Learning through Local Learning, Pattern Recognition Letters, in press.