ESPRIT is a standard implementation of the complete-linkage based hierarchical clustering method for sequence data analysis. It can process several tens of thousands sequences using a desktop computer, and up to one million sequences using a small computer cluster [more]

ESPRIT-Tree is an online-learning based algorithm for large-scale sequence data analysis. The algorithm exhibits a quasilinear computational complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard method [more]


ESPRIT-Forest is a cluster version of ESPRIT-Tree. It allows users to leverage the power of hundreds of computer nodes to process tens of millions of sequences [more]


Hybrid-HC is a hierarchical clustering method for 16S rRNA sequence data analysis that achieves good clustering performance and high scalability on extremely large sequence datasets. It can process tens of millions of sequence with the power of parallel computing [more]


M-pick is a modularity-based clustering method for OTU picking of 16S rRNA sequences. The algorithm does not require a predetermined cut-off level for defining OTUs [more]


Logo is a feature-selection algorithm for high-dimensional data analysis. It can achieve a close-to-optimal solution for arbitrary complex datasets containing millions of irrelevant features. The paper is featured as Spotlight Paper in 2010 September issue of the prestigious TPAMI journal [more]


i-Detect is a feature-selection algorithm for unsupervised learning.  It can detect informative features supporting arbitrary complex data structures embedded in a high-dimensional space [more]


I-RELIEF is an iterative version of the well-known RELIEF algorithm for feature selection for supervised learning based on the Expectation-Maximum principals [code]


Solving a L1 regularized learning problem has been considered a difficult problem. We demonstrate that it can be easily solved by using a gradient descent-based approach. In a simulation student, DGM is able to solve a problem with more than two million features in 82 seconds [more]


LFE is a natural extension of the RELIEF algorithm for the feature extraction purpose. It collects discriminant information through local learning and can be solved as an eigenvalue decomposition problem with a closed-form solution [code]