This package contains the ESPRIT-Forest source code and a testing example. We provide two versions: the OpenMP code for single-node multi-core environments (shared memory architecture), and the MPI-OpenMP mix code for multi-node multi-core environments. You can follow the instructions below to run it. 1. Compile 1.1) For the OpenMP version, a GNU c++ compiler is enough to compile the code. Please make sure that gcc/g++ is installed and available in your path. Then type: cd ESF-MP-src make A compiled executable code of the OpenMP version is already release on our web site, which supports either Linux (64bit) or Windows (32bit, under MinGW shell) environments. 1.2) For the MPI version, please ensure that a mpi compiler is installed and available in your path. We recommended intel-mpi which is the default setting in the Makefile. To compile the source code, type: cd ESF-MPI-src make For other types of compilers, please modify the Makefile accordingly. If you use a GNU c++ complier as a MPI backend compiler, please use parameter -O2 to ensure the correctness of the code. If the compilation is OK, you will get a set of binary files and can proceed to the next step. 2. Execution 2.1) OpenMP execution For OpenMP version, it is simple to run the code directly with the provided script esf-mp.sh: ./esf-mp [] The resulted files are in the same format as those of ESPRIT and ESPRIT-Tree. Please refer to their user guide for details. (URL: http://plaza.ufl.edu/sunyijun/Paper/ESTree_User.pdf ) An user guide for ES-Forest will be available soon. 2.2) MPI execution To run the MPI version, you should be sure that your HPC cluster supports mix MPI-OpenMP programs. Please check it with your HPC administrator. 2.2.1)Pre-processing Pre-processing is done in a single-thread mode using the "preproc" code. You can either execute it locally or submit it to a HPC task queue. The command line is : ./preproc which merely merges redundant sequences but does not perform quality filtering. In order to remove low-quality reads, please type: ./preproc [<-p primer-fasta-file>] -w -v 2 suppose the input file is "example.fas", the above command will result in three processed files: "example_Clean.fas" , "example_Clean.frq", and "example_Clean.map", which are to be used in the consequence steps of execution. 2.2.2) Environment Setting In order to have ESPRIT-Forest run efficiently on a HPC cluster, you need to ensure that each HPC computing node enables multi-core executing, and execute the following command on EACH of the computing nodes (rather than the task submission interface). export OMP_NUM_THREADS=[Num-of-threads-per-node] To do so, please consult your HPC administrator. Generally, adding the above command in the HPC submission script will work. 2.2.3) Execution of ESPRIT-Forest To carry out ESPRIT-Forest, the execution command is: mpirun -n [num-nodes] ./ESForest -f example_Clean.frq example_Clean.fas The command may differ with respect to different HPC systems. For example, in SLURM systems 'srun' will be used instead of 'mpirun'. 2.2.4) Post-processing of execution results The following command maps the clustering result executed on the _Clean files back to the original sequences: perl invmap.pl example_Clean.Clusters example_Clean.map example.org.Clusters NOTE: Generally, HPC systems won't allow you to execute the task directly, but will force you to submit your command lines in a script and submit it. Please consult your HPC user guide for how to implement the above commands in a submission script. A script for the SLURM system is provided in the source code package. 3. Output Hints Below is an example of The output hints generated by ESPRIT-Forest. If some of the hints are not shown or abnormal numbers (e.g., the number of threads running is different from what you set), it means that the code is not properly configured and please check your execution script. >> ESPRIT-Forest-MP execution path /projects/yijunsun/ES-Forest/./ESForest [# execution path] Input GutV2_Clean.fas Freq GutV2_Clean.frq from 0.010 to 0.150 [#input files] Opening GutV2_Clean.fas [#input statistics] .... Node 0 read 472923 seqs in 86.122365 secs .... Node 1 read 472923 seqs in 86.122593 secs .... Total 1113187 Reads Unique 472923 Average Len 233.5 .... Total 1113187 Reads Unique 472923 Average Len 233.4 ..... Node 0 # 8 threads running [#Number of threads run on each node reported] Node 1 # 8 threads running .... 5.158776 secs in Building Kmer. [# execution in progress] Starting PreCluster .... Node 1 PreClustering Finished with 65.046649 secs 265810 Clusters AL 207113 KM 0 Starting Find NN .... Node 1 FindNN Finished with 50.172227 secs AL 724546 KM 80745496 .... Node 0 Starting Clustering .... Node 0 Clustering Finished with 2082.536721 secs NumCls 1038 NumOL 609 AL 4831894 KM 1532445379 .... Generating Outputs [#print a summary of the clustering results] Level 0.010 OTUs 218201 Level 0.020 OTUs 125654 Level 0.030 OTUs 70044 Level 0.040 OTUs 40818 Level 0.050 OTUs 25156 Level 0.060 OTUs 15913 Level 0.070 OTUs 10592 Level 0.080 OTUs 7150 Level 0.090 OTUs 5038 Level 0.100 OTUs 3607 Level 0.110 OTUs 2681 Level 0.120 OTUs 1940 Level 0.130 OTUs 1557 Level 0.140 OTUs 1234 Level 0.150 OTUs 1038 2426.178339 secs total in clustering.