This package contains the ESPRIT-Forest source code and a testing example.
We provide two versions: the OpenMP code for single-node multi-core environments (shared memory architecture),
and the MPI-OpenMP mix code for multi-node multi-core environments.
You can follow the instructions below to run it.


1. Compile
1.1) For the OpenMP version, a GNU c++ compiler is enough to compile the code. Please make sure
that gcc/g++ is installed and available in your path. Then type:
	
	cd ESF-MP-src
	make

A compiled executable code of the OpenMP version is already release on our web site, which supports
either Linux (64bit) or Windows (32bit, under MinGW shell) environments.
	
1.2) For the MPI version, please ensure that a mpi compiler is installed and available in your path.
We recommended intel-mpi which is the default setting in the Makefile. To compile the source code, 
type:
	cd ESF-MPI-src
	make

For other types of compilers, please modify the Makefile accordingly. If you use a GNU c++ complier
as a MPI backend compiler, please use parameter -O2 to ensure the correctness of the code.

If the compilation is OK, you will get a set of binary files and can proceed to the next step.

2.  Execution
2.1) OpenMP execution

For OpenMP version, it is simple to run the code directly with the provided script esf-mp.sh:

./esf-mp <number-of-cpu-cores> <input-fasta-file-name> [<output-prefix>] 

The resulted files are in the same format as those of ESPRIT and ESPRIT-Tree. Please refer to
their user guide for details. (URL:  http://plaza.ufl.edu/sunyijun/Paper/ESTree_User.pdf )
An user guide for ES-Forest will be available soon.

2.2) MPI execution
To run the MPI version, you should be sure that your HPC cluster supports mix MPI-OpenMP programs.
Please check it with your HPC administrator. 

2.2.1)Pre-processing
Pre-processing is done in a single-thread mode using the "preproc" code. You can either execute it
locally or submit it to a HPC task queue. The command line is :

./preproc <input-fasta-file>

which merely merges redundant sequences but does not perform quality filtering. In order to
remove low-quality reads, please type:

./preproc [<-p primer-fasta-file>] -w -v 2 <input-fasta-file> 

suppose the input file is "example.fas", the above command will result in three processed files:
"example_Clean.fas" , "example_Clean.frq", and "example_Clean.map", which are to be used in 
the consequence steps of execution.

2.2.2) Environment Setting
In order to have ESPRIT-Forest run efficiently on a HPC cluster, you need to ensure that each
HPC computing node enables multi-core executing, and execute the following command on EACH of the 
computing nodes (rather than the task submission interface).

export OMP_NUM_THREADS=[Num-of-threads-per-node]

To do so, please consult your HPC administrator. Generally, adding the above command in the 
HPC submission script will work.

2.2.3) Execution of ESPRIT-Forest

To carry out ESPRIT-Forest, the execution command is:

mpirun -n [num-nodes] ./ESForest -f example_Clean.frq example_Clean.fas

The command may differ with respect to different HPC systems. For example, in SLURM
systems 'srun' will be used instead of 'mpirun'.
 
2.2.4) Post-processing of execution results
The following command maps the clustering result executed on the _Clean files
back to the original sequences:

perl invmap.pl example_Clean.Clusters example_Clean.map example.org.Clusters
 
NOTE: Generally, HPC systems won't allow you to execute the task directly, but will force you 
to submit your command lines in a script and submit it. Please consult your HPC user guide for
how to implement the above commands in a submission script. A script for the SLURM system
is provided in the source code package.

3. Output Hints

Below is an example of The output hints generated by ESPRIT-Forest. If some of the hints are
not shown or abnormal numbers (e.g., the number of threads running is different from what you
set), it means that the code is not properly configured and please check your execution script.

>>
ESPRIT-Forest-MP execution path /projects/yijunsun/ES-Forest/./ESForest [# execution path]
Input GutV2_Clean.fas Freq GutV2_Clean.frq from 0.010 to 0.150   [#input files]
Opening GutV2_Clean.fas                                          [#input statistics]   
....
Node 0 read 472923 seqs in 86.122365 secs
....
Node 1 read 472923 seqs in 86.122593 secs
....
Total 1113187 Reads Unique 472923 Average Len 233.5
....
Total 1113187 Reads Unique 472923 Average Len 233.4
.....
Node 0 # 8 threads running                                       [#Number of threads run on each node reported]
Node 1 # 8 threads running
....
5.158776 secs in Building Kmer.                                   [# execution in progress]
Starting PreCluster
....
Node 1 PreClustering Finished with 65.046649 secs 265810 Clusters AL 207113 KM 0
Starting Find NN
....
Node 1 FindNN Finished with 50.172227 secs AL 724546 KM 80745496
....
Node 0 Starting Clustering
....
Node 0 Clustering Finished with 2082.536721 secs NumCls 1038 NumOL 609 AL 4831894 KM 1532445379
....
Generating Outputs                                                [#print a summary of the clustering results]
Level 0.010	 OTUs 218201
Level 0.020	 OTUs 125654
Level 0.030	 OTUs 70044
Level 0.040	 OTUs 40818
Level 0.050	 OTUs 25156
Level 0.060	 OTUs 15913
Level 0.070	 OTUs 10592
Level 0.080	 OTUs 7150
Level 0.090	 OTUs 5038
Level 0.100	 OTUs 3607
Level 0.110	 OTUs 2681
Level 0.120	 OTUs 1940
Level 0.130	 OTUs 1557
Level 0.140	 OTUs 1234
Level 0.150	 OTUs 1038
2426.178339 secs total in clustering.