This package contains source code of HybridHC and a test example.

### Download
To download this package, type

```
git clone https://github.com/Wayne-Zen/HHC_EST.git
```

### Relevant Packages
We tested our C++ code in both MacOS and Linux Systems with GCC 4.8. 

1. GNU GCC with OpenMP   
2. python [https://www.python.org/downloads/](https://www.python.org/downloads/)  
3. scikit-learn [http://scikit-learn.org/stable/](http://scikit-learn.org/stable/)  

Some open source packages are incorporated into our source code:

1. armadillo [http://arma.sourceforge.net/](http://arma.sourceforge.net/)
2. seqan [http://www.seqan.de/](http://www.boost.org/)
3. tclap [http://tclap.sourceforge.net/](http://tclap.sourceforge.net/)
4. boost [http://www.boost.org/](http://www.boost.org/) 

### Quick Demo
You can run a quick demo by typing the ruuning the following script.
Please make sure you have __gcc 4.8__ and __scikit-learn__ installed.

```
cd HHC_EST
bash -x demo.sh
```

### Detailed Steps
You can follow the instructions below to run the test step by step.

#### 1. Comilation
You have to compile both ESPRIT-Tree and HybridHC source code.
To compile the code, you need __gcc 4.8__ or higher in your system path.
To compile the ESPRIT-Tree source code, you can type:

```bash
cd HHC_EST/src/EST/
make
```

To compile the HybridHC source code, you can type:

```bash
cd HHC_EST/src/HybridHC/
make
```

After compilation, you need to move the binary file in to "EXE" folder.

```bash
cd HHC_EST/ 
cp src/EST/preproc src/EST/pbpcluster src/HybridHC/hybridhc EXE
```
#### 2. Run HybirdHC
To run HybridHC, you can type:

```bash
cd HHC_EST/example/
../exe/hybridhc  -p 4 -d 0.3 -s 10000 -e 0 -i test_clean.fa -o .
```

To get the help information 

```bash
cd HHC_EST/example/
../EXE/hybridhc -h

USAGE: 

   ../EXE/hybridhc  [-o <string>] -i <string> [-e <int>] [-d <float>] [-s
                    <int>] [-n <int>] [-r <float>] [-m <int>] [-k <int>]
                    [-p <int>] [--] [--version] [-h]


Where: 

   -o <string>,  --outputdirectory <string>
     the output directory with clusters

   -i <string>,  --inputfilename <string>
     (required)  the input filename with fasta format

   -e <int>,  --distance <int>
     the option of distance (default 0):

     0-kmer distance

     1-banded needleman wunsch


   -d <float>,  --diameter <float>
     the diameter of clusters for active hierarchical clustering (default
     0.1)

   -s <int>,  --sizecluster <int>
     the size of clusters for active hierarchical clustering (default
     10000)

   -p <int>,  --numofthreads <int>
     the number of threads in OpenMP (default 1)

   --version
     Displays version information and exits.

   -h,  --help
     Displays usage information and exits.


   Command description message
```


#### 3. Calculate NMI score
The "example" folder contains a python script "nmi.py" for calculating NMI score.
You need Python and __scikit-learn__ installed to run this script.
To calculate the NMI score, you can type:

```bash
cd HHC_EST/example/
python nmi.py test.tax 6 0.03
```

* Argument 1: the ground-truth taxonomy file for the input fasta file.
* Argument 2: taxonomy level, 6 for genus level and 7 for species level.
* Argument 3: cutoff level of hierarchical clustering.