Anacapa is a toolkit designed to construct reference databases and assign taxonomy, from eDNA sequences.

For more details on anacapa, please read though the anacapa Github page:

Purpose of this guide

This guide is designed to step you though running the anacapa toolkit on QUTs HPC, as the published anacapa documentation on Github can be a bit hard to follow and needs some modification to work on the HPC.

This guide was developed and written by QUT’s eResearch team. For information about this guide or other bioinformatic analyses, contact us at eresearch@qut.edu.au

Requirements

Your eDNA sample files, which should be demultiplexed Illumina sequences in fastq format. If they are not demultiplexed or not Illumina, contact us at eResearch: eresearch@qut.edu.au
A table of the barcodes and adapters used to amplify your sequences. If you don’t already have these, you can usually request them from the organisation that sequenced your samples.
A QUT HPC account.
A basic knowledge of Linux command line operation and usage of QUT’s HPC is strongly recommended, but not required, as all the command line instructions are explicitly explained and can usually simply be cut and pasted into your HPC command line.

An overview of HPC commands and usage, as well as a link for requesting access to the HPC (if you don’t currently have a HPC account) is here:

HPC

There are plenty of online guides that teach basic Linux command line usage, for example:

https://www.youtube.com/watch?v=cBokz0LTizk&t=1s

https://www.youtube.com/watch?v=s3ii48qYBxA

Step 1: initial setup

You will be running various processes on the HPC that require quite a lot of processing power. Do not run these command on the 'head node' (which is the node you enter when you log on). Instead, either submit these command via a PBS script or an interactive PBS session, which runs your processes on another node.

The details of creating and submitting a PBS script can be found here:

HPC

If you’re testing several tools or running multiple separate commands then an interactive PBS session may be preferable. Below is the command to create an interactive PBS session with 8 CPUs and 64GB memory.

*Note: In this guide, commands to be entered by the user will be in grey boxes like the one below. As with most commands, you can simply cut and paste this into your command line.

qsub -I -S /bin/bash -l walltime=11:00:00 -l select=1:ncpus=8:mem=64gb

This request gets put in the HPC queue until there is an available node with sufficient resources. This may take several minutes, or possibly longer.

From your home directory, create a subdirectory called ‘anacapa’ and enter this subdirectory.

cd ~
mkdir anacapa
cd anacapa

Step 2: Running anacapa on Singularity

Anacapa uses many tools, which would be difficult and time consuming to install all of them on the HPC. Fortunately, the developers of Anacapa have created a Singularity image that contains all the required tools. Once the image is downloaded, all the standard tools and commands in the Anacapa guide can be run by prefixing them with ‘singularity exec anacapa-1.5.0.img’ which runs the subsequent command in the singularity container.

Information about running Anacapa in the singularity container is found here:

GitHub - anacapa-container: A containerized way to run the Anacapa eDNA processing toolkit on your own machine or server.

Download the Anacapa Singularity container to your anacapa directory.

wget https://zenodo.org/record/2602180/files/anacapa-1.5.0.img

Anacapa - eDNA analysis toolkit

Purpose of this guide

Requirements

Step 1: initial setup

Step 2: Running anacapa on Singularity