eResearch Installing Nextflow

Aims:

  • Install Nextflow in your HPC environment

  • Run a test Nextflow task to verify the installation process

Background

What is nextflow?

Nextflow enables the ability to write, deploy, and share data-intensive, highly scalable, workflows on any infrastructure.

 

Nextflow is a pipeline engine that can take advantage of the batch nature of the HPC environment to efficiently and quickly run Bioinformatic workflows.

For more information about Nextflow, please visit Nextflow - A DSL for parallel and scalable computational pipelines

Work in the HPC

Work in the HPC

Installing Nextflow

Nextflow is meant to run from your home folder on a Linux machine like the HPC.

A few commands can install Nextflow. Copy and paste the following block of code into your terminal (i.e., PuTTy that is already connected to the terminal) and hit 'enter'

module load java curl -s https://get.nextflow.io | bash mv nextflow $HOME/bin #verify Nextflow is installed mkdir $HOME/nftemp && cd $HOME/nftemp nextflow run hello #check for output of running the short nextflow hello pipeline cd $HOME rm -rf nftemp
  • Line 1: The module load command is necessary to ensure java is available

  • Line 2: This command downloads and assembles the parts of nextflow - this step might take some time.

  • Line 3: When finished, the nextflow binary will be in the current folder so it should be moved to your “bin” folder” so it can be found later.

  • Line 5: Make a temporary folder for Nextflow to create files when it runs.

  • Line 6: Verify Nextflow is working.

  • Lines 7 and 8: Clean up

You should see something like this:

Nextflow’s Default Configuration

Once Nextflow is installed, there are some settings that should be applied to take advantage of the HPC environment. Nextflow has a hierarchy of configuration files, where the base configuration that is applied to every workflow you run is here:

$HOME/.nextflow/config

You can create a suitable file for use on the QUT HPC with this.

Copy and paste the following text into your Linux command line and hit ‘enter’. This will make a few small changes to your local account so that Nextflow can run correctly.

[[ -d $HOME/.nextflow ]] || mkdir -p $HOME/.nextflow cat <<EOF > $HOME/.nextflow/config singularity { cacheDir = '$HOME/.nextflow/NXF_SINGULARITY_CACHEDIR' autoMounts = true } conda { cacheDir = '$HOME/.nextflow/NXF_CONDA_CACHEDIR' } process { executor = 'pbspro' scratch = false cleanup = false } includeConfig '/work/datasets/reference/nextflow/qutgenome.config' EOF

check the creation of the folders:

cat $HOME/.nextflow/config

Next exercise:eResearch nf-core-RNAseq pipeline