Prepared by the eResearch Office, QUT.
Nextflow is a pipeline engine that can take advantage of the batch nature of the HPC environment to efficiently and quickly run Bioinformatic workflows.
For more information about Nextflow, please visit Nextflow - A DSL for parallel and scalable computational pipelines
Installing Nextflow
Nextflow is meant to run from your home folder on a Linux machine like the HPC.
A few commands can install Nextflow
module load java curl -s https://get.nextflow.io | bash mv nextflow $HOME/bin #verify Nextflow is installed mkdir $HOME/nftemp && cd $HOME/nftemp nextflow run hello #check for output of running the short nextflow hello pipeline cd $HOME rm -rf nftemp
Line 1: The module load command is necessary to ensure java is available
Line 2: This command downloads and assembles the parts of nextflow - this step might take some time.
Line 3: When finished, the nextflow binary will be in the current folder so it should be moved to your “bin” folder” so it can be found later.
Line 5: Make a temporary folder for Nextflow to create files when it runs.
Line 6: Verify Nextflow is working.
Lines 7 and 8: Clean up
Nextflow’s Default Configuration
Once Nextflow is installed, there are some settings that should be applied to take advantage of the HPC environment. Nextflow has a hierarchy of configuration files, where the base configuration that is applied to every workflow you run is here:
$HOME/.nextflow/config
You can create a suitable file for use on the QUT HPC with this.
Copy and paste the following text into your Linux command line and hit ‘enter’. This will make a few small changes to your local account so that Nextflow can run correctly.
[[ -d $HOME/.nextflow ]] || mkdir -p $HOME/.nextflow cat <<EOF > $HOME/.nextflow/config singularity { cacheDir = '$HOME/.nextflow/NXF_SINGULARITY_CACHEDIR' autoMounts = true } conda { cacheDir = '$HOME/.nextflow/NXF_CONDA_CACHEDIR' } process { executor = 'pbspro' scratch = false cleanup = false } EOF
Preparing Data
Typically, the pipeline you want to run will want the data prepared in a particular way. You can check the pipeline’s help or website for a guide. Accessing help is typically:
nextflow run rnaseq --help
Some pipelines may need file names, and others may want a CSV file with file names, paths to raw data, and other information.
Running Nextflow
When you run Nextflow, it is a good idea to create a folder for the run - this keeps all the files separate and easy to manage.
When Nextflow runs, it creates a work folder where all the temporary and work-in-progress files are stored and a results folder where the output of the pipeline run is typically stored.
Once you have prepared your input data for the pipeline, you are ready to run the pipeline.
Nextflow is run with the command (After changing to the run folder):
nextflow run {pipeline name} {options}
However, it is good practice and much safer to submit a job on the HPC to run Nextflow on your pipeline. A job file (called launch.pbs) might look like:
#!/bin/bash -l #PBS -N MyNextflowRun #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 cd $PBS_O_WORKDIR module load java NXF_OPTS='-Xms1g -Xmx4g' nextflow run nf-core/rnaseq
What do these lines mean?
Lines 1-5 are typical PBS system commands here the name of the job is MyNextflowRun, 2 CPUS and 4gb of ram is selected, and the job will run for 24 hours. This is the total time for the pipeline run - it may take days or weeks depending on how much data and the pipeline.
Line 6 is to ensure the Java environment is available (Nextflow needs Java to run)
Line 7 tells Nextflow how much RAM to use
Line 8 runs Nextflow.
To see the output of Nextflow while running as a job, you can use the Nextflow Tower.
Using the Nextflow Tower
Nextflow Tower allow monitoring of Nextflow runs. To use the NFTower, please visit
https://nftower.qut.edu.au or the BioCommons: https://tower.services.biocommons.org.au/
(Control-Click this to open in a new Window)
There are no passwords for the Tower, instead, you use a link sent to your email.
Look for the Sign in button (Top Right) then provide your email address.
In the email that comes from eresearch@qut.edu.au, look for the “Access Nextflow Tower now!” option.