Table of Contents | ||
---|---|---|
|
...
Nextflow is a free and open-source pipeline management software that enables scalable and reproducible scientific workflows. It allows the adaptation of pipelines written in the most common scripting languages.
Key features of Nextflow that simplify the development, monitoring, execution and sharing of pipelines:
Reproducible → version control and use of containers ensure the reproducibility of nextflow pipelines
Portable → compute agnostic (i.e., HPC, cloud, desktop)
Time and resource management
Scalable → run from a single to thousands of samples
Continuous checkpoints & re-entrancy → allows you to resume its execution from the last successfully executed step
Minimal digital literacy → accessible to anyone
Active global community → more and more nextflow pipelines are available (i.e., https://nf-co.re/pipelines)
Nextflow is a pipeline engine that can take advantage of the batch nature of the HPC environment to efficiently and quickly run bioinformatic workflows.
For more information about Nextflow, please visit Nextflow - A DSL for parallel and scalable computational pipelines
Installing Nextflow
Connect to your Lyra account. Nextflow is meant to run from your home folder on a Linux machine like the HPC.
...
Code Block |
---|
ssh [username]@lyra.qut.edu.au |
Before we start using the HPC, let’s start an interactive session:
Code Block |
---|
qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=1:mem=4gb |
This might take a few minutes to start
You will see this message first:
...
Followed by:
...
You can check that your interactive window is active by running the command:
Code Block |
---|
qstat -u [username] |
...
Nextflow also requires Java 11 or later to be installed. To load java, run the following command:
Code Block |
---|
module load java |
...
Finally make sure you are in your home directory, if unsure you can run the following command:
Code Block |
---|
cd ~ |
Installing Nextflow for the first time
...
To install Nextflow for the first time, copy and paste the following block of code into your terminal (i.e., PuTTy that is already connected to the terminal) and hit 'enter':
Code Block |
---|
curl -s https://get.nextflow.io | bash mv nextflow $HOME/bin |
Line 1: This command downloads and assembles the parts of nextflow - this step might take some time.
Line 2: When finished, the nextflow binary will be in the current folder so it should be moved to your “bin” folder” so it can be found later.
...
Updating Nextflow
If you have installed Nextflow before on the HPC then you will have to run:
Code Block |
---|
nextflow self-update |
Check that your Nextflow installation worked
To verify that Nextflow is installed properly, you can run the following command:
Code Block |
---|
nextflow info |
We will now also run locally your first Nextflow pipeline, which is called Hello:
Code Block |
---|
mkdir $HOME/nftemp && cd $HOME/nftemp nextflow run hello |
Line 1: Make a temporary folder called nftemp for Nextflow to create files when it runs the hello pipeline; change directory to this newly created folder.
Line 2: Verify Nextflow is working.
You should see something like this:
...
If you got this output, well done! You have run your first Nextflow pipeline successfully.
Note | ||
---|---|---|
Troubleshooting:
|
Now go back to your home directory and clean the test folder.
Code Block | |||
---|---|---|---|
cd $HOME
rm -rf nftemp
It is likely there is an typo in the command (e.g. pipeline name) you provided and the error message is telling you it is unable to find a pipeline under the name provided. Check your spelling and resubmit. |
Now that you have managed to run the hello pipeline, go back to your home directory and clean the test folder.
Code Block |
---|
cd $HOME
rm -rf nftemp |
Nextflow’s base configuration
...
Code Block |
---|
[[ -d $HOME/.nextflow ]] || mkdir -p $HOME/.nextflow cat <<EOF > $HOME/.nextflow/config singularity { cacheDir = '$HOME/.nextflow/NXF_SINGULARITY_CACHEDIR' autoMounts = true } conda { cacheDir = '$HOME/.nextflow/NXF_CONDA_CACHEDIR' } process { executor = 'pbspro' scratch = false cleanup = false } includeConfig '/work/datasets/reference/nextflow/qutgenome.config' EOF |
Line 1: Check if a
.nextflow/config
file already exists in your home directory. Create it if it does not existLine 2-15: Using the cat command, paste text in the newly created
.nextflow/config
file which specifies the cache location for your singularity and conda.What are the parameters you are setting?
Line 4-7 set the directory where remote Singularity images are stored and direct Nextflow to automatically mount host paths in the executed container.
Line 8-10 set the directory where Conda environments are stored.
Line 11-15 sets default directives for processes in your pipeline. Note that the executor is set to pbspro on line 12.
Line 16 provides the local path to genome files required for pipelines such as nf-core/rnaseq
Info |
---|
More in depth information on Nextflow configuration is described here: https://www.nextflow.io/docs/latest/config.html. |