Table of Contents | ||
---|---|---|
|
...
Nextflow is a free and open-source pipeline management software that enables scalable and reproducible scientific workflows. It allows the adaptation of pipelines written in the most common scripting languages.
Key features of Nextflow:
Reproducible → version control and use of containers ensure the reproducibility of nextflow pipelines
Portable → compute agnostic (i.e., HPC, cloud, desktop)
Scalable → run from a single to thousands of samples
Minimal digital literacy → accessible to anyone
Active global community → more and more nextflow pipelines are available (i.e., https://nf-co.re/pipelines )
...
To install Nextflow, copy and paste the following block of code into your terminal (i.e., PuTTy that is already connected to the terminal) and hit 'enter':
Code Block |
---|
module load java curl -s https://get.nextflow.io | bash mv nextflow $HOME/bin |
Line 1: The module load command is necessary to ensure java is available
Line 2: This command downloads and assembles the parts of nextflow - this step might take some time.
Line 3: When finished, the nextflow binary will be in the current folder so it should be moved to your “bin” folder” so it can be found later.
Line 5: Make a temporary folder for Nextflow to create files when it runs.
Line 6: Verify Nextflow is working.
Lines 7 and 8: Clean up
To verify that Nextflow is installed properly, you can run locally a simple Nextflow pipeline called Hello:
...
Code Block |
---|
cd $HOME rm -rf nftemp |
...
Nextflow’s base configuration
A key Nextflow feature is the ability to decouple the workflow implementation, which describes the flow of data and operations to perform on that data, from the configuration settings required by the underlying execution platform. This enables the workflow to be portable, allowing it to run on different computational platforms such as an institutional HPC or cloud infrastructure, without needing to modify the workflow implementation.
...
Code Block |
---|
process { executor = 'pbspro' } |
Information on Nextflow configuration is described in details here: https://www.nextflow.io/docs/latest/config.html
The base configuration that is applied to every Nextflow workflow you run is located in $HOME/.nextflow/config
.
Nextflow’s Default Configuration
Once you have installed Nextflow on Lyra, there are some settings that should be applied to your $HOME/.nextflow/config
to take advantage of the HPC environment at QUT.
You can To create a suitable config file for use on the QUT HPC by copying , copy and pastingpaste the following text into your Linux command line and hit ‘enter’. This will make the necessary changes to your local account so that Nextflow can run correctly:
Code Block |
---|
[[ -d $HOME/.nextflow ]] || mkdir -p $HOME/.nextflow cat <<EOF > $HOME/.nextflow/config singularity { cacheDir = '$HOME/.nextflow/NXF_SINGULARITY_CACHEDIR' autoMounts = true } conda { cacheDir = '$HOME/.nextflow/NXF_CONDA_CACHEDIR' } process { executor = 'pbspro' scratch = false cleanup = false } EOF |
Line 1: Check if a
.nextflow/config
file already exists in your home directory. Create it if it does not existLine 2-15: Paste Using the cat command, paste text in the newly created
.nextflow/config
file which specifies the cache location for your singularity and conda.What are the parameters you are setting?
Line 3-6 set the directory where remote Singularity images are stored and direct Nextflow to automatically mount host paths in the executed container.
Line 7-9 set the directory where Conda environments are stored.
Line 10-14 sets default directives for processes in your pipeline. Note that the executor is set to pbspro on line 11.
More in depth information on Nextflow configuration is described here: https://www.nextflow.io/docs/latest/config.html.
Nextflow pipeline repositories
...