...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Aims:
Install Nextflow in your HPC environment
Run a test Nextflow task to verify the installation process
eResearch portal: https://eresearchqut.atlassian.net/servicedesk/customer/portals
Background
What is nextflow?
Nextflow is a free and open-source pipeline management software that enables scalable and reproducible scientific workflows. It allows the adaptation of pipelines written in the most common scripting languages.
Key features of Nextflow:
Reproducible → version control and use of containers ensure the reproducibility of nextflow pipelines
Portable → compute agnostic (i.e., HPC, cloud, desktop)
Scalable → run from a single to thousands of samples
Minimal digital literacy → accessible to anyone
Active global community → more and more nextflow pipelines are available (i.e., https://nf-co.re/pipelines )
...
For more information about Nextflow, please visit Nextflow - A DSL for parallel and scalable computational pipelines
Work in the HPC |
---|
Installing Nextflow
Nextflow is meant to run from your home folder on a Linux machine like the HPC.
A few commands can install Nextflow. Copy and paste the following block of code into your terminal (i.e., PuTTy that is already connected to the terminal) and hit 'enter'
Code Block |
---|
module load java curl -s https://get.nextflow.io | bash mv nextflow $HOME/bin #verify Nextflow is installed mkdir $HOME/nftemp && cd $HOME/nftemp nextflow run hello #check for output of running the short nextflow hello pipeline cd $HOME rm -rf nftemp |
Line 1: The module load command is necessary to ensure java is available
Line 2: This command downloads and assembles the parts of nextflow - this step might take some time.
Line 3: When finished, the nextflow binary will be in the current folder so it should be moved to your “bin” folder” so it can be found later.
Line 5: Make a temporary folder for Nextflow to create files when it runs.
Line 6: Verify Nextflow is working.
Lines 7 and 8: Clean up
You should see something like this:
...
Nextflow’s Default Configuration
Once Nextflow is installed, there are some settings that should be applied to take advantage of the HPC environment. Nextflow has a hierarchy of configuration files, where the base configuration that is applied to every workflow you run is here:
...
Code Block |
---|
[[ -d $HOME/.nextflow ]] || mkdir -p $HOME/.nextflow cat <<EOF > $HOME/.nextflow/config singularity { cacheDir = '$HOME/.nextflow/NXF_SINGULARITY_CACHEDIR' autoMounts = true enabled = true runOptions = '-B /data1' } conda { cacheDir = '$HOME/.nextflow/NXF_CONDA_CACHEDIR' } process { executor = 'pbspro' scratch = falsetrue cleanup = false } includeConfig '/work/datasets/reference/nextflow/qutgenome.config' plugins { id 'nf-validation@1.1.1' // Validation of pipeline parameters and creation of an input channel from a samplesheet id 'nf-prov@1.2.1' // Provenance reports for pipeline runs } EOF |
check the creation of the folders:
Code Block |
---|
cat $HOME/.nextflow/config |
...