...
For more information about Nextflow, please visit Nextflow - A DSL for parallel and scalable computational pipelines
Installing Nextflow
Nextflow is meant to run from your home folder on a Linux machine like the HPC.
...
Line 1: The module load command is necessary to ensure java is available
Line 2: This command downloads and assembles the parts of nextflow - this step might take some time.
Line 3: When finished, the nextflow binary will be in the current folder so it should be moved to your “bin” folder” so it can be found later.
Line 5: Make a temporary folder for Nextflow to create files when it runs.
Line 6: Verify Nextflow is working.
Lines 7 and 8: Clean up
Nextflow’s Default Configuration
Once Nextflow is installed, there are some settings that should be applied to take advantage of the HPC environment. Nextflow has a hierarchy of configuration files, where the base configuration that is applied to every workflow you run is here:
...
Copy and paste the following text into your Linux command line and hit ‘enter’. This will make a few small changes to your local account so that Nextflow can run correctly.
Code Block |
---|
[[ -d $HOME${HOME}/.nextflow ]] || mkdir -p $HOME${HOME}/.nextflow cat <<EOF > $HOME${HOME}/.nextflow/config singularity { cacheDir = '$HOME${HOME}/.nextflow/NXF_SINGULARITY_CACHEDIR' autoMounts = true } conda { cacheDir = '$HOME${HOME}/.nextflow/NXF_CONDA_CACHEDIR' } process { executor = 'pbspro' scratch = false cleanup = false } EOF |
Preparing Data
Typically, the pipeline you want to run will want the data prepared in a particular way. You can check the pipeline’s help or website for a guide. Accessing help is typically:
...
Some pipelines may need file names, and others may want a CSV file with file names, paths to raw data, and other information.
Running Nextflow
When you run Nextflow, it is a good idea to create a folder for the run - this keeps all the files separate and easy to manage.
...
Lines 1-5 are typical PBS system commands here the name of the job is MyNextflowRun, 2 CPUS and 4gb of ram is selected, and the job will run for 24 hours. This is the total time for the pipeline run - it may take days or weeks depending on how much data and the pipeline.
Line 6 is to ensure the Java environment is available (Nextflow needs Java to run)
Line 7 tells Nextflow how much RAM to use
Line 8 runs Nextflow.
To see the output of Nextflow while running as a job, you can use the Nextflow Tower.
Using the Nextflow Tower
Nextflow Tower allow monitoring of Nextflow runs. To use the NFTower, please visit
https://nftower.qut.edu.au or the BioCommons: https://tower.services.biocommons.org.au/
(Control-Click this to open in a new Window)
There are no passwords for the Tower, instead, you use a link sent to your email.
Look for the Sign in button (Top Right) then provide your email address.
...
Nextflow
...
.