Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For more information about Nextflow, please visit Nextflow - A DSL for parallel and scalable computational pipelines

Installing Nextflow

Nextflow is meant to run from your home folder on a Linux machine like the HPC.

...

  • Line 1: The module load command is necessary to ensure java is available

  • Line 2: This command downloads and assembles the parts of nextflow - this step might take some time.

  • Line 3: When finished, the nextflow binary will be in the current folder so it should be moved to your “bin” folder” so it can be found later.

  • Line 5: Make a temporary folder for Nextflow to create files when it runs.

  • Line 6: Verify Nextflow is working.

  • Lines 7 and 8: Clean up

Nextflow’s Default Configuration

Once Nextflow is installed, there are some settings that should be applied to take advantage of the HPC environment. Nextflow has a hierarchy of configuration files, where the base configuration that is applied to every workflow you run is here:

...

Copy and paste the following text into your Linux command line and hit ‘enter’. This will make a few small changes to your local account so that Nextflow can run correctly.

Code Block
[[ -d $HOME${HOME}/.nextflow ]] || mkdir -p $HOME${HOME}/.nextflow
cat <<EOF > $HOME${HOME}/.nextflow/config
singularity {
    cacheDir = '$HOME${HOME}/.nextflow/NXF_SINGULARITY_CACHEDIR'
    autoMounts = true
}
conda {
    cacheDir = '$HOME${HOME}/.nextflow/NXF_CONDA_CACHEDIR'
}
process {
  executor = 'pbspro'
  scratch = false
  cleanup = false
}
EOF

Preparing Data

Typically, the pipeline you want to run will want the data prepared in a particular way. You can check the pipeline’s help or website for a guide. Accessing help is typically:

...

Some pipelines may need file names, and others may want a CSV file with file names, paths to raw data, and other information.

Running Nextflow

When you run Nextflow, it is a good idea to create a folder for the run - this keeps all the files separate and easy to manage.

...

  • Lines 1-5 are typical PBS system commands here the name of the job is MyNextflowRun, 2 CPUS and 4gb of ram is selected, and the job will run for 24 hours. This is the total time for the pipeline run - it may take days or weeks depending on how much data and the pipeline.

  • Line 6 is to ensure the Java environment is available (Nextflow needs Java to run)

  • Line 7 tells Nextflow how much RAM to use

  • Line 8 runs Nextflow.

To see the output of Nextflow while running as a job, you can use the Nextflow Tower.

Using the Nextflow Tower

Nextflow Tower allow monitoring of Nextflow runs. To use the NFTower, please visit

https://nftower.qut.edu.au or the BioCommons: https://tower.services.biocommons.org.au/

(Control-Click this to open in a new Window)

There are no passwords for the Tower, instead, you use a link sent to your email.

Look for the Sign in button (Top Right) then provide your email address.

...

  • Nextflow

...

  • .