2024-S2 eResearch - Session 3: Introduction to Nextflow

 

 

This instructional material was originally developed by Maely Gauthier in 2024 as part of the QUT eResearch infrastructure. It is free to distribute but we just require that you acknowledge eResearch for any outputs (e.g. training, presentation slides, publications) that might result from using this training material.

Some sections of this course were adapted from the Carpentry course: https://carpentries-incubator.github.io/workflows-nextflow/.

Aims

  • Learn what is Nextflow

  • Install and configure Nextflow

  • Find pipelines on repositories (e.g. nf-core and epi2me)

  • Run pipelines using either the command line or a PBS script

  • Understand input and parameter specifications

  • Understand the concept of caching and the resume function

  • Understand how Nextflow pipelines output results

What will be covered during the workshop

  • 1. Getting started with Nextflow

    • What is Nextflow?

    • Installing Nextflow

    • Nextflow’s base configuration

  • 2. Nextflow pipeline repositories

    • nf-core

      • What is nf-core?

      • What are nf-core pipelines?

      • Searching for available nf-core pipelines

      • nf-core support

    • epi2me workflows

  • 3. Running pipelines

    • Fetching pipeline code

    • Software requirements for pipelines

    • Install and test that the pipeline installed successfully

      • From the command line

      • Launching Nextflow using a PBS script

  • 4. Input specifications

    • Samplesheet input

      • Examples of samplesheets

      • Exercise 1

      • Exercise 2

    • Input folder

  • 5. Parameters

    • Finding list of parameters available

      • Exercise 1

    • Specifying parameters on the command line

  • 6. Nextflow caching

    • Resume option

    • Structure of work folder

      • Task execution directory

      • Specifying another work directory

      • Clean the work directory

  • 7. Nextflow pipeline outputs

    • Results folder

    • Nextflow log, metrics and reports

  • 8. Where to from now?

 

Prerequisites

You will require a basic knowledge of Linux/Unix commands to be able to participate effectively in this workshop. For this workshop we assume participants have either attended the first 2 workshops, reviewed the materials provided in these workshops (if unable to attend) and are comfortable with it, or are already using the HPC.

You can watch some videos that go overt the basics: https://mediahub.qut.edu.au/media/t/0_d0bsv333

Initial requirements

 

To be able to run these exercises, you’ll need:

  1. A HPC account

  2. PuTTy installed on your local computer

  3. Access your HPC home directory from your PC

 

Instructions for getting a HPC account are here: https://qutvirtual4.qut.edu.au/group/staff/research/conducting/facilities/advanced-research-computing-storage/supercomputing/getting-started-with-hpc

 

You’ll need PuTTY on your PC to access the HPC.

You can download PuTTY from here: https://the.earth.li/~sgtatham/putty/latest/w64/putty.exe

Then add the HPC (Lyra) address: lyra.qut.edu.au and then click ‘open’.

image-20240527-223342.png

 

Setup Windows File Explorer to access your HPC home account. Follow the instructions here:

https://qutvirtual4.qut.edu.au/group/staff/research/conducting/facilities/advanced-research-computing-storage/supercomputing/using-hpc-filesystems