Table of Contents

style	none

...

Nextflow is a free and open-source pipeline management software that enables scalable and reproducible scientific workflows. It allows the adaptation of pipelines written in the most common scripting languages.
Key features of Nextflow:
- Reproducible → version control and use of containers ensure the reproducibility of nextflow pipelines
- Portable → compute agnostic (i.e., HPC, cloud, desktop)
- Scalable → run from a single to thousands of samples
- Minimal digital literacy → accessible to anyone
- Active global community → more and more nextflow pipelines are available (i.e., https://nf-co.re/pipelines )

...

To install Nextflow, copy and paste the following block of code into your terminal (i.e., PuTTy that is already connected to the terminal) and hit 'enter':

Code Block
module load java curl -s https://get.nextflow.io \| bash mv nextflow $HOME/bin

Line 1: The module load command is necessary to ensure java is available
Line 2: This command downloads and assembles the parts of nextflow - this step might take some time.
Line 3: When finished, the nextflow binary will be in the current folder so it should be moved to your “bin” folder” so it can be found later.

To verify that Nextflow is installed properly, you can run locally a simple Nextflow pipeline called Hello:

Code Block
mkdir $HOME/nftemp && cd $HOME/nftemp nextflow run hello

Line 1: Make a temporary folder for Nextflow to create files when it runs.
Line 2: Verify Nextflow is working.

You should see something like this:

...

Code Block

[[ -d $HOME/.nextflow ]] || mkdir -p $HOME/.nextflow
cat <<EOF > $HOME/.nextflow/config
singularity {
    cacheDir = '$HOME/.nextflow/NXF_SINGULARITY_CACHEDIR'
    autoMounts = true
}
conda {
    cacheDir = '$HOME/.nextflow/NXF_CONDA_CACHEDIR'
}
process {
  executor = 'pbspro'
  scratch = false
  cleanup = false
}
EOF

Line 1: Check if a .nextflow/config file already exists in your home directory. Create it if it does not exist
Line 2-15: Using the cat command, paste text in the newly created .nextflow/config file which specifies the cache location for your singularity and conda.
What are the parameters you are setting?
Line 3-6 set the directory where remote Singularity images are stored and direct Nextflow to automatically mount host paths in the executed container.
Line 7-9 set the directory where Conda environments are stored.
Line 10-14 sets default directives for processes in your pipeline. Note that the executor is set to pbspro on line 11.

More in depth information on Nextflow configuration is described here: https://www.nextflow.io/docs/latest/config.html.

...

Launching Nextflow using a PBS script

Input specifications

...

Samplesheet input

Nextflow pipelines generally need an input file, often referred to as a samplesheet, which contains information about the samples you would like to analyse.

...

Column names has to be specified in a header row as shown in the samplesheet example below:

...

sample,fastq_1
Clone1_N1,s3://ngi-igenomes/test-data/smrnaseq/C1-N1-R1_S4_L001_R1_001.fastq.gz
Clone1_N3,s3://ngi-igenomes/test-data/smrnaseq/C1-N3-R1_S6_L001_R1_001.fastq.gz
Clone9_N1,s3://ngi-igenomes/test-data/smrnaseq/C9-N1-R1_S7_L001_R1_001.fastq.gz
Clone9_N2,s3://ngi-igenomes/test-data/smrnaseq/C9-N2-R1_S8_L001_R1_001.fastq.gz
Clone9_N3,s3://ngi-igenomes/test-data/smrnaseq/C9-N3-R1_S9_L001_R1_001.fastq.gz
Control_N1,s3://ngi-igenomes/test-data/smrnaseq/Ctl-N1-R1_S1_L001_R1_001.fastq.gz
Control_N2,s3://ngi-igenomes/test-data/smrnaseq/Ctl-N2-R1_S2_L001_R1_001.fastq.gz
Control_N3,s3://ngi-igenomes/test-data/smrnaseq/Ctl-N3-R1_S3_L001_R1_001.fastq.gz

...

For the nf-core/rnaseq pipeline, the samplesheet has to be a comma-separated file with the following 4 columns:

...

Column names has to be specified in a header row as shown in the samplesheet example below:

...

sample,fastq_1,fastq_2,strandedness
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,auto
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,auto
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,auto

...

Please note that in this example, the same sample (CONTROL_REP1) was sequenced across 3 lanes. The nf-core/sarek pipeline will concatenate the raw reads before performing any downstream analysis.

Exercise 1

...

The following samplesheet file for the nf-core/rnaseq pipeline consisting of both single- and paired-end data is ready for analysis.
...
Expand
title Solution:
There are 6 samples in total, as `TREATMENT_REP3` has been sequenced twice. There are 3 single-end and 3 paired-end samples.

Exercise 2

...

Find what are the minimal columns required in the samplesheet to run nfcore/ampliseq
Expand
title Solution
You will need to go to the usage page of nfcore/ampliseq which can be found at https://nf-co.re/ampliseq/2.9.0/docs/usage#samplesheet-input (make sure you are using the latest version of the pipeline).
The input specification section will specify that the samplesheet must minimally contain 2 columns: `sampleID` and `forwardReads`.

Input folder

...

Some pipelines like nf-core/ampliseq will let you specify directly the path to the folder that contains your input FASTQ files, as an alternative to using a samplesheet.
...

Version	Old Version 19	New Version 20
Changes made by	Marie-Emilie Gauthier	Marie-Emilie Gauthier
Saved on	Jun 24, 2024	Jun 24, 2024

Content Comparison

Versions Compared

Key