Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel6
outlinefalse
stylenone
typelist
printabletrue

...

As a first exercise we will download and run the nf-core/smrnaseq which is a bioinformatics best-practice analysis pipeline for Small RNA-Seq. We will use the test data provided by the developers to ensure the pipeline installed successfully. This control dataset contains 8 samples.

Run the following command from your home directory:

Code Block
cd $HOME/workshop/2024-2/session3
mkdir smrnaseq_cl
cd smrnaseq_cl
export NXF_OPTS='-Xms1g -Xmx4g'
nextflow pull file:///work/training/smrnaseq
nextflow run nf-corefile:///work/training/smrnaseq -profile test,singularity --outdir results -r 2.3.1
  • Line 1: Move to your home directorythe directory created for this workshop.

  • Line 2: Make a temporary folder called smrnaseq_cl for Nextflow to test the smrnaseq pipeline.

  • Line 3: Change directory to the newly created folder smrnaseq_cl.

  • Line 4: In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. We recommend adding the following line to your environment to limit this.

  • Line 5: Download and run the test code.

This will download the smrnaseq pipeline and then run the test code. It should take ~20-30 minutes to run to completion.

Nextflow will first download the pipeline:

...

It will first then display the version of the pipeline which was downloaded: version 2.3.1.

It will then also list all the parameters that differ from the pipeline default.

...

Before running a process, it will download the required singularity imageimages and required reference and input files for testing.

In the screenshot below, all the jobs which will be run are listed.

We can see that 3 7 jobs have started:

  • FASTQ_FASTQC_UMITOOLS_FASTP:FASTQC_RAW: 3 jobs are running, the latest job that started is for sample Clone9_N1

  • FASTQ_FASTQC_UMITOOLS_FASTP:FASTP: 3 jobs are running, the latest job that started is for sample Clone9_N1

  • INDEX_GENOME (genome.fa): 1 job has started

At the bottom you can see that 6 files (including test fastq.gz input files and reference files) have also been downloaded.

...

You can check the full list of jobs that have been submitted at any point in time by opening a separate terminal and using the command:

...

Going back to the terminal from which you launched the Nextflow analysis, you can check the nextflow log to see how the analysis is progressing.

For example in the screenshot below, taken half way through the Nextflow analysis, several processes have run to completion for all 8 samples tested.

...

At the bottom, the message ‘Pipeline completed successfully’ will be printed along with the duration, the CPU hours and numbers of jobs that run to completion.

...

You will see that Nextflow created 2 folders (results and work) if you run the command

Code Block
ls

You can inspect the results which have been output by typing:

...

You can browse a couple of results folders to check what sort of outputs were generated by the pipeline.

...

Move back into your home directory and create a separate rnaseq_pbsfolder:

Code Block
mkdir ~-p $HOME/workshop/2024-2/session3/rnaseq_pbs
cd ~$HOME/workshop/2024-2/session3/rnaseq_pbs

Create the script file smrnaseqrnaseq_test.sh by running the following command:

Code Block
cat <<EOF > rnaseq_test.sh
#!/bin/bash -l
#PBS -N nfrnaseq_test
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=6:00:00

cd \$PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run nf-corefile:///work/training/nextflow_intro/rnaseq -r 3.14.0 -profile test,singularity --outdir results
EOF
  • Line 3: Set your PBS job name to be nfrnaseq_test

  • Line 4: Specify memory and CPU resource that you want to allocate to your job

  • Line 5: Specify that you want to allocate 6h for your job to run to.completion

  • Line 7: Change directory to $PBS_O_WORKDIR, which is a special environment variable created by PBS. This will be the folder where you ran the qsub command

  • Line 8: Load java

  • Line 9: In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. We recommend adding the following line to your environment to limit this

  • Line 10: Run the nf-core/rnaseq pipeline using the test data provided

You can check the content of the PBS script you just created using the command:

Code Block
cat rnaseq_test.sh

Make the command executable and then submit your job to the PBS queue by running the following commands:

...

Once again you can monitor your jobs using the qjobs qstat -u $user command.

The test should take ~ 30 min to run.

...