Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
As a first exercise we will download and run the nf-core/smrnaseq which is a bioinformatics best-practice analysis pipeline for Small RNA-Seq. We will use the test data provided by the developers to ensure the pipeline installed successfully. This control dataset contains 8 samples.
Run the following command from your home directory:
Code Block |
---|
cd $HOME/workshop/2024-2/session3 mkdir smrnaseq_cl cd smrnaseq_cl export NXF_OPTS='-Xms1g -Xmx4g' nextflow pull file:///work/training/smrnaseq nextflow run nf-corefile:///work/training/smrnaseq -profile test,singularity --outdir results -r 2.3.1 |
Line 1: Move to your home directorythe directory created for this workshop.
Line 2: Make a temporary folder called smrnaseq_cl for Nextflow to test the smrnaseq pipeline.
Line 3: Change directory to the newly created folder smrnaseq_cl.
Line 4: In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. We recommend adding the following line to your environment to limit this.
Line 5: Download and run the test code.
This will download the smrnaseq pipeline and then run the test code. It should take ~20-30 minutes to run to completion.
Nextflow will first download the pipeline:
...
It will first then display the version of the pipeline which was downloaded: version 2.3.1.
It will then also list all the parameters that differ from the pipeline default.
...
Before running a process, it will download the required singularity imageimages and required reference and input files for testing.
In the screenshot below, all the jobs which will be run are listed.
We can see that 3 7 jobs have started:
FASTQ_FASTQC_UMITOOLS_FASTP:FASTQC_RAW: 3 jobs are running, the latest job that started is for sample Clone9_N1
FASTQ_FASTQC_UMITOOLS_FASTP:FASTP: 3 jobs are running, the latest job that started is for sample Clone9_N1
INDEX_GENOME (genome.fa): 1 job has started
At the bottom you can see that 6 files (including test fastq.gz input files and reference files) have also been downloaded.
...
You can check the full list of jobs that have been submitted at any point in time by opening a separate terminal and using the command:
...
Going back to the terminal from which you launched the Nextflow analysis, you can check the nextflow log to see how the analysis is progressing.
For example in the screenshot below, taken half way through the Nextflow analysis, several processes have run to completion for all 8 samples tested.
...
At the bottom, the message ‘Pipeline completed successfully’ will be printed along with the duration, the CPU hours and numbers of jobs that run to completion.
...
You will see that Nextflow created 2 folders (results and work) if you run the command
Code Block |
---|
ls |
You can inspect the results which have been output by typing:
...
You can browse a couple of results folders to check what sort of outputs were generated by the pipeline.
...
Move back into your home directory and create a separate rnaseq_pbs
folder:
Code Block |
---|
mkdir ~-p $HOME/workshop/2024-2/session3/rnaseq_pbs cd ~$HOME/workshop/2024-2/session3/rnaseq_pbs |
Create the script file smrnaseqrnaseq_test.sh
by running the following command:
Code Block |
---|
cat <<EOF > rnaseq_test.sh #!/bin/bash -l #PBS -N nfrnaseq_test #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=6:00:00 cd \$PBS_O_WORKDIR module load java NXF_OPTS='-Xms1g -Xmx4g' nextflow run nf-corefile:///work/training/nextflow_intro/rnaseq -r 3.14.0 -profile test,singularity --outdir results EOF |
Line 3: Set your PBS job name to be
nfrnaseq_test
Line 4: Specify memory and CPU resource that you want to allocate to your job
Line 5: Specify that you want to allocate 6h for your job to run to.completion
Line 7: Change directory to $PBS_O_WORKDIR, which is a special environment variable created by PBS. This will be the folder where you ran the qsub command
Line 8: Load java
Line 9: In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. We recommend adding the following line to your environment to limit this
Line 10: Run the
nf-core/rnaseq
pipeline using the test data provided
You can check the content of the PBS script you just created using the command:
Code Block |
---|
cat rnaseq_test.sh |
Make the command executable and then submit your job to the PBS queue by running the following commands:
...
Once again you can monitor your jobs using the qjobs
qstat -u $user
command.
The test should take ~ 30 min to run.
...