Prior running the nf-core/sarek pipeline with real data, we will first run a test with sample data to make sure the pipeline runs properly.
...
Code Block |
---|
mkdir -p $HOME/workshop/sarek/scripts cp /work/training/sarek/scripts/* $HOME/workshop/sarek/scripts/ ls -l $HOME/workshop/sarek/scripts/ |
Line 1: The -p indicates create 'parental directories as required. Thus the line 1 command creates both /workshop/ and the subfolder /workshop/scripts/
Line 2: Copies all files from /work/datasets/workshop/scripts/ as noted by an asterisk to the newly created folder $HOME/workshop/scripts/
Copy public data to your $HOME
...
...
mkdir -p $HOME/workshop/sarek/data/WES/trio
mkdir -p $HOME/workshop/sarek/data/WES/liver
cp /work/training/sarek/data/WES/trio/* $HOME/workshop/sarek/data/WES/trio
cp /work/training/sarek/data/WES/liver/* $HOME/workshop/sarek/data/WES/liver
Lines 1 -2: Command creates the folders to copy data
Line 3: Copies all files from /work/datasets/workshop/sarek/data/WES/trio folder as noted by an asterisk to newly created $HOME/workshop/sarek/data/WES/trio folder.
Line 4: Copies all files from /work/datasets/workshop/sarek/data/WES/liver folder as noted by an asterisk to newly created $HOME/workshop/sarek/data/WES/liver folder.
Create folders for running the nf-core/sarek pipeline
...
Code Block |
---|
mkdir -p $HOME/workshop/sarek mkdir $HOME/workshop/sarek/run1_test mkdir $HOME/workshop/sarek/run2_trio mkdir $HOME/workshop/sarek/run3_liver cd $HOME/workshop/ |
Lines 1-4: create sub-folders for each exercise
Line 5: change the directory to the folder “run1_test”
Line 6: print the current working directory
Exercise 1: Running a test with nf-core sample data
First, let’s assess the execution of the nf-core/rnaseq pipeline by running a test using sample data.
...
#!/bin/bash -l #PBS -N nfsarek_run1_test #PBS -l walltime=48:00:00 #PBS -l select=1:ncpus=1:mem=5gb cd $PBS_O_WORKDIR NXF_OPTS='-Xms1g -Xmx4g' module load java #specify the nextflow version to use to run the workflow export NXF_VER=23.10.1 #run the sarek pipeline nextflow run nf-core/sarek \ -r 3.3.2 \ -profile test,singularity \ --outdir ./results |
---|
nextflow command: nextflow run
pipeline name: nf-core/sarek
pipeline version: -r 3.3.2
container type and sample data: -profile test,singularity
output directory: --outdir results
Submitting the job
Submit the test job to the HPC cluster as follows:
...
Once the pipeline has finished running - Assess the QC report:
NOTE: To proceed, you need to be on QUT’s WiFi network or signed via VPN.
To browse the working folder in the HPC type in the file finder:
...
Evaluate the nucleotide distributions in the 5'-end and 3'-end of the sequenced reads (Read1 and Read2). Look into the “MultiQC” folder and open the provided HTML report.
Items to check:
The overall quality of the experiment and reads. Look at the “Sequence Quality Histogram” plot. For example, if Phred assigns a quality score of 30 to a base, the chances that this base is called incorrectly are 1 in 1000. Phred quality scores are logarithmically linked to error probabilities.
...
Phred Quality Score
...
Probability of incorrect base call
...
Base call accuracy
...
10
...
1 in 10
...
90%
...
20
...
1 in 100
...
99%
...
30
...
1 in 1000
...
99.9%
...
40
...
1 in 10,000
...
99.99%
...
50
...
1 in 100,000
...
99.999%
...
60
...
1 in 1,000,000
...