Session 2: RNA-seq expression

This page provides a basic introduction to Unix commands to HPC users with no previous knowledge.

Log into HPC

ssh userID@lyra.qut.edu.au

Brief basic Unix commands recap

Once you log into the HPC, you will land in your personal home space (i.e. /home/myStudentID/). This space is only accessible to you. To work in collaboration with others we use workspaces (i.e. /work/myProjectName/).

To go to a shared directory for your project named “kenna_team” type the following command and hit enter:

cd /work/kenna_team/

Display list of files in a directory

ls -lh

Print working directory

pwd

Create a folder

mkdir myfolder

Enter new folder

cd myfolder

Move back to the previous folder

cd ..

Make a backup copy of the file

cp myfile.txt > myfile_copy.txt

Move a copy of a file to a newly created folder - note it is recommended to make a copy of important files prior to modifying or executing commands on them.

mv myfile_copy.txt myfolder/

View the content of a file (note hashtags # at the start of a line is used to provide information of the code underneath it)

#hash tags are used to add comments on what a command line does
#several commands can be used including cat, less, more, head and tail
cat myfile_copy.txt

#example: less -S allows to visualise very large (wide) files
less -S myfile_copy.txt

#stop viewing a file using the above command
--> Type “Control” and “c” at the same time.
      Or “Control” and “d” at the same time.

#print the first 50 lines of a file
head -50 myfile_copy.txt

#print the last 20 lines of a file
tail -20 myfile_coy.txt

Go back to my personal space. Type 'cd' and hit enter. This will move you to /home/mystudentID/

cd

Running the nextflow nf-core/rnaseq pipeline

Requirements:

index.csv → a file that provides a list of sample IDs and their associated FASTQ files (read 1 and read 2)
launch.pbs → a script to submit the job to the HPC cluster

Example index.csv file for nf-core/rnaseq version 3.3:

group,fastq_1,fastq_2,strandedness
control_r1,/work/kenna_team/data/raw_data/SRR1039508_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039508_2.fastq.gz,unstranded
dex_r1,/work/kenna_team/data/raw_data/SRR1039509_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039509_2.fastq.gz,unstranded
control_r2,/work/kenna_team/data/raw_data/SRR1039512_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039512_2.fastq.gz,unstranded
dex_r2,/work/kenna_team/data/raw_data/SRR1039513_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039513_2.fastq.gz,unstranded
control_r3,/work/kenna_team/data/raw_data/SRR1039516_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039516_2.fastq.gz,unstranded
dex_r3,/work/kenna_team/data/raw_data/SRR1039517_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039517_2.fastq.gz,unstranded
control_r4,/work/kenna_team/data/raw_data/SRR1039520_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039520_2.fastq.gz,unstranded
dex_r4,/work/kenna_team/data/raw_data/SRR1039521_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039521_2.fastq.gz,unstranded

Example launch.pbs script:

#!/bin/bash -l
#PBS -N nfrnaseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#Use the current directory to run the workflow
cd $PBS_O_WORKDIR

module load java
NXF_OPTS='-Xms1g -Xmx4g'

#run the nextflow RNA-seq pipeline:
nextflow run nf-core/rnaseq -profile singularity -r 3.3 --aligner star_salmon --input index.csv --genome GRCh38 -resume

where:

--aligner Specifies the alignment algorithm to use - available options are 'star_salmon', 'star_rsem', and 'hisat2'. The default option is 'star_salmon'.

more information at:

https://nf-co.re/rnaseq/3.3/usage

More advanced launch.pbs script example:

#!/bin/bash -l
#PBS -N nfrnaseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#Use the current directory to run the workflow
cd $PBS_O_WORKDIR

module load java
NXF_OPTS='-Xms1g -Xmx4g'

#run the nextflow RNA-seq pipeline:
nextflow run nf-core/rnaseq -profile singularity -r 3.3 --aligner star_salmon --input index.csv --genome GRCh38 --clip_r1 10 --clip_r2 10 --three_prime_clip_r1 2 --three_prime_clip_r2 2 --save_trimmed

Session 2 exercises:

Run the nf-core/rnaseq pipeline using the Airway smooth muscle public data (PMID: 24926665. GEO: GSE52778) - aligner option set to ‘star_salmon’
Same as above but aligner option set to ‘star_rsem’

Create a new working folder:

mkdir run1_star_salmon

cd run1_star_salmon

Copy index.csv and launch.pbs files to the newly created folder

cp /work/kenna_team/scripts/star_salmon/* .

Check that files were copied into the new working folder

ls -a
./ ../ index.csv  launch.pbs

#verify the content of index.csv
cat index.csv

#also check the PBS Pro submission script
cat launch.pbs

Run the workflow:

qsub launch.pbs

Monitor the progress of the workflow:

qjobs

or

qstats -u userID

Repeat the above process for ‘star_rsem’

The only variation is copying the index.csv and launch.pbs script. As follows:

cp /work/kenna_team/scripts/star_rsem/* .

Visualizing results

The results generated in the pipeline can be visualized within the ‘results’ folder.

#go to the results folder - note by default all nextflow pipelines show the key outputs within the'results' folder, while the 'work' folders contains all intermediate files generated during execution.
cd results

#list folders and files
ls

example output:

drwxrws---  2 barrero 4.0K Sep  7 20:05 fastqc/
drwxrws---  3 barrero 4.0K Sep  7 20:16 trimgalore/
drwxrws---  3 barrero   23 Sep  9 13:03 multiqc/
drwxrws---  2 barrero 4.0K Sep  9 13:03 pipeline_info/
drwxrws--- 20 barrero 4.0K Sep 14 23:23 star_rsem/

Access the HPC files from your laptop

Mac laptop (note: need to be connected via VPN)

Open the ‘Finder' window
Click on the search file tab and hit the “Command + K” keys simultaneously
This will open a new window:
Type the above to connect to the shared ‘work' space. To access your personal space replace ‘work’ with 'home’.