This page provides a basic introduction to Unix commands to HPC users with no previous knowledge.
Log into the HPC
Code Block |
---|
ssh userID@lyra.qut.edu.au |
...
To go to a shared directory for your project named “kenna_team” type the following command and hit enter:
Code Block |
---|
cd /work/kennachlamydia_teamcarey/ |
Display list of files in a directory
...
Make a backup copy of the file
Code Block |
---|
cp myfile.txt > myfile_copy.txt |
Move a copy of a file to a newly created folder - note it is recommended to make a copy of important files prior to modifying or executing commands on them.
...
Code Block |
---|
#hash tags are used to add comments on what a command line does #several commands can be used including cat, less, more, head and tail cat myfile_copy.txt #example: less -S allows to visualise very large (wide) files less -S myfile_copy.txt #stop viewing a file using the above command --> Type “Control” and “c” at the same time. Or “Control” and “d” at the same time. #print the first 505 lines of a file head -505 myfile_copy.txt #print the last 205 lines of a file tail -205 myfile_coy.txt |
Go back to my personal space. Type 'cd' and hit enter. This will move you to /home/mystudentID/
Code Block |
---|
cd |
Interactive session:
Code Block |
---|
qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=2:mem=4gb |
Running the nextflow nf-core/rnaseq pipeline
Requirements:
index.csv → a file that provides a list of sample IDs and their associated FASTQ files (read 1 and read 2)
launch.pbs → a script to submit the job to the HPC cluster
Example index.csv file for nf-core/rnaseq version 3.3:
Code Block |
---|
group,fastq_1,fastq_2,strandedness control_r1,/work/kenna_team/data/raw_data/SRR1039508_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039508_2.fastq.gz,unstranded dex_r1,/work/kenna_team/data/raw_data/SRR1039509_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039509_2.fastq.gz,unstranded control_r2,/work/kenna_team/data/raw_data/SRR1039512_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039512_2.fastq.gz,unstranded dex_r2,/work/kenna_team/data/raw_data/SRR1039513_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039513_2.fastq.gz,unstranded control_r3,/work/kenna_team/data/raw_data/SRR1039516_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039516_2.fastq.gz,unstranded dex_r3,/work/kenna_team/data/raw_data/SRR1039517_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039517_2.fastq.gz,unstranded control_r4,/work/kenna_team/data/raw_data/SRR1039520_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039520_2.fastq.gz,unstranded dex_r4,/work/kenna_team/data/raw_data/SRR1039521_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039521_2.fastq.gz,unstranded |
Example launch.pbs script:
Code Block |
---|
#!/bin/bash -l #PBS -N nfrnaseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #Use the current directory to run the workflow cd $PBS_O_WORKDIR module load java NXF_OPTS='-Xms1g -Xmx4g' #run the nextflow RNA-seq pipeline: nextflow run nf-core/rnaseq -profile singularity -r 3.3 --aligner star_salmon --input index.csv --genome GRCh38 -resume |
...
Code Block |
---|
#!/bin/bash -l
#PBS -N nfrnaseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00
#Use the current directory to run the workflow
cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'
#run the nextflow RNA-seq pipeline:
nextflow run nf-core/rnaseq -profile singularity -r 3.3 --aligner star_salmon --input index.csv --genome GRCh38 --clip_r1 10 --clip_r2 10 --three_prime_clip_r1 2 --three_prime_clip_r2 2 --save_trimmed
#allow access to others in the group
chmod -R g+rwX results
chmod -R g+rwX work |
Session 2 exercises:
Run the nf-core/rnaseq pipeline using the Airway smooth muscle public data (PMID: 24926665. GEO: GSE52778) - aligner option set to ‘star_salmon’
Same as above but aligner option set to ‘star_rsem’
Create a new working folder:
Code Block |
---|
mkdir session2 cd session2 mkdir run1_star_salmon cd run1_star_salmon cd .. |
Copy index.csv and launch.pbs files to the newly created folder
...
Code Block |
---|
cp /work/kenna_team/scripts/star_rsem/* . |
Visualizing results
The results generated in the pipeline can be visualized within the ‘results’ folder.
Code Block |
---|
#go to the results folder - note by default all nextflow pipelines show the key outputs within the'results' folder, while the 'work' folders contains all intermediate files generated during execution.
cd results
#list folders and files
ls |
example output:
Code Block |
---|
drwxrws--- 2 barrero 4.0K Sep 7 20:05 fastqc/
drwxrws--- 3 barrero 4.0K Sep 7 20:16 trimgalore/
drwxrws--- 3 barrero 23 Sep 9 13:03 multiqc/
drwxrws--- 2 barrero 4.0K Sep 9 13:03 pipeline_info/
drwxrws--- 20 barrero 4.0K Sep 14 23:23 star_rsem/ |
Access the HPC files from your laptop
Mac laptop (note: need to be connected via VPN)
Open the ‘Finder' window
Click on the search file tab and hit the “Command + K” keys simultaneously
This will open a new window:
Type the above to connect to the shared ‘work' space. To access your personal space replace ‘work’ with 'home’.
Next
Differential expression analysis using https://maayanlab.cloud/biojupies/analyze