Copy of Copy of Hands-on smRNAseq training

1 Public small RNA-seq data
2 Exercise 1: Running a test with nf-core sample data
3 Download Reference microRNA sequences from miRBase
4 Run a test
- 4.1 Submitting the job
- 4.2 Monitoring the Run
5 Preparing a sample metadata file
6 Run the nextflow nf-core/smRNAseq pipeline.
7 R differential expression script

Public small RNA-seq data

Public human small RNAseq data:

https://www.ebi.ac.uk/ena/browser/view/PRJNA861019 Integrative analysis of renal microRNA and mRNA to identify hub genes and pivotal pathways associated with Cyclosporine-induced acute kidney injury in mice

Work in the HPC

Work in the HPC

Before we start using the HPC, let’s start an interactive session:

qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=1:mem=4gb

Get a copy of the scripts to be used in this module

Use the terminal to log into the HPC and create a /RNAseq/ folder to run the nf-core/rnaseq pipeline. For example:

mkdir -p $HOME/workshop/small_RNAseq/scripts
cp /work/training/small_rnaseq/scripts/* $HOME/workshop/small_RNAseq/scripts/
ls -l $HOME/workshop/small_RNAseq/scripts/

Line 1: The -p indicates create 'parental directories as required. Thus the line 1 command creates both /workshop/ and the subfolder /workshop/scripts/
Line 2: Copies all files from /work/datasets/workshop/scripts/ as noted by an asterisk to the newly created folder $HOME/workshop/scripts/

Copy public data to your $HOME

mkdir -p $HOME/workshop/small_RNAseq/data
cp /work/training/smallRNAseq/data/* $HOME/workshop/small_RNAseq/data/
# list the content of the $HOME/workshop/small_RNAseq/data/

Line 1: The first command creates the folder /scripts/
Line 2: Copies all files from /work/datasets/workshop/scripts/ folder as noted by an asterisk to newly created $HOME/workshop/scripts/ folder
Line 3: a quick challenge - see the previous section for hints

Create a folder for running the nf-core small RNA-seq pipeline

Let’s create a “runs” folder to run the nf-core/rnaseq pipeline:

mkdir -p $HOME/workshop/small_RNAseq
mkdir $HOME/workshop/small_RNAseq/run1_test
mkdir $HOME/workshop/small_RNAseq/run2_smallRNAseq
cd $HOME/workshop/

Lines 1-4: create sub-folders for each exercise
Line 5: change the directory to the folder “run1_test”
Line 6: print the current working directory

Exercise 1: Running a test with nf-core sample data

First, let’s assess the execution of the nf-core/rnaseq pipeline by running a test using sample data.

Copy the launch_nf-core_RNAseq_test.pbs to the working directory

cd $HOME/workshop/RNAseq/run1_test
cp $HOME/workshop/scripts/launch_nf-core_RNAseq_test.pbs .

View the content of the script as follows:

cat launch_nf-core_RNAseq_test.pbs

Download Reference microRNA sequences from miRBase

First, let’s download a copy of miRBAse reference sequences, including hairpin and mature microRNA sequences.

microRNA mature sequences:

wget https://mirbase.org/download/mature.fa

Hairpin sequences:

wget https://mirbase.org/download/hairpin.fa

Fetch the genomic coordinated for precursors and mature sequences:

wget https://mirbase.org/download/hsa.gff3

Alternatively, submit the following PBS Pro script to the cluster. Before running the script, create a ‘reference’ folder (i.e., /myteam/data/reference/ ).

#!/bin/bash -l
#PBS -N nfsmrnaseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=2:00:00

cd $PBS_O_WORKDIR

wget https://www.mirbase.org/download/hairpin.fa
wget https://www.mirbase.org/download/mature.fa
wget https://www.mirbase.org/download/hsa.gff3

Run a test

Before running the pipeline with real data, run the following test:

nextflow run nf-core/smrnaseq -profile test,singularity --outdir results -r 2.1.0

To submit the above command to the HPC cluster, prepare the following script:

#!/bin/bash -l
#PBS -N nfsmrnaseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#work on current directory (folder)
cd $PBS_O_WORKDIR
#load java and set up memory settings to run nextflow
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run nf-core/smrnaseq -profile test,singularity --outdir results -r 2.1.0

Submitting the job

Once you have created the samplesheet.csv file and have a copy of the launch_nf-core_smallRNAseq_test.pbs script, submit the job to the HPC as follows:

qsub launch.pbs

Monitoring the Run

Use the command

qjobs

to check on the job that you are running. Note, that Nextflow will launch additional jobs during the run.

You can also check the .nextflow.log file for details on what is going on.

Preparing a sample metadata file

Now let’s prepare a samplesheet.csv file that specifies the name of your samples and the location of the raw FASTQ files

sample,fastq_1
SRR20753704,/work/training/smallRNAseq/data/SRR20753704.fastq.gz
SRR20753705,/work/training/smallRNAseq/data/SRR20753705.fastq.gz
SRR20753706,/work/training/smallRNAseq/data/SRR20753706.fastq.gz
SRR20753707,/work/training/smallRNAseq/data/SRR20753707.fastq.gz
SRR20753708,/work/training/smallRNAseq/data/SRR20753708.fastq.gz
SRR20753709,/work/training/smallRNAseq/data/SRR20753709.fastq.gz
SRR20753716,/work/training/smallRNAseq/data/SRR20753716.fastq.gz
SRR20753717,/work/training/smallRNAseq/data/SRR20753717.fastq.gz
SRR20753718,/work/training/smallRNAseq/data/SRR20753718.fastq.gz
SRR20753719,/work/training/smallRNAseq/data/SRR20753719.fastq.gz
SRR20753720,/work/training/smallRNAseq/data/SRR20753720.fastq.gz
SRR20753721,/work/training/smallRNAseq/data/SRR20753721.fastq.gz

To generate the above file, let’s use the following shell script (i.e., called “create_nf-core_smallRNAseq_samplesheet.sh”)

#!/bin/bash -l

#User defined variables.
##########################################################
#DIR='/path/to/the/FASTQ/files'
DIR=$1
INDEX='samplesheet.csv'
##########################################################

#load python module
module load python/3.10.8-gcccore-12.2.0

#fetch the script to create the sample metadata table
wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py
chmod +x fastq_dir_to_samplesheet.py

#generate initial sample metadata file
./fastq_dir_to_samplesheet.py  $DIR index.csv \
        --strandedness auto \
        --read1_extension .fastq.gz

#format index file
cat index.csv | awk -F "," '{print $1 "," $2}' > ${INDEX}

#Remove intermediate files:
rm index.csv fastq_dir_to_samplesheet.py

Assign to the “DIR” variable above the path where the raw FASTQ files are located. For example:

pwd

Copy and paste the path to the above script using VI or VIM (check prerequisites above).

Run the nextflow nf-core/smRNAseq pipeline.

Create a launch_nfsmRNAseq.pbs file that has the following information:

#!/bin/bash -l
#PBS -N nfsmrnaseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run nf-core/smrnaseq -r 2.1.0 \
	-profile singularity \
	--outdir outdir \
	--input samplesheet.csv \
	--genome GRCh38 \
	--three_prime_adapter 'AACTGTAGGCACCATCAAT'\
	--fastp_min_length 18 \
	--fastp_max_length 30 \
	--hairpin /work/trtp/data/mirbase/hairpin.fa \
	--mature /work/trtp/data/mirbase/mature.fa \
	--mirna_gtf /work/trtp/data/mirbase/hsa.gff3

Submit the job to the HPC cluster: