RNAseq - Star 2 pass approach (Ronin)

Requirements:

Reference genome (FASTA) and annotation (GTF). If required fetch reference genome and annotation files from https://asia.ensembl.org/info/data/ftp/index.html
RNAseq data: Paired end FASTQ datasets
Conda installed - if not installed then follow the instructions here: https://docs.conda.io/projects/conda/en/latest/user-guide/install/

Install required tools (if necessary)

The simplest option to install tools and their dependencies is using CONDA (https://anaconda.org)

Search for the tool of interest and copy paste the provide command line to install the tool. For example to install STAR do the following:

conda install -c bioconda star

other tools required are:

trim-galore
rsem

Build STAR genome index

STAR --runMode genomeGenerate \
     --genomeDir /work/batra_lab/rnaseq/data/genome/GRCh38 \
     --genomeFastaFiles /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
     --sjdbGTFfile /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.102.gtf \
     --runThreadN 8 --sjdbOverhang 89

To create the genome index in Lyra HPC submit the following PBS Pro script

#!/bin/bash -l
#PBS -N STARindex
#PBS -l select=1:ncpus=8:mem=64gb
#PBS -l walltime=48:00:00
cd $PBS_O_WORKDIR

#make genome index for STAR 1st pass mapping
STAR --runMode genomeGenerate \
     --genomeDir /work/batra_lab/rnaseq/data/genome/GRCh38 \
     --genomeFastaFiles /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
     --sjdbGTFfile /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.102.gtf \
     --runThreadN 8 --sjdbOverhang 89

To submit the above script script called ‘make_STAR_genome_index_1stPass.sh’ do:

qsub make_STAR_genome_index_1stPass.sh

Note a new STAR index need to be created for the 2nd pass step. We will do this after running a handful of samples through the 1st mapping pass.

Build RSEM genome index

rsem-prepare-reference --star --gtf /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.102.gtf /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa  ./rsem-index

Again, to create the above genome index prepare a PBS Pro script for the genome of interest . Example for Human GRCh38 genome:

#!/bin/bash -l
#PBS -N rsemindex
#PBS -l select=1:ncpus=8:mem=64gb
#PBS -l walltime=48:00:00
cd $PBS_O_WORKDIR

rsem-prepare-reference --star --gtf /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.102.gtf /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa  ./rsem-index

The above script can then be submitted via PBS Pro scheduler

qsub make_RSEM_genome_index.sh

Run RNAseq pipeline

Two scripts are necessary to run the RNAseq pipeline: 1) bash script describing individual processing steps (run_rnaseq_v0.1.sh), and 2) PBS Pro submission script accounting for the presence of multiple FASTQ pairs in the input directory.

sh submit_batch_rnaseq_pbs.sh /work/batra_lab/rnaseq/ronin/demo outdir