RNAseq - Star 2 pass approach (Ronin)
Requirements:
Reference genome (FASTA) and annotation (GTF). If required fetch reference genome and annotation files from https://asia.ensembl.org/info/data/ftp/index.html
RNAseq data: Paired end FASTQ datasets
Conda installed - if not installed then follow the instructions here: https://docs.conda.io/projects/conda/en/latest/user-guide/install/
Install required tools (if necessary)
The simplest option to install tools and their dependencies is using CONDA (https://anaconda.org)
Search for the tool of interest and copy paste the provide command line to install the tool. For example to install STAR do the following:
conda install -c bioconda star
other tools required are:
trim-galore
rsem
Build STAR genome index
STAR --runMode genomeGenerate \
--genomeDir /work/batra_lab/rnaseq/data/genome/GRCh38 \
--genomeFastaFiles /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
--sjdbGTFfile /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.102.gtf \
--runThreadN 8 --sjdbOverhang 89
To create the genome index in Lyra HPC submit the following PBS Pro script
#!/bin/bash -l
#PBS -N STARindex
#PBS -l select=1:ncpus=8:mem=64gb
#PBS -l walltime=48:00:00
cd $PBS_O_WORKDIR
#make genome index for STAR 1st pass mapping
STAR --runMode genomeGenerate \
--genomeDir /work/batra_lab/rnaseq/data/genome/GRCh38 \
--genomeFastaFiles /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
--sjdbGTFfile /work/batra_lab/rnaseq/data/genome/GRCh38/Homo_sapiens.GRCh38.102.gtf \
--runThreadN 8 --sjdbOverhang 89
To submit the above script script called ‘make_STAR_genome_index_1stPass.sh’ do:
Note a new STAR index need to be created for the 2nd pass step. We will do this after running a handful of samples through the 1st mapping pass.
Build RSEM genome index
Again, to create the above genome index prepare a PBS Pro script for the genome of interest . Example for Human GRCh38 genome:
The above script can then be submitted via PBS Pro scheduler
Run RNAseq pipeline
Two scripts are necessary to run the RNAseq pipeline: 1) bash script describing individual processing steps (run_rnaseq_v0.1.sh), and 2) PBS Pro submission script accounting for the presence of multiple FASTQ pairs in the input directory.
Â