RNA-seq using RASflow: genome- or transcriptome-based analysis
RASflow is a modular, flexible and user-friendly RNA-Seq analysis workflow.
RASflow can be applied to both model and non-model organisms. It supports mapping RNA-Seq raw reads to both genome and transcriptome (can be downloaded from public database or can be homemade by users) and it can do both transcript- and gene-level Differential Expression Analysis (DEA) when transcriptome is used as mapping reference.
Further details provided by the developer at:
Source code:
https://github.com/zhxiaokang/RASflow
Publication:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3433-x
Installation
This sections shows how to download and install the RASflow pipeline into the QUT’s HPC (or other server)
Clone the repository:
git clone https://github.com/zhxiaokang/RASflow.git
Create the environment:
conda env create -n rasflow -f env.yaml
Activate the environment:
conda activate rasflow
Deactivate the environment:
List available conda environments:
Set up configuration
Sample information
Modify the metafile describing your data configs/metadata.tsv
View the metadata file as follows:
Example of metadata file: Prepare a tab-delimited file containing three columns: 1) sample ID; 2) treatment group; and 3) replicate number.
Convert .csv file to .tsv
Pipeline information
Customise the workflow based on your need in configs/config_main.yaml
The config file contains information such as QC, trimming, reference to use (genome or transcriptome), statistical analysis to perform (edgeR or DESeq2), generate plots of the results (volcano plots and heatmaps). Modify the information in the config_main.yaml file as appropriate. For example use vi editor below - tutorial to use this editor can be found at https://www.washington.edu/computing/unix/vi.html
Note: RASflow only requires to provide a FASTA file for either the genome or transcriptome of interest, and also the FASTQ files for the data to be processed.
Run RASflow
Use the provided test data by RASflow to run a test. If you install RASflow in your laptop/desktop or virtual machine simply run the following command:
The above will prompt the user the confirm the analyses they want to run as specified in the config_main.yaml file. To run on the Lyra HPC we need to modify the main.py script so that it runs without asking the user to confirm the steps to do. why? because jobs are run on the HPC using a scheduler (PBS Pro) and as such interactive feedback from the user is not available when the jobs are being executed.
What to do? We need to comment out the lines in the code that request user confirmation. These code lines are the following:
Example of a PBS Pro submission script for RASFlow is the following:
Reference Genomes and Transcriptomes
Access to genomes, transcriptomes and annotation information for human and other species can be found at:
https://asia.ensembl.org/info/data/ftp/index.html