Building offline Reference Genome

When performing a NF-Core RNASEQ analysis, a number of indexes are built if not provided. The time to calculate these indexes can be excessive.

Downloading the genome files can be a slow process. Having pre downloaded reference files can speed up the time to complete the pipeline.

RNASEQ has a parameter --save-reference that can be used to save the genome and indexes in the {outdir}/genome folder.

This folder can be transferred to a shared place on the HPC so others can accelerate there RNASEQ analysys by not building indexes.

 

Download Reference

For Example, GRCh38.p14 from Ensembl:

Homo_sapiens - Ensembl genome browser 110

 

https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna_index/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

https://ftp.ensembl.org/pub/release-110/gtf/homo_sapiens/Homo_sapiens.GRCh38.110.gtf.gz

Build the Genome and Index

If using an already downloaded genome and annotation, RNASEQ requires the parameter --fasta and --gtf as a minimum to start the analysis. Also include the --save-reference parameter.

nextflow run {rnaseq} \ ... --fasta {path/to/fasta}/Homo_sapiens.GRCh38.dna.toplevel.fa.gz \ --gtf {path/to/gtf}/Homo_sapiens.GRCh38.110.gtf.gz \ --save-reference \ --outdir results \ ...

 

Copy/Move the genome folder

Once the pipeline is finished, the genome files and indexes will be found in {outdir}/genome. Transfer this to a shared location.

Using /work/datasets/reference/nextflow as the base, use Species/Provider/Release folder structure.

cd {RNASEQ run folder}/results cp genome /work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38

Adjust permissions so everyone has Read access to the files and folders.

Update the Genome Config file

Add a new section to the qutgenome.config file. Include -local in the name so there are no conflicts with the iGenomes references.

vi /work/datasets/reference/nextflow/qutgenome.config

Eg, Ensembl Homo_sapiens GRCh38: