Building offline Reference Genome

When performing a NF-Core RNASEQ analysis, a number of indexes are built if not provided. The time to calculate these indexes can be excessive.

Downloading the genome files can be a slow process. Having pre downloaded reference files can speed up the time to complete the pipeline.

RNASEQ has a parameter --save-reference that can be used to save the genome and indexes in the {outdir}/genome folder.

This folder can be transferred to a shared place on the HPC so others can accelerate there RNASEQ analysys by not building indexes.

Download Reference

For Example, GRCh38.p14 from Ensembl:

Homo_sapiens - Ensembl genome browser 110

https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna_index/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

https://ftp.ensembl.org/pub/release-110/gtf/homo_sapiens/Homo_sapiens.GRCh38.110.gtf.gz

Build the Genome and Index

If using an already downloaded genome and annotation, RNASEQ requires the parameter --fasta and --gtf as a minimum to start the analysis. Also include the --save-reference parameter.

nextflow run {rnaseq} \
  ...
  --fasta {path/to/fasta}/Homo_sapiens.GRCh38.dna.toplevel.fa.gz \
  --gtf {path/to/gtf}/Homo_sapiens.GRCh38.110.gtf.gz \
  --save-reference \
  --outdir results \
  ...

Copy/Move the genome folder

Once the pipeline is finished, the genome files and indexes will be found in {outdir}/genome. Transfer this to a shared location.

Using /work/datasets/reference/nextflow as the base, use Species/Provider/Release folder structure.

cd {RNASEQ run folder}/results
cp genome /work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38

Adjust permissions so everyone has Read access to the files and folders.

Update the Genome Config file

Add a new section to the qutgenome.config file. Include -local in the name so there are no conflicts with the iGenomes references.

vi /work/datasets/reference/nextflow/qutgenome.config

Eg, Ensembl Homo_sapiens GRCh38:

'GRCh38-local' {
      fasta            = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/Homo_sapiens.GRCh38.dna.toplevel.fa'
      transcript_fasta = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/genome.transcripts.fa'
      gtf              = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/Homo_sapiens.GRCh38.110.gtf'
      bed12            = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/Homo_sapiens.GRCh38.110.bed'
      star             = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/index/star'
      rsem             = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/rsem/'
      salmon           = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/index/salmon/'
    }