Building offline Reference Genome
When performing a NF-Core RNASEQ analysis, a number of indexes are built if not provided. The time to calculate these indexes can be excessive.
Downloading the genome files can be a slow process. Having pre downloaded reference files can speed up the time to complete the pipeline.
RNASEQ has a parameter --save-reference that can be used to save the genome and indexes in the {outdir}/genome folder.
This folder can be transferred to a shared place on the HPC so others can accelerate there RNASEQ analysys by not building indexes.
Download Reference
For Example, GRCh38.p14 from Ensembl:
Homo_sapiens - Ensembl genome browser 110
https://ftp.ensembl.org/pub/release-110/gtf/homo_sapiens/Homo_sapiens.GRCh38.110.gtf.gz
Build the Genome and Index
If using an already downloaded genome and annotation, RNASEQ requires the parameter --fasta and --gtf as a minimum to start the analysis. Also include the --save-reference parameter.
nextflow run {rnaseq} \
...
--fasta {path/to/fasta}/Homo_sapiens.GRCh38.dna.toplevel.fa.gz \
--gtf {path/to/gtf}/Homo_sapiens.GRCh38.110.gtf.gz \
--save-reference \
--outdir results \
...
Copy/Move the genome folder
Once the pipeline is finished, the genome files and indexes will be found in {outdir}/genome. Transfer this to a shared location.
Using /work/datasets/reference/nextflow as the base, use Species/Provider/Release folder structure.
cd {RNASEQ run folder}/results
cp genome /work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38
Adjust permissions so everyone has Read access to the files and folders.
Update the Genome Config file
Add a new section to the qutgenome.config file. Include -local in the name so there are no conflicts with the iGenomes references.
vi /work/datasets/reference/nextflow/qutgenome.config
Eg, Ensembl Homo_sapiens GRCh38: