When performing a NF-Core RNASEQ analysis a number of indexes are built if not provided. The time to calculate these indexes can be excessive.
Downloading the genome files can be a slow process. Having pre downloaded reference files can speed up the time to complete the pipeline.
RNASEQ has a parameter --save-reference that can be used to save the genome and indexes in the {outdir}/genome folder.
This folder can be transferred to a shared place on the HPC so others can accelerate there RNASEQ analysys by not building indexes.
Download Reference
For Example, GRCh38.p14 from Ensembl:
Homo_sapiens - Ensembl genome browser 110
https://ftp.ensembl.org/pub/release-110/gtf/homo_sapiens/Homo_sapiens.GRCh38.110.gtf.gz
Build the Genome and Index
If using an already downloaded genome and annotation, RNASEQ requires the parameter --fasta and --gtf as a minimum to start the analysis. Also include the --save-reference parameter.
nextflow run {rnaseq} \ ... --fasta {path/to/fasta}/Homo_sapiens.GRCh38.dna.toplevel.fa.gz \ --gtf {path/to/gtf}/Homo_sapiens.GRCh38.110.gtf.gz \ --save-reference \ --outdir results \ ...
Copy/Move the genome folder
Once the pipeline is finished, the genome files and indexes will be found in {outdir}/genome. Transfer this to a shared location.
Using /work/datasets/reference/nextflow as the base, use Species/Provider/Release folder structure.
cd {RNASEQ run folder}/results cp genome /work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38
Adjust permissions so everyone has Read access to the files and folders.
Update the Genome Config file
Add a new section to the qutgenome.config file. Include -local in the name so there are no conflicts with the iGenomes references.
vi /work/datasets/reference/nextflow/qutgenome.config
Eg, Ensembl Homo_sapiens GRCh38:
'GRCh38-local' { fasta = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/Homo_sapiens.GRCh38.dna.toplevel.fa' transcript_fasta = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/genome.transcripts.fa' gtf = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/Homo_sapiens.GRCh38.110.gtf' bed12 = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/Homo_sapiens.GRCh38.110.bed' star = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/index/star' rsem = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/rsem/' salmon = '/work/datasets/reference/nextflow/Homo_sapiens/Ensembl/GRCh38/genome/index/salmon/' }