Run RNA-seq pipeline using the Telomere-2-Telomere (T2T) latest human genome
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies (Nurk et al., Science, 2022 https://www.science.org/doi/10.1126/science.abj6987).
...
The latest T2T human genome and annotation has been downloaded from NCBI:
...
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/
You can access this genome at:
...
/work/training/references/ncbi/T2T
Check available files:
Code Block |
---|
ls -l /work/training/references/ncbi/T2T/ |
...
We can use the nf-core/rnaseq pipeline to profile the expression of genes in a custom genome (e.g., T2T or any animal or plant genome) of your interest, as long as there is a reference genome (FASTA file) and genome annotation (GTF or GFF3).
What parameters can be used to use a custom genome?
Code Block |
---|
--fasta my_custom_genome.fasta # de novo assembled genome or genome not available as an igenomes reference --gtf my_custom_genome.gtf # genome annotatio showing the location of genes |
...
Code Block |
---|
cp $HOME/workshop/2024-2/session4_RNAseq/data/samplesheet.csv $HOME/workshop/2024-2/session4_RNAseq/runs/run4_RNAseq_T2T cp $HOME/workshop/2024-2/session4_RNAseq/scripts/launch_nf-core_RNAseq_pipeline_T2T.pbs $HOME/workshop/2024-2/session4_RNAseq/runs/run4_RNAseq_T2T cd $HOME/workshop/2024-2/session4_RNAseq/runs/run4_RNAseq/T2T |
Line 1: Copy the samplesheet.csv file to the working directory
Line 2: Copy the launch scrip to run expression profiling using the T2T genome
Print the content of the “launch_nf_core_RNAseq_T2T.pbs” script:
...
Code Block |
---|
qsub launch_nf_core_RNAseq_T2T.pbs |
Tip: Read the help information for Nextflow pipelines
Information on how to run a nextflow pipeline and additional available parameters can be provided on the pipeline website (i.e., https://nf-co.re/rnaseq/3.12.0/docs/usage/ ). You can also run the following command to get help information:
...