miRBase and MirGeneDB
Download Reference microRNA data from miRBase
First, let’s create a folder to store the reference datasets:
mkdir -p $HOME/workshop/2024-2/session6_smallRNAseq/data/references
Now move to the reference folder and download the miRBase datasets using wget in an Interactive session or (see below) use a PBS Pro script.
OPTION #1: Use interactive session to run the following commands:
Fetch microRNA mature sequences:
wget https://mirbase.org/download/mature.fa
Fetch hairpin sequences:
wget https://mirbase.org/download/hairpin.fa
Fetch the genomic coordinated for precursors and mature sequences:
wget https://mirbase.org/download/hsa.gff3
OPTION #2: submit the following PBS Pro script to the cluster. Before running the script, create a ‘reference’ folder (i.e., /myteam/data/reference/ ).
#!/bin/bash -l #PBS -N nfsmrnaseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=2:00:00 cd $PBS_O_WORKDIR wget https://www.mirbase.org/download/hairpin.fa wget https://www.mirbase.org/download/mature.fa wget https://www.mirbase.org/download/hsa.gff3
Fetch public small RNA-seq data using SRA tools
For this approach you will need to have a list of SRA identifiers. For example, for the human Huntington Disease study the list of identifiers are:
ERR409878 ERR409879 ERR409880 ERR409881 ERR409882 ERR409883 ERR409884 ERR409885 ERR409886 ERR409887 ERR409888 ERR409889 ERR409890 ERR409891 ERR409892 ERR409893 ERR409894 ERR409895 ERR409896 ERR409897 ERR409898 ERR409899 ERR409900
The above list has been already prepared for you, fetch a copy of the list of IDs into your “my data” folder created previously:
cp /work/training/2024/smallRNAseq/data/human_disease/SRA_Acc_List.txt $HOME/workshop/2024-2/session6_smallRNAseq/data/mydata
Now let’s also get a copy of the “launch_fetch_SRA.pbs” script into your “my data” folder:
cp /work/training/2024/smallRNAseq/data/human_disease/launch_fetch_SRA.pbs $HOME/workshop/2024-2/session6_smallRNAseq/data/mydata
Check the content of the script:
cat launch_fetch_SRA.pbs
#!/bin/bash -l #PBS -N rna #PBS -l select=1:ncpus=1:mem=8gb #PBS -l walltime=24:00:00 #Enable the container modules source /pkg/shpc/enable #Load the SRA-TOOLS module module load sra-tools/3.0.5--h9f5acd7_1 #work on current directory (folder) cd $PBS_O_WORKDIR for i in $(cat SRR_Acc_List.txt); do echo $i prefetch.3 $i fasterq-dump.3 --split-files $i done gzip *fastq
submit PBS script to the HPC cluster
qsub launch_fetch_SRAfiles.pbs
monitor job progression
qjobs