...
...
...
...
...
...
...
...
miRBase and MirGeneDB
Download Reference microRNA
...
data from miRBase
First, let’s create a folder to store the reference datasets:
...
Now move to the reference folder and download the miRBase datasets using wget in an Interactive session or (see below) use a PBS Pro script.
OPTION #1: Use interactive session to run the following commands:
...
Code Block |
---|
wget https://mirbase.org/download/hsa.gff3 |
OPTION #2: submit the following PBS Pro script to the cluster. Before running the script, create a ‘reference’ folder (i.e., /myteam/data/reference/ ).
Code Block |
---|
#!/bin/bash -l #PBS -N nfsmrnaseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=2:00:00 cd $PBS_O_WORKDIR wget https://www.mirbase.org/download/hairpin.fa wget https://www.mirbase.org/download/mature.fa wget https://www.mirbase.org/download/hsa.gff3 |
Fetch public small RNA-seq data using SRA tools
For this approach you will need to have a list of SRA identifiers. For example, for the human Huntington Disease study the list of identifiers are:
Code Block |
---|
ERR409878
ERR409879
ERR409880
ERR409881
ERR409882
ERR409883
ERR409884
ERR409885
ERR409886
ERR409887
ERR409888
ERR409889
ERR409890
ERR409891
ERR409892
ERR409893
ERR409894
ERR409895
ERR409896
ERR409897
ERR409898
ERR409899
ERR409900 |
The above list has been already prepared for you, fetch a copy of the list of IDs into your “my data” folder created previously:
Code Block |
---|
cp /work/training/2024/smallRNAseq/data/human_disease/SRA_Acc_List.txt $HOME/workshop/2024-2/session6_smallRNAseq/data/mydata |
Now let’s also get a copy of the “launch_fetch_SRA.pbs” script into your “my data” folder:
Code Block |
---|
cp /work/training/2024/smallRNAseq/data/human_disease/launch_fetch_SRA.pbs $HOME/workshop/2024-2/session6_smallRNAseq/data/mydata |
Check the content of the script:
Code Block |
---|
cat launch_fetch_SRA.pbs |
Code Block |
---|
#!/bin/bash -l
#PBS -N rna
#PBS -l select=1:ncpus=1:mem=8gb
#PBS -l walltime=24:00:00
#Enable the container modules
source /pkg/shpc/enable
#Load the SRA-TOOLS module
module load sra-tools/3.0.5--h9f5acd7_1
#work on current directory (folder)
cd $PBS_O_WORKDIR
for i in $(cat SRR_Acc_List.txt);
do
echo $i
prefetch.3 $i
fasterq-dump.3 --split-files $i
done
gzip *fastq |
submit PBS script to the HPC cluster
Code Block |
---|
qsub launch_fetch_SRAfiles.pbs |
monitor job progression
Code Block |
---|
qjobs |