Background and external resources
...
You can also use SRA Explorer to view all files in a project and download all or some of them: https://www.biostars.org/p/385930/
Goal
Download public data deposited in NCBI’s Short Read Archive (SRA) database.
Pre-requisites (if not available)
Installed conda3 or miniconda3 ( https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html )
Basic unix command line knowledge (example: https://researchcomputing.princeton.edu/education/external-online-resources/linux ; https://swcarpentry.github.io/shell-novice/ )
Familiarity with one unix text editors (example Vi/Vim or Nano):
Installing miniconda
https://docs.conda.io/en/latest/miniconda.html#linux-installers
Code Block |
---|
bash Miniconda3-latest-Linux-x86_64.sh |
Install sra-tools
Once conda is installed in the instance. Go to https://anaconda.org and search for sra-tools. Copy and paste the command to install the tool in your HPC account:
Code Block |
---|
conda install -c bioconda sra-tools |
Download SRA filesSubmit
a Example: PBS script (launch_fetch_SRAfiles.pbs) to fetch multiple files from SRA filesdatabase
Code Block |
---|
#!/bin/bash -l #PBS -N SRAfiles #PBS -l walltime=2:00:00 #PBS -l mem=4gb #PBS -l ncpus=2 #PBS -m bae ###PBS -M email@host #PBS -j oe cd $PBS_O_WORKDIR ### User defined SRA identifiers SRAIDACCESSIONS=SRR1002659,SRR1002660,SRR1002661,SRR1002662 ### Pipeline #Step1: Download SRA file prefetch ${SRAIDACCESSIONS} #Step2: Extract FASTQ file(s) from SRA file fastq-dump --split-files ${SRAID}ACCESSIONS} |
submit PBS script to the HPC cluster
Code Block |
---|
qsub launch_fetch_SRAfiles.pbs |
monitor job progression
Code Block |
---|
qjobs |