Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Today’s we will learn to download FASTQ files from a published paper:

...

  • Click on the link above and search for “accession”, “Data availability”, “BioProject ID” or “GEO accession code”

  • If, only a GEO accession code is available, go to the GEO database and look for BioProject ID - Note, ENA (Step2) requires this identifier to download the data.

Which BioProject ID host the data used in the above manuscript?

Expand
titleSolution

PRJNA862097

...

Code Block
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/001044/SRR1039511SRR20630344/SRR1039511_2SRR20630344.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/000049/SRR1039520SRR20630349/SRR1039520_2SRR20630349.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/009055/SRR1039519SRR20630355/SRR1039519_2SRR20630355.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/004047/SRR1039514SRR20630347/SRR1039514_1SRR20630347.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/001050/SRR1039521SRR20630350/SRR1039521_2SRR20630350.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/000042/SRR1039520SRR20630342/SRR1039520_1SRR20630342.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/001053/SRR1039521SRR20630353/SRR1039521_1SRR20630353.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/000043/SRR1039510SRR20630343/SRR1039510_2SRR20630343.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/008039/SRR1039508SRR20630339/SRR1039508_1SRR20630339.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/000056/SRR1039510SRR20630356/SRR1039510_1SRR20630356.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/008054/SRR1039518SRR20630354/SRR1039518_1SRR20630354.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/007041/SRR1039517SRR20630341/SRR1039517_1SRR20630341.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/009045/SRR1039509SRR20630345/SRR1039509_1SRR20630345.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/004051/SRR1039514SRR20630351/SRR1039514_2SRR20630351.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/001040/SRR1039511SRR20630340/SRR1039511_1SRR20630340.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/009048/SRR1039519SRR20630348/SRR1039519_1SRR20630348.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/007052/SRR1039517SRR20630352/SRR1039517_2SRR20630352.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/008046/SRR1039508/SRR1039508_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039509/SRR1039509_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/008/SRR1039518/SRR1039518_2SRR20630346/SRR20630346.fastq.gz

Now using the TextEdit or NotePad app, we will add the following lines to the top of the script - copy and paste the following to the above script:

...

Code Block
#!/bin/bash -l
#PBS -N nfrnaseq_test
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#work on current directory (folder)
cd $PBS_O_WORKDIR

wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/001044/SRR1039511SRR20630344/SRR1039511_2SRR20630344.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/000049/SRR1039520SRR20630349/SRR1039520_2SRR20630349.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/009055/SRR1039519SRR20630355/SRR1039519_2SRR20630355.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/004047/SRR1039514SRR20630347/SRR1039514_1SRR20630347.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/001050/SRR1039521SRR20630350/SRR1039521_2SRR20630350.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/000042/SRR1039520SRR20630342/SRR1039520_1SRR20630342.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/001053/SRR1039521SRR20630353/SRR1039521_1SRR20630353.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/000043/SRR1039510SRR20630343/SRR1039510_2SRR20630343.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/008039/SRR1039508SRR20630339/SRR1039508_1SRR20630339.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/000056/SRR1039510SRR20630356/SRR1039510_1SRR20630356.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/008054/SRR1039518SRR20630354/SRR1039518_1SRR20630354.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/007041/SRR1039517SRR20630341/SRR1039517_1SRR20630341.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/009045/SRR1039509SRR20630345/SRR1039509_1SRR20630345.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/004051/SRR1039514SRR20630351/SRR1039514_2SRR20630351.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/001040/SRR1039511SRR20630340/SRR1039511_1SRR20630340.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/009048/SRR1039519SRR20630348/SRR1039519_1SRR20630348.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/007052/SRR1039517SRR20630352/SRR1039517_2SRR20630352.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103SRR206/008046/SRR1039508/SRR1039508_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039509/SRR1039509_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/008/SRR1039518/SRR1039518_2SRR20630346/SRR20630346.fastq.gz

STEP 6: Save the file and now let’s transfer it to the HPC. See below:

NOTE: To proceed, you need to be on QUT’s WiFi network or signed via VPN.

Windows PC: open file finder and type the address below to connect to your home directory in the HPC, and then browse to the /workshop/2024-2/session4_RNAseq/data/mydata folder

...

Code Block
dos2unix -n ena-file-download-selected-files-20241009-0005.sh ena-file-download-selected-files-20241009-0005.txt
  • Note: If you create a file using Microsoft Excel, it is likely that it will add ASCII characters, use dos2unix to remove such characters.

Now we are ready to submit to the HPC cluster the script to download FASTQ files:

...

Monitor progress of job:

Code Block
qjobs
  • Note: Downloading the above datasets will take about ~50 minutes.

Find in the link below alternative approaches to download data from SRA, BaseSpace or use the nf-core/fetchngs pipeline:

...