Content Comparison

Today’s we will learn to download FASTQ files from a published paper:

...

Click on the link above and search for “Accession”, “Data availability”, “BioProject ID” or “GEO accession code”
If, only a GEO accession code is available, go to the GEO database and look for BioProject ID - Note, ENA (Step2) requires this identifier to download the data.

Which BioProject ID host the data used in the above manuscript?

Expand

title	Solution

PRJNA862097

...

STEP3: (if applicable) select one or more BioProject submission(s). Click on the first listed BioProject ID:

...

STEP4: Select FASTQ files of interests (tick boxes next to the file names) and click on “Get download script”

...

. This will download a bash script (e.g.,
View file
name ena-file-download-selected-files-20241009-0005.sh
)

...

Open the downloaded ena file using TextEdit (NotePad or similar app). The downloaded script looks like this:

Code Block

wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/001/SRR1039511/SRR1039511_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/000/SRR1039520/SRR1039520_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039519/SRR1039519_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/004/SRR1039514/SRR1039514_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/001/SRR1039521/SRR1039521_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/000/SRR1039520/SRR1039520_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/001/SRR1039521/SRR1039521_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/000/SRR1039510/SRR1039510_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/008/SRR1039508/SRR1039508_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/000/SRR1039510/SRR1039510_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/008/SRR1039518/SRR1039518_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/007/SRR1039517/SRR1039517_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039509/SRR1039509_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/004/SRR1039514/SRR1039514_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/001/SRR1039511/SRR1039511_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039519/SRR1039519_1.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/007/SRR1039517/SRR1039517_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/008/SRR1039508/SRR1039508_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039509/SRR1039509_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/008/SRR1039518/SRR1039518_2.fastq.gz
wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/000/SRR1039520/SRR1039520.fastq.gz

Now using the TextEdit or NotePad app, we will add the following lines to the top of the script:

Code Block
#!/bin/bash -l #PBS -N ENAdownload #PBS -l walltime=72:00:00 #PBS -l mem=16gb #PBS -l ncpus=8

Copy the script to your HPC working folder $HOME/workshop/2024-2/session4_RNAseq/data
See below how to drag and drop the file using File Finder

NOTE: To proceed, you need to be on QUT’s WiFi network or signed via VPN.

To browse the working folder in the HPC type in the file finder:

...

First, prepare a file with the list of SRA IDs of interest to be downloaded:

Hint:

In the terminal create a new folder called ‘fetchngs’. For example:

Code Block
mkdir $HOME/workshop/2024-2/session4_RNAseq/data/fetchngs #then, move to the newly create folder cd $HOME/workshop/2024-2/session4_RNAseq/data/fetchngs

Copy the following list of IDs. Hint click on the top right corner of the block below to copy the text.

...

Alternatively, instead of list of SSR identifiers it is possible to download all data in a given BioProject ID:

Code Block
PRJNA862097

NOTE: Either the list above or citing the BioProject ID in the ‘ids.csv’ file will download exactly the same data.

Create a ‘ids.csv’. file using nano and paste the list of IDs:

...

Next, copy and paste the following PBS script to download the specified files in ‘ids.csv’.
NOTE: instead of listing individual SRR identifiers it is also possible to list the BioProject ID (e.g., PRJNA862107) which will fetch all SRR samples automatically.

Secondly, create a launch PBS script to download the data for the above IDs

...

Version	Old Version 6	New Version 7
Changes made by	Roberto Barrero Gumiel	Roberto Barrero Gumiel
Saved on	Oct 11, 2024	Oct 11, 2024

Versions Compared

Key