Today’s we will learn to download FASTQ files from a published paper:
...
Click on the link above and search for “Accession”, “Data availability”, “BioProject ID” or “GEO accession code”
If, only a GEO accession code is available, go to the GEO database and look for BioProject ID - Note, ENA (Step2) requires this identifier to download the data.
Which BioProject ID host the data used in the above manuscript?
Expand | ||
---|---|---|
| ||
...
STEP4: Select FASTQ files (tick boxes next to the file names) and click on “Get download script”. This will download a bash script (e.g.,
)View file name ena-file-download-selected-files-20241009-0005.sh
...
STEP 5: Open the downloaded ena file using TextEdit (NotePad or similar app). The downloaded script looks like this:
...
Code Block |
---|
#!/bin/bash -l #PBS -N nfrnaseq_test #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/001/SRR1039511/SRR1039511_2.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/000/SRR1039520/SRR1039520_2.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039519/SRR1039519_2.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/004/SRR1039514/SRR1039514_1.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/001/SRR1039521/SRR1039521_2.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/000/SRR1039520/SRR1039520_1.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/001/SRR1039521/SRR1039521_1.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/000/SRR1039510/SRR1039510_2.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/008/SRR1039508/SRR1039508_1.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/000/SRR1039510/SRR1039510_1.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/008/SRR1039518/SRR1039518_1.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/007/SRR1039517/SRR1039517_1.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039509/SRR1039509_1.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/004/SRR1039514/SRR1039514_2.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/001/SRR1039511/SRR1039511_1.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039519/SRR1039519_1.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/007/SRR1039517/SRR1039517_2.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/008/SRR1039508/SRR1039508_2.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039509/SRR1039509_2.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/008/SRR1039518/SRR1039518_2.fastq.gz |
STEP 6: Save the file and now let’s transfer it to the HPC. See below:
NOTE: To proceed, you need to be on QUT’s WiFi network or signed via VPN.
Windows PC: open file finder and type the address below to connect to your home directory in the HPC, and then browse to the /workshop/2024-2/session4_RNAseq/data folder
...
Drag and drop the script into the /workshop/2024-2/session4_RNAseq/data folder
STEP 7: We will ensure the copied file from our laptop / desktop does not have unwanted characters. Let’s move to the data folder:
Code Block |
---|
cd #HOME/workshop/2024-2/session4_RNAseq/data |
How to use the dos2unix tool? Type:
Code Block |
---|
dos2unix --help |
Now let’s run dos2unix conversion. Note the filename may vary, so adjust the filename as appropriate.
Code Block |
---|
dos2unix -n ena-file-download-selected-files-20241009-0005.sh ena-file-download-selected-files-20241009-0005.txt |
eResearch Downloading public data
...