/
eResearch Downloading public data

eResearch Downloading public data

Aims:

  • Use the European National Archive (ENA) to search and fetch data of interest.

  • Learn how to download public RNAseq data using the HPC.

Work in your Desktop / Laptop

Work in your Desktop / Laptop

ENA link: https://www.ebi.ac.uk/ena/browser/view/PRJNA862107

Search for data of interest

In the ‘view search box,' enter one of the following identifiers:

  • To fetch all or selected files in the project:

  • To fetch Individual files:

Once there, you can download any associated files by clicking the relevant links and then clicking on “Get download script.”

Mouse: Project PRJNA862107

wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/072/SRR20622172/SRR20622172.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/073/SRR20622173/SRR20622173.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/077/SRR20622177/SRR20622177.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/076/SRR20622176/SRR20622176.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/080/SRR20622180/SRR20622180.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/074/SRR20622174/SRR20622174.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/078/SRR20622178/SRR20622178.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/079/SRR20622179/SRR20622179.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/075/SRR20622175/SRR20622175.fastq.gz

 

Work in the HPC

Work in the HPC

Use the terminal to log into the HPC and create a /data/ folder to download FASTQ files. For example:

mkdir -p $HOME/workshop mkdir -p $HOME/workshop/data cd $HOME/workshop/data

Work in your Desktop / Laptop

Work in your Desktop / Laptop

Copy the ENA downloaded script to the newly created “data” folder in the HPC.

Use the ‘File Finder’ to connect to the HPC:

Windows (click on the tab of the file finder and type or copy-paste the following):

\\hpc-fs\work

Mac (Command + K):

Navigate to the /workshop/data/ folder and then “Drag & Drop” the ENA downloaded file.

Work in the HPC

Work in the HPC

Now, let’s create a PBS Pro submission script to download the data. Two options are described below, use either option:

 

Option 1: Use a script to read the ENA file

Tip: this option does not require the use of a text editor like vi or nano.

First, let’s get a copy of a script called “launch_read_ENA_download.pbs” as follows:

List the files in the directory:

You should have the ENA file and the launch script. For example:

Now, let’s submit the following job to the HPC cluster. We use the ‘qsub’ command to submit the script to the HPC, and we specify as a variable (-v) the “input_file” name of the ENA file. For example:

You can monitor the progress of the job by running the following command:

 

NOTE: The following code is for your reference only. We will not run the following code in the HPC. The content of the launch_input_ENA_download.sh is:

 

Option 2: Create a PBS Pro submission script:

(For advanced users): Use vi or nano text editors to create the following PBS Pro script. Copy the following code and paste it into a new file called, for example, launch_ENA_download_SRR206.pbs

Once you have the launch PBS script ready, proceed to submit it to the HPC as follows:

You can monitor the progress of the job by running the following command:

Additional data download user guides

To find additional options to download public or private data to the HPC see Data Download

Tip: additional options include i) NCBI’s Short Read Archive (SRA), and ii) Illumina’s BaseScpace.

Related content

2024 eResearch - Session 4: nf-core-RNAseq pipeline
2024 eResearch - Session 4: nf-core-RNAseq pipeline
Read with this
3. Fetch public RNA-seq data
3. Fetch public RNA-seq data
More like this
2024 - Semester Two: Hands on training workshops - HPC and Gene Expression (RNAseq and small RNAseq) analyses
2024 - Semester Two: Hands on training workshops - HPC and Gene Expression (RNAseq and small RNAseq) analyses
Read with this
Download data from ENA to the on premise HPC
Download data from ENA to the on premise HPC
More like this
3. Running pipelines
3. Running pipelines
Read with this
Deprecated - SRA using SRA Toolkit
Deprecated - SRA using SRA Toolkit
More like this