eResearch Downloading public data

Aims:

  • Use the European National Archive (ENA) to search and fetch data of interest.

  • Learn how to download public RNAseq data using the HPC.

Work in your Desktop / Laptop

Work in your Desktop / Laptop

ENA link: https://www.ebi.ac.uk/ena/browser/view/PRJNA862107

Search for data of interest

In the ‘view search box,' enter one of the following identifiers:

  • To fetch all or selected files in the project:

  • To fetch Individual files:

Once there, you can download any associated files by clicking the relevant links and then clicking on “Get download script.”

Mouse: Project PRJNA862107

wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/072/SRR20622172/SRR20622172.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/073/SRR20622173/SRR20622173.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/077/SRR20622177/SRR20622177.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/076/SRR20622176/SRR20622176.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/080/SRR20622180/SRR20622180.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/074/SRR20622174/SRR20622174.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/078/SRR20622178/SRR20622178.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/079/SRR20622179/SRR20622179.fastq.gz wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR206/075/SRR20622175/SRR20622175.fastq.gz

 

Work in the HPC

Work in the HPC

Use the terminal to log into the HPC and create a /data/ folder to download FASTQ files. For example:

mkdir -p $HOME/workshop mkdir -p $HOME/workshop/data cd $HOME/workshop/data

Work in your Desktop / Laptop

Work in your Desktop / Laptop

Copy the ENA downloaded script to the newly created “data” folder in the HPC.

Use the ‘File Finder’ to connect to the HPC:

Windows (click on the tab of the file finder and type or copy-paste the following):

\\hpc-fs\work

Mac (Command + K):

Navigate to the /workshop/data/ folder and then “Drag & Drop” the ENA downloaded file.

Work in the HPC

Work in the HPC

Now, let’s create a PBS Pro submission script to download the data. Two options are described below, use either option:

 

Option 1: Use a script to read the ENA file

Tip: this option does not require the use of a text editor like vi or nano.

First, let’s get a copy of a script called “launch_read_ENA_download.pbs” as follows:

List the files in the directory:

You should have the ENA file and the launch script. For example:

Now, let’s submit the following job to the HPC cluster. We use the ‘qsub’ command to submit the script to the HPC, and we specify as a variable (-v) the “input_file” name of the ENA file. For example:

You can monitor the progress of the job by running the following command:

 

NOTE: The following code is for your reference only. We will not run the following code in the HPC. The content of the launch_input_ENA_download.sh is:

 

Option 2: Create a PBS Pro submission script:

(For advanced users): Use vi or nano text editors to create the following PBS Pro script. Copy the following code and paste it into a new file called, for example, launch_ENA_download_SRR206.pbs

Once you have the launch PBS script ready, proceed to submit it to the HPC as follows:

You can monitor the progress of the job by running the following command:

Additional data download user guides

To find additional options to download public or private data to the HPC see Data Download

Tip: additional options include i) NCBI’s Short Read Archive (SRA), and ii) Illumina’s BaseScpace.