nf-core/fetchngs pipeline

Download data using the nf-core/fetchngs pipeline

Source: fetchngs: Introduction

image-20240829-043553.png

 

Alternatively, to the above approach we can also use the nextflow nf-core/fetchngs pipeline to download data.

To run the this pipeline we need to inputs: 1) list of SRA identifiers and 2) PBS Pro script to fetch the data using sratools.

First, prepare a file with the list of SRA IDs of interest to be downloaded:

Hint:

  • In the terminal create a new folder called ‘fetchngs’. For example:

  • mkdir $HOME/workshop/2024-2/session4_RNAseq/data/fetchngs #then, move to the newly create folder cd $HOME/workshop/2024-2/session4_RNAseq/data/fetchngs
  • Copy the following list of IDs. Hint click on the top right corner of the block below to copy the text.

SRR20622172 SRR20622173 SRR20622177 SRR20622176 SRR20622180 SRR20622174 SRR20622178 SRR20622179 SRR20622175

Alternatively, instead of list of SSR identifiers it is possible to download all data in a given BioProject ID:

PRJNA862097

NOTE: Either the list above or citing the BioProject ID in the ‘ids.csv’ file will download exactly the same data.

  • Create a ‘ids.csv’. file using nano and paste the list of IDs:

  • Next, copy and paste the following PBS script to download the specified files in ‘ids.csv’.

  • NOTE: instead of listing individual SRR identifiers it is also possible to list the BioProject ID (e.g., PRJNA862107) which will fetch all SRR samples automatically.

Secondly, create a launch PBS script to download the data for the above IDs

  • Copy the block of code below. Hint click on the top right corner of the block below to copy the text.

  • Use nano to create a launch script, for example:

  • Paste the block of code above and save the file.

Submit the the download job to the HPC cluster:

Outputs: