Downloading data from BaseSpace

Aim:

Download sequencing data generated for your project from Illumina's BaseSpace to your HPC account/

Pre-requisites

If you do not yet have an account at QUT’s HPC, apply for a new account here.

Additionally, we recommend users be familiar with the following topics:

Installed conda3 or miniconda3 ( https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html )
Basic unix command line knowledge (example: https://researchcomputing.princeton.edu/education/external-online-resources/linux ; https://swcarpentry.github.io/shell-novice/ )
Familiarity with one unix text editors (example Vi/Vim or Nano):
- VIM ( VIM Guide | Computational Biology Core ; https://missing.csail.mit.edu/2020/editors/)
- Nano (https://engineering.purdue.edu/ECN/Support/KB/Docs/BasictutorialforNanou ; https://www.howtogeek.com/howto/42980/the-beginners-guide-to-nano-the-linux-command-line-text-editor/ )

Download the data from BaseSpace

Create a BaseSpace account
Ask for access to Project (having a link to the project should be enough) from the project owner.

Install BaseScapce Sequence Hub

source: https://developer.basespace.illumina.com/docs/content/documentation/cli/cli-overview

Log into QUT’s HPC using your account credentials (see above)

The run the following command:

wget "https://launch.basespace.illumina.com/CLI/latest/amd64-linux/bs" -O $HOME/bin/bs

then change the file permissions to make the downloaded binary executable:

$ chmod u+x $HOME/bin/bs

Authenticate the connection to BaseSpace server:

bs auth

Find the project ID for the data you are interested to download. For example, you can derive this from the url when logged into BaseSpace

NOTE: If you have not yet created a folder to store raw data, you can run for example the following command:

mkdir $HOME/data/

then you can create a subfolder for your project, for example:

mkdir $HOME/data/myProjectName/

Example: now you are ready to download the Fastq.gz files

bs download project --name 357263934 -o $HOME/data/myProjectName/ --extension=fastq.gz

Download files using a PBS Pro script (i.e., called launch_fetch_BaseSpaceData.pbs):

#!/bin/bash -l
#PBS -N fetchBaseSpace
#PBS -l walltime=24:00:00
#PBS -l mem=32gb
#PBS -l ncpus=8
#PBS -m bae
#PBS -M email@host
#PBS -j oe

cd $PBS_O_WORKDIR

#fetch data from BaseSpace by indicating the Project ID (-i parameter)
bs download project --name 357263934 -o /my/project/raw_data --extension=fastq.gz

Submit the job to the PBS Pro scheduler (queue):

qsub launch_fetch_BaseSpaceData.pbs

Monitor progress:

qjobs

It can take ~15-20 min to download ~170GB of data.

ER-User Guides

Downloading data from BaseSpace

Aim:

Pre-requisites

Download the data from BaseSpace

Install BaseScapce Sequence Hub

Related content