BaseSpace data download
Aim:
Download sequencing data generated for your project from Illumina's BaseSpace to your HPC account/
Download the data from BaseSpace
Create a BaseSpace account
Ask for access to Project (having a link to the project should be enough) from the project owner.
Install BaseScapce Sequence Hub
source: https://developer.basespace.illumina.com/docs/content/documentation/cli/cli-overview
Log into QUT’s HPC using your account credentials (see above)
Run the following command:
wget "https://launch.basespace.illumina.com/CLI/latest/amd64-linux/bs" -O $HOME/bin/bs
then change the file permissions to make the downloaded binary executable:
$ chmod u+x $HOME/bin/bs
Authenticate the connection to the BaseSpace server:
bs auth
Find the project ID for the data you are interested in downloading. For example, you can derive this from the URL when logged into BaseSpace.
NOTE: If you have not yet created a folder to store raw data, you can run for example, the following command:
mkdir $HOME/data/
then you can create a subfolder for your project, for example:
mkdir $HOME/data/myProjectName/
Example: now you are ready to download the Fastq.gz files
bs download project -i 357263934 -o $HOME/data/myProjectName/ --extension=fastq.gz
Download files using a PBS Pro script (i.e., called launch_fetch_BaseSpaceData.pbs):
#!/bin/bash -l
#PBS -N fetchBaseSpace
#PBS -l walltime=24:00:00
#PBS -l mem=32gb
#PBS -l ncpus=8
#PBS -m bae
#PBS -M email@host
#PBS -j oe
cd $PBS_O_WORKDIR
#fetch data from BaseSpace by indicating the Project ID (-i parameter)
bs download project -i 357263934 -o /my/project/raw_data --extension=fastq.gz
Where: -i is ‘project ID number'; -o 'path to the output folder, where data will be downloaded.’
Submit the job to the PBS Pro scheduler (queue):
qsub launch_fetch_BaseSpaceData.pbs
Monitor progress:
qjobs
It can take ~15-20 min to download ~170GB of data.