Uploading sequence data to SRA

When submitting a manuscript to a journal, you are typically required to upload your data (usually fastq files) to an online repository, so that other researchers can download your dataset and replicate your analysis.

SRA (Sequence Read Archive) is one of the largest of these repositories.

SRA website

In this guide, we step through the process up uploading your data files that are stored on the HPC to SRA.

Getting started

There is a useful guide and links to uploading your data to SRA here:

SRA Submission Quick Start

You should have your sequence data - usually fastq files but can also be bam or other formats - in one directory on the HPC. If they are in a few different directories or in a directory with other data files of the same type (e.g. other fastq files), then create a new directory and copy them to there.

The files should be gzipped. Use the command line gzip tool to create *.gz files, if they are not already gzipped.

1. Log in to submission portal wizard

The portal wizard will step you through the upload process.

SRA Submission Portal

You will need an NCBI account and be logged on with this account to use the wizard. If you don’t have an account, create one now, then log on with that and then click on the above link.

2. Create new submission

Click on the ‘New Submission’ button.

In the ‘Submitter’ tab, enter or update your details (name, organization, etc)

Click ‘continue’

In the ‘General Information’ tab, select ‘No’ for both ‘Did you already register a BioProject for this research..?’ and ‘Did you already register a BioSample for this sample’ question. You will register your BioProject and BioSample as part of the submission process.

For the ‘When should this submission be released to the public?’ question, you can choose to release the data publicly immediately, or wait for a specific date of when the data is published. This depends on the sensitivity of your data.

Click ‘continue’

In the ‘Project Info’ tab, Give your project a title and a description. Usually the data is attached to a draft manuscript, so you can give the manuscript title and the abstract.

Most of the other sections for this tab are optional to fill in. Your choice.

Click ‘continue’

In the ‘Sample Type’ tab you need to select a ‘package’ for the organism your samples are from. Choose one of plant, invertebrate, model organism, etc.

Click ‘continue’

In ‘Biosample attributes’ tab you can choose to enter your sample information in a built-in table editor, or as a tab-delimited text file.

There are a number of individual required fields and multiple fields that you need to choose one to fill in. For example, you must fill in the ‘Organism’ field and you have to fill in one of either the ‘age’ or ‘development stage’ fields.

When you submit this table you will often get a ‘Your table upload failed because multiple BioSamples cannot have identical attributes’ error. The best way to fix this is by adding a new column at the end of the table ('Add column') that you call ‘Replicates’ and adding any replicate information you have there

Click ‘continue’

In ‘SRA metadata’ tab, you can also submit a tab-delimited text file or fill in an inbuilt table.

This is where you fill in your sequence data information: for each sample you provided in the previous section, you need to provide the data files (usually fastq.gz) files associated with this sample, and any library prep or sequencing platform information. Your sequencing provider should be able to provide you with this information.

Click ‘continue’

3. Uploading files using FTP (Lyra compatible)

In the ‘Files’ tab you can choose your method of uploading. In this guide we are assuming your data is on the QUT HPC, thus you’ll be uploading them via FTP from the Linux command line.

For the question ‘How do you want to provide files for this submission?', choose 'FTP or Aspera Command Line file preload’.

Click on ‘FTP upload instructions’ for details on setting up your upload directory on the SRA servers.

Here there will be an FTP server name, user name and access password provided for your submission.

Usually the FTP server is called 'ftp-private.ncbi.nlm.nih.gov', so on the command line connect to this by typing:

ftp ftp-private.ncbi.nlm.nih.gov

3. Using rVDI to upload data via aspera (Aqua compatible)

Connect to rVDI
Connect to Aqua work folder where the data to submit is located.
Download IBM aspera connect following this link: https://www.ibm.com/products/aspera/downloads#cds
If aspera connect is already on rVDI, select ‘repair’
Download add-on for edge
On the SRA submission portal, select ‘Web browser upload via HTTP or Aspera Connect plugin’
Select the Aqua folder in which the fastq files are located.
Start download.