Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
cat metadata.txt | awk '{print $1}' | sed 1d > SraAccList.txt

Check SraAccList.txt (i.e., cat SraAccList.txt):

Code Block
SRR1039508
SRR1039509
SRR1039512
SRR1039513
SRR1039516
SRR1039517
SRR1039520
SRR1039521

Once the list of wanted SRA accession IDs is ready, use a PBS Pro submission script to fetch all the sequences. Note, data will be downloaded to the folder where the job is submitted. Example script (fetch_SraAccList.pbs):

Code Block
#!/bin/bash -l
#PBS -N sra_fetch
#PBS -l walltime=8:00:00
#PBS -l mem=8gb
#PBS -l ncpus=4
#PBS -m bae
###PBS -M email@host
#PBS -j oe

#Usage: qsub fetch_SraAccList.pbs

cd $PBS_O_WORKDIR

for i in `cat SraAccList.txt`;
do 
  echo $i
  prefetch $i
  fastq-dump --split-files $i  
done

Pre-processing of public data

Downloaded public data for the airway smooth muscle project show size differences between ‘Read 1’ and ‘Read 2’ FASTQ files. Prior to running the nextflow nf-core/RNAseq pipeline, downloaded raw data will be quality checked using default trim galore options:

Code Block
#!/bin/bash -l
#PBS -N QC_P1-6
#PBS -l walltime=2:00:00
#PBS -l mem=8gb
#PBS -l ncpus=4
#PBS -m bae
#PBS -M email@host
#PBS -j oe


#User-defined parameters:
SAMPLEID=SRR1039513
READ1=SRR1039513_1.fastq
READ2=SRR1039513_2.fastq

#Pipeline:

cd $PBS_O_WORKDIR

#make output folder
mkdir -p trimgalore

# Remove adaptors and poor quality bases/reads using trimgalore. Minimal quality score of 20 (-q20) and minimal length of 50 bases (--length 50)
trim_galore --length 50 --cores 4 --paired -q 20 --fastqc  -o ./trimgalore ${READ1} ${READ2}