Page Comparison

...

Code Block
cat metadata.txt \| awk '{print $1}' \| sed 1d > SraAccList.txt

Check SraAccList.txt (i.e., cat SraAccList.txt):

Code Block
SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516 SRR1039517 SRR1039520 SRR1039521

Once the list of wanted SRA accession IDs is ready, use a PBS Pro submission script to fetch all the sequences. Note, data will be downloaded to the folder where the job is submitted. Example script (fetch_SraAccList.pbs):

Code Block

#!/bin/bash -l
#PBS -N sra_fetch
#PBS -l walltime=8:00:00
#PBS -l mem=8gb
#PBS -l ncpus=4
#PBS -m bae
###PBS -M email@host
#PBS -j oe

#Usage: qsub fetch_SraAccList.pbs

cd $PBS_O_WORKDIR

for i in `cat SraAccList.txt`;
do 
  echo $i
  prefetch $i
  fastq-dump --split-files $i  
done

Pre-processing of public data

Downloaded public data for the airway smooth muscle project show size differences between ‘Read 1’ and ‘Read 2’ FASTQ files. Prior to running the nextflow nf-core/RNAseq pipeline, downloaded raw data will be quality checked using default trim galore options:

Code Block

#!/bin/bash -l
#PBS -N QC_P1-6
#PBS -l walltime=2:00:00
#PBS -l mem=8gb
#PBS -l ncpus=4
#PBS -m bae
#PBS -M email@host
#PBS -j oe


#User-defined parameters:
SAMPLEID=SRR1039513
READ1=SRR1039513_1.fastq
READ2=SRR1039513_2.fastq

#Pipeline:

cd $PBS_O_WORKDIR

#make output folder
mkdir -p trimgalore

# Remove adaptors and poor quality bases/reads using trimgalore. Minimal quality score of 20 (-q20) and minimal length of 50 bases (--length 50)
trim_galore --length 50 --cores 4 --paired -q 20 --fastqc  -o ./trimgalore ${READ1} ${READ2}

Versions Compared

Old Version 4

New Version Current

Key

Pre-processing of public data