...
Code Block |
---|
cat metadata.txt | awk '{print $1}' | sed 1d > SraAccList.txt |
Check SraAccList.txt (i.e., cat SraAccList.txt):
Code Block |
---|
SRR1039508
SRR1039509
SRR1039512
SRR1039513
SRR1039516
SRR1039517
SRR1039520
SRR1039521 |
Once the list of wanted SRA accession IDs is ready, use a PBS Pro submission script to fetch all the sequences. Note, data will be downloaded to the folder where the job is submitted. Example script (fetch_SraAccList.pbs):
Code Block |
---|
#!/bin/bash -l #PBS -N sra_fetch #PBS -l walltime=8:00:00 #PBS -l mem=8gb #PBS -l ncpus=4 #PBS -m bae ###PBS -M email@host #PBS -j oe #Usage: qsub fetch_SraAccList.pbs cd $PBS_O_WORKDIR for i in `cat SraAccList.txt`; do echo $i prefetch $i fastq-dump --split-files $i done |
Pre-processing of public data
Downloaded public data for the airway smooth muscle project show size differences between ‘Read 1’ and ‘Read 2’ FASTQ files. Prior to running the nextflow nf-core/RNAseq pipeline, downloaded raw data will be quality checked using default trim galore options:
Code Block |
---|
#!/bin/bash -l
#PBS -N QC_P1-6
#PBS -l walltime=2:00:00
#PBS -l mem=8gb
#PBS -l ncpus=4
#PBS -m bae
#PBS -M email@host
#PBS -j oe
#User-defined parameters:
SAMPLEID=SRR1039513
READ1=SRR1039513_1.fastq
READ2=SRR1039513_2.fastq
#Pipeline:
cd $PBS_O_WORKDIR
#make output folder
mkdir -p trimgalore
# Remove adaptors and poor quality bases/reads using trimgalore. Minimal quality score of 20 (-q20) and minimal length of 50 bases (--length 50)
trim_galore --length 50 --cores 4 --paired -q 20 --fastqc -o ./trimgalore ${READ1} ${READ2} |