Aim
Assess sequence polymorphism in five horse genes of interest by comparing amplicon seq data of healthy and unhealthy horses.
...
Code Block |
---|
>EquCab3.0_Glucagon_ADIPOQ_201000001|ADIPOQ-201 cds:protein_coding ATGGGACAATGTGTCTCTGGTTGTCTGACTAGATCAAGGAAAGACTATGTGTGTGTGTGTGCTTGTGCGTACATGTGTGTGCAAGTATGTGTATGTATGTATATGTATGTGTGTGTTTGGGTTGGGTGTGCTGTTTGGGGTCTGCTCTCATGGCTGACAGTGCAGATTTGGATTCCAGGACTCAGGATGCTGTTGCTCCAAGCTGTTCTATTGCTACTAGTCCTGCCGAGTCCGGGTGAGGTTACCACGACTGAAGAGACTCTGCCCAAGGAGGGCTGCGCAGGTTGGATGGCAGGCATCCCAGGGCATCCTGGCCACAATGGGACCCCAGGCCGTGATGGCAGAGATGGCACCCCTGGCGAGAAGGGTGAGAAAGGAGATCCAGGTCTTGTTGGGCCTAAGGGTGATGCTGGTGAAACTGGAGTGCCTGGAGTTGAAGGTCCCAGAGGCTTTCCGGGAATCCCAGGCAGGAAAGGAGAACCTGGAGAAAGTTCCTATGTATACCGCTCAGCATTCAGTGTAGGATTGGAGACCCGAGTCACCGTCCCCAATGTTCCCATTCGTTTTACCAAGATCTTCTACAATCAGCAAAACCACTATGATGGCAGCACGGGCAAATTCCACTGCAACATTCCTGGGCTGTACTACTTCTCCTACCACATCACAGTCTACTTGAAGGATGTGAAGGTCAGCCTCTACAAGAAGGACAAGGCTGTGCTCTTCACCTATGACCAGTACCAGGACAAGAACTTGGACCAGGCCTCAGGCTCTGTTCTCCTCTATCTGGAGAAGGGCGACCAAGTCTGGCTCCAGGTGTATGGGGATGGAGATCATAATGGGCTCTATGCCGATAATGTCAATGACTCCACCTTCACAGGCTTCCTTCTCTACCACGACACCAACTGA |
ConsGenome pipeline: Creating a conda environment
See this tutorial for additional information: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
The ConsGenome workflow requires the following tools specified in an ‘environment.yml’ file:
Code Block |
---|
name: ConsGenome
channels:
- bioconda
- conda-forge
dependencies:
- python=3.7.8
- bwa=0.7.17
- spades=3.15.2
- samtools=1.7
- bedtools=2.27.1
- bcftools=1.9
- blast=2.5.0
- seqtk=1.3
- trim-galore=0.6.2
|
Creating a conda environment called ‘ConsGenome’
Code Block |
---|
conda env create -f environment.yml |
Activate the environment → this enables to use of the above tools. NOTE: prior to running netxflow need to activate the ConsGenome environment.
Code Block |
---|
conda activate ConsGenome |
Deactivate the environment
Code Block |
---|
conda deactivate |
Nextflow - ConsGenome pipeline
...
To run the pipeline prepare the follwowingfollowing:
index.csv - a file describing the sample ID, the path to read1 and read2 if applicable
nextflow.config - a file to specify parameter options such as the genome/transcriptome/amplicon reference to use to map reads and predict a consensus sequence
launch.pbs - PBS Pro script to submit the ConsGenome job to the HPC cluster
Example index.csv file:
Code Block |
---|
sampleid,read1,read2 AP01,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP01_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP01_R2.fq.gz AP02,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP02_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP02_R2.fq.gz AP03,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP03_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP03_R2.fq.gz AP04,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP04_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP04_R2.fq.gz AP06,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP06_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP06_R2.fq.gz AP07,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP07_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP07_R2.fq.gz AP08,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP08_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP08_R2.fq.gz AP10,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP10_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP10_R2.fq.gz AP11,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP11_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP11_R2.fq.gz AP12,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP12_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP12_R2.fq.gz AP13,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP13_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP13_R2.fq.gz AP14,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP14_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP14_R2.fq.gz AP15,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP15_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP15_R2.fq.gz AP17,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP17_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP17_R2.fq.gz AP18,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP18_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP18_R2.fq.gz AP19,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP19_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP19_R2.fq.gz AP20,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP20_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP20_R2.fq.gz MC01,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC01_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC01_R2.fq.gz MC02,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC02_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC02_R2.fq.gz MC03,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC03_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC03_R2.fq.gz MC04,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC04_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC04_R2.fq.gz MC05,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC05_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC05_R2.fq.gz MC06,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC06_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC06_R2.fq.gz |
Example nextflow.config file:
Code Block |
---|
params { outdir = "results" indexfile = "index.csv" genome = "/work/APP_dp18app/nextflow/data/ref/Equus_caballus_EquCan3.0_GCG_MC2R_MC4R_POMC_ADIPOQ_genes.cds.fasta" paired = true } process { withLabel: mapping { memory = 32.GB } } |
...