Aim
Assess sequence polymorphism in five horse genes of interest by comparing amplicon seq data of healthy and unhealthy horses.
Genes of interest:
Gene Name | Ensembl | NCBI Gene ID |
Proglucagon (GCG) | ENSECAG00000005660 | 100051551 |
Melanocortin 2 receptor (MC2R) | ENSECAG00000003841 | 100057018 |
Melanocortin 4 receptor (MC4R) | ENSECAG00000001712 | 100050469 |
Proopiomelanocortin (POMC) | ENSECAG00000016388 | 100071524 |
Adiponectin (ADIPOQ) | ENSECAG00000002962 | 100059500 |
Proglucagon (GCG) gene - ENSECAG00000005660 has two gene splicing variants:
Name | Transcript ID | bp | Protein | Biotype | UniProt Match | Flags |
---|---|---|---|---|---|---|
GCG-201 | 708 | Protein coding | - | |||
GCG-202 | 705 | Protein coding | - |
>EquCab3.0_Glucagon_GCG000001|GCG-201 cds:protein_coding ATGAAAAGCATCTACTTTGTGGCCGGACTGTTTGTAATGCTGGTACAAAGCAGCTGGCAACGTTCGCTGCAGGACACAGAGGAGACATCCAGATCGTTCCCAGCTCCCCAGACAGACCCACTCAGTGATCCGGATCAGATGAATGAAGATAAGCGCCATTCTCAGGGCACGTTCACCAGTGACTACAGCAAGTATCTGGACTCCAGGCGTGCCCAGGATTTCGTGCAGTGGTTGATGAACACCAAGAGGAACAAGAATAACATTGCCAAACGTCATGATGAATTTGAGAGACATGCTGAAGGGACCTTTACCAGTGATGTAAGTTCTTATTTGGAAGGCCAAGCTGCCAAGGAATTCATTGCTTGGCTGGTGAAAGGCCGAGGAAGGCGAGATTTCCCAGAAGAAGTCACCATCGTTGAAGAACTCCGCCGCAGACATGCCGATGGTTCCTTCTCTGATGAGATGAACACAGTTCTTGATAATCTTGCCACCCGGGACTTTATAAACTGGTTGCTTCAGACCAAAATTACTGACAGAGTCAATAGAATACCTGAAAGAGACTGCAACAGAGTAGACTCCCATAATGAAGAGGGCATCTTCAGAGGTGAAGGGGAGCCCAAGTGTAACAGCTTTTCAAGTTCCCTCTCTTCAGTGAGGATCATAAGAGGCACTCCATTCAAGGGGAAGTGTGCAATCTGA >EquCab3.0_Glucagon_GCG_202000001|GCG-202 cds:protein_coding ATGAAAAGCATCTACTTTGTGGCCGGACTGTTTGTAATGCTGGTACAAAGCAGCTGGCAACGTTCGCTGCAGGACACAGAGGAGACATCCAGATCGTTCCCAGCTCCCCAGACAGACCCACTCAGTGATCCGGATCAGATGAATGAAGATAAGCGCCATTCTCAGGGCACGTTCACCAGTGACTACAGCAAGTATCTGGACTCCAGGCGTGCCCAGGATTTCGTGCAGTGGTTGATGAACACCAAGAGGAACAAGAATAACATTGCCAAACGTCATGATGAATTTGAGAGACATGCTGAAGGGACCTTTACCAGTGATGTAAGTTCTTATTTGGAAGGCCAAGCTGCCAAGGAATTCATTGCTTGGCTGGTGAAAGGCCGAGGAAGGCGAGATTTCCCAGAAGAAGTCACCATCGTTGAAGAACTCCGCCGCAGACATGCCGATGGTTCCTTCTCTGATGAGATGAACACAGTTCTTGATAATCTTGCCACCCGGGACTTTATAAACTGGTTGCTTCAGACCAAAATTACTGACAGGAAGTAA
Melanocortin 2 receptor (MC2R)
- ENSECAG00000003841 single splice variant:
Name | Transcript ID | bp | Protein | Biotype | UniProt Match | Flags |
---|---|---|---|---|---|---|
MC2R-201 | 891 | Protein coding | - |
>EquCab3.0_Glucagon_MC2R_201000001|MC2R-201 cds:protein_coding ATGAAGCACATTGTCAATCTATATGAAAACATCAATGATACAGCAAGAAATAATTCAGACTGTCCTCTTGTGGTTTTGCCAGAAGAGATATTCTTCACAATATCCATCATTGGGGTTTTGGAGAATCTGATGATCCTTCTGGCTGTGATCAAGAATAAGAATCTCCAGTCACCAATGTACTTTTTCATTTGCAGCTTGGCCATTTCTGATATGTTGGGCAGCCTATATAAGATCCTGGAAAATATCCTGATCATGTTCAGAAACACAGGTTATCTCAAGCCTCGTAGCAATTTTGAAACCACAGCCGATGACATCATTGACTCTCTGTTCATCCTCTCCCTACTTGGGTCCATTTTCAGCCTGTCTGTGATCGCCGTTGACCGCTACATCACAATCTTCCATGCTCTGCAGTACCACAGCATTGTGACCATGCACCGTGCCATTGTTGTCCTGATAGTCATCTGGACGTGCTGCCTGGGCAGCGGCATCGCCATGGTGATCTTCTCCCATCACATCCCCACAGTGATCACCTTCACCTCGCTGTTCCCTCTCATGTTGGTCTTTATCCTATGCCTCTATGTGCACATGTTCTTGCTGGCCCGTTCCCATGCCAGGAAGATCTCAACCCTCTCTAGAGGCAACATGAAAGGGGCCATCACGCTGACCATCCTGCTCGGGGTGTTCATCTTCTGCTGGGCCCCTTTTGTCCTTCATGTCCTCTTAATGACATTCTGCCCAAATAACCCTTACTGTGTCTGCTACATGTCCCTCTTCCAGGTGAATGGCATGTTGATCATGTGCAATGCAGTCATCGACCCTTTTATATATGCCTTCCGGAACCCAGAGCTCAGGGAGGCATTCAAAAAGATGATCTTCTGCAACAGTTACCAG
Melanocortin 4 receptor (MC4R)
- ENSECAG00000001712 has a single splice variant:
Name | Transcript ID | bp | Protein | Biotype | UniProt Match | Flags |
---|---|---|---|---|---|---|
MC4R-201 | 1002 | Protein coding | - |
>EquCab3.0_Glucagon_MC4R_201000001|MC4R-201 cds:protein_coding AGGATGGACTCTACTCACCGCCATGGAATGCACACTTCTCTCCACTTCTGGAACCGCAGCACCTACGGACTGCACAGCAATGCCAGTGAGTCCCTTGGAAAAGGCTACTCTGATGGAGGGTGCTACGAGCAACTTTTTGTCTCTCCTGAGGTGTTTGTGACTCTGGGTGTCATCAGCTTGTTGGAGAATATTCTGGTGATTGTGGCAATAGCCAAGAACAAGAATCTGCATTCACCCATGTACTTTTTCATCTGTAGCCTGGCTGTGGCTGACATGTTGGTGAGCGTTTCAAATGGATCAGAAACCATTGTCATCACCCTGTTAAACAGTACAGATACGGACGCACAGAGTTTCACGGTGAATATTGATAATGTCATTGACTCAGTGATCTGTAGCTCCTTACTTGCATCAATTTGCAGCCTGCTTTCAATTGCAGTGGACAGGTATTTTACTATCTTTTATGCTCTCCAGTATCATAACATTATGACGGTTAAGCGGGTCGGGATCATCATAAGTTGCATCTGGGCAGCTTGCACGGTTTCGGGCATTTTGTTCATCATTTACTCAGATAGTAGTGCTGTCATCATCTGCCTCATCACCATGTTCTTCACCATGCTGGCTCTCATGGCTTCTCTCTATGTCCACATGTTCCTCATGGCCAGACTTCACATTAAGAGGATCGCTGTCCTCCCAGGCACTGGCACCATCCGCCAAGGTGCCAACATGAAGGGGGCGATCACCTTGACCATATTGATTGGAGTCTTTGTGGTCTGCTGGGCCCCATTCTTCCTCCACTTAATATTCTACATCTCTTGTCCCCAGAATCCATACTGTGTGTGCTTCATGTCTCACTTTAACTTGTATCTCATACTGATCATGTGTAATTCCATCATCGATCCTCTGATCTATGCACTCCGGAGCCAAGAACTGAGGAAAACCTTCAAAGAGATCATCTGTTGCTACCCTCTGGGAGGCCTTTGTGATTTGTCTAGCAGATACTAA
Proopiomelanocortin (POMC)
- ENSECAG00000016388 has three splice variants:
Name | Transcript ID | bp | Protein | Biotype | UniProt Match | Flags |
POMC-201 | 1323 | - | ||||
Protein coding | ||||||
POMC-202 | 1320 | - | ||||
Protein coding | ||||||
POMC-203 | 1131 | - | ||||
Protein coding |
>EquCab3.0_Glucagon_POMC_201000001|POMC-201 cds:protein_coding ATGGCCAGAAGGAGCCAAGAGCCTCAGCCTGCCTGCAAGATGCCGAGATCGTGCGGCAGCCGCTCGGGGGCCCTGCTGCTGGCCCTGCTGCTTCAGGCTTCTGTGGAAGTGCGTGGCTGGTGCCTGGAGAGCAGCCAGTGTCAGGACCTCACCACGGAAAGTAACCTGCTGGCCTGCATCCGGGCCTGCAAGATCGACCTCTCCGCTGAGACGCCCGTGTTCCCCGGCAACGGCGAGGAGCAGCCGCTGACCGAGAACCCCCGGAAGTACGTCATGGGCCACTTCCGCTGGGACCGCTTCGGCCGCCGGAACAGCAGCAGCGGCGGCGGCGCGAGCCAGAAGCGCGAGGAGGAGGAGGTGGTGGTGCTGGGCGGCCCCGGGCCCCGCGGCGACGGCGGCGATGGCGGCGAGGCGGGCCCGCGCGAGGGCAAGCGCTCCTACTCCATGGAGCACTTCCGCTGGGGCAAGCCGGTGGGCAAGAAGCGGCGCCCGGTGAAGGTGTACCCCAACGGCGCCGAGGACGAGTCGGCCGAGGCCTTCCCCCTGGAGTTCAAGAGGGAGCTGGCCGGGGAGCGGCCCGAGGGCGCGGCGGCCCGCGCCGAGCTGGGGTACAGCCTGGTGGCGGAGGCCGAGGCGGCGGAGAAGAAGGACGAGGGGCCCTATAAAATGGAGCACTTCCGCTGGGGCAGCCCGCGCAAGGACAAGCGCTACGGCGGCTTCATGAGCTCCGAGAAGAGCCAGACGCCCCTGGTGACGCTGTTCAAAAACGCCATCATCAAGAACGCCCACAAGAAGGGCCAGTGA >EquCab3.0_Glucagon_POMC_202000001|POMC-202 cds:protein_coding ATGCCGAGATCGTGCGGCAGCCGCTCGGGGGCCCTGCTGCTGGCCCTGCTGCTTCAGGCTTCTGTGGAAGTGCGTGGCTGGTGCCTGGAGAGCAGCCAGTGTCAGGACCTCACCACGGAAAGTAACCTGCTGGCCTGCATCCGGGCCTGCAAGATCGACCTCTCCGCTGAGACGCCCGTGTTCCCCGGCAACGGCGAGGAGCAGCCGCTGACCGAGAACCCCCGGAAGTACGTCATGGGCCACTTCCGCTGGGACCGCTTCGGCCGCCGGAACAGCAGCAGCGGCGGCGGCGCGAGCCAGAAGCGCGAGGGAGGAGGTGGTGGTCTGGGCGGCCCCGGGCCCCGCGGCGACGGCGGCGATGGCGGCGAGGCGGGCCCGCGCGAGGGCAAGCGCTCCTACTCCATGGAGCACTTCCGCTGGGGCAAGCCGGTGGGCAAGAAGCGGCGCCCGGTGAAGGTGTACCCCAACGGCGCCGAGGACGAGTCGGCCGAGGCCTTCCCCCTGGAGTTCAAGAGGGAGCTGGCCGGGGAGCGGCCCGAGGGCGCGGCGGCCCGCGCCGAGCTGGGGTACAGCCTGGTGGCGGAGGCCGAGGCGGCGGAGAAGAAGGACGAGGGGCCCTATAAAATGGAGCACTTCCGCTGGGGCAGCCCGCGCAAGGACAAGCGCTACGGCGGCTTCATGAGCTCCGAGAAGAGCCAGACGCCCCTGGTGACGCTGTTCAAAAACGCCATCATCAAGAACGCCCACAAGAAGGGCCAGTGA >EquCab3.0_Glucagon_POMC_203000001|POMC-203 cds:protein_coding ATGGGGGAGTTGCTTGTTACGTTGCCCTGCCGGGAAGTTCTCCTCCAGAGCCCACACGGGCGAGCCCGGCAGATATATAAGGACCGGGAGAGCGACCAGGACCAGGCGGCGGCGAAGGAGGAGAAAGAGAGGAAGAAAAGTGACCAAGAGAGGCCCCCAGCATCCTCGCCCCGGCGCAGCGGGAGTCTCCCAGAGAGCAGCATCCCCGCGGCAGAGCCTCAGCCTGCCTGCAAGATGCCGAGATCGTGCGGCAGCCGCTCGGGGGCCCTGCTGCTGGCCCTGCTGCTTCAGGCTTCTGTGGAAGTGCGTGGCTGGTGCCTGGAGAGCAGCCAGTGTCAGGACCTCACCACGGAAAGTAACCTGCTGGCCTGCATCCGGGCCTGCAAGATCGACCTCTCCGCTGAGACGCCCGTGTTCCCCGGCAACGGCGAGGAGCAGCCGCTGACCGAGAACCCCCGGAAGTACGTCATGGGCCACTTCCGCTGGGACCGCTTCGGCCGCCGGAACAGCAGCAGCGGCGGCGGCGCGAGCCAGAAGCGCGAGGAGGAGGAGGTGGTGGTGCTGGGCGGCCCCGGGCCCCGCGGCGACGGCGGCGATGGCGGCGAGGCGGGCCCGCGCGAGGGCAAGCGCTCCTACTCCATGGAGCACTTCCGCTGGGGCAAGCCGGTGGGCAAGAAGCGGCGCCCGGTGAAGGTGTACCCCAACGGCGCCGAGGACGAGTCGGCCGAGGCCTTCCCCCTGGAGTTCAAGAGGGAGCTGGCCGGGGAGCGGCCCGAGGGCGCGGCGGCCCGCGCCGAGCTGGGGTACAGCCTGGTGGCGGAGGCCGAGGCGGCGGAGAAGAAGGACGAGGGGCCCTATAAAATGGAGCACTTCCGCTGGGGCAGCCCGCGCAAGGACAAGCGCTACGGCGGCTTCATGAGCTCCGAGAAGAGCCAGACGCCCCTGGTGACGCTGTTCAAAAACGCCATCATCAAGAACGCCCACAAGAAGGGCCAGTGA
Adiponectin (ADIPOQ)
- ENSECAG00000002962
Name | Transcript ID | bp | Protein | Biotype | UniProt Match | Flags |
---|---|---|---|---|---|---|
ADIPOQ-201 | 1264 | Protein coding | - |
>EquCab3.0_Glucagon_ADIPOQ_201000001|ADIPOQ-201 cds:protein_coding ATGGGACAATGTGTCTCTGGTTGTCTGACTAGATCAAGGAAAGACTATGTGTGTGTGTGTGCTTGTGCGTACATGTGTGTGCAAGTATGTGTATGTATGTATATGTATGTGTGTGTTTGGGTTGGGTGTGCTGTTTGGGGTCTGCTCTCATGGCTGACAGTGCAGATTTGGATTCCAGGACTCAGGATGCTGTTGCTCCAAGCTGTTCTATTGCTACTAGTCCTGCCGAGTCCGGGTGAGGTTACCACGACTGAAGAGACTCTGCCCAAGGAGGGCTGCGCAGGTTGGATGGCAGGCATCCCAGGGCATCCTGGCCACAATGGGACCCCAGGCCGTGATGGCAGAGATGGCACCCCTGGCGAGAAGGGTGAGAAAGGAGATCCAGGTCTTGTTGGGCCTAAGGGTGATGCTGGTGAAACTGGAGTGCCTGGAGTTGAAGGTCCCAGAGGCTTTCCGGGAATCCCAGGCAGGAAAGGAGAACCTGGAGAAAGTTCCTATGTATACCGCTCAGCATTCAGTGTAGGATTGGAGACCCGAGTCACCGTCCCCAATGTTCCCATTCGTTTTACCAAGATCTTCTACAATCAGCAAAACCACTATGATGGCAGCACGGGCAAATTCCACTGCAACATTCCTGGGCTGTACTACTTCTCCTACCACATCACAGTCTACTTGAAGGATGTGAAGGTCAGCCTCTACAAGAAGGACAAGGCTGTGCTCTTCACCTATGACCAGTACCAGGACAAGAACTTGGACCAGGCCTCAGGCTCTGTTCTCCTCTATCTGGAGAAGGGCGACCAAGTCTGGCTCCAGGTGTATGGGGATGGAGATCATAATGGGCTCTATGCCGATAATGTCAATGACTCCACCTTCACAGGCTTCCTTCTCTACCACGACACCAACTGA
Nextflow - ConsGenome pipeline
This is a QUT eResearch developed pipeline for 1) the de novo assembly of reference sequences (i.e., amplicon seq regions, metagenomics sequences, etc); 2) map and deduce a reference consensus genomics/transcript sequence based on HTS data collected for a given sample, and 3) identify nucleotide sequence polymorphism.
To run the pipeline prepare the follwowing:
index.csv - a file describing the sample ID, the path to read1 and read2 if applicable
nextflow.config - a file to specify parameter options such as the genome/transcriptome/amplicon reference to use to map reads and predict a consensus sequence
launch.pbs - PBS Pro script to submit the ConsGenome job to the HPC cluster
Example index.csv file:
sampleid,read1,read2 AP01,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP01_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP01_R2.fq.gz AP02,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP02_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP02_R2.fq.gz AP03,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP03_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP03_R2.fq.gz AP04,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP04_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP04_R2.fq.gz AP06,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP06_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP06_R2.fq.gz AP07,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP07_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP07_R2.fq.gz AP08,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP08_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP08_R2.fq.gz AP10,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP10_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP10_R2.fq.gz AP11,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP11_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP11_R2.fq.gz AP12,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP12_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP12_R2.fq.gz AP13,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP13_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP13_R2.fq.gz AP14,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP14_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP14_R2.fq.gz AP15,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP15_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP15_R2.fq.gz AP17,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP17_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP17_R2.fq.gz AP18,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP18_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP18_R2.fq.gz AP19,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP19_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP19_R2.fq.gz AP20,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP20_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP20_R2.fq.gz MC01,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC01_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC01_R2.fq.gz MC02,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC02_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC02_R2.fq.gz MC03,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC03_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC03_R2.fq.gz MC04,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC04_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC04_R2.fq.gz MC05,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC05_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC05_R2.fq.gz MC06,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC06_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC06_R2.fq.gz
Example nextflow.config file:
params { outdir = "results" indexfile = "index.csv" genome = "/work/APP_dp18app/nextflow/data/ref/Equus_caballus_EquCan3.0_GCG_MC2R_MC4R_POMC_ADIPOQ_genes.cds.fasta" paired = true } process { withLabel: mapping { memory = 32.GB } }
Launch.pbs script:
#!/bin/bash -l #PBS -N nfconsgenome #PBS -l walltime=24:00:00 #PBS -l select=1:ncpus=1:mem=5gb cd $PBS_O_WORKDIR NXF_OPTS='-Xms1g -Xmx4g' module load java #run netflow pipeline nextflow run /work/eresearch_bio/nextflow/nf-eresearch/nfconsgenome/main.nf