Equus caballus Project

Aim

Assess sequence polymorphism in five horse genes of interest by comparing amplicon seq data of healthy and unhealthy horses.

Genes of interest:

Gene Name

Ensembl

NCBI Gene ID

Proglucagon (GCG)

ENSECAG00000005660

100051551

Melanocortin 2 receptor (MC2R)

ENSECAG00000003841

100057018

Melanocortin 4 receptor (MC4R)

ENSECAG00000001712

100050469

Proopiomelanocortin (POMC)

ENSECAG00000016388 

100071524

Adiponectin (ADIPOQ)

ENSECAG00000002962

100059500

 

  1. Proglucagon (GCG) gene - ENSECAG00000005660 has two gene splicing variants:

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

GCG-201

ENSECAT00000066569.1

708

232aa

 Protein coding

A0A5F5PSR3

-

GCG-202

ENSECAT00000005849.3

705

180aa

 Protein coding

F7ABP1

-

>EquCab3.0_Glucagon_GCG000001|GCG-201 cds:protein_coding ATGAAAAGCATCTACTTTGTGGCCGGACTGTTTGTAATGCTGGTACAAAGCAGCTGGCAACGTTCGCTGCAGGACACAGAGGAGACATCCAGATCGTTCCCAGCTCCCCAGACAGACCCACTCAGTGATCCGGATCAGATGAATGAAGATAAGCGCCATTCTCAGGGCACGTTCACCAGTGACTACAGCAAGTATCTGGACTCCAGGCGTGCCCAGGATTTCGTGCAGTGGTTGATGAACACCAAGAGGAACAAGAATAACATTGCCAAACGTCATGATGAATTTGAGAGACATGCTGAAGGGACCTTTACCAGTGATGTAAGTTCTTATTTGGAAGGCCAAGCTGCCAAGGAATTCATTGCTTGGCTGGTGAAAGGCCGAGGAAGGCGAGATTTCCCAGAAGAAGTCACCATCGTTGAAGAACTCCGCCGCAGACATGCCGATGGTTCCTTCTCTGATGAGATGAACACAGTTCTTGATAATCTTGCCACCCGGGACTTTATAAACTGGTTGCTTCAGACCAAAATTACTGACAGAGTCAATAGAATACCTGAAAGAGACTGCAACAGAGTAGACTCCCATAATGAAGAGGGCATCTTCAGAGGTGAAGGGGAGCCCAAGTGTAACAGCTTTTCAAGTTCCCTCTCTTCAGTGAGGATCATAAGAGGCACTCCATTCAAGGGGAAGTGTGCAATCTGA >EquCab3.0_Glucagon_GCG_202000001|GCG-202 cds:protein_coding ATGAAAAGCATCTACTTTGTGGCCGGACTGTTTGTAATGCTGGTACAAAGCAGCTGGCAACGTTCGCTGCAGGACACAGAGGAGACATCCAGATCGTTCCCAGCTCCCCAGACAGACCCACTCAGTGATCCGGATCAGATGAATGAAGATAAGCGCCATTCTCAGGGCACGTTCACCAGTGACTACAGCAAGTATCTGGACTCCAGGCGTGCCCAGGATTTCGTGCAGTGGTTGATGAACACCAAGAGGAACAAGAATAACATTGCCAAACGTCATGATGAATTTGAGAGACATGCTGAAGGGACCTTTACCAGTGATGTAAGTTCTTATTTGGAAGGCCAAGCTGCCAAGGAATTCATTGCTTGGCTGGTGAAAGGCCGAGGAAGGCGAGATTTCCCAGAAGAAGTCACCATCGTTGAAGAACTCCGCCGCAGACATGCCGATGGTTCCTTCTCTGATGAGATGAACACAGTTCTTGATAATCTTGCCACCCGGGACTTTATAAACTGGTTGCTTCAGACCAAAATTACTGACAGGAAGTAA

 

  1. Melanocortin 2 receptor (MC2R)

- ENSECAG00000003841 single splice variant:

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

MC2R-201

ENSECAT00000003809.3

891

297aa

 Protein coding

F6Y3T7

-

>EquCab3.0_Glucagon_MC2R_201000001|MC2R-201 cds:protein_coding ATGAAGCACATTGTCAATCTATATGAAAACATCAATGATACAGCAAGAAATAATTCAGACTGTCCTCTTGTGGTTTTGCCAGAAGAGATATTCTTCACAATATCCATCATTGGGGTTTTGGAGAATCTGATGATCCTTCTGGCTGTGATCAAGAATAAGAATCTCCAGTCACCAATGTACTTTTTCATTTGCAGCTTGGCCATTTCTGATATGTTGGGCAGCCTATATAAGATCCTGGAAAATATCCTGATCATGTTCAGAAACACAGGTTATCTCAAGCCTCGTAGCAATTTTGAAACCACAGCCGATGACATCATTGACTCTCTGTTCATCCTCTCCCTACTTGGGTCCATTTTCAGCCTGTCTGTGATCGCCGTTGACCGCTACATCACAATCTTCCATGCTCTGCAGTACCACAGCATTGTGACCATGCACCGTGCCATTGTTGTCCTGATAGTCATCTGGACGTGCTGCCTGGGCAGCGGCATCGCCATGGTGATCTTCTCCCATCACATCCCCACAGTGATCACCTTCACCTCGCTGTTCCCTCTCATGTTGGTCTTTATCCTATGCCTCTATGTGCACATGTTCTTGCTGGCCCGTTCCCATGCCAGGAAGATCTCAACCCTCTCTAGAGGCAACATGAAAGGGGCCATCACGCTGACCATCCTGCTCGGGGTGTTCATCTTCTGCTGGGCCCCTTTTGTCCTTCATGTCCTCTTAATGACATTCTGCCCAAATAACCCTTACTGTGTCTGCTACATGTCCCTCTTCCAGGTGAATGGCATGTTGATCATGTGCAATGCAGTCATCGACCCTTTTATATATGCCTTCCGGAACCCAGAGCTCAGGGAGGCATTCAAAAAGATGATCTTCTGCAACAGTTACCAG

 

  1. Melanocortin 4 receptor (MC4R)

- ENSECAG00000001712 has a single splice variant:

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

MC4R-201

ENSECAT00000001527.3

1002

333aa

 Protein coding

F6TMM6

-

>EquCab3.0_Glucagon_MC4R_201000001|MC4R-201 cds:protein_coding AGGATGGACTCTACTCACCGCCATGGAATGCACACTTCTCTCCACTTCTGGAACCGCAGCACCTACGGACTGCACAGCAATGCCAGTGAGTCCCTTGGAAAAGGCTACTCTGATGGAGGGTGCTACGAGCAACTTTTTGTCTCTCCTGAGGTGTTTGTGACTCTGGGTGTCATCAGCTTGTTGGAGAATATTCTGGTGATTGTGGCAATAGCCAAGAACAAGAATCTGCATTCACCCATGTACTTTTTCATCTGTAGCCTGGCTGTGGCTGACATGTTGGTGAGCGTTTCAAATGGATCAGAAACCATTGTCATCACCCTGTTAAACAGTACAGATACGGACGCACAGAGTTTCACGGTGAATATTGATAATGTCATTGACTCAGTGATCTGTAGCTCCTTACTTGCATCAATTTGCAGCCTGCTTTCAATTGCAGTGGACAGGTATTTTACTATCTTTTATGCTCTCCAGTATCATAACATTATGACGGTTAAGCGGGTCGGGATCATCATAAGTTGCATCTGGGCAGCTTGCACGGTTTCGGGCATTTTGTTCATCATTTACTCAGATAGTAGTGCTGTCATCATCTGCCTCATCACCATGTTCTTCACCATGCTGGCTCTCATGGCTTCTCTCTATGTCCACATGTTCCTCATGGCCAGACTTCACATTAAGAGGATCGCTGTCCTCCCAGGCACTGGCACCATCCGCCAAGGTGCCAACATGAAGGGGGCGATCACCTTGACCATATTGATTGGAGTCTTTGTGGTCTGCTGGGCCCCATTCTTCCTCCACTTAATATTCTACATCTCTTGTCCCCAGAATCCATACTGTGTGTGCTTCATGTCTCACTTTAACTTGTATCTCATACTGATCATGTGTAATTCCATCATCGATCCTCTGATCTATGCACTCCGGAGCCAAGAACTGAGGAAAACCTTCAAAGAGATCATCTGTTGCTACCCTCTGGGAGGCCTTTGTGATTTGTCTAGCAGATACTAA
  1. Proopiomelanocortin (POMC)

- ENSECAG00000016388 has three splice variants:

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

POMC-201

ENSECAT00000064027.2

1323

267aa

 

A0A3Q2IBW0

-

Protein coding

POMC-202

ENSECAT00000069674.1

1320

253aa

 

A0A5F5PNG1

-

Protein coding

POMC-203

ENSECAT00000017172.3

1131

332aa

 

F6W8H2

-

Protein coding

  1. Adiponectin (ADIPOQ)

- ENSECAG00000002962

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

ADIPOQ-201

ENSECAT00000002827.3

1264

301aa

 Protein coding

F7DZE7

-

ConsGenome pipeline: Creating a conda environment

See this tutorial for additional information: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html

The ConsGenome workflow requires the following tools specified in an ‘environment.yml’ file:

Creating a conda environment called ‘ConsGenome’

Activate the environment → this enables to use of the above tools. NOTE: prior to running netxflow need to activate the ConsGenome environment.

Deactivate the environment

Nextflow - ConsGenome pipeline

This is a QUT eResearch developed pipeline for 1) the de novo assembly of reference sequences (i.e., amplicon seq regions, metagenomics sequences, etc); 2) map and deduce a reference consensus genomics/transcript sequence based on HTS data collected for a given sample, and 3) identify nucleotide sequence polymorphism.

To run the pipeline prepare the following:

  • index.csv - a file describing the sample ID, the path to read1 and read2 if applicable

  • nextflow.config - a file to specify parameter options such as the genome/transcriptome/amplicon reference to use to map reads and predict a consensus sequence

  • launch.pbs - PBS Pro script to submit the ConsGenome job to the HPC cluster

Example index.csv file:

Example nextflow.config file:

Launch.pbs script: