Equus caballus Project
Aim
Assess sequence polymorphism in five horse genes of interest by comparing amplicon seq data of healthy and unhealthy horses.
Genes of interest:
Gene Name | Ensembl | NCBI Gene ID |
Proglucagon (GCG) | ENSECAG00000005660 | 100051551 |
Melanocortin 2 receptor (MC2R) | ENSECAG00000003841 | 100057018 |
Melanocortin 4 receptor (MC4R) | ENSECAG00000001712 | 100050469 |
Proopiomelanocortin (POMC) | ENSECAG00000016388 | 100071524 |
Adiponectin (ADIPOQ) | ENSECAG00000002962 | 100059500 |
Proglucagon (GCG) gene - ENSECAG00000005660 has two gene splicing variants:
Name | Transcript ID | bp | Protein | Biotype | UniProt Match | Flags |
---|---|---|---|---|---|---|
GCG-201 | 708 | Protein coding | - | |||
GCG-202 | 705 | Protein coding | - |
>EquCab3.0_Glucagon_GCG000001|GCG-201 cds:protein_coding
ATGAAAAGCATCTACTTTGTGGCCGGACTGTTTGTAATGCTGGTACAAAGCAGCTGGCAACGTTCGCTGCAGGACACAGAGGAGACATCCAGATCGTTCCCAGCTCCCCAGACAGACCCACTCAGTGATCCGGATCAGATGAATGAAGATAAGCGCCATTCTCAGGGCACGTTCACCAGTGACTACAGCAAGTATCTGGACTCCAGGCGTGCCCAGGATTTCGTGCAGTGGTTGATGAACACCAAGAGGAACAAGAATAACATTGCCAAACGTCATGATGAATTTGAGAGACATGCTGAAGGGACCTTTACCAGTGATGTAAGTTCTTATTTGGAAGGCCAAGCTGCCAAGGAATTCATTGCTTGGCTGGTGAAAGGCCGAGGAAGGCGAGATTTCCCAGAAGAAGTCACCATCGTTGAAGAACTCCGCCGCAGACATGCCGATGGTTCCTTCTCTGATGAGATGAACACAGTTCTTGATAATCTTGCCACCCGGGACTTTATAAACTGGTTGCTTCAGACCAAAATTACTGACAGAGTCAATAGAATACCTGAAAGAGACTGCAACAGAGTAGACTCCCATAATGAAGAGGGCATCTTCAGAGGTGAAGGGGAGCCCAAGTGTAACAGCTTTTCAAGTTCCCTCTCTTCAGTGAGGATCATAAGAGGCACTCCATTCAAGGGGAAGTGTGCAATCTGA
>EquCab3.0_Glucagon_GCG_202000001|GCG-202 cds:protein_coding
ATGAAAAGCATCTACTTTGTGGCCGGACTGTTTGTAATGCTGGTACAAAGCAGCTGGCAACGTTCGCTGCAGGACACAGAGGAGACATCCAGATCGTTCCCAGCTCCCCAGACAGACCCACTCAGTGATCCGGATCAGATGAATGAAGATAAGCGCCATTCTCAGGGCACGTTCACCAGTGACTACAGCAAGTATCTGGACTCCAGGCGTGCCCAGGATTTCGTGCAGTGGTTGATGAACACCAAGAGGAACAAGAATAACATTGCCAAACGTCATGATGAATTTGAGAGACATGCTGAAGGGACCTTTACCAGTGATGTAAGTTCTTATTTGGAAGGCCAAGCTGCCAAGGAATTCATTGCTTGGCTGGTGAAAGGCCGAGGAAGGCGAGATTTCCCAGAAGAAGTCACCATCGTTGAAGAACTCCGCCGCAGACATGCCGATGGTTCCTTCTCTGATGAGATGAACACAGTTCTTGATAATCTTGCCACCCGGGACTTTATAAACTGGTTGCTTCAGACCAAAATTACTGACAGGAAGTAA
Melanocortin 2 receptor (MC2R)
- ENSECAG00000003841 single splice variant:
Name | Transcript ID | bp | Protein | Biotype | UniProt Match | Flags |
---|---|---|---|---|---|---|
MC2R-201 | 891 | Protein coding | - |
>EquCab3.0_Glucagon_MC2R_201000001|MC2R-201 cds:protein_coding
ATGAAGCACATTGTCAATCTATATGAAAACATCAATGATACAGCAAGAAATAATTCAGACTGTCCTCTTGTGGTTTTGCCAGAAGAGATATTCTTCACAATATCCATCATTGGGGTTTTGGAGAATCTGATGATCCTTCTGGCTGTGATCAAGAATAAGAATCTCCAGTCACCAATGTACTTTTTCATTTGCAGCTTGGCCATTTCTGATATGTTGGGCAGCCTATATAAGATCCTGGAAAATATCCTGATCATGTTCAGAAACACAGGTTATCTCAAGCCTCGTAGCAATTTTGAAACCACAGCCGATGACATCATTGACTCTCTGTTCATCCTCTCCCTACTTGGGTCCATTTTCAGCCTGTCTGTGATCGCCGTTGACCGCTACATCACAATCTTCCATGCTCTGCAGTACCACAGCATTGTGACCATGCACCGTGCCATTGTTGTCCTGATAGTCATCTGGACGTGCTGCCTGGGCAGCGGCATCGCCATGGTGATCTTCTCCCATCACATCCCCACAGTGATCACCTTCACCTCGCTGTTCCCTCTCATGTTGGTCTTTATCCTATGCCTCTATGTGCACATGTTCTTGCTGGCCCGTTCCCATGCCAGGAAGATCTCAACCCTCTCTAGAGGCAACATGAAAGGGGCCATCACGCTGACCATCCTGCTCGGGGTGTTCATCTTCTGCTGGGCCCCTTTTGTCCTTCATGTCCTCTTAATGACATTCTGCCCAAATAACCCTTACTGTGTCTGCTACATGTCCCTCTTCCAGGTGAATGGCATGTTGATCATGTGCAATGCAGTCATCGACCCTTTTATATATGCCTTCCGGAACCCAGAGCTCAGGGAGGCATTCAAAAAGATGATCTTCTGCAACAGTTACCAG
Melanocortin 4 receptor (MC4R)
- ENSECAG00000001712 has a single splice variant:
Name | Transcript ID | bp | Protein | Biotype | UniProt Match | Flags |
---|---|---|---|---|---|---|
MC4R-201 | 1002 | Protein coding | - |
>EquCab3.0_Glucagon_MC4R_201000001|MC4R-201 cds:protein_coding
AGGATGGACTCTACTCACCGCCATGGAATGCACACTTCTCTCCACTTCTGGAACCGCAGCACCTACGGACTGCACAGCAATGCCAGTGAGTCCCTTGGAAAAGGCTACTCTGATGGAGGGTGCTACGAGCAACTTTTTGTCTCTCCTGAGGTGTTTGTGACTCTGGGTGTCATCAGCTTGTTGGAGAATATTCTGGTGATTGTGGCAATAGCCAAGAACAAGAATCTGCATTCACCCATGTACTTTTTCATCTGTAGCCTGGCTGTGGCTGACATGTTGGTGAGCGTTTCAAATGGATCAGAAACCATTGTCATCACCCTGTTAAACAGTACAGATACGGACGCACAGAGTTTCACGGTGAATATTGATAATGTCATTGACTCAGTGATCTGTAGCTCCTTACTTGCATCAATTTGCAGCCTGCTTTCAATTGCAGTGGACAGGTATTTTACTATCTTTTATGCTCTCCAGTATCATAACATTATGACGGTTAAGCGGGTCGGGATCATCATAAGTTGCATCTGGGCAGCTTGCACGGTTTCGGGCATTTTGTTCATCATTTACTCAGATAGTAGTGCTGTCATCATCTGCCTCATCACCATGTTCTTCACCATGCTGGCTCTCATGGCTTCTCTCTATGTCCACATGTTCCTCATGGCCAGACTTCACATTAAGAGGATCGCTGTCCTCCCAGGCACTGGCACCATCCGCCAAGGTGCCAACATGAAGGGGGCGATCACCTTGACCATATTGATTGGAGTCTTTGTGGTCTGCTGGGCCCCATTCTTCCTCCACTTAATATTCTACATCTCTTGTCCCCAGAATCCATACTGTGTGTGCTTCATGTCTCACTTTAACTTGTATCTCATACTGATCATGTGTAATTCCATCATCGATCCTCTGATCTATGCACTCCGGAGCCAAGAACTGAGGAAAACCTTCAAAGAGATCATCTGTTGCTACCCTCTGGGAGGCCTTTGTGATTTGTCTAGCAGATACTAA
Proopiomelanocortin (POMC)
- ENSECAG00000016388 has three splice variants:
Name | Transcript ID | bp | Protein | Biotype | UniProt Match | Flags |
POMC-201 | 1323 |
| - | |||
Protein coding | ||||||
POMC-202 | 1320 |
| - | |||
Protein coding | ||||||
POMC-203 | 1131 |
| - | |||
Protein coding |
Adiponectin (ADIPOQ)
- ENSECAG00000002962
Name | Transcript ID | bp | Protein | Biotype | UniProt Match | Flags |
---|---|---|---|---|---|---|
ADIPOQ-201 | 1264 | Protein coding | - |
ConsGenome pipeline: Creating a conda environment
See this tutorial for additional information: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
The ConsGenome workflow requires the following tools specified in an ‘environment.yml’ file:
Creating a conda environment called ‘ConsGenome’
Activate the environment → this enables to use of the above tools. NOTE: prior to running netxflow need to activate the ConsGenome environment.
Deactivate the environment
Nextflow - ConsGenome pipeline
This is a QUT eResearch developed pipeline for 1) the de novo assembly of reference sequences (i.e., amplicon seq regions, metagenomics sequences, etc); 2) map and deduce a reference consensus genomics/transcript sequence based on HTS data collected for a given sample, and 3) identify nucleotide sequence polymorphism.
To run the pipeline prepare the following:
index.csv - a file describing the sample ID, the path to read1 and read2 if applicable
nextflow.config - a file to specify parameter options such as the genome/transcriptome/amplicon reference to use to map reads and predict a consensus sequence
launch.pbs - PBS Pro script to submit the ConsGenome job to the HPC cluster
Example index.csv file:
Example nextflow.config file:
Launch.pbs script: