Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Aim

Assess sequence polymorphism in five horse genes of interest by comparing amplicon seq data of healthy and unhealthy horses.

Genes of interest:

Gene Name

Ensembl

NCBI Gene ID

Proglucagon (GCG)

ENSECAG00000005660

100051551

Melanocortin 2 receptor (MC2R)

ENSECAG00000003841

100057018

Melanocortin 4 receptor (MC4R)

ENSECAG00000001712

100050469

Proopiomelanocortin (POMC)

ENSECAG00000016388 

100071524

Adiponectin (ADIPOQ)

ENSECAG00000002962

100059500

  1. Proglucagon (GCG) gene - ENSECAG00000005660 has two gene splicing variants:

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

GCG-201

ENSECAT00000066569.1

708

232aa

 Protein coding

A0A5F5PSR3

-

GCG-202

ENSECAT00000005849.3

705

180aa

 Protein coding

F7ABP1

-

>EquCab3.0_Glucagon_GCG000001|GCG-201 cds:protein_coding
ATGAAAAGCATCTACTTTGTGGCCGGACTGTTTGTAATGCTGGTACAAAGCAGCTGGCAACGTTCGCTGCAGGACACAGAGGAGACATCCAGATCGTTCCCAGCTCCCCAGACAGACCCACTCAGTGATCCGGATCAGATGAATGAAGATAAGCGCCATTCTCAGGGCACGTTCACCAGTGACTACAGCAAGTATCTGGACTCCAGGCGTGCCCAGGATTTCGTGCAGTGGTTGATGAACACCAAGAGGAACAAGAATAACATTGCCAAACGTCATGATGAATTTGAGAGACATGCTGAAGGGACCTTTACCAGTGATGTAAGTTCTTATTTGGAAGGCCAAGCTGCCAAGGAATTCATTGCTTGGCTGGTGAAAGGCCGAGGAAGGCGAGATTTCCCAGAAGAAGTCACCATCGTTGAAGAACTCCGCCGCAGACATGCCGATGGTTCCTTCTCTGATGAGATGAACACAGTTCTTGATAATCTTGCCACCCGGGACTTTATAAACTGGTTGCTTCAGACCAAAATTACTGACAGAGTCAATAGAATACCTGAAAGAGACTGCAACAGAGTAGACTCCCATAATGAAGAGGGCATCTTCAGAGGTGAAGGGGAGCCCAAGTGTAACAGCTTTTCAAGTTCCCTCTCTTCAGTGAGGATCATAAGAGGCACTCCATTCAAGGGGAAGTGTGCAATCTGA
>EquCab3.0_Glucagon_GCG_202000001|GCG-202 cds:protein_coding
ATGAAAAGCATCTACTTTGTGGCCGGACTGTTTGTAATGCTGGTACAAAGCAGCTGGCAACGTTCGCTGCAGGACACAGAGGAGACATCCAGATCGTTCCCAGCTCCCCAGACAGACCCACTCAGTGATCCGGATCAGATGAATGAAGATAAGCGCCATTCTCAGGGCACGTTCACCAGTGACTACAGCAAGTATCTGGACTCCAGGCGTGCCCAGGATTTCGTGCAGTGGTTGATGAACACCAAGAGGAACAAGAATAACATTGCCAAACGTCATGATGAATTTGAGAGACATGCTGAAGGGACCTTTACCAGTGATGTAAGTTCTTATTTGGAAGGCCAAGCTGCCAAGGAATTCATTGCTTGGCTGGTGAAAGGCCGAGGAAGGCGAGATTTCCCAGAAGAAGTCACCATCGTTGAAGAACTCCGCCGCAGACATGCCGATGGTTCCTTCTCTGATGAGATGAACACAGTTCTTGATAATCTTGCCACCCGGGACTTTATAAACTGGTTGCTTCAGACCAAAATTACTGACAGGAAGTAA

  1. Melanocortin 2 receptor (MC2R)

- ENSECAG00000003841 single splice variant:

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

MC2R-201

ENSECAT00000003809.3

891

297aa

 Protein coding

F6Y3T7

-

>EquCab3.0_Glucagon_MC2R_201000001|MC2R-201 cds:protein_coding
ATGAAGCACATTGTCAATCTATATGAAAACATCAATGATACAGCAAGAAATAATTCAGACTGTCCTCTTGTGGTTTTGCCAGAAGAGATATTCTTCACAATATCCATCATTGGGGTTTTGGAGAATCTGATGATCCTTCTGGCTGTGATCAAGAATAAGAATCTCCAGTCACCAATGTACTTTTTCATTTGCAGCTTGGCCATTTCTGATATGTTGGGCAGCCTATATAAGATCCTGGAAAATATCCTGATCATGTTCAGAAACACAGGTTATCTCAAGCCTCGTAGCAATTTTGAAACCACAGCCGATGACATCATTGACTCTCTGTTCATCCTCTCCCTACTTGGGTCCATTTTCAGCCTGTCTGTGATCGCCGTTGACCGCTACATCACAATCTTCCATGCTCTGCAGTACCACAGCATTGTGACCATGCACCGTGCCATTGTTGTCCTGATAGTCATCTGGACGTGCTGCCTGGGCAGCGGCATCGCCATGGTGATCTTCTCCCATCACATCCCCACAGTGATCACCTTCACCTCGCTGTTCCCTCTCATGTTGGTCTTTATCCTATGCCTCTATGTGCACATGTTCTTGCTGGCCCGTTCCCATGCCAGGAAGATCTCAACCCTCTCTAGAGGCAACATGAAAGGGGCCATCACGCTGACCATCCTGCTCGGGGTGTTCATCTTCTGCTGGGCCCCTTTTGTCCTTCATGTCCTCTTAATGACATTCTGCCCAAATAACCCTTACTGTGTCTGCTACATGTCCCTCTTCCAGGTGAATGGCATGTTGATCATGTGCAATGCAGTCATCGACCCTTTTATATATGCCTTCCGGAACCCAGAGCTCAGGGAGGCATTCAAAAAGATGATCTTCTGCAACAGTTACCAG

  1. Melanocortin 4 receptor (MC4R)

- ENSECAG00000001712 has a single splice variant:

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

MC4R-201

ENSECAT00000001527.3

1002

333aa

 Protein coding

F6TMM6

-

>EquCab3.0_Glucagon_MC4R_201000001|MC4R-201 cds:protein_coding
AGGATGGACTCTACTCACCGCCATGGAATGCACACTTCTCTCCACTTCTGGAACCGCAGCACCTACGGACTGCACAGCAATGCCAGTGAGTCCCTTGGAAAAGGCTACTCTGATGGAGGGTGCTACGAGCAACTTTTTGTCTCTCCTGAGGTGTTTGTGACTCTGGGTGTCATCAGCTTGTTGGAGAATATTCTGGTGATTGTGGCAATAGCCAAGAACAAGAATCTGCATTCACCCATGTACTTTTTCATCTGTAGCCTGGCTGTGGCTGACATGTTGGTGAGCGTTTCAAATGGATCAGAAACCATTGTCATCACCCTGTTAAACAGTACAGATACGGACGCACAGAGTTTCACGGTGAATATTGATAATGTCATTGACTCAGTGATCTGTAGCTCCTTACTTGCATCAATTTGCAGCCTGCTTTCAATTGCAGTGGACAGGTATTTTACTATCTTTTATGCTCTCCAGTATCATAACATTATGACGGTTAAGCGGGTCGGGATCATCATAAGTTGCATCTGGGCAGCTTGCACGGTTTCGGGCATTTTGTTCATCATTTACTCAGATAGTAGTGCTGTCATCATCTGCCTCATCACCATGTTCTTCACCATGCTGGCTCTCATGGCTTCTCTCTATGTCCACATGTTCCTCATGGCCAGACTTCACATTAAGAGGATCGCTGTCCTCCCAGGCACTGGCACCATCCGCCAAGGTGCCAACATGAAGGGGGCGATCACCTTGACCATATTGATTGGAGTCTTTGTGGTCTGCTGGGCCCCATTCTTCCTCCACTTAATATTCTACATCTCTTGTCCCCAGAATCCATACTGTGTGTGCTTCATGTCTCACTTTAACTTGTATCTCATACTGATCATGTGTAATTCCATCATCGATCCTCTGATCTATGCACTCCGGAGCCAAGAACTGAGGAAAACCTTCAAAGAGATCATCTGTTGCTACCCTCTGGGAGGCCTTTGTGATTTGTCTAGCAGATACTAA
  1. Proopiomelanocortin (POMC)

- ENSECAG00000016388 has three splice variants:

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

POMC-201

ENSECAT00000064027.2

1323

267aa

A0A3Q2IBW0

-

Protein coding

POMC-202

ENSECAT00000069674.1

1320

253aa

A0A5F5PNG1

-

Protein coding

POMC-203

ENSECAT00000017172.3

1131

332aa

F6W8H2

-

Protein coding

>EquCab3.0_Glucagon_POMC_201000001|POMC-201 cds:protein_coding
ATGGCCAGAAGGAGCCAAGAGCCTCAGCCTGCCTGCAAGATGCCGAGATCGTGCGGCAGCCGCTCGGGGGCCCTGCTGCTGGCCCTGCTGCTTCAGGCTTCTGTGGAAGTGCGTGGCTGGTGCCTGGAGAGCAGCCAGTGTCAGGACCTCACCACGGAAAGTAACCTGCTGGCCTGCATCCGGGCCTGCAAGATCGACCTCTCCGCTGAGACGCCCGTGTTCCCCGGCAACGGCGAGGAGCAGCCGCTGACCGAGAACCCCCGGAAGTACGTCATGGGCCACTTCCGCTGGGACCGCTTCGGCCGCCGGAACAGCAGCAGCGGCGGCGGCGCGAGCCAGAAGCGCGAGGAGGAGGAGGTGGTGGTGCTGGGCGGCCCCGGGCCCCGCGGCGACGGCGGCGATGGCGGCGAGGCGGGCCCGCGCGAGGGCAAGCGCTCCTACTCCATGGAGCACTTCCGCTGGGGCAAGCCGGTGGGCAAGAAGCGGCGCCCGGTGAAGGTGTACCCCAACGGCGCCGAGGACGAGTCGGCCGAGGCCTTCCCCCTGGAGTTCAAGAGGGAGCTGGCCGGGGAGCGGCCCGAGGGCGCGGCGGCCCGCGCCGAGCTGGGGTACAGCCTGGTGGCGGAGGCCGAGGCGGCGGAGAAGAAGGACGAGGGGCCCTATAAAATGGAGCACTTCCGCTGGGGCAGCCCGCGCAAGGACAAGCGCTACGGCGGCTTCATGAGCTCCGAGAAGAGCCAGACGCCCCTGGTGACGCTGTTCAAAAACGCCATCATCAAGAACGCCCACAAGAAGGGCCAGTGA
>EquCab3.0_Glucagon_POMC_202000001|POMC-202 cds:protein_coding
ATGCCGAGATCGTGCGGCAGCCGCTCGGGGGCCCTGCTGCTGGCCCTGCTGCTTCAGGCTTCTGTGGAAGTGCGTGGCTGGTGCCTGGAGAGCAGCCAGTGTCAGGACCTCACCACGGAAAGTAACCTGCTGGCCTGCATCCGGGCCTGCAAGATCGACCTCTCCGCTGAGACGCCCGTGTTCCCCGGCAACGGCGAGGAGCAGCCGCTGACCGAGAACCCCCGGAAGTACGTCATGGGCCACTTCCGCTGGGACCGCTTCGGCCGCCGGAACAGCAGCAGCGGCGGCGGCGCGAGCCAGAAGCGCGAGGGAGGAGGTGGTGGTCTGGGCGGCCCCGGGCCCCGCGGCGACGGCGGCGATGGCGGCGAGGCGGGCCCGCGCGAGGGCAAGCGCTCCTACTCCATGGAGCACTTCCGCTGGGGCAAGCCGGTGGGCAAGAAGCGGCGCCCGGTGAAGGTGTACCCCAACGGCGCCGAGGACGAGTCGGCCGAGGCCTTCCCCCTGGAGTTCAAGAGGGAGCTGGCCGGGGAGCGGCCCGAGGGCGCGGCGGCCCGCGCCGAGCTGGGGTACAGCCTGGTGGCGGAGGCCGAGGCGGCGGAGAAGAAGGACGAGGGGCCCTATAAAATGGAGCACTTCCGCTGGGGCAGCCCGCGCAAGGACAAGCGCTACGGCGGCTTCATGAGCTCCGAGAAGAGCCAGACGCCCCTGGTGACGCTGTTCAAAAACGCCATCATCAAGAACGCCCACAAGAAGGGCCAGTGA
>EquCab3.0_Glucagon_POMC_203000001|POMC-203 cds:protein_coding
ATGGGGGAGTTGCTTGTTACGTTGCCCTGCCGGGAAGTTCTCCTCCAGAGCCCACACGGGCGAGCCCGGCAGATATATAAGGACCGGGAGAGCGACCAGGACCAGGCGGCGGCGAAGGAGGAGAAAGAGAGGAAGAAAAGTGACCAAGAGAGGCCCCCAGCATCCTCGCCCCGGCGCAGCGGGAGTCTCCCAGAGAGCAGCATCCCCGCGGCAGAGCCTCAGCCTGCCTGCAAGATGCCGAGATCGTGCGGCAGCCGCTCGGGGGCCCTGCTGCTGGCCCTGCTGCTTCAGGCTTCTGTGGAAGTGCGTGGCTGGTGCCTGGAGAGCAGCCAGTGTCAGGACCTCACCACGGAAAGTAACCTGCTGGCCTGCATCCGGGCCTGCAAGATCGACCTCTCCGCTGAGACGCCCGTGTTCCCCGGCAACGGCGAGGAGCAGCCGCTGACCGAGAACCCCCGGAAGTACGTCATGGGCCACTTCCGCTGGGACCGCTTCGGCCGCCGGAACAGCAGCAGCGGCGGCGGCGCGAGCCAGAAGCGCGAGGAGGAGGAGGTGGTGGTGCTGGGCGGCCCCGGGCCCCGCGGCGACGGCGGCGATGGCGGCGAGGCGGGCCCGCGCGAGGGCAAGCGCTCCTACTCCATGGAGCACTTCCGCTGGGGCAAGCCGGTGGGCAAGAAGCGGCGCCCGGTGAAGGTGTACCCCAACGGCGCCGAGGACGAGTCGGCCGAGGCCTTCCCCCTGGAGTTCAAGAGGGAGCTGGCCGGGGAGCGGCCCGAGGGCGCGGCGGCCCGCGCCGAGCTGGGGTACAGCCTGGTGGCGGAGGCCGAGGCGGCGGAGAAGAAGGACGAGGGGCCCTATAAAATGGAGCACTTCCGCTGGGGCAGCCCGCGCAAGGACAAGCGCTACGGCGGCTTCATGAGCTCCGAGAAGAGCCAGACGCCCCTGGTGACGCTGTTCAAAAACGCCATCATCAAGAACGCCCACAAGAAGGGCCAGTGA
  1. Adiponectin (ADIPOQ)

- ENSECAG00000002962

Name

Transcript ID

bp

Protein

Biotype

UniProt Match

Flags

ADIPOQ-201

ENSECAT00000002827.3

1264

301aa

 Protein coding

F7DZE7

-

>EquCab3.0_Glucagon_ADIPOQ_201000001|ADIPOQ-201 cds:protein_coding
ATGGGACAATGTGTCTCTGGTTGTCTGACTAGATCAAGGAAAGACTATGTGTGTGTGTGTGCTTGTGCGTACATGTGTGTGCAAGTATGTGTATGTATGTATATGTATGTGTGTGTTTGGGTTGGGTGTGCTGTTTGGGGTCTGCTCTCATGGCTGACAGTGCAGATTTGGATTCCAGGACTCAGGATGCTGTTGCTCCAAGCTGTTCTATTGCTACTAGTCCTGCCGAGTCCGGGTGAGGTTACCACGACTGAAGAGACTCTGCCCAAGGAGGGCTGCGCAGGTTGGATGGCAGGCATCCCAGGGCATCCTGGCCACAATGGGACCCCAGGCCGTGATGGCAGAGATGGCACCCCTGGCGAGAAGGGTGAGAAAGGAGATCCAGGTCTTGTTGGGCCTAAGGGTGATGCTGGTGAAACTGGAGTGCCTGGAGTTGAAGGTCCCAGAGGCTTTCCGGGAATCCCAGGCAGGAAAGGAGAACCTGGAGAAAGTTCCTATGTATACCGCTCAGCATTCAGTGTAGGATTGGAGACCCGAGTCACCGTCCCCAATGTTCCCATTCGTTTTACCAAGATCTTCTACAATCAGCAAAACCACTATGATGGCAGCACGGGCAAATTCCACTGCAACATTCCTGGGCTGTACTACTTCTCCTACCACATCACAGTCTACTTGAAGGATGTGAAGGTCAGCCTCTACAAGAAGGACAAGGCTGTGCTCTTCACCTATGACCAGTACCAGGACAAGAACTTGGACCAGGCCTCAGGCTCTGTTCTCCTCTATCTGGAGAAGGGCGACCAAGTCTGGCTCCAGGTGTATGGGGATGGAGATCATAATGGGCTCTATGCCGATAATGTCAATGACTCCACCTTCACAGGCTTCCTTCTCTACCACGACACCAACTGA

Nextflow - ConsGenome pipeline

This is a QUT eResearch developed pipeline for 1) the de novo assembly of reference sequences (i.e., amplicon seq regions, metagenomics sequences, etc); 2) map and deduce a reference consensus genomics/transcript sequence based on HTS data collected for a given sample, and 3) identify nucleotide sequence polymorphism.

To run the pipeline prepare the follwowing:

  • index.csv - a file describing the sample ID, the path to read1 and read2 if applicable

  • nextflow.config - a file to specify parameter options such as the genome/transcriptome/amplicon reference to use to map reads and predict a consensus sequence

  • launch.pbs - PBS Pro script to submit the ConsGenome job to the HPC cluster

Example index.csv file:

sampleid,read1,read2
AP01,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP01_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP01_R2.fq.gz
AP02,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP02_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP02_R2.fq.gz
AP03,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP03_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP03_R2.fq.gz
AP04,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP04_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP04_R2.fq.gz
AP06,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP06_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP06_R2.fq.gz
AP07,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP07_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP07_R2.fq.gz
AP08,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP08_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP08_R2.fq.gz
AP10,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP10_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP10_R2.fq.gz
AP11,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP11_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP11_R2.fq.gz
AP12,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP12_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP12_R2.fq.gz
AP13,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP13_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP13_R2.fq.gz
AP14,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP14_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP14_R2.fq.gz
AP15,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP15_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP15_R2.fq.gz
AP17,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP17_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP17_R2.fq.gz
AP18,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP18_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP18_R2.fq.gz
AP19,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP19_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP19_R2.fq.gz
AP20,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP20_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/AP20_R2.fq.gz
MC01,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC01_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC01_R2.fq.gz
MC02,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC02_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC02_R2.fq.gz
MC03,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC03_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC03_R2.fq.gz
MC04,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC04_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC04_R2.fq.gz
MC05,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC05_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC05_R2.fq.gz
MC06,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC06_R1.fq.gz,/work/APP_dp18app/nextflow/data/QCed_fastq_pairs/MC06_R2.fq.gz

Example nextflow.config file:

params {
  outdir = "results"
  indexfile = "index.csv"
  genome = "/work/APP_dp18app/nextflow/data/ref/Equus_caballus_EquCan3.0_GCG_MC2R_MC4R_POMC_ADIPOQ_genes.cds.fasta"
  paired = true
}

process {
  withLabel: mapping {
    memory = 32.GB
  }
}

Launch.pbs script:

#!/bin/bash -l
#PBS -N nfconsgenome 
#PBS -l walltime=24:00:00
#PBS -l select=1:ncpus=1:mem=5gb

cd $PBS_O_WORKDIR
NXF_OPTS='-Xms1g -Xmx4g'
module load java

#run netflow pipeline
nextflow run /work/eresearch_bio/nextflow/nf-eresearch/nfconsgenome/main.nf 

  • No labels