Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Goals

  • Individual-level MAF: Assign minimal allele frequency (MAF) information to a VCF file generated using the nf-core/sarek pipeline

  • Population-level MAF: Add population-based MAF to annotated variants using public databases such as Mafdb.gnomAD and dbSNP

VCF files generated by the nextflow nf-core/sarek pipeline

In the sarek pipeline, for example, VCF files generated by HaplotypeCaller can be found at:

results/VariantCalling/sampleID/HaplotypeCaller/HaplotypeCaller_sampleID.vcf

Annotated VCF files using Variant Effect Predictor (VEP) can be found at:

results/Annotation/sampleID/VEP/HaplotypeCaller_sampleID_VEP.vcf

Alternatively if annotated using snpEff

results/Annotation/sampleID/snpEff/HaplotypeCaller_sampleID_snpEff.vcf

Example of VCF file annotated using snpEff. Note the metadata header information is not shown.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  FBS1-LNCAP-RNA
chr1    14542   rs1045833       A       G       71.84   .       AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=23.70;QD=23.95;SOR=2.833;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*133A>G|||||133|,G|downstream_gene_variant|MODIFIER|MIR68
chr1    14574   rs28503599      A       G       121.05  .       AC=2;AF=1.00;AN=2;DB;DP=10;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.52;QD=12.10;SOR=4.804;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*165A>G|||||165|,G|downstream_gene_variant|MODIFIER|MIR68
chr1    14599   rs531646671     T       A       301.02  .       AC=2;AF=1.00;AN=2;DB;DP=7;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.00;QD=25.36;SOR=4.174;ANN=A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*190T>A|||||190|,A|downstream_gene_variant|MODIFIER|MIR685
chr1    14604   rs541940975     A       G       356.05  .       AC=2;AF=1.00;AN=2;DB;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.23;QD=28.73;SOR=3.056;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*195A>G|||||195|,G|downstream_gene_variant|MODIFIER|MIR685
chr1    14610   .       T       C       356.05  .       AC=2;AF=1.00;AN=2;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.23;QD=30.97;SOR=3.056;ANN=C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*201T>C|||||201|,C|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG000
chr1    14653   rs62635297      C       T       1816.06 .       AC=2;AF=1.00;AN=2;DB;DP=67;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=34.03;QD=27.94;SOR=2.412;ANN=T|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*244C>T|||||244|,T|downstream_gene_variant|MODIFIER|MIR68
chr1    14677   rs201327123     G       A       926.64  .       AC=1;AF=0.500;AN=2;BaseQRankSum=2.83;DB;DP=113;ExcessHet=3.0103;FS=1.545;MLEAC=1;MLEAF=0.500;MQ=37.82;MQRankSum=-1.398e+00;QD=8.35;ReadPosRankSum=0.962;SOR=0.852;ANN=A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcrip
chr1    16257   rs11489794      G       C       49.64   .       AC=1;AF=0.500;AN=2;BaseQRankSum=-1.136e+00;DB;DP=23;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=34.10;MQRankSum=0.381;QD=2.16;ReadPosRankSum=0.00;SOR=1.022;ANN=C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript
chr1    16288   rs113141985     C       G       48.64   .       AC=1;AF=0.500;AN=2;BaseQRankSum=-1.150e+00;DB;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=32.24;MQRankSum=-1.350e-01;QD=6.08;ReadPosRankSum=-1.029e+00;SOR=0.307;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_
chr1    16298   rs62636498      C       T       112.14  .       AC=2;AF=1.00;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.25;QD=28.04;SOR=3.258;ANN=T|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*1889C>T|||||1889|,T|downstream_gene_variant|MODIFIER|MIR6
chr1    136048  rs371677125     C       T       125.14  .       AC=2;AF=1.00;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=40.00;QD=31.29;SOR=0.693;ANN=T|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-153G>A|||||153|,T|downstream_gene_variant|MODIFIER|RP
chr1    136573  .       T       C       113.97  .       AC=2;AF=1.00;AN=2;DP=7;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=49.62;QD=19.00;SOR=1.329;ANN=C|TF_binding_site_variant|LOW|||Egr1|MA0162.2|||n.136573A>G||||||,C|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n
chr1    136962  rs373582709     C       T       118.68  .       AC=1;AF=0.500;AN=2;BaseQRankSum=-8.870e-01;DB;DP=8;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=50.36;MQRankSum=0.489;QD=16.95;ReadPosRankSum=0.489;SOR=0.330;ANN=T|TF_binding_site_variant|MODIFIER|||Egr1|MA0162.2|||n.136962G>A||||||,T|upstream_gene_variant|MODIFIER|RP1
chr1    137622  rs376555721     G       A       54.66   .       AC=1;AF=0.500;AN=2;BaseQRankSum=0.674;DB;DP=4;ExcessHet=3.0103;FS=6.021;MLEAC=1;MLEAF=0.500;MQ=20.00;MQRankSum=0.00;QD=13.67;ReadPosRankSum=-3.190e-01;SOR=2.788;ANN=A|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudo
chr1    137825  rs147252685     G       A       622.06  .       AC=2;AF=1.00;AN=2;DB;DP=26;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=36.46;QD=24.88;SOR=0.770;ANN=A|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-1930C>T|||||1930|,A|downstream_gene_variant|MODIFIER
chr1    138156  rs370691115     G       T       197.04  .       AC=2;AF=1.00;AN=2;DB;DP=8;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.00;QD=24.63;SOR=0.693;ANN=T|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-2261C>A|||||2261|,T|upstream_gene_variant|MODIFIER|RP
chr1    138817  rs556938922     T       C       945.06  .       AC=2;AF=1.00;AN=2;DB;DP=45;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=35.16;QD=21.98;SOR=0.739;ANN=C|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-2922A>G|||||2922|,C|upstream_gene_variant|MODIFIER|R
chr1    184246  .       T       C       52.84   .       AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=41.80;QD=17.61;SOR=1.179;ANN=C|downstream_gene_variant|MODIFIER|FO538757.2|ENSG00000279928|transcript|ENST00000624431.1|protein_coding||c.*88T>C|||||88|,C|downstream_gene_variant|MODIFIER|FO538757.1|ENSG000002
chr1    185194  .       G       C       91.64   .       AC=1;AF=0.500;AN=2;BaseQRankSum=-1.345e+00;DP=10;ExcessHet=3.0103;FS=4.260;MLEAC=1;MLEAF=0.500;MQ=24.41;MQRankSum=-1.150e+00;QD=11.45;ReadPosRankSum=0.00;SOR=2.833;ANN=C|3_prime_UTR_variant|MODIFIER|FO538757.1|ENSG00000279457|transcript|ENST00000623083.3|protein_coding|11/11|c.*23

Adding MAF information using bcftools

The following command line adds MAF information to either annotated or non-annotated VCF files.

#non-annotated file:
bcftools +fill-tags HaplotypeCaller_sampleID.vcf > HaplotypeCaller_sampleID_tags.vcf

#VEP annotated file:
bcftools +fill-tags HaplotypeCaller_sampleID_VEP.vcf > HaplotypeCaller_sampleID_VEP_tags.vcf

#snpEff annotated file:
bcftools +fill-tags HaplotypeCaller_sampleID_snpEff.vcf > HaplotypeCaller_sampleID_snpEff_tags.vcf

For the example shown above, find the added MAF information to the VCF annotated using snpEff.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  FBS1-LNCAP-RNA
chr1    14542   rs1045833       A       G       71.84   .       AC=2;AF=1;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=23.7;QD=23.95;SOR=2.833;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*133A>G|||||133|,G|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG0
chr1    14574   rs28503599      A       G       121.05  .       AC=2;AF=1;AN=2;DB;DP=10;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22.52;QD=12.1;SOR=4.804;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*165A>G|||||165|,G|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG00
chr1    14599   rs531646671     T       A       301.02  .       AC=2;AF=1;AN=2;DB;DP=7;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22;QD=25.36;SOR=4.174;ANN=A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*190T>A|||||190|,A|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG00000
chr1    14604   rs541940975     A       G       356.05  .       AC=2;AF=1;AN=2;DB;DP=9;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22.23;QD=28.73;SOR=3.056;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*195A>G|||||195|,G|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG00
chr1    14610   .       T       C       356.05  .       AC=2;AF=1;AN=2;DP=9;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22.23;QD=30.97;SOR=3.056;ANN=C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*201T>C|||||201|,C|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG00000278267|t
chr1    14653   rs62635297      C       T       1816.06 .       AC=2;AF=1;AN=2;DB;DP=67;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=34.03;QD=27.94;SOR=2.412;ANN=T|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*244C>T|||||244|,T|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG0
chr1    14677   rs201327123     G       A       926.64  .       AC=1;AF=0.5;AN=2;BaseQRankSum=2.83;DB;DP=113;ExcessHet=3.0103;FS=1.545;MLEAC=1;MLEAF=0.5;MQ=37.82;MQRankSum=-1.398;QD=8.35;ReadPosRankSum=0.962;SOR=0.852;ANN=A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*26
chr1    16257   rs11489794      G       C       49.64   .       AC=1;AF=0.5;AN=2;BaseQRankSum=-1.136;DB;DP=23;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=34.1;MQRankSum=0.381;QD=2.16;ReadPosRankSum=0;SOR=1.022;ANN=C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*1848G>C||||
chr1    16288   rs113141985     C       G       48.64   .       AC=1;AF=0.5;AN=2;BaseQRankSum=-1.15;DB;DP=9;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=32.24;MQRankSum=-0.135;QD=6.08;ReadPosRankSum=-1.029;SOR=0.307;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*1879C>
chr1    16298   rs62636498      C       T       112.14  .       AC=2;AF=1;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22.25;QD=28.04;SOR=3.258;ANN=T|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*1889C>T|||||1889|,T|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG
chr1    136048  rs371677125     C       T       125.14  .       AC=2;AF=1;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=40;QD=31.29;SOR=0.693;ANN=T|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-153G>A|||||153|,T|downstream_gene_variant|MODIFIER|RP11-34P13.14|E
chr1    136573  .       T       C       113.97  .       AC=2;AF=1;AN=2;DP=7;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=49.62;QD=19;SOR=1.329;ANN=C|TF_binding_site_variant|LOW|||Egr1|MA0162.2|||n.136573A>G||||||,C|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-678A>G|||||
chr1    136962  rs373582709     C       T       118.68  .       AC=1;AF=0.5;AN=2;BaseQRankSum=-0.887;DB;DP=8;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=50.36;MQRankSum=0.489;QD=16.95;ReadPosRankSum=0.489;SOR=0.33;ANN=T|TF_binding_site_variant|MODIFIER|||Egr1|MA0162.2|||n.136962G>A||||||,T|upstream_gene_variant|MODIFIER|RP11-34P13.15|EN
chr1    137622  rs376555721     G       A       54.66   .       AC=1;AF=0.5;AN=2;BaseQRankSum=0.674;DB;DP=4;ExcessHet=3.0103;FS=6.021;MLEAC=1;MLEAF=0.5;MQ=20;MQRankSum=0;QD=13.67;ReadPosRankSum=-0.319;SOR=2.788;ANN=A|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-1727C
chr1    137825  rs147252685     G       A       622.06  .       AC=2;AF=1;AN=2;DB;DP=26;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=36.46;QD=24.88;SOR=0.77;ANN=A|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-1930C>T|||||1930|,A|downstream_gene_variant|MODIFIER|RP11-34P13
chr1    138156  rs370691115     G       T       197.04  .       AC=2;AF=1;AN=2;DB;DP=8;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22;QD=24.63;SOR=0.693;ANN=T|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-2261C>A|||||2261|,T|upstream_gene_variant|MODIFIER|RP11-34P13.16|E
chr1    138817  rs556938922     T       C       945.06  .       AC=2;AF=1;AN=2;DB;DP=45;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=35.16;QD=21.98;SOR=0.739;ANN=C|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-2922A>G|||||2922|,C|upstream_gene_variant|MODIFIER|RP11-34P13.
chr1    184246  .       T       C       52.84   .       AC=2;AF=1;AN=2;DP=4;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=41.8;QD=17.61;SOR=1.179;ANN=C|downstream_gene_variant|MODIFIER|FO538757.2|ENSG00000279928|transcript|ENST00000624431.1|protein_coding||c.*88T>C|||||88|,C|downstream_gene_variant|MODIFIER|FO538757.1|ENSG00000279457|tran
chr1    185194  .       G       C       91.64   .       AC=1;AF=0.5;AN=2;BaseQRankSum=-1.345;DP=10;ExcessHet=3.0103;FS=4.26;MLEAC=1;MLEAF=0.5;MQ=24.41;MQRankSum=-1.15;QD=11.45;ReadPosRankSum=0;SOR=2.833;ANN=C|3_prime_UTR_variant|MODIFIER|FO538757.1|ENSG00000279457|transcript|ENST00000623083.3|protein_coding|11/11|c.*23C>G|||||23|,C|dow

  • No labels