Goals
Individual-level MAF: Assign minimal allele frequency (MAF) information to a VCF file generated using the nf-core/sarek pipeline
Population-level MAF: Add population-based MAF to annotated variants using public databases such as Mafdb.gnomAD and dbSNP
VCF files generated by the nextflow nf-core/sarek pipeline
In the sarek pipeline, for example, VCF files generated by HaplotypeCaller can be found at:
results/VariantCalling/sampleID/HaplotypeCaller/HaplotypeCaller_sampleID.vcf
Annotated VCF files using Variant Effect Predictor (VEP) can be found at:
results/Annotation/sampleID/VEP/HaplotypeCaller_sampleID_VEP.vcf
Alternatively if annotated using snpEff
results/Annotation/sampleID/snpEff/HaplotypeCaller_sampleID_snpEff.vcf
Example of VCF file annotated using snpEff. Note the metadata header information is not shown.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT FBS1-LNCAP-RNA chr1 14542 rs1045833 A G 71.84 . AC=2;AF=1.00;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=23.70;QD=23.95;SOR=2.833;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*133A>G|||||133|,G|downstream_gene_variant|MODIFIER|MIR68 chr1 14574 rs28503599 A G 121.05 . AC=2;AF=1.00;AN=2;DB;DP=10;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.52;QD=12.10;SOR=4.804;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*165A>G|||||165|,G|downstream_gene_variant|MODIFIER|MIR68 chr1 14599 rs531646671 T A 301.02 . AC=2;AF=1.00;AN=2;DB;DP=7;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.00;QD=25.36;SOR=4.174;ANN=A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*190T>A|||||190|,A|downstream_gene_variant|MODIFIER|MIR685 chr1 14604 rs541940975 A G 356.05 . AC=2;AF=1.00;AN=2;DB;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.23;QD=28.73;SOR=3.056;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*195A>G|||||195|,G|downstream_gene_variant|MODIFIER|MIR685 chr1 14610 . T C 356.05 . AC=2;AF=1.00;AN=2;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.23;QD=30.97;SOR=3.056;ANN=C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*201T>C|||||201|,C|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG000 chr1 14653 rs62635297 C T 1816.06 . AC=2;AF=1.00;AN=2;DB;DP=67;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=34.03;QD=27.94;SOR=2.412;ANN=T|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*244C>T|||||244|,T|downstream_gene_variant|MODIFIER|MIR68 chr1 14677 rs201327123 G A 926.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=2.83;DB;DP=113;ExcessHet=3.0103;FS=1.545;MLEAC=1;MLEAF=0.500;MQ=37.82;MQRankSum=-1.398e+00;QD=8.35;ReadPosRankSum=0.962;SOR=0.852;ANN=A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcrip chr1 16257 rs11489794 G C 49.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.136e+00;DB;DP=23;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=34.10;MQRankSum=0.381;QD=2.16;ReadPosRankSum=0.00;SOR=1.022;ANN=C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript chr1 16288 rs113141985 C G 48.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.150e+00;DB;DP=9;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=32.24;MQRankSum=-1.350e-01;QD=6.08;ReadPosRankSum=-1.029e+00;SOR=0.307;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_ chr1 16298 rs62636498 C T 112.14 . AC=2;AF=1.00;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.25;QD=28.04;SOR=3.258;ANN=T|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*1889C>T|||||1889|,T|downstream_gene_variant|MODIFIER|MIR6 chr1 136048 rs371677125 C T 125.14 . AC=2;AF=1.00;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=40.00;QD=31.29;SOR=0.693;ANN=T|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-153G>A|||||153|,T|downstream_gene_variant|MODIFIER|RP chr1 136573 . T C 113.97 . AC=2;AF=1.00;AN=2;DP=7;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=49.62;QD=19.00;SOR=1.329;ANN=C|TF_binding_site_variant|LOW|||Egr1|MA0162.2|||n.136573A>G||||||,C|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n chr1 136962 rs373582709 C T 118.68 . AC=1;AF=0.500;AN=2;BaseQRankSum=-8.870e-01;DB;DP=8;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=50.36;MQRankSum=0.489;QD=16.95;ReadPosRankSum=0.489;SOR=0.330;ANN=T|TF_binding_site_variant|MODIFIER|||Egr1|MA0162.2|||n.136962G>A||||||,T|upstream_gene_variant|MODIFIER|RP1 chr1 137622 rs376555721 G A 54.66 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.674;DB;DP=4;ExcessHet=3.0103;FS=6.021;MLEAC=1;MLEAF=0.500;MQ=20.00;MQRankSum=0.00;QD=13.67;ReadPosRankSum=-3.190e-01;SOR=2.788;ANN=A|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudo chr1 137825 rs147252685 G A 622.06 . AC=2;AF=1.00;AN=2;DB;DP=26;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=36.46;QD=24.88;SOR=0.770;ANN=A|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-1930C>T|||||1930|,A|downstream_gene_variant|MODIFIER chr1 138156 rs370691115 G T 197.04 . AC=2;AF=1.00;AN=2;DB;DP=8;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=22.00;QD=24.63;SOR=0.693;ANN=T|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-2261C>A|||||2261|,T|upstream_gene_variant|MODIFIER|RP chr1 138817 rs556938922 T C 945.06 . AC=2;AF=1.00;AN=2;DB;DP=45;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=35.16;QD=21.98;SOR=0.739;ANN=C|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-2922A>G|||||2922|,C|upstream_gene_variant|MODIFIER|R chr1 184246 . T C 52.84 . AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=41.80;QD=17.61;SOR=1.179;ANN=C|downstream_gene_variant|MODIFIER|FO538757.2|ENSG00000279928|transcript|ENST00000624431.1|protein_coding||c.*88T>C|||||88|,C|downstream_gene_variant|MODIFIER|FO538757.1|ENSG000002 chr1 185194 . G C 91.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.345e+00;DP=10;ExcessHet=3.0103;FS=4.260;MLEAC=1;MLEAF=0.500;MQ=24.41;MQRankSum=-1.150e+00;QD=11.45;ReadPosRankSum=0.00;SOR=2.833;ANN=C|3_prime_UTR_variant|MODIFIER|FO538757.1|ENSG00000279457|transcript|ENST00000623083.3|protein_coding|11/11|c.*23
Adding MAF information using bcftools
The following command line adds MAF information to either annotated or non-annotated VCF files.
#non-annotated file: bcftools +fill-tags HaplotypeCaller_sampleID.vcf > HaplotypeCaller_sampleID_tags.vcf #VEP annotated file: bcftools +fill-tags HaplotypeCaller_sampleID_VEP.vcf > HaplotypeCaller_sampleID_VEP_tags.vcf #snpEff annotated file: bcftools +fill-tags HaplotypeCaller_sampleID_snpEff.vcf > HaplotypeCaller_sampleID_snpEff_tags.vcf
For the example shown above, find the added MAF information to the VCF annotated using snpEff.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT FBS1-LNCAP-RNA chr1 14542 rs1045833 A G 71.84 . AC=2;AF=1;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=23.7;QD=23.95;SOR=2.833;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*133A>G|||||133|,G|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG0 chr1 14574 rs28503599 A G 121.05 . AC=2;AF=1;AN=2;DB;DP=10;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22.52;QD=12.1;SOR=4.804;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*165A>G|||||165|,G|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG00 chr1 14599 rs531646671 T A 301.02 . AC=2;AF=1;AN=2;DB;DP=7;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22;QD=25.36;SOR=4.174;ANN=A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*190T>A|||||190|,A|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG00000 chr1 14604 rs541940975 A G 356.05 . AC=2;AF=1;AN=2;DB;DP=9;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22.23;QD=28.73;SOR=3.056;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*195A>G|||||195|,G|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG00 chr1 14610 . T C 356.05 . AC=2;AF=1;AN=2;DP=9;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22.23;QD=30.97;SOR=3.056;ANN=C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*201T>C|||||201|,C|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG00000278267|t chr1 14653 rs62635297 C T 1816.06 . AC=2;AF=1;AN=2;DB;DP=67;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=34.03;QD=27.94;SOR=2.412;ANN=T|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*244C>T|||||244|,T|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG0 chr1 14677 rs201327123 G A 926.64 . AC=1;AF=0.5;AN=2;BaseQRankSum=2.83;DB;DP=113;ExcessHet=3.0103;FS=1.545;MLEAC=1;MLEAF=0.5;MQ=37.82;MQRankSum=-1.398;QD=8.35;ReadPosRankSum=0.962;SOR=0.852;ANN=A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*26 chr1 16257 rs11489794 G C 49.64 . AC=1;AF=0.5;AN=2;BaseQRankSum=-1.136;DB;DP=23;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=34.1;MQRankSum=0.381;QD=2.16;ReadPosRankSum=0;SOR=1.022;ANN=C|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*1848G>C|||| chr1 16288 rs113141985 C G 48.64 . AC=1;AF=0.5;AN=2;BaseQRankSum=-1.15;DB;DP=9;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=32.24;MQRankSum=-0.135;QD=6.08;ReadPosRankSum=-1.029;SOR=0.307;ANN=G|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*1879C> chr1 16298 rs62636498 C T 112.14 . AC=2;AF=1;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22.25;QD=28.04;SOR=3.258;ANN=T|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*1889C>T|||||1889|,T|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG chr1 136048 rs371677125 C T 125.14 . AC=2;AF=1;AN=2;DB;DP=4;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=40;QD=31.29;SOR=0.693;ANN=T|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-153G>A|||||153|,T|downstream_gene_variant|MODIFIER|RP11-34P13.14|E chr1 136573 . T C 113.97 . AC=2;AF=1;AN=2;DP=7;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=49.62;QD=19;SOR=1.329;ANN=C|TF_binding_site_variant|LOW|||Egr1|MA0162.2|||n.136573A>G||||||,C|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-678A>G||||| chr1 136962 rs373582709 C T 118.68 . AC=1;AF=0.5;AN=2;BaseQRankSum=-0.887;DB;DP=8;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=50.36;MQRankSum=0.489;QD=16.95;ReadPosRankSum=0.489;SOR=0.33;ANN=T|TF_binding_site_variant|MODIFIER|||Egr1|MA0162.2|||n.136962G>A||||||,T|upstream_gene_variant|MODIFIER|RP11-34P13.15|EN chr1 137622 rs376555721 G A 54.66 . AC=1;AF=0.5;AN=2;BaseQRankSum=0.674;DB;DP=4;ExcessHet=3.0103;FS=6.021;MLEAC=1;MLEAF=0.5;MQ=20;MQRankSum=0;QD=13.67;ReadPosRankSum=-0.319;SOR=2.788;ANN=A|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-1727C chr1 137825 rs147252685 G A 622.06 . AC=2;AF=1;AN=2;DB;DP=26;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=36.46;QD=24.88;SOR=0.77;ANN=A|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-1930C>T|||||1930|,A|downstream_gene_variant|MODIFIER|RP11-34P13 chr1 138156 rs370691115 G T 197.04 . AC=2;AF=1;AN=2;DB;DP=8;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22;QD=24.63;SOR=0.693;ANN=T|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-2261C>A|||||2261|,T|upstream_gene_variant|MODIFIER|RP11-34P13.16|E chr1 138817 rs556938922 T C 945.06 . AC=2;AF=1;AN=2;DB;DP=45;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=35.16;QD=21.98;SOR=0.739;ANN=C|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149.2|processed_pseudogene||n.-2922A>G|||||2922|,C|upstream_gene_variant|MODIFIER|RP11-34P13. chr1 184246 . T C 52.84 . AC=2;AF=1;AN=2;DP=4;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=41.8;QD=17.61;SOR=1.179;ANN=C|downstream_gene_variant|MODIFIER|FO538757.2|ENSG00000279928|transcript|ENST00000624431.1|protein_coding||c.*88T>C|||||88|,C|downstream_gene_variant|MODIFIER|FO538757.1|ENSG00000279457|tran chr1 185194 . G C 91.64 . AC=1;AF=0.5;AN=2;BaseQRankSum=-1.345;DP=10;ExcessHet=3.0103;FS=4.26;MLEAC=1;MLEAF=0.5;MQ=24.41;MQRankSum=-1.15;QD=11.45;ReadPosRankSum=0;SOR=2.833;ANN=C|3_prime_UTR_variant|MODIFIER|FO538757.1|ENSG00000279457|transcript|ENST00000623083.3|protein_coding|11/11|c.*23C>G|||||23|,C|dow