Variant call Format (VCF) is a text format used to store mutation information for a gene sequence. Represents single-base mutation, insertion/deletion, copy number variation, and structural variation. BCF format files are binary files in VCF format.
Chrom [chromosome]: chromosome name.
POS [position]: reference genomic mutation base position, if Indel (insertion missing), position is the first base position of Indel.
ID [identifier]: the name of the mutation. If not, then use '. ' Indicates that it is a new variant.
REF [Reference base (s)]: the base of the reference chromosome must be one of the ATCGN, n indicates an indeterminate base.
ALT [Alternate base (s)]: Compared with the reference sequence, the base of mutation occurs; multiple words with "," connection, optional symbol for atcgn*, case sensitive.
QUAL [Quality]: The mass value under the Phred standard indicates the likelihood of a mutation at that point; the higher the value, the greater the likelihood of the mutation; Calculation method: Phred value = -10 * log (1-P) p is the probability of mutation existence.
filter [filter status]: GATK filter results are filtered using other methods, if passed then the value is "pass", if the mutation is unreliable, then the item is not "pass" or "."
info [Additional information]: Indicates the details of the mutation
DP [Read Depth]: Sample coverage After some reads in this position have been filtered out
DP4: High quality sequencing base, located before and after ref or ALT
MQ [Mapping Quality]: mean-square rms for overriding sequence quality
Fq:phred values about the likelihood of all samples being similar
AF1 [allele frequency]: Indicates the frequency of allele (allele), AF1 the likelihood of the frequency of the first alt allele occurring
AC1 [allele Count]: represents the number of allele (alleles), AC1 the maximum likelihood of the first alt allele count
An [allele number]: Represents the total amount of allele (allele)
is: Insert missing or partially inserted missing reads maximum number allowed
AC [allele Count]: Indicates the number of allele (alleles)
G3:ML evaluate the frequency of genotype occurrence
Hwe:chi^2 HWE-based test P-values and G3
CLR: The numerical value of the likelihood of a genotype being or not being restricted
UGT: Three most likely non-restricted genotype structures
CGT: The most likely to be limited by three genotype structures
PV4: Four kinds of P-value errors, respectively (Strand, BASEQ, MAPQ, tail distance bias)
INDEL: Indicates that the mutation in this position is missing insertion
PC2: The phred (probability of mutation) of non-reference alleles is different in size in two groups
PCHI2: Post-weighted chi^2 to test the relationship between two sets of samples according to P-values
PCHI2 under the qchi2:phred standard.
PR: Substitution produces a smaller PCHI2
QBD [quality by depth]: Indicates the effect of sequencing depth on quality
RPB [read position bias]: Indicates the error position of the sequence
MDV: Maximum number of high-quality non-reference sequences in a sample
VDB [variant distance bias]: Indicates the variation error range of filtering artificial stitching sequences in RNA sequences
GT [Genotype]: represents the genotype of the sample. Two numbers are separated by '/', and these two numbers represent the genotype of the double-body sample.
0 indicates a allele with ref in the sample
1 indicates the allele of the variant in the sample
2 indicates a allele with a second variant.
0/0 indicates that the bit in sample is pure, consistent with ref.
0/1 indicates that the bit in sample is mixed, with ref and variant two genotypes.
1/1 indicates that the bit in sample is pure, consistent with the variant.
GQ [Genotype Quality]: denotes the mass value of the genotype. The mass value of the phred format, indicating the probability that the genotype exists at that point; the higher the value, the greater the likelihood of genotype; Calculation method: Phred value = -10 * log (1-P) p is the probability of genotype existence.
GL: Possibility of three genotypes (RR RA AA), R for reference base, and a for variation base
DV: High quality non-reference base
P-Value error bars for sp:phred
PL [Provieds The likelihoods of the given genotypes]: the mass value of the specified three genotypes. Three of the specified genotypes were (0/0,0/1,1/1), and the probability of the three genotypes was the sum of 1. The larger the value, the less likely it is to be the genotype of that type. Phred value = -10 * log (p) p is the probability that the genotype exists.
FORMAT: The (optional) extensible list of fields used to describe the sample
SAMPLEs: For each of the (optional) samples described in the file, the values of the fields listed in the format are given
VCF File Details