VCF file Details

Source: Internet
Author: User

Variant call Format (VCF) is a text format used to store mutation information for a gene sequence. Represents single-base mutation, insertion/deletion, copy number variation, and structural variation. BCF format files are binary files in VCF format.

Chrom [chromosome]: chromosome name.

POS [position]: reference genomic mutation base position, if Indel (insertion missing), position is the first base position of Indel.

ID [identifier]: the name of the mutation. If not, then use '. ' Indicates that it is a new variant.

REF [Reference base (s)]: the base of the reference chromosome must be one of the ATCGN, n indicates an indeterminate base.

ALT [Alternate base (s)]: Compared with the reference sequence, the base of mutation occurs; multiple words with "," connection, optional symbol for atcgn*, case sensitive.

QUAL [Quality]: The mass value under the Phred standard indicates the likelihood of a mutation at that point; the higher the value, the greater the likelihood of the mutation; Calculation method: Phred value = -10 * log (1-P) p is the probability of mutation existence.

filter [filter status]: GATK filter results are filtered using other methods, if passed then the value is "pass", if the mutation is unreliable, then the item is not "pass" or "."

info [Additional information]: Indicates the details of the mutation

DP [Read Depth]: Sample coverage After some reads in this position have been filtered out

DP4: High quality sequencing base, located before and after ref or ALT

MQ [Mapping Quality]: mean-square rms for overriding sequence quality

Fq:phred values about the likelihood of all samples being similar

AF1 [allele frequency]: Indicates the frequency of allele (allele), AF1 the likelihood of the frequency of the first alt allele occurring

AC1 [allele Count]: represents the number of allele (alleles), AC1 the maximum likelihood of the first alt allele count

An [allele number]: Represents the total amount of allele (allele)

is: Insert missing or partially inserted missing reads maximum number allowed

AC [allele Count]: Indicates the number of allele (alleles)

G3:ML evaluate the frequency of genotype occurrence

Hwe:chi^2 HWE-based test P-values and G3

CLR: The numerical value of the likelihood of a genotype being or not being restricted

UGT: Three most likely non-restricted genotype structures

CGT: The most likely to be limited by three genotype structures

PV4: Four kinds of P-value errors, respectively (Strand, BASEQ, MAPQ, tail distance bias)

INDEL: Indicates that the mutation in this position is missing insertion

PC2: The phred (probability of mutation) of non-reference alleles is different in size in two groups

PCHI2: Post-weighted chi^2 to test the relationship between two sets of samples according to P-values

PCHI2 under the qchi2:phred standard.

PR: Substitution produces a smaller PCHI2

QBD [quality by depth]: Indicates the effect of sequencing depth on quality

RPB [read position bias]: Indicates the error position of the sequence

MDV: Maximum number of high-quality non-reference sequences in a sample

VDB [variant distance bias]: Indicates the variation error range of filtering artificial stitching sequences in RNA sequences

GT [Genotype]: represents the genotype of the sample. Two numbers are separated by '/', and these two numbers represent the genotype of the double-body sample.

0 indicates a allele with ref in the sample

1 indicates the allele of the variant in the sample

2 indicates a allele with a second variant.

0/0 indicates that the bit in sample is pure, consistent with ref.

0/1 indicates that the bit in sample is mixed, with ref and variant two genotypes.

1/1 indicates that the bit in sample is pure, consistent with the variant.

GQ [Genotype Quality]: denotes the mass value of the genotype. The mass value of the phred format, indicating the probability that the genotype exists at that point; the higher the value, the greater the likelihood of genotype; Calculation method: Phred value = -10 * log (1-P) p is the probability of genotype existence.

GL: Possibility of three genotypes (RR RA AA), R for reference base, and a for variation base

DV: High quality non-reference base

P-Value error bars for sp:phred

PL [Provieds The likelihoods of the given genotypes]: the mass value of the specified three genotypes. Three of the specified genotypes were (0/0,0/1,1/1), and the probability of the three genotypes was the sum of 1. The larger the value, the less likely it is to be the genotype of that type. Phred value = -10 * log (p) p is the probability that the genotype exists.

FORMAT: The (optional) extensible list of fields used to describe the sample

SAMPLEs: For each of the (optional) samples described in the file, the values of the fields listed in the format are given

VCF File Details

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.