GFF3 is the new standard for GFF annotation files. A property of each behavioral genome in the file, divided into 9 columns, separated by tab.
In turn:
1. Reference sequence: Reference sequence
The object that indicates the comment. such as a chromosome, clone or fragment. You can have multiple reference sequences.
The name of the ID cannot begin with ' > ' and cannot contain spaces.
2. Source: Sources
The source of the comment. If unknown, the dot (.) is used instead.
3. Type: Types
The type of the property. It is recommended to use a name that complies with the so Convention (sequence ontology, see [[Sequence Ontology Project]]), such as Gene,repeat_region,exon,cds.
4. Start position: Starting point
Property corresponds to the beginning of the fragment. Counting starting from 1.
5. End Position: End point
property corresponds to the end point of the fragment. Generally larger than the starting value.
6. Score: Score
For some properties that can be quantified, you can set a value here to indicate the degree of difference. If empty, replace with a dot (.).
7. Strand: Chain
"+" means positive chain, "-" means negative chain, "." Indicates no need to specify a positive and negative chain.
8. Phase: Stepping
For CDs that encode proteins, this column specifies where the next codon begins. can be 0,1 or 2, which indicates the number of bases to skip before reaching the next codon.
For other properties, the dot (.) is used instead.
9. Attributes: Properties
A list that contains many properties. The format is "label = value" (tag=value). Different attributes are separated by semicolons. There can be spaces, but if there is ", =;" The URL is escaped (url escaping rule), and the tab needs to be converted to "". All labels that start with uppercase captions are reserved for popular use, while labels that start with a lowercase letter are applied arbitrarily according to their own arrangement.
The following labels are defined:
Id
Specifies a unique identity. Classification of attributes is very useful (for example, to find an exon in a transcription unit).
Name
Specifies the name of the property. This property is displayed to the user. The value of name is displayed when visualized. As a result, name can be arbitrarily valued according to what it shows.
Alias
Name of the McCartney or other. This property is used when there are other names.
Parent
Indicates the last-level ID subordinate to the feature. Used to aggregate exons into transcript and transripts into gene.
Target
Specifies the target area of the alignment, which is generally used to indicate the alignment result of the sequence. The format is "target_id start end [Strand]", where strand is optional ("+" or "-"), and if the target_id contains spaces, it is converted to '.
Gap
The gap information compared to the result, together with Target, is used to indicate the alignment result of the sequence.
Note
Description of the descriptive.
Is_circular
Indicates whether the featrue is a torus. For the cyclic genome sequence.
Same tag if there are multiple values, separate the multiple values with commas, for example:
Parent=af2312,ab2812,abc-3
Alias=m19211,gna-12,gamma-globulin
The tags that can use multiple values are: Parent, Alias, Note, Dbxref and Ontology_term.
Reference: http://blog.sina.com.cn/s/blog_670445240102uxh2.html
GFF3 Format Files