Authoritative release: Long-chain non-coded RNA naming conventions

Source: Internet
Author: User

Transferred from: http://blog.sina.com.cn/s/blog_8088f3700101pab7.html

authoritative release: Long-chain non-coded RNA naming conventions



The Hugo Gene Naming Commission (HGNC) is the only officially authorized institution for the development of the human genome naming standard. HGNC's database contains 38,000 gene names, most of which are encoded protein genes, but HGNC also named more than 8,500 human non-coding genes and pseudo-non-coding genes, and by collaborating with various levels of experts, they named most small non-encoded RNA.

Small non-encoded RNA can generally be classified according to their homology and the same function. In contrast, long-chain non-encoded RNA has a completely different set of characteristics, their length of more than 200 bases, does not have a conservative sequence of homology, there are changeable functional properties. Like encoded protein genes, long-chain non-encoded RNA is also named as much as possible based on known functions of their products.

In order to help researchers to effectively name lncRNA, so that their names more standardized, the name can reflect the function, HGNC produced such a naming guidelines for scientific research personnel reference.

Before a long-chain non-encoded RNA is to be published, researchers should first be recognized by HGNC.

According to the prediction, there are a large number of long-chain non-encoded RNA (at least thousands of) in the human genome, but humans know little about its function. Therefore, it is common to use the genome context to name the lncRNA of unknown functions. HGNC hopes to work with researchers to name the long-chain non-coding RNA well. The goal of HGNC is to make lncRNA's name unique and accurate (the ability to maximize the name).

lncRNA Naming guidance Standard
A lncRNA to be named reasonably accurate, some principles need to follow, there are many factors need to pay attention to. The detailed naming principles and considerations are as follows:
The name of each lncRNA should be unique

The principle of "uniqueness of name" is important and cannot be violated. It allows us to study and analyze a gene without having problems (it doesn't happen: a gene with several names, a gene with the same name). On the other hand, the above problems are not conducive to HGNC management and maintenance of naming rules. If an author publishes a lncRNA name and it has been used elsewhere, HGNC will specify a new name to choose from. For example, a new lncRNA, its function is to maintain epithelial cells in a non-differentiated state, originally intended to be named ANCR, but this name has been used in "Happy Puppet Syndrome chromosome area, Angelmansyndrome chromosome region", So agree with the author and use DANCR to name the lncRNA "differentiation antagonizing non-protein Coding RNA".

lncRNA's name should be a description of the gene's abbreviation.
The identity of each lncRNA should be a "abbreviation" or "initials" describing the gene.
For example, BANCR is the first letter of the ' braf-activated Non-protein Codingrna ' phrase. This makes it easy for people to understand the meaning of a name.

the name of lncRNA should consist only of Latin alphabet and Arabic numerals
There should be no punctuation in each lncRNA's logo, but you can use letters or numbers instead of punctuation marks.
Hyphens are used only on special occasions. For example: antisense encoded protein genes can be added to the identity of the hyphen (Bace1-as is the name of BACE1 Antisenserna).

The letters in lncRNA's name should be uppercase
The letters in the human genome should be capitalized in order to differentiate them from the genes of other species (e.g., the rodent gene's identity requires only initial capitalization and the rest of the lower case).
For example, the "Hot Gas" (HotAir) gene, called HotAir in Humans, is written hotair in mice.

lncRNA's name should not refer to specific species types
For example, if the gene name has h/h (for human beings), it can cause some confusion and misleading because of the problem of homologous genes involved.

lncRNA's logo should avoid the use of some common vocabulary
The common vocabulary that appears in the name of a gene can cause some confusion, causing a lot of problems in the analysis, so you should avoid the usual vocabulary in naming.
For example: the "Airn" Gene was originally announced as ' AIR ', search from a public database can get 220,000 irrelevant information, while the search for "Airn" is only 10 message. "Airn" can be found to be much more efficient. The same examples are many.

lncRNA's logo should reflect its function as much as possible.
For example: The ' XIST ' gene is an abbreviation for ' x (inactive)-specifictranscript ', which is involved in silencing the transcription of a pair of X chromosomes.
When naming, try to reflect the usual function of the gene, but not its mutant phenotype. The naming of genes should be concise and should not contain too much information.
    • The identity of the gene should not have the color of attack or contempt.
    • The identity of the gene should not have a personal and local color.
    • The identification of genes should not contain the names of deified, fictional or historical figures.
    • The identification of genes should not contain "imaginary" and meaningless information.
functional transcription pseudo-genes should contain the names of their pseudo-genes.
Currently, some of the fewer transcription pseudo-genes are found to be functional, such as the PTENP1 gene, which is associated with "pten-targeting" miRNA to modulate the expression level of PTEN.
The functional transcription pseudo-genes should be named with their pseudo-genetic names, and should not change their function-based names. To facilitate search, this feature should be added at the end of the logo. The naming of PTENP1 is an example of this. PTENP1 is ' phosphatase and tensin homolog pseudogene 1 (functional) '.

how to name an unknown gene should follow these requirements
The lncRNA of unknown function should be named according to the genome context, and the rules of how to systematize naming are given in Figure I.



Figure A
If there is a very close protein coding gene, lncRNA's name should begin with the encoded gene name and then the suffix can be classified in the following way: Antisense (Antisense,as), Bace1-as; Intron (intronic,it), for example, spry4-it1; overlap (Overlapping,ot), for example, Osx2-ot; lncRNA (Longintergeniclncrnas,lincrnas) of long chain genes, prefixed with LINC, number suffix, For example LINC00485. In essence, the nomenclature is based on the annotated directory of Gnecode, antisense RNA, just intron, just overlapping and long-chain non-encoded RNA (Lincrna). Some new classifications should also be considered, particularly for these Lnrna, where they are head-on with encoded genes (headto Head), so infer that they have two-way initiators, HGNC recommends naming these lncRNA upstream (Antisenseupstream,au), for example, Gene2-au1. It should also be noted that HGNC does not favour the naming of splicing variants, so the two splicing variants are named after one of the lncRNA genes, for example, GENE2-AS1; If a lncRNA gene encodes a transcript that spans more than one protein-encoded gene, Named after the first protein-encoded gene at the 5 ' end of lncRNA, such as Gene-as2
The above named basic architecture works for most lncRNA, but for lncRNA in gene-dense areas it may not be appropriate, in which case you should communicate with HGNC to resolve.

HGNC is committed to making the naming of lncRNA in the human genome effective and normative. For more information, please visit
Www.genenames.org/rna/LNCRNA, can also send HGNC e-mail [email protected]

Authoritative release: Long-chain non-coded RNA naming conventions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.