Common database for the reference transcriptome

Last Update:2016-01-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

NR (non-redundant, non-redundant) database

Literature: Deng YY, Li JQ, Wu S F, Zhu YP, et al Integrated NR Database in Protein Annotation System and its Localization. Computer Engineering 2006.,32 (5): 71-74.

Characteristics:

1, to the known or possible coding sequence, give the corresponding amino acid sequence, which part of the supply of protein database serial number;

2, you can use blast software to do the connection.

Swiss-prot Database

Literature: Apweiler R, Bairoch A, Wu CH, Barker WC, et al uniprot:the Universal Protein Knowledgebase. Nucleic Acids 2004 Jan 1;32 (Database issue):D 115-9.

Characteristics:

Swiss-prot is a annotated database of protein sequences maintained by the European Institute of Bioinformatics (EBI). The database consists of protein sequence entries, each containing a protein sequence, citations, taxonomy information, notes, etc., including the function of protein, post-transcriptional modification, special sites and regions, two-stage structure, four-stage structure, similarity with other sequences, the relationship between sequence deformity and disease, Information such as sequence variants and conflicts. The swiss-prot minimizes redundant sequences and establishes cross-references to more than 30 other data, including nucleic acid sequence libraries, protein sequence libraries, and protein structure libraries. The Sequence Extraction system (SRS) makes it easy to retrieve Swiss-prot and other EBI databases. Swiss-prot only accepts sequences of proteins obtained directly from sequencing, and sequence submissions can be done on their web pages.

Go database

Literature: Ashburner M, Ball C A, Blake J A, Botstein D, et al. Gene Ontology:tool for the unification of biology. Nature Genetics 2000, 25 (1): 25-29.

Characteristics:

1. Go is not a gene sequence or gene product database, instead, go emphasizes the function of gene products in cells.
2. Go is not a way to consolidate a database (such as a federated consolidated database), and it does not do this because:
A. Slow update speed
B. Because each person defines the data in different ways, it is difficult to achieve a consistent standard.
C. Go does not describe every aspect of biology. such as the structure of functional domains, 3D structure, evolution and so on.
3. GO is an annotation of the function of a gene, but it has its limitations. For example, go does not reflect the expression of the gene, whether it is in a particular cell, in a particular tissue, at a particular stage of development, or in connection with a disease. Go does not involve these aspects, but supports other OBO (open Biology ontologies) members to set up other types of ontology databases (such as developmental ontology, proteome ontology, gene Chip ontology, etc.)

COG Database (Cluster of orthologous Groups of proteins (cluster of adjacent proteins))

Literature: Tatusov RL, Galperin MY, Natale DA. The COG database:a tool for genome scale analysis of protein functions and evolution. Nucleic Acids 2000, 28 (1): 33-36.

Characteristics:

1, the protein annotation. The known function of a cog protein member (as well as a two-dimensional or three-dimensional structure) can be applied directly to other members of the cog. However, it is also to be warned that because some cog contains paralogs, their function does not correspond to those known proteins.

2, the species of the pattern of occurrence. This gives the presence of certain proteins in a given species in a particular cog. Systems, these maps can be used to determine whether a particular metabolic pathway is in a species.

3, multiple alignment. Each cog page includes a multi-alignment link to the cog member, which can be used to determine the evolutionary relationship between conservative sequence residues and the analysis of member proteins.

KOGDatabase

Literature Links: Koonin EV, Fedorova ND, Jackson JD, et al. A Comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biology, 2004, 5 (2): R7.

It can be understood that the cog is a NCBI database. The Chinese interpretation of cog is "homologous protein cluster". Cog is divided into two categories, one is the prokaryotic organism, the other is eukaryotes. The prokaryotic organism is generally called the COG database; Eukaryotes are generally called KOG databases.

Pfam

Literature Links: Finn RD, Bateman A, Clements J, et al pfam:the protein families database. Nucleic Acids, 2013:gkt1223.

Protein family database, based on multiple sequence alignment and mutation spectral hmm construction

Kegg Database

Literature Links: Kanehisa M, Goto S, Kawashima S, Okuno Y, et al. The KEGG resource for deciphering the genome. Nucleic Acids, 2004 (Database issue):D 277-d280.

KEGG (Kyoto gene and genome Encyclopedia) is a database of genomic deciphering. In the case of a complete set of genes in chromosomes, it can predict the role of protein interaction (interaction) networks in various cellular activities. KEGG's pathway database integrates knowledge of current molecular interaction networks (such as channels, consortia), KEGG's Genes/ssdb/ko database provides knowledge about genes and proteins found in genome projects, KEGG compound/glycan/ The reaction database provides knowledge of biochemical complexes and reactions.

Common database for the reference transcriptome

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Common database for the reference transcriptome

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support