NR (non-redundant, non-redundant) database
Literature: Deng YY, Li JQ, Wu S F, Zhu YP, et al Integrated NR Database in Protein Annotation System and its Localization. Computer Engineering 2006.,32 (5): 71-74.
Characteristics:
1, to the known or possible coding sequence, give the corresponding amino acid sequence, which part of the supply of protein database serial number;
2, you can use blast software to do the connection.
Swiss-prot Database
Literature: Apweiler R, Bairoch A, Wu CH, Barker WC, et al uniprot:the Universal Protein Knowledgebase. Nucleic Acids 2004 Jan 1;32 (Database issue):D 115-9.
Characteristics:
Swiss-prot is a annotated database of protein sequences maintained by the European Institute of Bioinformatics (EBI). The database consists of protein sequence entries, each containing a protein sequence, citations, taxonomy information, notes, etc., including the function of protein, post-transcriptional modification, special sites and regions, two-stage structure, four-stage structure, similarity with other sequences, the relationship between sequence deformity and disease, Information such as sequence variants and conflicts. The swiss-prot minimizes redundant sequences and establishes cross-references to more than 30 other data, including nucleic acid sequence libraries, protein sequence libraries, and protein structure libraries. The Sequence Extraction system (SRS) makes it easy to retrieve Swiss-prot and other EBI databases. Swiss-prot only accepts sequences of proteins obtained directly from sequencing, and sequence submissions can be done on their web pages.
Go database
Literature: Ashburner M, Ball C A, Blake J A, Botstein D, et al. Gene Ontology:tool for the unification of biology. Nature Genetics 2000, 25 (1): 25-29.
Characteristics:
1. Go is not a gene sequence or gene product database, instead, go emphasizes the function of gene products in cells.
2. Go is not a way to consolidate a database (such as a federated consolidated database), and it does not do this because:
A. Slow update speed
B. Because each person defines the data in different ways, it is difficult to achieve a consistent standard.
C. Go does not describe every aspect of biology. such as the structure of functional domains, 3D structure, evolution and so on.
3. GO is an annotation of the function of a gene, but it has its limitations. For example, go does not reflect the expression of the gene, whether it is in a particular cell, in a particular tissue, at a particular stage of development, or in connection with a disease. Go does not involve these aspects, but supports other OBO (open Biology ontologies) members to set up other types of ontology databases (such as developmental ontology, proteome ontology, gene Chip ontology, etc.)
COG Database (Cluster of orthologous Groups of proteins (cluster of adjacent proteins))
Literature: Tatusov RL, Galperin MY, Natale DA. The COG database:a tool for genome scale analysis of protein functions and evolution. Nucleic Acids 2000, 28 (1): 33-36.
Characteristics:
1, the protein annotation. The known function of a cog protein member (as well as a two-dimensional or three-dimensional structure) can be applied directly to other members of the cog. However, it is also to be warned that because some cog contains paralogs, their function does not correspond to those known proteins.
2, the species of the pattern of occurrence. This gives the presence of certain proteins in a given species in a particular cog. Systems, these maps can be used to determine whether a particular metabolic pathway is in a species.
3, multiple alignment. Each cog page includes a multi-alignment link to the cog member, which can be used to determine the evolutionary relationship between conservative sequence residues and the analysis of member proteins.
KOGDatabase
Literature Links: Koonin EV, Fedorova ND, Jackson JD, et al. A Comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biology, 2004, 5 (2): R7.
It can be understood that the cog is a NCBI database. The Chinese interpretation of cog is "homologous protein cluster". Cog is divided into two categories, one is the prokaryotic organism, the other is eukaryotes. The prokaryotic organism is generally called the COG database; Eukaryotes are generally called KOG databases.
Pfam
Literature Links: Finn RD, Bateman A, Clements J, et al pfam:the protein families database. Nucleic Acids, 2013:gkt1223.
Protein family database, based on multiple sequence alignment and mutation spectral hmm construction
Kegg Database
Literature Links: Kanehisa M, Goto S, Kawashima S, Okuno Y, et al. The KEGG resource for deciphering the genome. Nucleic Acids, 2004 (Database issue):D 277-d280.
KEGG (Kyoto gene and genome Encyclopedia) is a database of genomic deciphering. In the case of a complete set of genes in chromosomes, it can predict the role of protein interaction (interaction) networks in various cellular activities. KEGG's pathway database integrates knowledge of current molecular interaction networks (such as channels, consortia), KEGG's Genes/ssdb/ko database provides knowledge about genes and proteins found in genome projects, KEGG compound/glycan/ The reaction database provides knowledge of biochemical complexes and reactions.
Common database for the reference transcriptome