Reading Notes -- A Report from the trenches

Source: Internet
Author: User

Building, maintaining, and using knowledge bases: A Report from the trenches

Abstract

A knowledge base (Kb) is a collection that contains concepts, instances, and relationships. This paper describes a knowledge base for industrial use, from the establishment of maintenance to the use of the whole process. In particular, it creates, updates, and organizes a large knowledge base and a large number of applications. I. Application of the Introduction knowledge base and knowledge graph: dblp, Google Scholar, Internet Movie Database, Yago, dbpedia, Wolfram Alpha, and freebase. 2. Typical knowledge bases of preliminaries include a set of concepts, C1, C2, and C3, an instance set II for each CI, a relational set, and RI to express the relationship between concepts. A tree structure classification is built here to express the relationships between concepts. In particular, we need to emphasize a extracted relationship, "yes", which is a relationship, the child node belongs to its parent node. There may be other relationships between non-Parent and Child Nodes. 1. Some Knowledge Base parent nodes contain all the instances belonging to the sub-nodes, but this does not have this requirement, and more specifically domain-specific KBS. global KBS: domain-specific KB: dblp, Google Scholar, dblife, echonestglobal kb in specific fields: freebase, Google's knowledge graph, Yago, dbpedia, and the collection of Wikipedia infoboxes. although global KB is important, domain-specific KB is also important in some specific fields. Ontology-like KBS vs. Source-specific KBS: Ontology-like KB: for example, it can point to a specific domain, but not the whole domain of this domain, but the whole of important areas. The problem is how to obtain all the information of a certain entity. Source-specific KB: contains all data in a region. It is important to organize various information. From the above two points, we can see that the shortcomings of the two cases can be complementary. It is easy to build sskb Based on olkb. Here we will build a global, Ontology-like KB: 3. Building the knowledge baseconverting Wikipedia into a kb :( 1) construct a classification tree based on Wikipedia.
  • It is necessary to create a local image by crawling the Wiki.
  • Build a wiki
There are two main Wiki pages: The article page (representing the instance) and the category page (representing the concept). Therefore, the structure of the image here, the node represents a category or instance, the edge between nodes represents a wiki connection, which can be a parent class to a subclass, or a concept to an instance. Ideally, the document and category should come from a classification, but the actual situation is the opposite. The generated graph is a ring. For example, another problem is that the natural classification of Wiki is not very good. As we need it, there are many impurities, and a relatively useful classification depth is applied, manually define the high-level classification under root, which not only makes the classification clear, but also compresses the meaningful classification to the distance from the node.
  • Construct a classification tree
The first problem above is how to create a classification tree from the directed graph of the Wiki. Use the existing algorithm Edmonds 'algorithm (Tarjan, cropping edges by weight ). For detailed steps, see the thesis. (2) construct a DAG on top of the classification. There may be more than one path from the root node to the child node, so the processing here will be complicated. First, we need to extract a primary classification tree, meanwhile, the sub-path is retained (differentiated by weight ). Specifically, the original wiki graph is DFS, and the cycle can be broken once and again until the end, but different paths are retained. (3) relation extraction by Wikipedia. For example, <name of concept instance 1, name of concept instance 2, some text indicating a relationship between them>. (4) adds metadata. It mainly defines the definition of fuzzy concept nodes and the definition of met metadata (5) add other data. When adding external data to an instance and a link, first add a link, then add an instance (Instance name, category), first match the name, then match the category, and then add an instance according to different situations, add metadata or do not perform operations. 4. Maintaining the knowledge base1) Updating the knowledge base1. re-capture the rule. Because of the previously defined loop-breaking operation, re-capture the rule. 2. update only file2) curating the knowledge base1.evaluating the quality: random path for manual sampling, and random Node 2. curating by writing commands: manual intervention on KB operations * adding/deleting nodes and edges: * Changing edge weights: * Changing the assignment of an instance-of or an is-a relationship: * recommending an ancestor to a node: * assigning P Reference to a subtree in the graph: 3. managing commands: Due to updates, manual editing is performed after capturing. However, when the newly added content conflicts with the edited content, a rollback policy is implemented. If KB is dynamically changed, you must integrate the operation into a command for continuous update and rollback. This is very important. 5. Using the knowledge basequery understanding, Deep Web search, In-context advertising, event monitoring in social media, product search, social gifting, and social mining. personal conclusion: For domain-specific kb, there are many data sources, but the data structure is obvious and there are not many abstract relationships. Most of them are specific relationships. First, add the independent ID of your data from a data source, and add other information to the database fields as meta. Then, you can crawl different data sources to expand instances and fields. In terms of maintenance, data source updates break up and automatically, but the most important thing to note is that the manual management should be encapsulated into tasks and the editing policy should be defined, in this way, conflicts between procedural updates and human updates can be solved, and rollback and Batch Tasks are supported. Domain-specific KB is very important for personalized recommendations and learning algorithms. Global KB is a sea, And what truly nourishes is a boat. domain-specific KB is a boat.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.