Introduction to Knowledge Atlas (vi) Knowledge integration

Source: Internet
Author: User

Welcome to my blog http://pelhans.com/, all articles will be published in the first time there Oh ~

This section mainly introduces the knowledge fusion related technology, first introduces what is the knowledge fusion, then makes a introduction to the knowledge fusion technology flow and makes a brief introduction to the Knowledge fusion common tool. Introduction to Knowledge fusion

Knowledge Fusion, that is, merging two knowledge maps (ontologies), the basic question is how to combine the descriptive information from multiple sources on the same entity or concept. To be sure: equivalence class/Subclass equivalence attribute/Sub-attribute

As shown in the previous illustration, the different colored circles in the graph represent different sources of knowledge maps, where Roma and geoname.org in dbpedia.org are the same entity, via two sameas links. The solid alignment between different knowledge maps is the main work of kg fusion.

In addition to entity alignment, there are conceptual layers of knowledge fusion, cross-language knowledge fusion and other work.

It is worth mentioning that in different literature, knowledge fusion has different names, such as ontology alignment, ontology matching, Record Linkage, entity Resolution, solid alignment and so on, but their essential work is the same.

The main technical challenges of knowledge fusion are two: Data quality challenges such as ambiguous naming, data entry errors, data loss, inconsistent data formats, abbreviations, etc. Data size challenges: Large data volumes (parallel computing), diversity of data types, no longer just name matching, multiple relationships, more links, etc. the basic technological process of knowledge fusion

Knowledge fusion is generally divided into two steps, the basic process of ontology alignment and entity matching is similar to the following:

Data preprocessing

Data preprocessing phase, the quality of raw data will directly affect the results of the final link, different data sets on the same entity are often described differently, the normalization of these data is an important step to improve the accuracy of subsequent links.

Common data preprocessing is: syntax normalization:
Syntax matching: such as the presentation method of a contact phone synthetic properties: Data normalization for the presentation of home addresses:
Remove spaces, "", "", "--symbols input error class topological errors replace the nickname and abbreviation with the official name, etc. record connection

Assuming that the two entity records x and Y, x and Y values on the I attribute are xi,yi x i, y i x_i, y_i, then the connection is recorded in two steps: Attribute similarity: The attribute similarity vector is obtained by synthesizing a single attribute similarity:
[Sim (X1,y1), SIM (x2,y2),..., sim (Xn,yn)] [s i m (x 1, y 1), S i m (x 2, y 2), ..., S i m (x N, y N)] [Sim (X_1, Y_1), Sim (X_2, y_2), \ldots, Sim (X_n, y_n)] entity similarity: The similarity of an entity is obtained based on the attribute similarity vector. calculation of attribute similarity degree

There are many methods for calculating the similarity of attributes, such as editing distance, similarity calculation of sets, and similarity calculation based on vectors. Editing distance: Levenstein, Wagner and Fisher, edit Distance with afine gaps set similarity calculation: Jaccard coefficient, Dice vector-based similarity calculation: cosine similarity, TFIDF similarity ...... Edit Distance calculation attribute similarity Levenshtein Distance

Levenshtein distance, or minimum editing distance, is used to convert one string to another with minimal editing. For example, calculate the editing distance between Lvensshtain and Levenshtein:

Lvensshtain→insert "E" →levensshtain L v e n s s h t a i n→i n s e r T "E" →l E v e n s s h t a i n lvensshtain \rig Htarrow Insert "E" \rightarrow levensshtain
Levenshtain→delete "s" →levenshtain l e v e n S h t a i n→d e L e t e "s "→l e v e n S h t a i n levenshtain \rightarrow delete" s "\rightarrow Levenshtain

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.