What is information integration)

Source: Internet
Author: User

Wikipedia's description on information integration is not flattering (many of which I think are wrong or very one-sided ):

Information Integration
(Ii) (also calledInformation Fusion
, Deduplication
And referential integrity
)
Is the merging of information from disparate sources with differing
Conceptual, contextual and typographical representations. It is used in Data Mining
And lack lidation of data from unstructured or semi-structured resources. Typically,Information Integration
Refers to textual representations of knowledge but is sometimes applied to rich media content.

Among the technologies
Available to integrate information are string metrics
That allow detection of similar text in different data sources by Fuzzy Matching
.

I personally think that information integration has not yet become a mature discipline, so there is no strict definition, method, and system. My personal opinions are as follows:

To integrate, we must first clarify the purpose. In the end, we should form a knodge DGE base to integrate heterogeneous information rather than collect it together. Among them, heterogeneous is the key and the most challenging part.

Second, whether information storage is structured database> XML> ontology. The most mature solution here is database-level integration. A typical solution is data warehouse. However, the integration in the data warehouse requires many people to think that they are involved in the formulation of Integration Rules. The degree of automation is very low, such as the ETL process.

To be integrated, information mapping or matching is the core. This is the latest hot research topics: schema matching and ontology matching.

 

To be continued

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.