Wikipedia's description on information integration is not flattering (many of which I think are wrong or very one-sided ):
Information Integration
(Ii) (also calledInformation Fusion
, Deduplication
And referential integrity
)
Is the merging of information from disparate sources with differing
Conceptual, contextual and typographical representations. It is used in Data Mining
And lack lidation of data from unstructured or semi-structured resources. Typically,Information Integration
Refers to textual representations of knowledge but is sometimes applied to rich media content.
Among the technologies
Available to integrate information are string metrics
That allow detection of similar text in different data sources by Fuzzy Matching
.
I personally think that information integration has not yet become a mature discipline, so there is no strict definition, method, and system. My personal opinions are as follows:
To integrate, we must first clarify the purpose. In the end, we should form a knodge DGE base to integrate heterogeneous information rather than collect it together. Among them, heterogeneous is the key and the most challenging part.
Second, whether information storage is structured database> XML> ontology. The most mature solution here is database-level integration. A typical solution is data warehouse. However, the integration in the data warehouse requires many people to think that they are involved in the formulation of Integration Rules. The degree of automation is very low, such as the ETL process.
To be integrated, information mapping or matching is the core. This is the latest hot research topics: schema matching and ontology matching.
To be continued