Enterprise Heterogeneous Data Source Integration

Source: Internet
Author: User
Background

Today, more and more applications need to access various heterogeneous data sources. For enterprise applications, this is not only the need for internal enterprise development, but also the need for enterprises to adapt to the external environment.

Enterprises have accumulated a large amount of data during their development and are constantly investing in data storage and management. However, due to the implementation of the data management system's phased, technical, and other economic and human factors, even in a single enterprise, the data management systems used are quite different, from simple file databases to complex network databases, they constitute Heterogeneous Data Sources of enterprises. Although these data management systems can meet data storage and management requirements, in many cases, enterprise applications may need to access data in multiple data management systems distributed in different locations on the network. For example, if a company needs to know the production and usage of a set of racks, related Applications must first access the production preparation section, the tooling Institute, and different database systems in the workshop to extract and process relevant data. Obviously, the original data management system does not provide such support, and a powerful system is required to integrate data that exists in the distributed data source.

Moreover, as the living environment changes constantly, enterprises face many challenges while grasping opportunities. With the development of the network, enterprises gradually develop from an isolated node to an entity that constantly exchanges information with the network and conducts business transactions. The integration of enterprise data also changes from internal integration to inter-enterprise integration. Today, enterprises need to publish and exchange internal data more than ever before. This will inevitably lead more and more enterprise applications to access various heterogeneous data sources, and these data sources may be distributed anywhere on the network. To meet this requirement, a system must support data integration from heterogeneous data sources.

Therefore, from the perspective of enterprise development or data integration between enterprises, more and more enterprise applications require a Heterogeneous Data Integration System as a support for accessing heterogeneous data sources, in the current situation, what problems must be solved when an enterprise's heterogeneous data integration system is established?
Problems

Heterogeneous Data Source Integration is a classic problem in the database field. With the rise of XML technology, it has once again become a hot topic in this field. From the perspective of integration, the integration of Enterprise Heterogeneous Data sources is essentially different from the integration of common heterogeneous data sources, and has the same common problems. However, from the perspective of building a support system for enterprise applications, we must consider the special problems encountered during the integration of Enterprise Heterogeneous Data. In summary, the author believes that when building an enterprise's heterogeneous data source integration system, we will mainly face the following problems:

A. Heterogeneous

Heterogeneity is the primary issue for Enterprise Heterogeneous Data integration. It is mainly manifested in two aspects:

Heterogeneous Systems: different application systems, database management systems, and operating systems depend on data sources.
The mode is heterogeneous, and the data source varies in the storage mode. The general storage modes include the relational mode, object mode, object mode, and document nesting mode. The relational mode is the mainstream storage mode. Note that even in the same storage mode, their schema structure may be different. For example, the data types used by Oracle and SQL Server are not exactly the same.

B. Integrity

The purpose of data integration for heterogeneous data sources is to provide unified access support for applications. To meet the requirements of various applications for processing (including publishing) data, the integrated data must ensure a certain degree of integrity, including data integrity and constraints integrity.
Data integrity refers to the complete extraction of data itself, which is generally easier to achieve.
Constraints Integrity: constraints are the associations between exponential data and data. They are the only characteristics that characterize the logic between data. Ensuring the integrity of constraints is a prerequisite for good data publishing and exchange, which facilitates data processing and improves efficiency.

C. Performance

Applications in the Internet age pose challenges to traditional data integration methods and raise higher standards. Generally, the applications currently responsible for integration must meet the following requirements: lightweight and rapid deployment, that is, the system can quickly adapt to data source changes and low-cost features.
Note: The performance here is a requirement on the system itself. to emphasize its importance, the author also lists it as one of the problems.

D. semantic conflicts

There are semantic differences between information resources. These differences in semantics may cause various contradictions, from simple name semantic conflicts (different names represent the same concept ), to complex structure semantic conflicts (different models express the same information) [14]. Semantic conflicts can lead to redundancy of data integration results and interfere with data processing, publishing, and exchange. Therefore, how to minimize semantic conflicts is also a hot topic in data integration.

E. Permission bottleneck

Because database resources may belong to different units, how can we ensure that the permissions of the original database are not infringed on the basis of accessing data from heterogeneous data sources, and isolate and control the access permissions of the original data source, it becomes a problem that must be solved to connect to the heterogeneous data resource library. The author defines this issue as a permission bottleneck.

F. Additional constraints

When two or more data sources are integrated, there may be some relationship between the data in the data source. For example, in the preceding example, obviously, there is a logical connection between the information stored in different resource libraries about the same set of tooling. Therefore, the process of attaching this logical connection to the integration result is called an additional constraint.

G. Integration content limitation

Data integration between multiple data sources does not require integration of all data. Therefore, how to define the scope of integration constitutes a limitation of the integration content.

The preceding lists several major problems that must be faced when building an enterprise's Heterogeneous Data Source Integration System. The heterogeneous, integrity, performance, and semantic conflicts are common issues in heterogeneous data integration, permission bottlenecks, additional constraints, and integration content restrictions are the characteristics of Enterprise Heterogeneous Data Integration. It is worth noting that, although the author classifies problems, in fact, these problems are interrelated and mutually restrictive and should not be simply taken in isolation.

Integration of Heterogeneous Data sources

Currently, there are two methods to integrate heterogeneous databases. The first is to transplant the original data to the new data management system. To integrate different types of data, some non-traditional data types must be converted into new data types. Many relational database vendors provide similar features. The disadvantage of this integration method is that with the upgrade of the data management system, the original data-related application software may be discarded or re-developed to adapt to the new data management system. Therefore, porting to a new system is not a practical solution.

The second method is to use middleware to integrate heterogeneous databases. This method does not need to change the storage and management methods of raw data. Middleware is located in heterogeneous database systems (data layer) and ApplicationsProgram(Application Layer), coordinates the database systems downward, and provides unified data mode and common data access interfaces for applications that access integrated data. The applications of each database still complete their tasks. The Middleware system mainly provides a high-level retrieval service for heterogeneous data sources. Obviously, the middleware system model is an ideal solution to achieve heterogeneous data integration.

Introduction of XML Technology

After middleware is used as the solution for integrating heterogeneous data sources, we must select a global data mode for the middleware system. The Middleware system responsible for integration must provide a global data mode to unify the heterogeneous source data mode. In the past, integrated systems of heterogeneous data sources, such as multi-database systems (such as multi-database systems in CIMS) or federated database systems, usually adopt the relational or object data mode as the global mode. However, they cannot meet the high standards proposed by Intra/Internet applications in the Internet era. Generally, the Global Model of heterogeneous data integration must meet the following requirements: (1) it can describe various data formats, whether structured or semi-structured, whether it supports all query languages or simple text queries. (2) It is easy to publish and exchange data. The integrated data can be easily published in multiple formats and easy to exchange data between applications.

With the development of XML, its related technologies and applications, XML is not only a standard for data exchange between applications, but also an important information exchange standard and Representation Technology for the World Wide Web. As a matter of fact, there are several draft industrial standards (xmldtd) in the industry. The generation of XML has a profound impact on the unification of different information formats. XML provides an information exchange mode for the first time, which is editable, easy to parse, and can be expressed as any type of structure or semi-structured information.

Currently, XML has been supported by multiple parties and is highly adaptive to XML, allowing it to quickly package and integrate and publish resources. Therefore, XML technology is introduced, the combination of XML technology and the global data model can enable Heterogeneous Data Source Integration Middleware systems to better adapt to data integration in an open and development environment (such as an enterprise's Dynamic Alliance environment. Many well-known Heterogeneous Data Source Integration Studies have introduced XML-related technologies, such as IBM's tsimmis project, garlic project, Sims and Momis project.

Conclusion

Integration of heterogeneous data sources in enterprises is not a new topic, but this topic is also evolving under the premise that the enterprise's living environment is constantly changing. How to face the future and use appropriate technologies to achieve enterprise data integration in the Network Age is a permanent discussion. As a basis for enterprise application and enterprise service integration, Enterprise Heterogeneous Data Source Integration will have a profound impact on the Enterprise Informatization Process.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.