What is data integration?

Source: Internet
Author: User

Data integration integrates data of different sources, formats, and characteristics logically or physically to provide comprehensive data sharing for enterprises. In the field of enterprise data integration, many mature frameworks are available. At present, integrated systems are usually constructed using federated methods, middleware models and data warehouses. These technologies solve data sharing and provide decision-making support for enterprises in different key and application aspects.

Over the past few decades, the rapid development of science and technology and the advancement of information technology have brought the amount of data accumulated by human society to exceed the sum of data accumulated in 000, the amount of data collected, stored, processed, and disseminated is also growing. Enterprises can share data so that more people can fully use existing data resources and reduce repeated labor and costs such as data collection and data collection. However, in the process of implementing data sharing, because the data provided by different users may come from different channels, the data content, data format, and data quality vary greatly, sometimes you may even encounter difficult problems such as data format cannot be converted or information loss after the data conversion format, which seriously hinders the flow and sharing of data in various departments and software systems. Therefore, how to effectively integrate and manage data has become an inevitable choice for enhancing enterprises' business competitiveness.

With the rapid development of modern enterprises and the gradual development of enterprises from an isolated node to the entity that constantly exchanges information with the network and business transactions, enterprise data exchange also moves from within the enterprise to between enterprises, data uncertainty and frequent changes, as well as the tight coupling relationship between these integrated systems in terms of implementation technology and physical data. As a result, once the application changes or physical data changes, the entire system will have to be modified accordingly. Therefore, we are faced with how to adapt to the complex needs of the development of modern society, effectively expand the application field, separate implementation technology and application needs, fully describe various data source formats, and publish and exchange data. problem.

1. Data integration model classification

Data integration integrates data of different sources, formats, and characteristics logically or physically to provide comprehensive data sharing for enterprises. In the field of enterprise data integration, many mature frameworks are available. At present, integrated systems are usually constructed using federated methods, middleware models and data warehouses. These technologies solve data sharing and provide decision-making support for enterprises in different key and application aspects. Here we will make a basic analysis on these data integration models.

The federated database system (fdbs) is composed of semi-autonomous database systems that share data with each other. The data sources in the Alliance provide access interfaces for each other, meanwhile, the database system of the Alliance can be a centralized database system or a distributed database system, and t he, j He, d he, D. He I swear by Tom's dream, Jack's words, Jack's words? Loose coupling does not provide uniform interfaces, but data sources can be accessed in a unified language. The core of this interface is that all data sources must be semantic.

The middleware mode accesses heterogeneous databases, legacy systems, and web resources through a unified global data model. Middleware is located between heterogeneous data source systems (data layer) and applications (Application Layer), and coordinates data source systems downward, provides unified data mode and universal data access interfaces for applications that access integrated data. The applications of various data sources still complete their tasks. The Middleware system mainly provides a high-level retrieval service for heterogeneous data sources.

A data warehouse is a topic-oriented, integrated, time-related, and unchangeable data set in enterprise management and decision-making. Data is classified as broadly defined, functionally independent, and non-overlapping topics. These methods solve the problem of data sharing and intercommunication between applications to a certain extent, but they also have the following Similarities and Differences: The federated database system is mainly designed for the integration of multiple database systems, data sources may need to be mapped to each data mode. When the integrated system is large, it will bring huge difficulties for actual development.

The middleware mode is a popular data integration method. It provides a unified data logic view on the middle layer to hide the underlying data details, this allows you to regard the integrated data source as a unified whole. The key issue in this model is how to construct this logical view and map different data sources to this intermediate layer.

The data warehouse technology expresses data sharing at another level. It is mainly used to propose a data integration method for an enterprise's application field, that is, the topic-oriented system we mentioned above that provides data mining and decision-making support for enterprises.

2. Data Cache is the key

For the data integration architecture, the key is to have a data high-speed cache containing the target plan, source-target ing, data acquisition, hierarchical extraction, error recovery, and security conversion. In addition, the data cache contains pre-customized data extraction tasks that are automatically deployed in the backend and data warehouse of an enterprise.

As a single integration point of enterprise and e-commerce data, a high-speed cache minimizes the need for direct access to backend systems and complex real-time integration. This high-speed cache removes a large number of unnecessary data requests from the backend system, so that e-commerce companies can add more users and allow the backend system to do their designated work.

Data integration software is associated with Enterprise Application Integration vendors and program integrators, rather than replacing them. Indeed, as data integration software is increasingly used as a tool for B2B integration, it will dramatically transform the way B2B integrators work together and the way enterprises migrate to the Internet.

3. Functions of data integration for enterprise information systems

The emergence of data integration enables enterprises to migrate backend ERP information to the Internet. Data integration products provide "high-speed cache" or data classification between a company's Internet computers and backend systems of SAP, Oracle, and PeopleSoft.

Data integration provides an image of backend information stored on a computer of a business owner. When an Internet customer needs to check the status of an order, the query is transferred to the data integration software. Therefore, it is not always necessary to access the master computer of the enterprise. Data integration software has enough intelligence to know when to synchronize with the master computer so that data is constantly updated. The integration of ERP data for e-commerce applications is achieved through the combination of data classification and direct access to ERP data, which includes the use of a data server and some data caches. Data integration software intelligently blends direct real-time and batch data access methods to extract data from an ERP system.

Data advances from one or more sources to one or more target tables and information types (such as XML ), the steps for data movement include determining the source from which data should be extracted, the conversion of data, and the place where data should be sent. You can use a graphical user interface to specify data ing and conversion.

A user-defined program controls the movement of each piece of data and determines the internal relevance between such movements. For example, if a target table depends on the values of other target tables, some programs are used to specify the order in which a data server should manage the individual data movement in these target tables. Data movement can be designed to run in batches or in real time and created and managed by the Administrator, to control data movement between ERP, e-commerce, customer relationship management, supply chain management, and communication applications. Data movement uses distributed query optimization, multithreading, Data Conversion in memory, and parallel pipeline operations to provide high data throughput and scalability. For example, to manage extraction programs and execute batch data extraction from SAP software, you can use the optimized ABAP code (SAP's proprietary Programming Language) without the need to develop and maintain customized ABAP code.

Data integration is a problem for enterprises to further develop. Data Model Modeling and related application technologies are used to analyze enterprise information integration applications. While developing and applying effective model design ideas, we should focus on the following points:

(1) model Timeliness: including development model and runtime model, while runtime model shows the core concept driven by model.

(2) Model Evolution: It reveals whether the model can change itself based on application changes.

(3) Model Hierarchy: as the complexity of the system increases, the model can be made up of multiple layers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.