Big Data Processing Technology: Data Integration

Source: Internet
Author: User
Keywords data integration data integration techniques data integration meaning
Big data processing technology is a very important job, just like cooking. We need to clean the vegetables before cooking, so that we can rest assured that the washed dishes can be eaten, and also help our health. Big data processing is like cleaning vegetables. When we sort out the data, we can analyze an accurate result. There are many big data processing technologies, the most common of which is data integration, so what is data integration? Below we will introduce this knowledge to you.

When self-learning big data, we will learn a lot of knowledge and knowledge, for example, due to the diversity of data sources, the data set has different influences due to interference, redundancy, and consistency factors. quality. From a demand perspective, some data analysis tools and applications have strict requirements for data quality. Therefore, data preprocessing technology is needed to improve the quality of data in big data systems. And data integration is one of the more important knowledge.

Generally speaking, data integration technology logically and physically centralizes data from different data sources to provide users with a unified view. Data integration is a mature research field in traditional database research, such as data warehouse and data joint methods. The data warehouse is also called ETL and consists of 3 steps: extraction, transformation and loading. The extraction is to connect the source system and select and collect the necessary data for subsequent analysis and processing. Data transformation is to convert the extracted data to a standard format through a series of rules. Loading is to import the extracted and transformed data into the target storage infrastructure. Data Federation creates a virtual database, queries and merges data from separate data sources. The virtual database does not contain the data itself, but stores information or metadata of the real data and its storage location. Of course, these two methods can not meet the high performance requirements of streaming and search applications, so the data of these applications is highly dynamic and requires real-time processing. Generally, data integration technology is best integrated with stream processing engines or search engines.

In fact, everyone needs to pay attention that there is no unified data preprocessing process and a single technology that can be used for diverse data sets. When dealing with specific problems, you must consider the characteristics of the data set, the problems to be solved, and the performance requirements. And other factors to select a suitable data preprocessing scheme. This will save time and increase work efficiency.

In this article, we have introduced a lot of knowledge about data integration. Through the understanding of these knowledge, we gradually understand the importance of data processing to data analysis. I hope this article can help everyone.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.