The reason to differentiate between big data applications and BI (business intelligence) is that there is not a relatively complete understanding of big data applications and BI, data mining, and so on.
BI (businessintelligence), the business intelligence, is a complete solution for effectively consolidating existing data in the enterprise, providing quick and accurate reporting and decision-making basis to help enterprises make informed business operations decisions.
With the development of BI, is the ETL, data integration platform, such as the proposed concept. Etl,extraction transformation Loading, data extraction, conversion and loading, the main functions of data integration platform to extract and transform various business data to meet the requirements of BI, data warehouse data format and content mining.
The basic work of data integration platform is very similar to ETL, and its main function is to realize the data extraction in different formats of different system, and transform it into corresponding format according to the target requirement. Data integration begins with point-to-point, and slowly discovers that this mode is difficult for systems, different ownership of enterprise data flow, and data standard control, and for this reason, the need for a unified enterprise data platform was born to realize the data interaction between enterprise level.
The data integration platform, like the hub in the network, can connect all application systems and realize the exchange of data between the systems. The data integration platform, generated by the needs of BI and data warehousing, has now spanned the initial requirements and has risen to a higher stage.
now Big Data applications are more focused on unstructured data, more about Internet, Twitter, Facebook, blogs and other unstructured data, so understanding big data applications is obviously a bit off the chart. structured data also belongs to Big data, and shows the same characteristics and characteristics, such as large amount of data, growing faster and higher demand for data processing.
Structured data is a part of the data with the highest content or value density in generalized big data, compared with the high gold content and low value density of unstructured data . Before the advent of the Hadoop platform, no one talked about big data. Data application is mainly structured data, more use of IBM, HP and other established manufacturers of small or server equipment.
The traditional approach to dealing with these low-density unstructured data is considered unworthy because its output is limited. After the advent of the Hadoop platform, it provides an open, inexpensive platform based on common commercial hardware, at the core of which is distributed massively parallel processing, which creates conditions for unstructured data processing.
data sources for big data applications should include structured data, such as databases, various structured files, Message Queuing and application system data, followed by unstructured data, and further subdivided into two parts, social media such as Twitter, Facebook, The data generated by blogs, including the habits/characteristics of user clicks, the comments made, the characteristics of comments, and the relationships among netizens, all constitute big data sources. Another part of the data, is also a large amount of data, is the machine equipment and the data generated by the sensor. In the case of the telecommunications industry, CDR, call logging, these data belong to the original sensor data, mainly from routers or base stations. In addition, the mobile phone sensor, a variety of handheld devices, access control systems, cameras, ATMs, etc., its data volume is also very large.
for tools that analyze big data, all of today's analytical tools focus on structured analysis, such as analysis of the direction of social media commentary, based on specific word frequency or semantics, to determine the nature of the comment by counting the ratio of positive/negative comments. If there is an application system that receives structured data, such as an analysis system, receiving these semantics can be easily analyzed.
The key to getting big data into the ground is deep integration with industry applications.
The public security industry video image processing is a specific application area, traditional bi, ETL tools take these data no way, the use of distributed Hadoop processing can bring good benefits, because Hadoop can handle enough data volume. The public security industry has actually collected a large amount of video imagery data, which can be used to track the whereabouts of a suspect and in what areas of the country. These applications can not rely solely on human power, the need for face recognition, image recognition technology, mode processing, data compression and other technologies, the need for mass processing software, grasp the relevant characteristics, to help the police to improve work efficiency.
in the telecommunications industry, the billing system is actually the result of consolidating a variety of data, which is a shrinking data. with big data applications, operators can analyze raw big data, such as analyzing sensor data for anomalies, to judge device anomalies, and so on, all of which are not achievable with traditional bi tools, often with unexpected results, helping operators improve service levels and user satisfaction.
In the Internet industry, through the analysis of mobile Internet track, can analyze the customer base, understand the user's preferences, in addition, access to geographic information, also has a specific value.
From these industry big Data Application analysis, one is video image processing, one is log analysis, the other is processing specific file format analysis processing, there is obviously no commonality between each other features, its common point is to take advantage of cheap large data processing platform.
Understand the difference between BI Business intelligence and big data applications