Just a few weeks ago, the launch of Apache Hadoop 2.0 was a huge milestone in the field of Hadoop, as it opened up an unprecedented revolution in the way data is stored. Hadoop retains its typical "big data" base technology, but does it fit into the current database and data Warehouse usage? Is there a common pattern that can actually reduce the inherent complexity of usage?
General patterns used by Hadoop
Hadoop's original idea was that companies such as Yahoo, Google, Facebook and so on could solve a lot of data storage problems at very low cost. It is now being increasingly introduced into the enterprise environment to handle new and different data types. Machine-generated data, sensor data, social data, blog data types grow exponentially, and the data is often (but not always) unstructured. It is precisely this type of data that brings man-machine dialogue from "data analysis" to "Big Data analysis": Because mining this data can be a commercial advantage.
Analysis applications are popular in a variety of ways, and most importantly, you can address the needs of a vertical industry. At first glance, they seem to have nothing to do with each other in the industry or the vertical, but in fact, when viewed at the infrastructure level, there are some very clear patterns: The following 3 modes.
Pattern 1: Data refinery
The data refinery model using Hadoop enables organizations to incorporate these new data sources into their common bi and analytics applications. For example, I might have an application that can view the data that the customer builds on in the ERP and CRM systems. But how do you find out what they're interested in from their Web session (based on our site)? "Data Refinery", the usage pattern is what customers expect.
The key concept here is that Hadoop is used to extract large amounts of data for easier management. The resulting data is then loaded into an existing data system that can be accessed using traditional tools, but don't forget that these operations are built on richer datasets. In some ways, this is the simplest use case because the enterprise can clearly benefit from Hadoop without making major changes to the traditional approach. Whether vertical or not, the refinery concept still applies. In the area of financial services, we see organizations refining transaction data to better understand markets, analyzing and looking for value from complex portfolios. Energy companies use large data to analyze consumption levels in different regions to better forecast production levels. Retail companies (any consumer-oriented organization) often use refineries to gain insight into internet sentiment. Telecommunications companies use refineries to invoke phone records to extract useful details in order to optimize billing methods. Finally, on expensive, mission-critical vertical devices, we often find Hadoop used for predictive analysis and active fault identification. In communication technology, this may be a network base station. The franchise restaurant can be used to monitor the data of the cold storage.
Pattern 2: Exploring data with Apache Hadoop
The second most common use case we call "data exploration". In this case, the organization acquires and stores large amounts of new data on Hadoop and then explores the data directly. So instead of using Hadoop as a staging area for processing and then transferring data to an enterprise Data warehouse (as with a refinery use case), the data is stored on Hadoop and then directly explored.
The data discovery use case is typically when the enterprise begins to explore previously discarded data (such as blogs, social media data, and so on) and builds a new analytics application and uses that data directly. Almost every vertical system can enjoy the advantage of exploring use cases. In the area of financial services, we can use exploratory use cases to perform forensics or identify fraud. The professional sports team will use data science to analyze transactions and annual drafts, as we have seen in the film "Moneyball." In short, data science and exploration can be used to discover new business opportunities or new insights that could not have been achieved before Hadoop.
Pattern 3: Mining applications
The third and final use case is the mining application. In this case, the data stored in Hadoop determines the purpose of the application. For example, by mining all of the stored network session data, we can customize the personality experience for users when they return to the site. By digging through the data stored in Hadoop, we can find a lot of useful value from session history. For example, through the user's history to provide a timely feedback.
This use case is the foundation of many of the world's largest websites, such as Yahoo and Facebook. By customizing the user experience, they can be effectively differentiated from their competitors. This is the second use case for Yahoo Hadoop, just as it realized that Hadoop could help improve the advertising position. This concept transforms large Web sites and is also making traditional businesses better sold, while some small organizations even use these concepts to dynamically price their retail outlets.
As you would expect, the most typical use cases are generally adopted or accepted as organizations become familiar with refining and exploring data on Hadoop. But at the same time, this hints at what Hadoop can do in the future, and over time and development, traditional database applications will gradually be replaced by Hadoop applications.
Of course, any new platform technology involved in the IT enterprise environment has a certain degree of complexity, Hadoop is no exception. Whether you're using Hadoop to improve or explore, or enrich your data, compatibility with existing IT infrastructures will be critical. This is why the current Hadoop ecosystem and the ability to integrate solutions between different vendors have grown significantly. Hadoop has the potential to have a profound impact in the enterprise data field, and by understanding common usage patterns, you can significantly reduce their complexity.
Original link: The three most common ways data junkies are using Hadoop (Compile/Wei revisers/Zhonghao)
(Responsible editor: The good of the Legacy)