Internet giants Google and Facebook have achieved huge value by managing and analyzing big data, prompting CIOs to ask whether emerging technologies can create brilliant results within their own businesses. The idea is partly encouraged by industry analysts predicting that big data will grow at breakneck speed. Wikibon predicts that by 2015 the big data market will jump from $5 billion trillion in 2012 to more than $30 billion trillion, up to $53.4 billion trillion by 2017 (where you can view free reports). IDC is more conservative, and they expect the big data market to reach $16.9 billion trillion by 2015. Many IBM customers are using their data warehouses, the largest collection of managed fabric data, to start a large data journey. This article will outline how the enterprise achieves real business value after incorporating various technologies into the large data management umbrella.
Deploy large data management technology
The techniques for deploying large data management are:
Enterprise Data Warehouse
Data Warehouse Equipment
apache Hadoop Cluster
Log analysis and complex event processing technology based on streaming data
Software for moving and integrating data between management nodes
The IBM Authoring team provides detailed and in-depth guidance to technical beginners in a free IBM electronic book titled "Big Data".
The first step is managing large data. Enterprises create value in the process of using various algorithms to analyze data and gradually understand the causal relationship of data. However, these analysis algorithms are strict in computational level and are most compatible with data management technologies running on large-scale parallel processing hardware architectures.
Realize the modernization of data Warehouse
Many of the first generation data warehouses deployed on the computing architecture have failed to meet the requirements for large data analysis. With the modernization of data warehouses and the replacement of older database management systems running on symmetric multiprocessing hardware with IBM Netezza Data Warehouse equipment, there are now hundreds of enterprises applying large data.
In the mobile communications era, the continuous provision of high-quality network services is the basis for customer satisfaction, if the customer is not satisfied will be turned to competitors. The T first generation Data warehouse is not large enough to aggregate the company's data, so it is impossible to fully understand the entire network event. TB, the Oracle-based data Warehouse has gone beyond its limits, refusing to help the company understand its quality of service. By using the Netezza source to modernize the data Warehouse, T can load up to 17 billion network records per day, and can analyze these data to delve into the quality of service and customer satisfaction. The Data Warehouse currently manages 2 PB data, provides analysis support for 1,300 enterprise users, has achieved great success, and the client scope extends beyond the original network operation user group and expands to the revenue security, billing, marketing and customer service fields. Here you can watch an exclusive video of Christine Twiford, the T Network Technology Solutions department manager, who details the T company's big data trip.
Using devices to enhance data warehousing
Banks and other financial firms must strictly control their computing assets to comply with industry regulations. Given the challenges posed by the distribution and networking of modern computer systems to meet regulatory requirements, and the understanding of computer data, this already daunting task has become more complex. Like other industries, the banking industry has turned its attention to computing technology to improve inventory control effectiveness and reduce costs, but often only piecemeal. For example, a bank formerly relied on more than 40 systems to manage hundreds of terabytes of massive data. This "system-set system" technology is not only difficult to handle but also extremely inefficient. The seemingly simple problem of answering a computer's configuration and its managed data can take weeks to complete.
After rethinking the data management strategy, the bank decided to create a new infrastructure master data Center to collaborate with IBM DB2 and Netezza devices. With DB2 's massive scalability, the bank consolidates hundreds of TB of data previously distributed through more than 40 systems into a single integrated database with a common data model. In addition to data consolidation, DB2 is also used as an operational data store (ODS) to respond to short queries at very high arrival rates. Integration software quickly migrates data from ODS to Netezza for reporting and advanced analysis. The bank's data configuration is now under the control of the overall governance model and its business has been monitored for near-real-time asset inventories.
Use Hadoop with Data Warehouse
Hadoop is a highly reliable and scalable data-processing system. The advantage is that data can be loaded without a pattern, and it is possible to massively process and analyze such unstructured (or multiple-structured 1) data using inexpensive hardware. Hadoop can process data in batches, it does not have optimizer, nor does it support random access and interactive queries. These are the advantages of database systems such as Netezza.
Edmunds.com (an online car sales company) uses Hadoop (as a data reception engine) with its Netezza warehouse, and Netezza Hadoop adapters can also move data between these systems. Hadoop is responsible for analyzing massive unstructured data, including text, voice, and tweets outside the data warehouse, converting it into relational format, and then transferring the structured data to Netezza so that the Edmunds.com Analytics team can integrate social media and consumer feedback into all aspects of the business. Here you can get a slideshow that describes the relevant details.
Extended data warehouses through real-time complex event processing
The Northwest Pacific National Laboratory (PNNL) Smart Grid Demonstration project is the largest regional cooperation project in the United States involving 60,000 customers from 11 power companies across five states. The project manages large data using log analysis and real-time complex event processing (using IBM Infosphere Streams) and Data warehousing equipment (IBM Netezza). Infosphere Streams can analyze millions of messages, including each communication state or event, cascade from the power grid control system, and detect various problems that could lead to power outages. The above data is then sent to Netezza,netezza to manage the event history, and to run a more in-depth analysis to identify, in real time, imperceptible trends and other patterns. These analysis data can improve the reliability of the power network and reduce the operating cost of the dynamic new data management platform. Netezza the analysis data back to Infosphere Streams, refining control system data real-time analysis. Ron Melton, director of the Smart Grid demonstration project, has a detailed description of this, please watch here.
As a demonstration project of social computing, the PNNL Smart Grid Demonstration project is beyond the operational capability of many enterprises, but its data management platform is very enlightening to the chief Information officer who studies the value of real-time complex event processing. Merge Infosphere Streams and Netezza create a data management platform for new types of applications including financial market transaction analysis, fraud detection, Network Service quality analysis, network threat detection, asset monitoring, and marketing activity management. These and other applications will be more widely used as inventories and assets are IP-implemented.
Concluding
We find new types of enterprise data platforms emerging. By leveraging structured data storage for technologies designed to manage and analyze massive dynamic and static data, organizations can arm themselves to take advantage of any and all available data. This new large data management platform is a distributed platform, but not completely unified, but a different platform, each platform for its own task for specialization and optimization, tandem work and share data and analysis results. Opening large data tours in the Data Warehouse is an effective way to represent the largest managed data storage and the central focus of enterprise data integration, security, and governance technologies and experiences.