As data grows in hundreds of terabytes, we need a unique technology to address this unprecedented challenge.
Big data analysis ushered in a big era
Organizations around the world have realised that the most accurate business decisions come from the facts, not from the imagination. This means that they need to use the decision model and technical support based on data analysis in addition to the historical information of the internal trading system. Internet-clicked data, sensing data, log files, mobile data with rich geospatial information and various kinds of comments involving the network have become various forms of mass information.
The challenge is that traditional database deployments cannot handle terabytes of data or support high-level data analysis. Over the past more than 10 years, large-scale parallel processing (MPP) Platforms and column storage databases have opened a new round of data analysis in the history of the revolution. And as technology has evolved over the years, we are beginning to see a blurring of the boundaries between the known architectures that technology upgrades bring. More importantly, the NoSQL platform, which deals with semi-structured and unstructured information, is beginning to emerge gradually.
Big data analysis ushered in a big era
In this article, we will introduce you to date, including EMC's Greenplum, Hadoop, and MapReduce to provide large data analysis products. In addition, HP prior time acquisition of real-time analysis platform Vertica, IBM Independent based on DB2 intelligent Analysis System and Netezza related products. Of course, there are Microsoft's parallel data Warehouse, the Sybase IQ Data Warehouse analysis tool of SAP's company, and so on. Here, let's take a look at these 12 big products from the industry's Big data analysis:
1. Modular EMC Appliance handles a variety of data types
EMC acquired the Greenplum in 2010, and then, using EMC's own storage hardware and greenplum large-scale parallel processing (MPP) databases that support replication and backup, launched the EMC Greenplum Data Computing Appliance ( DCA). DCA extends database support for Greenplum through partners such as SAS and MAPR.
EMC appliance supporting Large data analysis
This May, EMC launched its own Hadoop software tool, and the company promised that modular DCA, released this fall, would support Greenplum sql/relational databases, and Hadoop deployments would be supported on the same devices. With the help of HADOOP,EMC, it can solve the difficulty of real large data analysis such as network click Data and unstructured data. Modular DCA also enables long-term retention of high-capacity enclosures on the same devices to meet monitoring requirements.
2.Hadoop and MapReduce Refining large data
Hadoop is an open source Distributed data processing system architecture that focuses on the storage and processing of structured, semi-structured, or unstructured, truly large data (typically hundreds of terabytes or even PB-level data) applications. Network clicks and social media analytics applications are greatly driving application requirements. Hadoop provides MapReduce (and some other environments) as an ideal solution for dealing with large datasets.
MapReduce can decompose large data problems into multiple child problems, assign them to hundreds of processing nodes, and then assemble the results into a small dataset, making it easier to analyze the final results.
MapReduce Structure Chart
Hadoop can run on top of low-cost hardware products, and can be an alternative to commercial storage and data analysis by extension. It has become the main solution for many internet giants, such as AOL, EHarmony (American online dating site), ebay, Facebook, Twitter and Netflix big data analysis. There are also more traditional giants such as JPMorgan Chase, which are also considering adopting this solution.
3. HP Vertica e-commerce analysis
The Vertica, which was acquired by HP this February, is a real-time analysis platform for the storage database with efficient data storage and quick query. Faster deployment, operation, and maintenance can be achieved with lower maintenance and operating costs than traditional relational databases. The database also supports large-scale parallel processing (MPP). After the acquisition, HP launched the HP Vertica based on x86 hardware. Through MPP extensibility, Vertica can be used to analyze data from high-end digital marketing, E-commerce customers (such as AOL, Twitter, Groupon) to a PB level.
HP Vertica Real-time analysis platform
In fact, as early as the HP acquisition, Vertica on the introduction of a series of memory, flash fast analysis and other innovative products. It is one of the first products to add a Hadoop link to support customer-managed relational data and is one of the first product platforms based on cloud deployment risk. Currently, Vertica supports HP's cloud services automation solution.
4.IBM provides transport and peacekeeping analysis Data Warehouse
Last year, IBM launched the DB2 Smart Analytic System (on the left), so why did it buy another Netezza platform? Because the former is a highly scalable enterprise data Warehouse platform, can support thousands of users and various types of application operations. For example, a call center usually has a large number of employees who need to quickly dial back the customer's history call records. Smart Analytic System provides a DB2 database of consolidated information, preconfigured Cognos BI software modules that can be run on IBM Power System (RISC or X86 architectures).
Smart Analytic system and Netezza
Netezza is committed to providing solutions for digital marketing companies, telecommunications, and other companies that exploit hundreds of terabytes or even petabytes of data to provide highly scalable analytics applications. IBM's Netezza Twinfin Data Warehouse device, which supports large-scale parallel processing, can be deployed in one day. Netezza supports multiple languages and methods for database analysis, including Java, C, C + +, Python, and MapReduce. At the same time, it also supports matrix manipulation methods such as SAS,IBM SPSS and R programming languages. IBM Netezza recently added a high-capacity long-term archiving device to meet more requirements.
5.Infobright reduce DBA effort and query time
The infobright column storage database is designed to provide a variety of analysis Services for dozens of TB-level data. This is one of the core markets for Oracle and Microsoft SQL Server. Infobright also said that the database based on MySQL also offers another option, designed specifically for analytical applications, low-cost streamlining of workforce work, and delivery of high-performance services.
(Responsible editor: Lu Guang)