This is an era of "information flooding", where big data volumes are common and enterprises are increasingly demanding to handle big data. This article describes the solutions for "big data.
First, relational databases and desktop analysis or virtualization packages cannot process big data. On the contrary, a large number of parallel software running on thousands of servers need to do this.
Many organizations turn to open-source tools, such as Apache hadoop, to handle big data. For example, Twitter sends login information to hadoop and directly writes it to HDFS and hadoop file systems.
Hadoop supports data-intensive application deployment on thousands of nodes and several petabytes, said David Hill, chairman of mesabi group.
However, big data cannot be generalized for different types of applications. For example, hadoop is not necessarily suitable for all cases, Hill warned.
Hill emphasizes that big data capturing, storage, and analysis rely on the characteristics of special applications. For example, the storage of EMC isilon or IBM sonas connected by the scale-out network may be better for using unstructured data than slices or videos.
Big Data Processing type
Big Data processing can be classified into three basic types, said Mike Minelli, executive vice president of revolution analytics, information management, business intelligence, and intelligent analysis.
Information management captures and stores information, Bi analyzes data, and looks at past situations. Intelligent analysis predicts data. Minelli said.
Revolution analytics provides open-source R language and revolution R enterprise, and provides advanced analysis of terabytes of data. Revolution analytics is developing the hadoop connector and R language capabilities on Google's map/reduce framework.
Tools for Processing Big Data
Apsara stack provides proprietary software for processing big data analysis capabilities, including asterdata, IBM's proprietary software netezza, datameer, proprietary software built on Apache hadoop, and paraccel.
IBM's netezza is in its Infosphere product. Oracle exadata and EMC's greenplum are also proprietary tools for processing large amounts of data.
EMC introduced free greenplum DatabaseCommunityVersion. This Community version is only software. Greenplum community reports include three collaboration modules: greenplum dB, madlib, and alpine miner.
Open-source tools for processing large data volumes include hadoop, MAP/reduce, and jaspersoft Bi tools.
Jaspersoft's bi tool provides reporting, analysis, and etletl (decompression, conversion, and loading) for a large number of parallel analytic databases, including EMC greenplum and HP vertica. Jaspersoft also provides local reports through open source connections between hadoop and various types of nosql databases, including MongoDB, Riak, couchdb and infinispan.
Open-source tools vs. proprietary tools
Open-source tools can be viewedCodeIn this way, developers can find out what is in the integration process. In almost all cases, open-source analysis is more cost-effective and flexible. Minelli of revolution analytics.
As the data volume continues to grow, the company will be forced to increase its infrastructure deployment. Patent fees will continue to increase, while open-source technologies will save the continued patent fees. Twitter chooses hadoop, the important reason is that the cost of proprietary tools is too high.
In the long run, open-source tools allow enterprises to create new analysis technologies to better process Unstructured languages, such as tablets. Rather than relying on new analysis technologies developed by traditional manufacturers. Open-source tools give enterprises the opportunity to innovate.
Another field is the hybrid use of open source and proprietary tools.
In the short term, open-source analysis will become more and more widely used and grow rapidly. In the long run, the application of hybrid technology will appear in a highly competitive market, and the two will have a huge demand.