Biginsights: Interpreting IBM Data analysis platform based on Hadoop

Source: Internet
Author: User
Keywords For analysis platform is a kind of information management interpretation

There is no doubt that big data has become a buzzword for 2012 years. Large data processing has reached $70 billion trillion this year and is growing at an annual rate of 15–20%, according to reports from foreign statistical agencies. Almost all major tech companies are interested in large data and have invested heavily in the products and services in this area. These include IBM, Oracel, EMC, HP, Dell, SGI, Hitachi, Yahoo, and so on, and the list continues.

IBM also released a large data-processing and analytics technology in mid-2011: New Apache Hadoop based services Infosphere biginsights analytics software on the SmartCloud platform. In the recent Chinese programmers, database engineers, "IBM DB2 Migration Star Competition" media activities. Lu Weihuan, general manager of IBM China Development Center and General manager of information management software, IBM software group Zhu, shared his views on related topics.

3 years ago Layout Hadoop research and development

IBM's research on Hadoop began more than 2-3 years ago. So far, the research results are related to job scheduling, query language and so on. As a typical application results, IBM infosphere Large data analysis platform, including biginsights and streams, complementary, biglnsights to large-scale static data analysis, it provides a multi-node distributed computing, can add nodes at any time, Improve data processing capabilities. Streams uses the memory calculation method to analyze the real-time data. Infosphere Large data analysis platform also integrates data warehouse, database, data integration, business process management and other components.

Biginsight Overall Frame Chart

Both the Biginsights Foundation and Enterprise Editions contain Apache Hadoop and a large number of open source software technologies, including open source projects:

Apache Hadoop includes Hadoop distributed File System (HDFS), MapReduce framework, and generic utilities, a software framework for data-intensive applications that can be used to develop distributed computing environments Pig is a high-level programming language for Hadoop and a run-time environment JAQL is an advanced query language based on JavaScript Object notation (JSON) and also supports SQL hive as a data warehouse infrastructure, File HBase designed to support bulk querying and analysis of Hadoop management is a column-oriented data storage environment designed to support a large table of sparse fills in Hadoop flume is a tool for data collection and loading into Hadoop Lucene is a kind of text search and indexing technology Avro is a kind of data serialization technology zookeeper is a collaborative service of distributed Application Oozie is workflow/job choreography technology

In addition to open source technology, Biginsights also includes custom technologies developed by IBM: A text analysis engine, a data mining tool for business analytics to achieve integration with enterprise software and the effects of Hadoop enhancements.

IBM China Development Center Information management general manager Zhu

In the view of IBM China Development Center Information management general manager Zhu, Biginsights does not replace OLAP (online analytical 處理) or OLTP (online Transaction 處理) applications. But it can be consolidated to "filter a large amount of raw data and merge the results to save the results in the form of structured data in a DBMS or data warehouse." IBM's Hadoop solution is available and customers can test it.

Hadoop cannot single out large data problems

In addition, Zhu believes that the industry needs a comprehensive solution to the problem of large data analysis and processing. "No single product at present can solve the problem and challenge of large data." Hadoop is the most widely heard in the industry today, but I don't think that a single product, such as Hadoop, can solve the current problem. The traditional data warehouse still plays a very important role in this, at least the huge data generation source. ”

In addition, according to IBM Big Data development senior manager Wang Yuanhong, IBM CDL (China Development Laboratory) 's research and development staff participated in the global research and development of Biginsights project, and actively help domestic customers to authenticate IBM based on Hadoop data analysis platform project.

IBM Software Group Greater China Information management software general manager Lu Weihuan

In the day's activities, IBM software group Greater China Information management software general Manager Lu Weihuan introduced the Chinese programmer, database Engineer "IBM DB2 Migration Star Competition" activities. The contest was formally opened in Beijing on September 20, 2011, and was divided into three stages of preliminaries, semi-finals and finals. The preliminaries took the form of an online answer, select the best 100 contestants into the semi-finals; the contestants who enter the semi-finals are grouped according to region and interest, according to the relevant direction and field of the application of the Organizing Committee, submit the proposal of the team to the organizing committee, and the judges finally select 10 teams to enter the final stage. Take part in the finals of March 14 in Beijing. In addition to winning bonuses, certificates and other rewards, the winning team will also be given the opportunity to visit IBM's American laboratories.

Oracle also announced earlier that its large data system, big appliance, would be able to support Hadoop, and Microsoft hinted that it would support Hadoop on the Azure cloud platform and Windows server. In addition, Amazon's elastic mapreduce cloud service is also based on Hadoop. It can be believed that large data solutions will be greatly concerned by the industry.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.