EMC greenplum Large unstructured data analysis capabilities

Source: Internet
Author: User
Keywords Analytic function unstructured large data
Tags allows users analysis analytic function analytics apache community computing data

EMC today announces the addition of a new feature in the Hadoop Data Computing Appliance (DCA) device that allows users to combine unstructured and structured data analysis platforms.

EMC also publishes Greenplum Analytics workbench--a 1000-node test bench for the Apache Hadoop software integration test.

The test bench has provided testing resources for the Hadoop open source community to quickly identify errors, stabilize new versions, and optimize hardware configurations to speed up the innovation of Hadoop. All tests and results will be returned to the Apache Software Foundation and open source community. EMC's testing will be coordinated with the Apache Hadoop project.

EMC introduces modular Data Computing appliance in Greenplum device products, allowing users to combine a large scale parallel processing relational database with enterprise Apache Hadoop into a unified device To achieve structured and unstructured data processing.

Greenplum launched Data Computing appliance last October and released an upgraded version of DCA this May, including a Hadoop device.

Greenplum HD (Hadoop) DCA is built on top of the Intel x86 server, using the structured database built by Greenplum (which EMC acquired last year) and the Apache Open source version of Hadoop. The older version of the device is based on the Sun Fire x64 server.

According to EMC's vice president of data computing, Greenplum co-founder Scott Yara, administrators can read and write files to HDFs (Hadoop file System) from Greenplum in parallel to achieve fast data sharing. Using Greenplum SQL and advanced analytics to read data on HDFs, you can analyze across platforms.

The new modular DCA, through the SAS Cato as Analytics software, adds High-performance computing modules that can serve structured data such as databases and unstructured data.

"The main challenge is that it leverages server memory to perform parallel processing using Business analytics software from SAS Cato," says Yara. We want to provide a framework that resembles Lego building blocks. ”

With as analytics software, structured and unstructured data can exist on multiple x86 hosts to allow users to perform calculations on the memory of each server node in a cluster configuration.

"The strength of this device is that it executes all these complex problems in parallel," says Yara. "The new modular DCA is conducting product testing and is expected to be available by the end of this year," he said.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.