R language brings revolutionary changes to Hadoop cluster data statistical analysis

Source: Internet
Author: User
Keywords operation statistical analysis of data revolutionary
Tags analysis analytics apache based core team data data mining developed

R as a source of data statistical analysis language is imperceptibly in the enterprise to expand their influence. Unique extensions provide free extensions and allow the R language engine to run on the Hadoop cluster.

R language is mainly used for statistical analysis, drawing language and operating environment. R was originally developed by Ross Ihaka and Robert Gentleman from Oakland University in New Zealand. (also known as R) is now being developed by the R Development core team. R is a GNU project based on the S language, so it can also be implemented as an S language, and code written in S language can be run without modification in R environment. The syntax of R is from scheme.

R's source code is free to download and has compiled executable versions that can be downloaded and run on a variety of platforms, including UNIX (also including FreeBSD and Linux), Windows and OS. R is primarily a command-line operation, and several graphical user interfaces have been developed. (This resource is from Wikipedia)

As we all know, Google pioneered mapreduce,mapreduce as a pioneer in dealing with unstructured data stored in storage. Although Google does not allow MapReduce to be used externally, Google has mapreduce some of the relevant information to share with Nutch to develop open source versions of Hadoop. As a result, Nutch was acquired by Yahoo, so Yahoo also launched the Apache Hadoop project.

MapReduce works by breaking and distributing unstructured data to various nodes of the server. MapReduce the parallelization, fault tolerance, data distribution, load balancing and so on in the library, and the system to all operations of the data down to two steps, through the map and reduce two steps to achieve in large-scale computing node of the character scheduling and distribution.

R language combined with Hadoop

Now statisticians can use the R language, R language to excel in the analysis of unstructured data stored in a Hadoop Distributed file system. R can now run on HBase, a relational database, and a column-oriented distributed data store. The main imitation of Google's bigtable. This is essentially equivalent to using Hadoop to hold a database of structured data. Just like the subproject hbase of the Apache Software Foundation Hadoop project.

Revolution Analytics provides business software expansion and support for open source R language, which enables statisticians and scientists to find meaningful information from a large amount of important information in a short time. David Champagne, chief technology officer at Revolution Analytics, says the R engine can be deployed on every node in the Hadoop cluster. Instead of reducing the algorithm in Java programming, you can set up the R algorithm in a workgroup where R is deployed. It can parse the nodes of the Hadoop mapping function, while the parallel statistical analysis is stored in the HDFS data.

If you do not use MapReduce, then extract the data and return the data to the workgroup that analyzed the data. But don't forget that you also need to digest the data in the cluster while aggregating them. In essence, R is a grid controller that uses Hadoop to manage the operation of a particular algorithm and control the running data.

R language provides more business opportunities for enterprises

This week, Revolution Analytics and Cloudera became new partners. It also announces the integration of Cloudera distribution Apache Hadoop (CDH3) into the R enterprise platform of Revolution Analytics. The new product is called "Revoconnectr for Apache Hadoop".

In fact, Oracle has increased its support for open source R language as early as last year, and Oracle has revealed that they will use the R language for statistics and analysis in the Data mining software interface. At the same time, some mainstream data analysis and database vendors, such as IBM, SAS have also started to support R language.

Seven excellent R language graphical user interface

The relevant R language graphical user interface is also applied, which helps beginners to quickly enter the R language environment. Including: integrated development environment Rstudio, GNOME Environment R language data mining tools rattle, graphics programming interface Red-r, Deducer and so on.

Now, R and Hadoop connectors can already be downloaded at GitHub.

(Responsible editor: Lu Guang)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.