LinkedIn Open source large data computing engine Cubert and create a new language for this

Source: Internet
Author: User
Keywords Large data Linkedin hadoop cubert cubertscript
Tags apache big data coding computing create custom data data computing

"Editor's note" LinkedIn Tuesday announced open source its large data computing engine Cubert, its name is derived from the Rubik ' cube, in order to make it easier for developers to use Cubert without any form of custom coding, LinkedIn has developed a new programming language for this Cubert Script.

The following is the translation:

In Tuesday, LinkedIn announced open source its large data-computing engine, Cubert, which uses a special algorithm to organize data to run queries more easily without a system load or a waste of CPU resources.

Cubert, whose name comes from the Rubik ' cube, is known to be a Java application that can be easily accepted by an engineer, and contains a "script-like user interface" so that an agent can run queries using such meshjoin and The cube algorithm saves system resources on organization data.

From LinkedIn Blog We know:

existing engines Apache Pig, Hive, and shark provide a logical declarative language and are then translated into a physical plan. This program executes the distributed engine (Map-reduce, Tez, or Spark), where the physical operator is executed for the data partition. Finally, the data partition will be managed by the filesystem abstraction provided by HDFS.


Cubert Architecture

Cubert running on Hadoop, the new framework can abstract all of the storage to data blocks, which in addition to allowing the operator to help better manage the data, makes it easier to run its resource-saving algorithms, for example, combine operators can combine multiple pieces of data together, Pivot operators can create subsets of data blocks.

LinkedIn has also created a new language called Cubertscript, which is designed to make it easier for developers to use Cubert without having to do any form of custom coding.

LinkedIn now uses Cubert as a key component to process data. When the Kafka real-time messaging system gets all the information from a number of LinkedIn applications and sends it to hadoop,cubert and then processes the data to ensure that it does not occupy system resources and helps engineers solve "a wide variety of statistical, analytical, and graphical computing problems." ”

After being processed, the data flowed to LinkedIn's Pinot real-time data analysis system, and then the company analyzed its many data-tracking features, such as who recently viewed the user's data.


LinkedIn Data Pipeline

Now that Cubert has been linked to LinkedIn's infrastructure, the company is no longer worried that the Hadoop script will end up "consuming too much resources on the cluster" or wasting time doing what they should do.

Original link: LinkedIn Open sources Cubert, a big data computation engine this saves CPU resources (Zebian/Wei)

CSDN invites you to participate in China's large data award-winning survey activities, just answer 23 questions will have the opportunity to obtain the highest value of 2700 Yuan Award (a total of 10), speed to participate in it!

The China Large Data Technology Conference (Marvell conference 2014,BDTC 2014) will be held at Crowne Plaza Beijing New Yunnan December 12, 2014 14th. Heritage since 2008, after seven precipitation, "China's large Data technology conference" is currently the most influential, the largest large-scale data field technology event. At this session, you will not only be able to learn about Apache Hadoop submitter uma maheswara Rao G (a member of the project Management Committee), Yi Liu, and members of the Apache Hadoop and Tez Project Management Committee Bikas Saha and other shares of the general large data open source project of the latest achievements and development trends, but also from Tencent, Ali, Cloudera, LinkedIn, NetEase and other institutions of the dozens of dry goods to share. For a limited ticket discount, advance booking is expedited.

Free Subscribe to the "CSDN large data" micro-letter public number, real-time understanding of the latest big data progress!

CSDN large data, focus on large data information, technology and experience sharing and discussion, to provide Hadoop, Spark, Impala, Storm, HBase, MongoDB, SOLR, machine learning, intelligent algorithms and other related large data views, large data technology, large data platform, large data practice , large data industry information and other services.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.