9 Committer gathered at Hadoop China Technology Summit

Source: Internet
Author: User
Keywords nbsp;

For the open source technology community, the role of committer is very important. Committer can modify a piece of source code for a particular open source software. According to Baidu Encyclopedia explanation, committer mechanism refers to a group of systems and code is very familiar with the technical experts (committer), personally complete the core module and system architecture development, and lead the system Non-core part of the design and development, and the only access to code into the quality assurance mechanism. Its objectives are: expert responsibility, strict control of the combination, to ensure quality, improve the ability of developers.

Committer's identity is a great honor for software engineers keen on Open-source technology. For the relevant enterprises, the number of committer has also reflected the company's technical strength and level to some extent. Take Hadoop For example, Hortonworks has 22-bit Hadoop committer,yahoo! 10-bit, Cloudera 8-bit, see also: http://hadoop.apache.org/who.html#Hadoop+Committers

November 22-23rd in Beijing, the Hadoop China Technology Summit (Http://www.chinahadoop.com) invited the number of Committer technology to share it? The answer is 9 digits. They are active in Hadoop, Hbase, Mesos, Thrift, Azkaban, Hama, Spark and other fields.

Now let's take a look at their style:

Benjamin Hindman

▲benjamin Hindman

Ben, the founder of the Apache Mesos project, opened the project as long as he was a ph. D., and then he introduced Mesos to Twitter, which now runs Mesos on thousands of machines. In addition to continuing to lead the Apache Mesos, Ben is also a technology leader in Twitter, one of the project evaluation members at the corporate architecture level. The topic he will share at the technical summit is: Mesos making it easy to build distributed BAE in Twitter. Mesos's biggest selling point is the management of the operating resources on Hadoop, which makes it possible to provide a unified resource management platform in a cluster environment where multiple computing frameworks coexist.

Todd Lipcon

▲todd Lipcon

Todd Lipcon is a member of the PMC (Project Management Committee) and Committer of the Hadoop, HBase and thrift projects, as well as a star engineer for Cloudera. He will share "New features in Hadoop & Hbase, exciting features in Impala" at the technical Summit. Cloudera Impala, a real-time query open source project based on Hadoop, is said to be 3~90 times faster than the hive SQL query that was originally based on MapReduce.

Ted Yu

▲ted Yu

Ted Yu, who works in Hortonworks, is an Apache HBase Committer and currently has only 33 members in the Apache HBase project team. He will bring the latest progress of HBase. HBase is a distributed, column-oriented open source database, just as BigTable leverages the distributed data storage provided by Google's file system, HBase provides a bigtable-like capability over Hadoop. HBase is a subproject of Apache Hadoop. Unlike a generic relational database, HBase is a database suitable for unstructured data storage. Another difference is that HBase is based on columns rather than on a row based pattern.

Hur (Chenjie Yu)

Hu Chen is one of the major members of the LinkedIn Hadoop Group and one of the main creators of open source job flow scheduling software Azkaban. Before joining LinkedIn, he developed a large-scale data-processing pipeline on Hadoop in the Yahoo data platform Group. Its good Azkaban is the Hadoop batch scheduler (detailed: Http://data.linkedin.com/opensource/azkaban), which is used to build and run a Hadoop job or other offline process. He will share the application of Hadoop in LinkedIn.

Yong Maoyuan

Sogou Senior Engineer Yong Maoyuan is a Hadoop Hama project Committer, he will introduce the Hadoop subproject Hama (http://hama.apache.org/) in the use of Sogou. Apache Hama is a Google Pregel open source implementation, and Hadoop is suitable for distributed large data processing, Hama is mainly used in distributed matrix, graph, network algorithm calculation. Simply put, Hama is the BSP (Bulk Synchronous Parallel) Computing framework that is implemented on the HDFs to make up for Hadoop's lack of computational power.

Dai King

Dai King (Jason Dai) is the Intel Software and Services Division's technical director and Chief engineer responsible for leading the development of Intel's large data technology. He is a spark committer. Spark is a general-purpose parallel computing framework developed by UCBerkeley's AMP labs. Spark is an efficient distributed computing system that is 100 times times more performance than Hadoop. Spark provides a higher level of API than Hadoop, and the same algorithm is implemented in spark with a length of 1/10 or 1/100 of Hadoop. Shark similar to "SQL on Spark", is a data warehouse on the Spark implementation, in the case of compatible hive, the maximum performance can reach 100 times times the hive.

Huang

Huang also comes from Intel Software and Services division, also a spark committer. He will be with Jason Dai at the summit "Mining Web-scale Social graph with Graphx", GRAPHX is a new flexible distributed image processing framework on the spark platform, which can greatly improve the social network, Machine learning and data mining efficiency in the field of precision advertising.

Li Haoyuan (Haoyuan Li)

Li Haoyuan is a ph. D. In UC Berkeley amp Lab and a core developer for Spark. Li Haoyuan focuses on computer systems and large data research. Division from Dr. Scott Shenker and Dr. Ion Stoica two professors. During school, the Tachyon system and the sparkstreaming system were created, one of the main developers of the Apache Spark Committer,shark Committer, Berkeley large data Processing platform (Bdas). In Google and Conviva engaged in large-scale data mining research and development work, the research and development of the PFP algorithm was used by Apache Mahout. Li Haoyuan has a bachelor's degree from Peking University and a master's degree from Cornell (Cornell) University. The theme of Li Haoyuan's speech is "Tachyon-a distributed memory cache that is 100 times times faster than HDFs," Tachyon is a highly fault-tolerant Distributed file system that allows files to be reliably shared in a cluster framework with memory speed, similar to spark and MapReduce.

Sing (Reynold Xin)

Reynold Xin (Sing) is one of the leading people in the Apache Spark Open source community. He participated in the development of spark during his doctoral studies at UC Berkeley Amplab and wrote Shark and Graphx two open source frameworks on spark. In the middle of this year, he and Amplab colleagues together to create the Databricks company. His share of the issues the organizers of the Conference has not been open, is said to be a mysterious issue.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.