BDTC ppt Collection (ii): A large data architecture shared by Facebook, LinkedIn, etc.

Source: Internet
Author: User
Keywords Large data bdtc bdtc2014 cloud PPT Collection

From the 2008 60-man "Hadoop in China" technology salon, to the current thousands of-person scale of the industry technology feast, the seven-year BDTC (large data technology conference) has fully witnessed the transformation of China's large data technology and applications, faithfully depicting the large data field of technology hotspots, Precipitated countless valuable industry experience. At the same time, from December 2014 12 to 14th, the largest China data technology event will continue to lead the current field of technology hotspots, sharing the industry experience.

In order to better understand the industry development trends, understanding the technical challenges of the Enterprise, in the eve of BDTC 2014, we will take you to the past assembly of the precipitation of knowledge mining, sharing the IT giants in the field of large data exploration road.

Large data for the development of enterprises to bring great business opportunities, but also on the structure of large data poses a serious challenge, here will be sent to the previous China large data technology conference PPT Pristine large data architecture and system (next).

The following is a large data structure and system of the PPT of China Large Data Technology Conference (II.):

Apachetez committer Bikas Saha: Next Generation Hadoop

PPT Download--The seventh session of the BDTC 2013

Bikas Saha introduces that the yarn architecture looks very similar to the Hadoop 1.x, but there is a big difference in logic. The advantages relative to the Hadoop 1.x,yarn are mainly embodied in the following aspects: Added new applications and services, enhanced cluster utilization, greater scale, experimental flexibility, shared services, and various aspects of the detailed deployment of a description; At the same time, he also shared the yarn vision of the planning, Yarn allows you to store all of your data in one place and interact in different ways, providing performance predictions. For example, Windows or other operating systems can allocate and manage different resources within the system, and yarn can do this centralized management.

Hortonworks Technology Leadergunther Hagleitner:apachehive&stinger

PPT Download--The seventh session of the BDTC 2013

Gunther Hagleitner first described the background of the birth of Stinger, hope that through the community to promote the development of the hive, the hive query speed up 100 times times, not only to support the function of interactive query, but also to enhance its scalability; next, Gunther details the INSERT, UPDATE, delete operations, for hive, some of the relevant content is a practical transaction, the customer's table may be updated or deleted every hour. Each update stores a new file and records all changes. When querying, a list of transactions appears, and they are consolidated. Finally, Gunther also talked about Tez. Gunther says Tez replaces MapReduce. With Tez, Tez can commit to different task MapReduce tasks.

Hadoop PMC Sze, Tsz-wo (Nicholas): HDFs Innovation in Hadoop 2.0

PPT Download--The seventh session of the BDTC 2013

Nicholas describes how to solve Namenod's single point problem through ListBox Namenode Federation ListBox Namenode has multiple federation, And every Namenode is independent. For the 2.0 version of HA, which includes support for hot sparing (hot spare namenode will maintain data structures in memory), support for manual or automated fail-backs is supported. In the case of automatic failure redundancy, it is possible to activate the Namenode selection mechanism and to use zookeeper detection failure; periodic namenode health check; Replay cache. He also said that in the absence of a snapshot of the file system, delete files can not be restored, and can not be restored at a certain point in time, but not cyclical recovery.

Facebook data Infrastructure Team software engineer Dong Siying: Facebook development HDFs and HBase new developments

PPT Download--The sixth session of the BDTC 2012

Dong Siying A detailed description of how Facebook's Namenode and datanode achieve data increments, and he graphically depicts the process of "full reporting + increment" between the two "census-Birth report-Death report". And for the industry's "How to achieve Namenode Upgrade" This issue, provides a way to achieve Facebook. In the view of Facebook, HDFs and HBase are a very important infrastructure that can be used on a variety of products, for both use, Facebook from the database to real-time random read and write to real-time continuous read and write has a lot of updates, the update growth process is very long, But Facebook continues to make a variety of improvements to help HDFC become a more versatile and stable data platform.

LinkedIn Hadoop core Team Hu Chen Jay: LinkedIn Big Data applications and Azkaban

PPT Download--The seventh session of the BDTC 2013

Hu Chen first introduces the large data applications of LinkedIn on the Hadoop platform, including its data products and recommendation platform, and then introduces its workflow scheduling platform Azkaban, detailing how they design Azkaban to meet the requirements of large data product and engineer design. Hur says Azkaban's biggest feature is the emphasis on visualization, which is critical to improving the productivity of the company. In addition, he suggested that Azkaban's other feature is support for a wide variety of large data platforms, with very good compatibility, including support for Hadoop 0.20, 1.x and 2.x, compatible with Hadoop multiple configurations such as Hadoop security; support for Pig, The new and old versions of SQL engines such as hive are compatible, and some non-hadoop platforms, such as Teradata, are finally supported.

Ali Data Platform Division massive data technology expert Rolly: Building a cross-room Hadoop cluster

PPT Download--The seventh session of the BDTC 2013

Rolly introduces the status of the ladder of the Alibaba Hadoop cluster, and the background of the deployment across the engine room. Ali set up the Hadoop cluster from 2008, and was launched in 2009. After that, the cluster code has been maintained by the company itself. As the scale increases, you will need to face the problems of Hadoop deployment and scalability across the engine room. He said storage utilization of more than 80% is a very dangerous signal, especially with some machine data very full, and even two thousand or three thousand units reaching 98%, which is very dangerous. Calculation utilization of nearly 100%, to achieve the deployment of a cross room, the difficulty is actually very much, including not support namenode expansion, how to solve the bandwidth, how to distribute the data, and finally how to transfer this room 90% data, the data volume reached more than 50 p, migration will be very slow.

Tencent senior Engineer Zhao: Hive in Tencent distributed data Warehouse practice

PPT Download--The sixth session of the BDTC 2012

Tencent senior Engineer Zhao introduced the company's TDW core structure, Hive,mapreduce,hdfs and PostgreSQL composition. He shared the practical experience of the Core hive module in TDW; Hive is a software that constructs a data warehouse on Hadoop that supports the manipulation of structured data through the HQL language of the class SQL. At the initial time, the function of hive still has certain restriction, the use threshold is too high, the localization is difficult, the performance is not high, and is not stable enough. Based on these deficiencies, TWD has made a lot of customization and optimization for Hive: Feature expansion, ease of use, performance optimization, and stability optimization. These practical work make the function, efficiency, performance and stability of hive significantly improved. Next, Tencent needs to make further efforts to promote hive.

VMware product line manager Dompo: VMware Power Enterprise Application Hadoop three stages

PPT Download--The sixth session of the BDTC 2012

Dompo that the use of Hadoop within the enterprise can be divided into three phases: the pilot POC is the first phase, starting with the line of business, using 1-2 use cases to validate the Hadoop value, typically under 20 nodes; The Hadoop production application is the second phase, it can serve the department, use the use case more, Core Hadoop and other related software, typical scale of dozens of to hundreds of nodes; Large data production applications are the third phase that can serve many departments, often supporting a subset of mission-critical processes and consolidating with other large data bureau services. such as MPP Db,nosql. In these three phases, VMware virtualization can help to make Hadoop simpler, more resilient, and more highly available.

Chi-Star Qualcomm CTO Murrisen: Real-time game data analysis system based on drill

PPT Download--The seventh session of the BDTC 2013

Murrisen introduces that Xingcloud as a data analysis platform needs to excavate some conclusions from the data, including how many people are logged in today, what the revenue is, the model for these problems, that is, a table can describe who did what, and these problems can be translated into SQL language execution. Then the planning or operations staff based on these conclusions to understand the operation of the situation, you can dig into the dau behind the information. This introduces the concept of the user, and a user has attributes, according to the value of the property can effectively solve the problem. In his speech, Murrisen revealed that Xingcloud currently has about 2 billion inserts/updates, 200k+ aggregated data each day, query response time averaged about 10 seconds, and for their drill, it has been added to the distribution, while the storage engine has joined the write interface.

Guo Leitao, cloud computing researcher, China Mobile Research Institute: HBase coprocessor Optimization and experiment

PPT Download--The sixth session of the BDTC 2012

Guo Leitao describes HBase, a relational, column-oriented, open source distributed structured data storage system on top of Hadoop. The hbase data is entirely on the HDFs, and the domain HDFs is very similar, including the three-tier index structure: metatable, roottable, and zookeeper file. Guo Leitao also use an example to explain the hbase coprocessor through the observer and endpoint two ways of the implementation process, in the application development will encounter region distribution disorder, client network bottlenecks and CP instability, and so on, through the region data localization, The local collection of CP can improve its efficiency and optimize the configuration well.

December 2014 12-14th, sponsored by the China Computer Society (CCF), CCF Large data Experts committee, the Chinese Academy of Sciences and CSDN jointly hosted the first China large Data technology conference (DA data Marvell Conference 2014,BDTC 2014) will be held in Beijing new Yunnan Crowne Plaza Grand Hotel. This Congress will focus on "big data Infrastructure", "large data ecosystem," large data core technology "," large data application of Internet technology Practice "," large data application of traditional enterprise technology "and other issues, nearly hundred experts will visit the scene, to share their technical combat. More concessions, speed to register!

China large Data Technology conference PPT Collection series articles

BDTC ppt Collection (i): BAT, Huawei, NetEase and other large-share data architecture

BDTC ppt Collection (ii): A large data architecture shared by Facebook, LinkedIn, etc.

Free Subscription "CSDN cloud Computing (left) and csdn large data (right)" micro-letter public number, real-time grasp of first-hand cloud news, to understand the latest big data progress!

CSDN publishes related cloud computing information, such as virtualization, Docker, OpenStack, Cloudstack, and data centers, sharing Hadoop, Spark, Nosql/newsql, HBase, Impala, memory calculations, stream computing, Machine learning and intelligent algorithms and other related large data views, providing cloud computing and large data technology, platform, practice and industry information services.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.