BDTC ppt Collection (iii): BAT, IBM, Intel shared large data technology

Source: Internet
Author: User
Keywords Large data bdtc bdtc2014 cloud PPT Collection
Tags access analysis application applications based big data business class

From the 2008 60-man "Hadoop in China" technology salon, to the current thousands of-person scale of the industry technology feast, the seven-year BDTC (large data technology conference) has fully witnessed the transformation of China's large data technology and applications, faithfully depicting the large data field of technology hotspots, Precipitated countless valuable industry experience. At the same time, from December 2014 12 to 14th, the largest China data technology event will continue to lead the current field of technology hotspots, sharing the industry experience.

In order to better understand the industry development trends, understanding the technical challenges of the Enterprise, in the eve of BDTC 2014, we will take you to the past assembly of the precipitation of knowledge mining, sharing the IT giants in the field of large data exploration road.

Large data for the development of enterprises to bring great business opportunities, but also to large data technology put forward a serious challenge, here will be sent to the previous China large Data technology conference PPT Highlights of large data technology.

The following is the great Data technology Conference of the past China PPT highlights of large data technology:

Ali Search Division senior Technical experts Wang Feng: Ali search real-time streaming technology

PPT Download--The seventh session of the BDTC 2013

Wang Feng introduced the Ali search Flow computing technology IStream the birth of the business background, he focused on the five aspects of the IStream calculation model for a detailed explanation, this is the first time to show the IStream Computing Model: Basic concepts, topology structure, message management, progress management, programming interface. The design of IStream follows the idea of "computing and storage layering, decoupling between computations", "message passing with persistent distributed message queues between computing layers", and the upstream and downstream computing decoupling, not blocking progress, making new business more convenient access; Persistent message flow, It also facilitates the sharing of data and the tracing of problems.

IBM Big Data FX project director George Lapis: Extracting effective insights from large numbers

PPT Download--The sixth session of the BDTC 2012

Big Data FX project director George Lapis first introduced the current social intelligence, to 2020 the total amount of information will reach 35ZB, and large numbers are not targeted at fixed personnel, can be targeted at customer service, market, analyst and so on. He points out that IBM's large data optimizes traditional databases to analyze and solve unstructured data problems that traditional databases cannot solve. IBM, through the analysis of information to find the advantages of large data competition, according to IBM survey, from 2010 to 2012, 28% of the companies began large data-related work, 47% of the company began to plan large data-related work, only 24% of the companies do not have large data-related work. In the use of large data, 49% of the company through large data to achieve customer management, 18% of the company through large data to achieve operational optimization, the remaining 33% through large data to achieve the management of risk finance, staff collaboration.

Intel Asia Pacific Research and Development Co., Ltd. Large data department Xiajun: spark--Next Generation large data analysis framework based on memory

PPT Download--The seventh session of the BDTC 2013

Xiajun Introduction, as early as 2011 Intel began to contribute Spark project, the current Intel China has Spark Project 3 submitter, 7 contributors, its contribution patch already 70 +. Before introducing spark in detail, Xiajun a very interesting example, if the large data system as a mobile phone, that MapReduce can only be regarded as a function machine, and then appear drill, Impala, S4, Storm is a variety of enhancements in the mapredcue. He focuses on several aspects that users often care about when using spark, including performance, learning costs, stability, if memory is low, fault tolerance and compatibility. When asked why Intel was so aggressive in developing the Spark project at the final on-site interaction, Xiajun explained that Intel would pick up some of the more promising open source projects and join them, which would allow Intel to maintain a certain voice in future competition.

Baidu Infrastructure Department data platform technology manager Liu Liping: Baidu Large Data Platform introduction

PPT Download--The sixth session of the BDTC 2012

Liu Liping mainly introduces the technology and application of the large data platform of Baidu, focusing on analyzing the data warehouse itself and the multiple analysis engine. Based on hive Data Warehouse, the current important work, first, the data texture, content is greater than the platform, to establish the entire model, such as themes, bare metal, physical storage, etc., to consider what the situation, how to reduce, will not shield these problems; At the back of the data warehouse, we need to do the work on the data content level, improve the data coverage, and store the reference of the whole company, the data model is continuously perfected, to build, the application-oriented scene and data model should be built.

Qihoo 360 Technical Manager Zhao Jianbo: Qihoo 360 enhancement and improvement of super large scale hbase cluster

PPT Download--The seventh session of the BDTC 2013

Zhao Jianbo focus on seven aspects of Qihoo 360 in the last year hbase on the improvements made in detail: Exclusive metaserver, start-up optimization, Scan, compaction, protection mode, client timeout guarantee, index preload. According to Qihoo 360 's hbase experience, he has made 4 helpful suggestions: To create region in advance, to control the number and size of region, and to control the timing and data of compaction: Low peak operation and avoid repetitive IO Monitor region health in real time and keep in meta and on server consistency. In the future they will continue to combine the business to improve the hbase in terms of reducing the number of region, random read optimization (reducing the amount of read data), level two indexing, and service availability.

Cai Yudong, senior manager of NetEase: Practice of large-scale content recommendation system

PPT Download--The sixth session of the BDTC 2012

Cai Yudong introduces the technology selection of large content recommendation of NetEase mainly has two kinds: one is based on the content of the recommendation system (User and object modeling, calculation of goods and user models of the similarity, and the user's model of the highest similarity of the object recommended to the user); the other is the recommendation system based on collaborative filtering (unrelated to the business of the system, The similarity is mined according to the user's access record. According to various selection, the company finally selected the news recommendation (content-based recommendations), Atlas and video recommendations (based on collaborative filtering recommendations). Cai Yudong also details how the technology is implemented, from the portal user access log to dig out the user's interest, build the user's interest model, and use hadoop&hive as a data mining tool.

Second hand technical manager Liu Chengzhong: Runningcloudera Impala on PostgreSQL

PPT Download--The seventh session of the BDTC 2013

Liu Chengzhong explains why the second hand focuses on large data and explores large data. He said that the original choice of Cloudera Impala as the basis for Project Camaro development, mainly considered Cloudera Impala has the following advantages: Better code fan, module clear, easy customization, faster than Hadoop, distributed execution tree and so on. Then, he introduces the Camaro from front end, backend and so on, and gives the data of Camaro performance, index, multi-user query and so on. In the end, he has brought Camaro future function prospects, such as yarn integration, UDF and so on, it is worth looking forward to.

Giant FIR Database CTO Wang Tao: Non-relational database SQL execution engine based on Cloudera Impala

PPT Download--The seventh session of the BDTC 2013

Wang Tao introduced the Impala Practice: SEQUOIADB is a NoSQL database of document classes, Sequoiasql is a SQL execution engine developed based on open source project Cloudera Impala. Sequoiasql is not just "sql-on-hadoop", support JDBC, compatible hive drive, compared to Cloudera Impala, in the following four points for enhancements: Join SEQUOIADB and relational database of read-write interface, Embedding Metastore into SEQUOIADB, adding update/delete/merge statements, querying predicate pressure, and improving performance with database index. The test data show that the sequoiasql can achieve up to 10 times times higher than the hive performance. Finally, he shared future product plans, such as support for aggregate down, sort down, support for nested types, support for array types, and ultimately cost-based performance optimization.

Informatica, senior product management Director of CORE technology Zheng: Data integration for Hadoop escort

PPT Download--The sixth session of the BDTC 2012

Zheng that using large data can do two things, innovate and reduce costs. Innovation is to enable different enterprises to use large data for our life and work, can use large data for fraud detection, risk, portfolio analysis, investment recommendations, real-time data audit, predictive maintenance maintenance, genetic sequencing, interconnection vehicles and other innovations. There are several ways to reduce costs: the raw data can be temporarily stored on Low-cost commercial hardware, the Etl/elt processing transferred to Low-cost commercial hardware, with real-time data integration, smooth implementation of ETL processing, with high-speed data replication, from the source system uninstall processing; Increase productivity by up to twice times, developers can deploy anywhere by one development, eliminate data replicas, and increase data warehousing capabilities through data virtualization to reduce data management costs. Zheng points to a trade-off between the innovation of big data and the cost reduction.

Venus Chen Company vice president, chief Strategic Officer Pan Juting: attacking Big Data

PPT Download--The sixth session of the BDTC 2012

The security and privacy of large data is a perpetual issue, and as data grows, the major risks facing the organization span a complex threat, and the traditional methods of data protection are often not met to comply with more compliance requirements. We have to learn to use security thinking to consider the security of large data, includes three elements (assets, threats and security measures), position (game, confrontation, cooperation, value is attribution, intention and random disturbance), space-time and knowledge (distribution and hierarchy, lifecycle, flow and use case, knowledge dimension cluster) and classical means (Authenticated encryption class, Offense detection class, systematic risk management class. Pan Juting that attacking large data is mainly from three aspects of system plane, service plane and data plane, and has carried on the detailed introduction.

December 2014 12-14th, sponsored by the China Computer Society (CCF), CCF Large data Experts committee, the Chinese Academy of Sciences and CSDN jointly hosted the first China large Data technology conference (DA data Marvell Conference 2014,BDTC 2014) will be held in Beijing new Yunnan Crowne Plaza Grand Hotel. This Congress will focus on "big data Infrastructure", "large data ecosystem," large data core technology "," large data application of Internet technology Practice "," large data application of traditional enterprise technology "and other issues, nearly hundred experts will visit the scene, to share their technical combat. More concessions, speed to register!

China large Data Technology conference PPT Collection series articles:

BDTC ppt Collection (i): BAT, Huawei, NetEase and other large-share data architecture

BDTC ppt Collection (ii): A large data architecture shared by Facebook, LinkedIn, etc.

BDTC ppt Collection (iii): BAT, IBM, Intel shared large data technology

Free Subscription "CSDN cloud Computing (left) and csdn large data (right)" micro-letter public number, real-time grasp of first-hand cloud news, to understand the latest big data progress!

CSDN publishes related cloud computing information, such as virtualization, Docker, OpenStack, Cloudstack, and data centers, sharing Hadoop, Spark, Nosql/newsql, HBase, Impala, memory calculations, stream computing, Machine learning and intelligent algorithms and other related large data views, providing cloud computing and large data technology, platform, practice and industry information services.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.