Baidu Open Cloud Chief architect Xu String: Architect's understanding of architecture cloud architecture architect Baidu

Source: Internet
Author: User
Tags cloud hosting hosting apache mesos

In the 2016 China Cloud Computing Technology Conference (CCTC 2016, feature report), Baidu Open cloud chief architect Xu String published the "Enterprise IT infrastructure in the cloud How to change" keynote speech, and received an interview with CSDN reporters, in-depth sharing his understanding of architecture and design, Understanding of architects ' work and skills, and Baidu's open cloud architecture to meet the needs of different applications such as big data and artificial intelligence.

In addition to the high throughput, scalability, and stability requirements, the implementation of the architecture in cloud computing environment is also important, Xu said. The architect's job is to persist or compromise between contradictions, such as high-throughput and low-latency contradictions, elegant architectures and tight demands. To ensure that the needs of the business is a basic principle of the design framework, to become a good architect, we must learn to understand the business, and the first-line product manager communication, find the most core of the demand to solve. On the other hand, the architect, in addition to a broad technical perspective to follow up with the latest technology, also must go deep into the bottom to understand the programmer's work and pain, to make the programmer satisfied with the choice.

650) this.width=650; "src=" Http://image.jiagoushuo.com/2016/QRB7nq.png "class=" AlignCenter "style=" border:0px; Vertical-align:middle; "alt=" Qrb7nq.png "/>

Baidu Open Cloud Chief architect Xu String Architecture design: Persistence and compromise under contradiction

Since 2008 to join Baidu, Xu String has done a web search for the underlying distributed storage, later distributed computing, Hadoop-related fields, and the overall bottom-level large-scale cluster management system work, in the bottom of the distributed system for nearly 6 years, to 2014 years Baidu began to decide to put into open cloud products, He began to dabble in the design and development of distributed systems for public clouds.

At the very beginning of the distributed architecture, the two factors are usually considered in the Xu string:

    1. High throughput. Be sure to store and process a large amount of data, with the maximum throughput to solve.

    2. Scalability. Baidu's data growth is very rapid, almost every year doubled growth, requiring the system to the maximum degree of scalability.

Beginning to do cloud products, Xu string more attention is how to make the architecture in a stable premise to ensure flexibility, because of the ever-changing needs of rich functional components to better support the rapid change.

But there is also the same thing, is a trade-offs. The architect's work, basically is to constantly in a variety of contradictions in the right choice, such as the distribution system when the high throughput and low latency is a contradiction, it is difficult to achieve real-time response in high throughput, it is necessary to choose what the business wants more, when the public cloud, to achieve the elegant architecture design, with special urgent needs, is also a contradiction, how to control the rhythm, at some point to make some compromises, some time to adhere to, in order to ensure that the overall structure of the situation can better adapt to business development.

On the decision to persist and compromise, Xu string shared his core principles: first of all to ensure that business needs to be realized.

Architects often complain that "our customer needs are too weird, or our product managers are always forcing us to do something very dirty". However, the experience of Xu string is that many times the customer's needs can be changed, the key is that the architect must find out what the most basic needs are-the customer usually say a lot of things, but only one or two points is his core appeal, the other things are attached, to meet his core demands do not need too many things- The core is that the people who do the underlying architecture can't get out of the business and must go to the front lines to really negotiate with the product manager, the customer, and discover his core needs to keep the architecture elegant while meeting the core needs.

Baidu Practice: The impact of open source, containers, big data and AI

As an open product of internal technology, Baidu Open cloud and Baidu Private cloud architecture is consistent, is a complex system. The bottom of Baidu's private cloud is the IDC system, which provides a major hardware infrastructure with a complete cluster of operating systems to manage all machines and provide overall resource scheduling, with distributed systems including a wide variety of distributed storage, distributed computing, and data processing layers, Including data warehousing, data interface, BI, etc. that can manage big data, and then there is a layer of PAAs, providing middleware services for internal services, which is Baidu's own application. Xu String said that in each business, each product will choose a set of their own applicable technology stack, but the bottom of the framework is included. Of course, Baidu is not the beginning there are so many things, overall, experienced three times a relatively large transformation.

  1. 2008, Baidu in the middle of the IDC only a very thin layer, the above is directly the application of Baidu. In this case, Baidu found its business in rapid growth, data is also growing rapidly, no big data system can not support demand, so on the above invented a distributed storage system (including Distributed file system, distributed tabular system, distributed object storage) and distributed computing systems (including high-throughput off-line computing platform , large-scale machine learning platform, real-time streaming computing platform).

  2. Although there is distributed storage and distributed computing system, the entire company's data processing or appear disorganized, each product line basically has its own idea, this data management and business interaction between the formation of a great barrier, to drive Baidu to do a layer of complete data processing layer, the entire Baidu Unified management, Provides a specification that is easily managed to handle various data.

  3. After two iterations, Baidu found in the use of IDC, because the volume of data in the rapid growth, if there is no system to fully utilize resources for scheduling, waste is very serious, so developed a cluster operating system for resource scheduling.

    650) this.width=650; "src=" Http://image.jiagoushuo.com/2016/VjERRb.png "class=" AlignCenter "style=" border:0px; Vertical-align:middle; "alt=" Vjerrb.png "/>

    Baidu Private Cloud architecture
Container and micro-service architecture

Container technology is currently widely used in the field of private cloud, Xu String said, Baidu began to do containers, and now all applications are placed in the container. 2012, container technology is not so much like today's prosperity, only the most basic kernel technology cgroup, technology is not mature, Baidu did a lot of work, now formed their own set of technology, rather than using the now popular Docker such a mature container solution; container management technology, Because the development is actually relatively early, but also their own research and development, but Baidu will pay attention to Kubernetes, Apache Mesos and other industry's latest direction, hope to find some advanced ideas can learn from, introduced into the Baidu container system (Baidu Open cloud has not yet complete open container technology, Just open cloud bottom based on Baidu's own container technology operation, including the entire open bottom of the cloud host is built on the Baidu container technology, the future in the mature time Baidu will also be the container as a service open out).

And for the current very hot micro-service architecture, Xu String said, micro-service is difficult to define, Baidu does have a lot of low-level distributed services, because the business is too complex, from top to bottom may have to go through five or six layers, this thing can be called micro-service is worth discussing-micro-service ideal situation, is to split each work module to the minimum and to service it separately, but generally split into such a small size, the architecture will have a great challenge, the first is the change in functional requirements, may run through many layers of change, which in the service interface is a major thing, need to complete testing. If the split is too thin, QA will often say that the environment is too cumbersome to deploy, it was just testing, but had to deploy the entire service to do it. So, the granularity of micro-service is exactly controlled at what level, this is a matter of questionable.

Xu String believes that a good architect, in the design of the architecture before the advent of the microservices concept, will actually be aware of some bottlenecks, or some high-expansion things, but to control this granularity, must not create uncontrollable complexity. Baidu's module split principle, is from the simplest to begin the iterative evolution, emphasizing not over-design-the beginning is a small service, it may be a single-machine system, put all things together, when found that some of the code is often upgraded, the formation of bottlenecks, immediately do the refactoring, this part of the split out, To form an independent service--not to be bothered by the concept of service or SOA at the outset--it is important to choose the right level of business development to iteratively evolve.

PAAs and DevOps

The whole PAAs promotion was not very successful in China, and the most important reason is that PAAs was originally only suitable for a single technology, only for the initial stage of the company, but any one of the company's business development, a single technology can not meet the demand, the enterprise will be worried by a PAAs platform, Business cannot develop quickly, so even if you start with PAAs, you want to build the environment yourself. Therefore, the Baidu Open Cloud PAAs will provide PHP, Java, Python, node. JS and other support, provide MySQL, MongoDB, Redis, Port, cache services, and for the growth of the enterprise designed a set of programs, you can step by step upgrade.

Operational automation is also a point that PAAs emphasizes. Xu String said, foreign programmers understand devops more thoroughly, the trend of foreign big companies is to talk about the whole stack, not only to do development, but also to do operation and maintenance testing. The domestic trend of devops has just come up, the traditional mode of research and development, operation and maintenance, testing or very clear, research and development often think that operation and maintenance of the work can be done, research and development do not need to consider. But Baidu in practice will find that a system research and development on-line after the operation to invest a lot of manpower, because the system in the operation of the design is not perfect, such as how to upgrade, how to do some small flow test, these things if done poorly, often will have a huge impact on the stability of the integration. Therefore, Baidu gradually required in the design phase to consider, a system will be involved in the future of how to upgrade, how to do these design operations without affecting the business, how to facilitate deployment, how to extract more information, to provide external interface, easy to observe the internal operation of the situation, Make it easy to find potential problems in the system at a later stage. So on the whole, the trend of devops is absolutely correct, and if it is not fully understood in the design and development phase, the reliability will be a great challenge.

Baidu's automated operation and maintenance, slowly from the daily business operations to automate operations, such as monitoring, deployment can be platform-based, standardized, so that all the platform design integrated automatic management docking. After submitting the code to the code hosting SVN, the following CI integration, on-line release, and small flow control are all fully automated processes. Tools, Baidu is the basic use of internal research and development, including monitoring platform, log collection system, in the cluster operating system to deploy operations, small batch upgrades, process control, and so on, these things will use open source ideas, but not directly open source, because the needs of Baidu to solve the time, Communities often don't have mature open-source products.

Open Source for reference

Baidu in a number of technical fields have a reference to open source technology, Xu String said, Baidu will always pay close attention to the development of open source technology, thinking about what the open source technology in the end of the Baidu business has what role, which should be introduced, which should not be introduced, the latest introduction is to do some work in spark.

    • Initially Hadoop, the entire distributed storage and distributed computing began in 2008 from the development of Hadoop, by 2009, the demand for Baidu beyond the needs of the community, the community is mainly aimed at some of the number of machines within 1000 small and medium-sized enterprises, And Baidu quickly to 3000 sets of bottlenecks, can only optimize the Hadoop kernel, and then to 10,000, Baidu and the community began to do at the same time, has produced a huge disagreement.

    • At the data warehouse level, Baidu draws on the open source hive, Impala to build its own products and services, including column storage, MPP architecture and other important ideas.

    • Container Cluster Management, Baidu does not lag behind Google too much, at the beginning of the time there is no open-source kubenetes. Kubenetes A better point, is that it put some advanced concept standardization, standardization, Baidu will observe kubenetes standardization defines what things, can be used for their own container management scheduling, for follow-up to open their own container services to provide reference.

From big data to AI

Big Data work covers data collection, storage, statistical analysis and application, Baidu mainly focus on efficient data transmission, mass data storage, mass processing and data warehouse construction and management, based on distributed storage systems and distributed computing system. Distributed storage is not entirely big data, will also support some pictures, video, mobile phone software distribution, logs, while some do genetic testing companies will also put some low-cost storage requirements to Baidu open cloud; In terms of computing, Baidu opened bmr--a Hadoop, spark cloud hosting service, Now fully open source of ecological integration into the open cloud Baidu, Baidu operation, management experience and core optimization, to provide better services for enterprises; In addition, Baidu Big data also made a report and multidimensional analysis of the OLAP engine palo.

High-order applications of big data are supporting the development of AI technology, and the impact of AI on the open cloud architecture of Baidu includes two layers of hardware and software.

    1. hardware plane, CPU for general purpose computing, Its performance does not well meet the needs of people as a smart platform, Baidu is trying two options:

    • < em>* GPU acceleration. *   Large-scale machine learning requires a lot of matrix, vector operations, the GPU is good at, using GPU acceleration to become the industry's practice, especially in distributed deep learning training. Baidu also built a large-scale GPU cluster, the number of GPUs in these two years have a leap in growth, support a variety of machine learning tasks. Large-scale GPU cluster also means the demand for high-performance network transformation, because the performance of single-machine, can handle the increase in the amount of data, Baidu has been completely upgraded to million network, but from the development of artificial intelligence, the interaction between nodes and nodes is also more and more frequent, Baidu also in the high-speed network pilot to do some rehearsal work.

    • * FPGA acceleration. * &NBSP;GPU Although the computational performance is good, but the energy consumption is relatively large, will affect the cost of IDC, and in the same energy, the FPGA can provide better performance than the GPU, perhaps the future AI algorithm will have a dedicated FPGA. Baidu is also exploring the scale of FPGA applications, basically all the machines that support online advertising are plugged into an FPGA card. The biggest obstacle is that the FPGA itself also has some characteristics of the hardware, involving cabling, power optimization, the flow of the chip, an FPGA program iterative cycle is much longer than the simple program iteration cycle. If future FPGA development has the same tools as the current software compilers, intelligently tuning power, reducing the need for the ability of the chip engineer to write code, the iteration speed will greatly increase the FPGA platform and allow for fast iterations, including power testing. This takes a long time to achieve, requires the industry to work together, Baidu is currently based on the situation of the business, the natural growth of the way to explore.

Software level, Baidu in the artificial intelligence engineering found some of the more important things is actually a general process, such as the bottom network communication optimization, can be said with the algorithm is not related, but in fact, when the application inevitably encountered these problems, Baidu based on the experience of large-scale machine learning to build a unified platform, Packaged into BML products to provide services, so that the algorithm engineers only need to focus on their main algorithms, how to better design AI strategies, without concern for the underlying large-scale platform design. BML is now basically implementing full-process hosting, containing more than 20 of the most commonly used machine learning algorithms-Baidu has found that there is little need to make changes to the existing classic algorithms, and that a lot of work is done in data processing and feature extraction-users can customize their platform based on standard processes. The architect must be a programmer

Although Baidu is actively using AI technology, Baidu Architects are also trying to provide a better platform for the development of artificial intelligence to promote the progress of the technology, but asked whether AI can simplify some of the architect's work, Xu string think this is still difficult, because the architect is facing the biggest problem is the choice, this is a very complex thing, This would be a trend in the case of the business requirements, where AI can understand the business, but it will take a long time-if artificial intelligence can do so, most of the work now can be replaced by artificial intelligence.

When it comes to architects ' self-cultivation, Xu String says, a better architect should have a wide range of technical perspectives and understand business requirements.

    1. Architects must be programmers, and if you don't understand the pain in program development, you can't understand why programmers are so sensitive to changes in requirements, and when it comes to the code of those architectures, it's hard to make a choice between satisfying the programmer and meeting business needs. So the architect must first have to drill down to the bottom to understand what the programmer is doing, what the development framework is, and how to follow up on the latest technology.

    2. The difference between programmers and architects is that they cannot be separated from the work of the underlying programming and look at the problem from a higher level. When a system is complex, the underlying programmer often sees only the lower part, but more importantly how the business needs to be disassembled, how the overly complex system should be analyzed, so the architect needs to increase its height and really look at what the business needs to do to change the architecture.

There are two ways in which the architect should focus on the new technology:

    1. Focus on some of the top international conferences. Because some of the latest ideas, research direction, will be in some of the top international academic conferences published papers. And when books appear, it means that the technology is ripe, so the book is only for programmers, and when they just enter new fields, find some classic books to quickly understand what's in the field.

    2. Follow up the entire open source community. Open source is now a big trend, basically a lot of new technology will be open source, so the architect to follow this trend, while combining their business experience to make some judgments-although many open-source technology is prosperous, but also a lot of open-source technology is just a flash, so architects understand the nature of technology, In the end is to solve a kind of business problem, can make the right judgment.

For more information, please follow the public number: It_haha

This article is from the "djh01" blog, make sure to keep this source http://djh01.blog.51cto.com/10177066/1825690

Baidu Open Cloud Chief architect Xu String: Architect's understanding of architecture cloud architecture architect Baidu

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.