Super Cloud Product manager Liang read "Yun Hui"

Source: Internet
Author: User
Keywords We for big data this what

Liang: Good afternoon, I am the Super Cloud Product manager Leung Yiu-chung, thank you just now my two leaders to do a more wonderful speech, more vivid, give me half an hour to introduce you to the Yun Hui.




Before I introduce this product or solution to you, I would first explain to you our company's corresponding concept of large data and large data corresponding solutions in our company. We have three questions to solve for big data or three questions to ask the customer, there are three. First, what is the customer's actual requirements, what is the big Data Environment he is in, what is the main problem, and what is his scale of expansion over the next three to five years? This is the first question. Second, we will be the implementation of our soft and hard integration of the program from the software level and the hardware level separately to tune, do not waste any resources, do not spend any unnecessary costs. At last, we provide the hardware and software of one-stop-all to compare closely with the user's support and services. Therefore, we mainly from three aspects and customers in different latitude communication.




I hope everyone in the field and in the industry to the concept of large data has a part of the understanding and have a part of the contact, I give you a rough description of what is the big data, we are talking about big data, this big embodiment in what aspect? There are three main areas, the first is as the name implies is a large number of large, to what extent? IDC has given a forecast 2020 data growth will be 44 times times the current, to 35.2ZB, the size of these data is widely distributed, many types, are due to the current application and different processing methods and the accumulation of data accumulated before, such as the current very popular microblogging, video network, And now some industries are deploying the Internet of things, some sensors, some of the data they produce, these are the scope of large data. There is also a type of more than we mentioned before the number is proportional to the video type, there are text types, as well as the signal type, is the sensor processing file save form type. It also includes some navigation on some middle-grade cars, basically are interactive, not only is the output signal, as a driver of GPs has a certain way to inquire, through Remote Assistance, navigation information to carry out an interactive, of course, this stage is relatively preliminary, the future stage of development may answer the question, Specify the route. The signals generated by it may be more complex to handle by the data and applications it stores.




What's the last big one? is the current customer and various industries need large data response speed and processing speed, this latitude on the big. With regard to the processing speed of large data, our current relational database has shown no ability to deal with or store large data, whether from the development of its technology or from the level of existing technology, it has shown a relatively slow growth trend. So, the problem arises for the technology of the large data itself to move forward not that it is to replace the relational database, but the current database technology is a good supplement, is a complementary role, not all of our future production of data is called large data, this is wrong, we can also produce very simple like a text document, The documents produced in the very simple work need to be stored in our relational database, but for some of the more high-end or industry-advanced technologies or a future business-demand storage and computing technology, The processing ability of relational database and its future development have already had a great influence on the processing of large data. So in traditional bi, when the number reaches TB, its performance has produced a significant lag, and now for data processing, its data volume has been increasing, but the traditional database processing speed is gradually slowing, which for customers, the most direct effect? I can't see the results, but my corresponding software has been upgraded to the current version, which is the most intuitive response to end users.




If our customers, or if the current industry is dealing with large data, can be very secure, quite right, and if the entry point is appropriate, then the benefits of large data and the cost of it for the current stage, and for the enterprise in the next 3-5 years, will produce a very impressive number that can be seen in this picture. For example, you have a retail business, aviation, manufacturing, food, steel, etc., the number of data generated in these industries, data types, and their demand for large data is very complex, once they use the right method, for large data processing, their production efficiency you can see, basically is maintained at 20% Above, this for an enterprise, or for an industry, its productivity is directly converted into its profits, it can use less money to produce greater social value or industrial value, the corresponding through the left to the right words, converted to the corresponding profits, this number is also considerable, And that's what we're talking about, potentially huge markets, and now most of our industries, companies, customers, and their understanding of big data is only about how to use more storage to save this data, rather than how to dig, to use, to transform these data into better productivity, to bring greater profit value.




What are some of the typical applications of the numbers that we just talked about in some industries? Of course it's going to be a lot of industry apps, for our super cloud, we are in five aspects, the first aspect is the internet industry, this is our current largest customers, and E-commerce, that is, electricity, the third is business intelligence, that is, bi in this respect, followed by education research, that is, universities, research institutes, The last aspect is the internet of things, this is a very hot topic, now everyone is talking about the internet of things, but what is the standard of the internet of things, what definition is, from what point of view to better approach the concept of the Internet of things, from conceptualization to a field, I think the big data is a point. From a cloud perspective, we want to get into the internet of things from the point of large data to broaden our understanding of the concept of the Internet of things, including infrastructure.




I just introduced the internet industry is one of our biggest industry, in the Internet industry, how the big data is used, or it has the application of the field, one aspect such as a large number of concurrent access to data, like micro-blog, micro-blog suddenly a sensitive event occurred, we will focus on this topic to brush the page, brush Weibo, Take a look at some of the more authoritative bodies. What is the comment on the event, and what is the truth, everyone is curious. So, at some point in time, a lot of the user's access to a page will produce a lot of data, and this data whether upload data or access to data, level, text, including the current animation, and so on, and so on, all belong to the category of large data, how to deal with this problem, or to face the problem, The existing architecture of the Internet industry can afford to have such a large number of users at a certain point in time access, the amount of data it said just now, millions of users at a point in time to brush a page, its data may be GB, but more significant is such as micro-blog access to sensitive events, it may be PB-class, There are some large data typical applications, such as the Web server in some logs, log preservation is a relatively traditional application of large data. Everyone knows the records contained in the log, what it is stating is the fact, but the one thing that each record is strung out according to the point of time, it may reflect a trend, and this trend is a good corroboration of the interpretation of a particular problem, which is the potential value of the data itself. The log of the network device can detect the different states of a network device before and after the outage, and analyze the reason. Also has the relational database log, this certainly is also the log's one.




E-commerce is also a broad field of data application, such as Taobao, Baidu, such as the electricity business, the network operators of their interpretation of large data, their deployment of the framework for large data adaptability, about two to three years ago, Taobao three years before its background of the technical framework has been biased to data-driven, Large data application as the core of the technical framework, a large number of transaction information and data. You know, for example, we buy a thing on the Web, buy a mouse, the same configuration after buying, there will be a kind of comparison, this is through what is the right? Is through the user when the different specifications are checked, the background of large data applications from its stored data from the end of the continuous search, and constantly compare the resulting corresponding a recommendation method, may recommend more than one, may have Microsoft, there will be other vendors and so on, so, this is a large data direct application, That is, data comparisons, and non-transactional data, such as applications and log documents for devices, are for vendors ' data centers and engineers. The third is a lot of user information, what to use? A user on the shop what is his trajectory, to which shop, what his consumption behavior is, his storage value and his data mining value on his future data deployment and business distribution, as well as his entire network store business distribution, etc., have played a very important role.




Business intelligence, now most want to achieve real-time, BI technology is not achieved, it is real-time query, this is the big data technology to the BI bring the greatest shake.




Education research, in my understanding, is the advancement of the technology of large data itself. Therefore, in the educational institutions, research institutes, their understanding of large data is not for the application layer, but for the large data itself algorithm, the large data itself for the hardware matching, its future direction and it to the technical details, such as the large data algorithm in how each parameter is applied, This I think is the Education Research institute their most concerned topic. So, we list it in order that we can have some cooperation with the Institute of Education, and this cooperation is actually a promotion of the large algorithm itself, a technology. Internet of things as I just said, the hyper-cloud is hoping to get into the idea of the internet of things through the entry point of big data, through the provision of our own infrastructure, can be very complete or very comprehensive to the user or to our partners, or to the concept of the Internet of things we hope that the same as the cloud, can be real time landing , we want to do something about the internet of things, this is a point we cut from big data.




Actually is to do two things, one is the partition, one is redundant, what is the partition? Partitioning is how to efficiently and efficiently process data, quickly, which is a word of it. Another is redundancy, what is redundancy? Redundancy is reliable, which means I want my service to be 7x24 and uninterrupted, even if the x365 is not interrupted, now, if this approach is achieved, the reliable way is through redundancy.




Go back to the 1th, large data real-time to fast, since has been stored, has precipitated a large number of different types of data, for these data analysis and mining extraction and feedback, customers need to be able to respond in real time, I input that I see, of course, the customer's hope is always increasing, And for our technology providers or equipment providers, to meet customers, this is our only mission. Now a single server node, it reached the extreme, that is, 80%-95%, this is its ultimate, if the valve is worthy of the words, if he is a linear deployment, the problem of large data is still unresolved, its ability is very limited, that is, for the algorithm itself, The utilization of large data problems is only 40%. So the concept of further expansion, a single fails, we give it together, a group of concepts, that is, for the same problem, put it to dahua small, the small dispersed on different servers, through a mechanism, the final results of the summary, the final feedback to the customer. So, it is the equivalent of our usual juicer, we hope that a fruit juice has bananas, apples, pears, three kinds of fruit put together, how to squeeze? Must be first cut, different fruit cut together squeezed, we extend the image of the comparison, for different types of data, like these three kinds of fruit, we cut it, slit small to juicer to the fastest speed, we specify 15 seconds, with that intensity placed in the juicer, and finally summed up into a result feedback to the customer, achieve real-time. Therefore, the big data for the technical framework only need to do these two points, one is the same big problem dispersed in different computing nodes, scattered, and how to provide a more reliable computing environment.




The hyper-cloud solution for large data is the industry's most prevalent or popular method called Hadoop, first open source, and everyone can use it, it provides a reference to the technical framework, But different companies use different versions of Hadoop to tune different functional modules in different versions of Hadoop. For our super cloud, we have different partners to do a deep excavation of the Hadoop algorithm, and of course, what is the purpose of choosing a different partner? is for different customer base, customer's demand is various, it is difficult to provide a single technical solution or technical products to customers, to meet all his needs, this is unrealistic, especially for open source, which is more challenging. So, what is the plan or strategy we're taking? is based on different customer needs, the use of different partners of the Hadoop technology, coupled with our super high density and low energy consumption of the corresponding server to be integrated into the software machine to solve or provide customers with a series of one-stop large data programs.




I was just talking about Hadoop, and if we were looking at hyper Cloud, we have published a version before, is with our brothers company Cloud trend of a large data integration machine, today is the introduction of our software department with Intel's development of the All-in-one Hadoop, what is the characteristics of this? The feature is that we've tuned Hadoop's algorithm to the point where it can be said to be super cloud hardware. As you know Intel is a hardware manufacturer, they have a very sharp and unique understanding of CPU technology, motherboard technology, and even energy technology, and we are Intel's partners. Therefore, the cooperation between the two sides is even closer.
Back to the Hadoop version, what's the biggest advantage of this version? For two areas, one domain bi field, there is another area is the electric business domain. The BI domain requires real-time, that is, for the customer to deal with the documents or deal with the transaction, can be the fastest speed feedback to the customer, which is through the real-time database module. There are hive data warehouses, these two modules for tuning, and for the implementation of these two modules, Intel and the Super Cloud in the Hadoop open source community has done a lot of work, can be said to be more characteristic. So, in real-time database and data warehouse These two modules, is this version and the previous version of the biggest difference.




Here is a description of our integrated machine in the end how the composition of the structure is what. On the left is a kind, you can see is a cabinet, the size of the cabinet can be based on the actual business needs of customers and his current actual business development situation we analyze, to carry out the deployment. There are three parts in the cabinet, the first part is the network part, there is a part of us called the name node, used to do? It's the entrance to the entire device data, it is the headquarters of the entire system, it put all the data is not concentrated here, but the index of a distributed system, it knows where each data exists, where the backup of each data exists, this is called the name node, everyone image of the command can be called.




The last part is the data node, which is the actual storage data, all the data are stored here. In the actual course of operation, data request from First name node, for the same data our implementation of Hadoop algorithm is a copy of the data three copies, a data into our system, will be three times times the original form of backup, which is more secure, not only redundant.




On the right you can see the soft and hard one of the scheme architecture, the above we specifically implement the Hadoop algorithm for the corresponding customization, tuning, the bottom is customized by the Super Cloud server to provide.




This is a kind, 14U cabinet, the inside is made up of R6000 series of servers, in this cabinet are all dual, for its headquarters we are using two dual-way server, for data node storage timing of data backup we used 8 nodes, each node is a dual road, Disk storage reached 96TB, which used a 1.2TB cache, mainly used to do the calculation of the data storage, reduce its ro bottleneck.




Several key features of our products, I will not introduce, I would like to say that about two more distinctive, the first is unique hardware technology, we are Intel's partners, so we and Intel in the joint development of this product, Intel side of our hardware architecture, In particular, the underlying hardware support makes a unique or relatively special hardware optimization, including instruction-level optimization, which expands the instruction level accordingly. There are multiple cores, multiple threads, from the point of view of the hardware engineer, it is better to understand that the capacity of unit computing is doubled, and to improve the I/O throughput of its data, the technology of DCA is optimized and the general server hardware does not have a separate development. So, our 6000-series servers have the capability to improve OS throughput.




And Intel's SSD high-speed solid-state drive, this hard drive is just said the BI system to improve its real-time storage, real-time processing, as well as its virtual warehouse, a lot of background analysis such a capability. There is also a feature that we use to open the box, for the general needs of customers, we have just this recommendation of the standard configuration, the customer needs to do two actions, one is plugged in the power, plug in the network cable boot, and after the simple deployment of engineers, not the software to further tuning, Instead, the interface between the software and the client's industry software is tuned to the average time of 3 days. That means you'll need three days to get the entire cabinet and the actual cabinet deployment. For a typical BI application, we have a little experience, this time is not three days, three months may have to play a question mark. Large data in the industry's rapid deployment and application is of great advantage.




What can we offer our customers when they buy this product? We as one of the solutions to provide customers, not only the product itself, but also the value of the product itself, what customers can get. There are several points, the 1th is the cluster configuration and platform recommendations, our engineers will be the entire customer needs and customers 3-5 years of demand to do a comparative overview of the analysis and summary, for our program characteristics of a summary, targeted for tuning. Whether from the customer's data capacity analysis, or hardware selection, operating system recommendations, software installation, industry has software tuning, interface design, and so on, will make a series of analysis. There is also the operating environment for the entire integrated machine that the customer deploys, integrated environment support, including for customer data import we have a special method, there are special tools to import, as well as the customer in the deployment process will be found in a variety of, whether the software level, or hardware level errors, we will detect, Check, row wrong. For the client to deploy a good integration of the cluster, will be a monitoring of its various hardware levels, whether it is energy monitoring or usage monitoring, or for a node of the fault monitoring, etc., and so on, we have a platform-level support.




In the last aspect, Hadoop is open source, and this open source version is known to be very fast, and its update cycle is not done in weeks, but on average every three days, for Hadoop, an official version of the release is an average of three days. Therefore, we will upgrade the corresponding products of Hadoop, which is provided by our engineers. For our products after you purchase we will provide the above series of services, we also provide for the Hadoop itself, professional training services. For example, after you bought our whole machine, may not have the appropriate technical staff to understand the Hadoop technology, we can purchase the machine from the time of your technical staff to carry out the corresponding Hadoop training, so that your company's technical staff after the purchase can immediately start, the corresponding product to carry out a guide to installation and maintenance, Wait, wait.




We for this product after-sales service, including on-site support, we will have a corresponding after-sales team, from the software technology and hardware technology will be the corresponding support, remote is of course the telephone, Email, these can be achieved. Finally, installation and debugging, as I said just now, we have only two parts of the concept of one machine, part of the plug-in, plug the network cable, and part of the application software interface tuning, in the process will certainly have a series of problems, especially the second part of the problem, the existing application software problems, This problem may involve further tuning of the code layer and interface layer by software engineers. Therefore, this part of our work is included in the whole package.




Most of this page I have already introduced to you, why the enterprise to deploy Hadoop, why to deploy large data this problem or use the corresponding large data solution. So, from our super cloud, the deployment of large data, as I said before, just want to target two areas, especially this version, one is the bi domain, and one is the electrical business of these two areas. So, we have further communication with the larger data, or further cooperation with Intel, we believe that two purposes. One goal is to better integrate hardware and software technologies, the algorithm of Hadoop itself and its performance can be pushed to a new height, there is a can be from the big data the problem itself can have a new point of view, for the combination of IoT and cloud computing can provide a new understanding, This is the two goals we hope to achieve through our product.




These three aspects of customer benefits, before I mentioned to you, an occupation, eliminating the middle of a lot of steps, do not need to cloth network, networking, do not need to debug the hardware, has been debugging well. There is one side of soft and hard, for customers to save a lot of operating costs, it computing costs, and so on, and our Super Cloud Server is low-power, in power consumption management and cost savings, we have some unique technology. These three aspects for our whole machine products, for customers can still receive three points obvious benefits.




We are also constantly aware of the problem of large data itself, we are thinking that large data is not immutable, and the volume of data changes, changes in data types and processing speed changes, not only large data itself to face the problem, may be more complex with the business environment, The difficulty of large data deployment is more complex, we will be more understanding of large data or corresponding solutions, we are also thinking, we are not the introduction of a product for large data is immutable, we will think about what kind of cooperation, What kind of processing to help customers solve large data or data volume increase, real-time requirements of the more difficult problems, while we also want to recommend the large data and Hadoop algorithm technology itself in the industry to promote a degree, which we from the perspective of the cloud as a hardware provider, I hope to be able to bind the corresponding software solution to enhance our infrastructure provider's role. Therefore, we hope that from different channels, different partners can give us more than cloud views, whether constructive or destructive, constructive of course better, as long as we can push things forward, the whole program is helpful, we are welcome. Well, thank you.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.