Cloud practice and large data outlook

Source: Internet
Author: User
Keywords We can very really this

Zhang Fubo: The following part of the forum is mainly four guests, talk about cloud practice. Beijing First Letter Group is the Beijing government's integration company, mainly responsible for the capital window of the construction, they are also in the domestic, in the government industry earlier in a company, as the first letter Group Technical Support Center General Manager Zhang Ninglai for us to do the report.

Zhang: Good afternoon, we have just introduced, I am from Beijing First Letter Development Co., Ltd., I bring today is the result of our practice in cloud computing technology these years.

Today mainly divided into three parts, we mainly do is the application of the field of e-government, we are mainly made in Beijing E-government cloud platform, first I will introduce the planning of this platform. What we do at the moment, what kind of service we offer.

E-government cloud of the overall framework, can be described in accordance with the structure of 1+n+16, 1 is the municipal e-government cloud, it is actually the city's IT infrastructure, which has a collection of computing resources, storage resources, network resources, through virtualization technology to form a apps service platform, It supports the professional application of various application fields in Beijing. In the n layer, which provides SaaS services for different applications, the above two layers are the cloud's application environment. Through this government cloud construction to form some experience, including some norms and standards, we will directly guide and responsible for the construction of the county cloud, this is the Beijing government cloud framework for consideration.

Back to just said the municipal government cloud, from the internal view should be divided into two parts, that is, government Internet cloud and external network cloud, this just with the network structure is related, this is divided into the Internet and e-government network. Government Interconnection Network Cloud This is mainly the "window of the capital" as the Beijing Municipal government's website, which mainly applies, are all the municipal bureau of the application. The external network is mainly aimed at Beijing government civil servants. These two clouds will have a quarantine zone between them to ensure data exchange and communication between the two networks.

In addition to building the cloud infrastructure, including the entire apps service, we are also working on the cloud Platform with the capital window and the Commission to develop cloud management norms, now four specifications. The first government cloud applicability evaluation specification, this mainly wants to explain to enter the cloud platform's application to make an appraisal. We now know that government applications are not all suitable for the cloud, so we evaluate the standards of the cloud. The corresponding cloud platform to do the overall operation of the service specification and quality of service evaluation has set a set of standards. As a cloud infrastructure platform, we will have the entire cloud services directory available.

So this year, the entire cloud platform construction period, this cloud under the overall framework of the norms, should be five years of construction cycle, the first year what we have done? Mainly in apps this level to do some work, the use of technology is mainly virtualization technology, mainly formed a unified cloud computing resources of the flexible computing environment. Through a unified cloud computing resource management platform to mobilize these resources, including storage resources, at the same time for E-government Cloud platform to establish a matching operating norms and operational team

This is the current e-government cloud in the Internet at this end of the platform structure, basically we use at the bottom is X86 server, with storage equipment and network equipment, through virtualization technology to form a unified pool of resources, through the cloud management platform for flexible scheduling. After encapsulating the IaaS, we did some partitioning, which is just the beginning, and as the capacity expands, the partitioning function is constantly divided. In addition to the platform users also provide cloud services portal, according to his own use of resources, to see. So for the top layer of support, these are multi-tenant forms, is completely isolated.

This is the resource situation of the whole cloud platform construction this year. In fact, this year should be said in two stages, the first half of the time, April the entire platform running, when the whole situation is 52 devices, basically can be hosted in the number of virtual machines around 500, to the end of this year, as the business two constantly expanding, on the storage of continuous consumption, we also have to his expansion, The number of virtual machines can now be hosted around 800 units.

This is the current cloud Platform resource management, but also thank days Cloud company to give us strong support, now our entire Virtualization Computing resource management platform is based on cloud company to do.

After the completion of the platform we combed the entire management directory, we also with the letter committee leaders discussed how to classify, probably divided into four categories of services, storage services, network security services, operation and maintenance, as well as patrol. Because now this stage, the cloud platform more manifests is IaaS service, we are in the overall framework, the PAAs this layer is in unceasingly appends, we use the commonly used government service as the platform execution application.

We have more than two years until the end of Twelve-Five, we will do some improvement on this platform, this is the cloud environment.

First, we hope to introduce a number of Third-party software vendors, the introduction of cloud form to the software packaged into services provided, which is the ecology of E-government Cloud.

The second is the introduction of desktop cloud services, the first can support the call business of the Commission.

Third, cloud storage, in fact, we this year in the Cloud platform operation, the feeling of cloud storage consumption is still very large, so we need to be able to introduce a number of distributed file systems in this platform, or some way to reduce costs, the storage consumption reduced, but also can be reversed to improve the user's use of space.

The operating service platform, mainly embodied in the standard landing and implementation, to the online service processing process, as well as metering and billing functions, and customers to do online real-time communication. Now the Electronic government service processing is the line of some contracts signed, has not yet done online.

Five, mixed cloud mode, some of the client will find, he now has some temporary resources, hope that through our cloud platform with the existing IT architecture, this is our direction.

A brief introduction to our first letter company, we are mainly to undertake the Beijing major E-government project set up, we and the concept of intelligent city, we are doing the software services, basic support platform, are completely covered by the wisdom of the city's field. Basically my introduction today, about Beijing e-Government cloud aspect is this, thank you.

Zhang Fubo: Thank you very much, in fact, you will see very simple, so the first letter in the government clouds do not have too many virtual things, we see a fact that the server is increasing, we need to increase the load in the increase, so down-to-earth step-by-step forward, is a cloud landing of the real truth.

Below we have the Sky Cloud Technology director Zhang Yi to introduce us to skyform.

Zhang Yi: Let me introduce the cloud management platform of Skyform, which is a software product developed by Tian Yun Technology, we can look at it to help our IT management to solve the problem.

Problem one, traditional it in the initial investment will present such a situation, the initial input is very large, with the equipment life, including the initial return is disproportionate, resulting in a large investment, but the benefit is very small, this is the current problem.

Problem two, there will be operating costs, in the operating costs, there will be such a few problems, system deployment, maintenance need more professional people to operate, there are resource sharing problems, many systems between the utilization of resources is not effectively improved, the sharing of resources can not be very beneficial to share, will cause waste of resources. If the resources of a system cannot be fully occupied, the other systems are almost impossible to use, which is a problem. There is also a scale requirement, in fact the smaller the data center, in fact, the unit operating costs are very high. This is our summary of question two.

Problem three, resource utilization. In fact, this is relatively high availability of resources is contradictory, we all know, in order to ensure the high reliability of the business, basically we will do ha, do two machines, what is this situation? We have the same backup system as the production system, put it on the edge, in order to improve high availability, in fact the security of this approach is guaranteed, but basically wasted half of our resources.

Question four, power costs. Now the cost of electricity is basically far greater than the server's equipment value, including the power consumption of the server network equipment, including refrigeration equipment, air-conditioning and other power consumption, as well as an alternating current conversion, the loss of electricity conversion, which will lead to a rise in the cost of electricity, this is the fourth it issue we summed up.

We also need to take a look at the application-related it area, which is based on the system within the mobile operator, and we have also summed up a problem to help it optimize its IT systems. The main situation is the business process to achieve a relatively long, a new system, from social procurement to installation and commissioning, coupled with a number of ancillary, the final line of business is a very long process, this is a problem. There are a variety of small business platform many, basically will operate alone, purchase, design, and then deploy implementation, these small business systems between the idle resources can not be shared. Another problem is that the equipment is very dispersed, there may be 10 to 20 business units within a company, each business unit may have 10 to dozens of different businesses, scattered in many places, different floors and data centers, which require us to configure a lot of professional maintenance staff, even so, maintenance costs are high, What about the maintenance effect? Not very well. There is also a system of ease of structure, and now a variety of easy to construct equipment, X86 servers, including racks, blades, and small machines, all of these easy to create environment for our operators to make a great difficulty, so these are we summed up the current existing IT system inside the problem.

We see what the cloud solves, first, it does not require a huge one-time input, it needs a resource pool such a concept, it can be dynamic elastic expansion, so in the early days of business start-up, there is no need to put all IT equipment once procurement, we can according to the actual business development, The first problem is to add the required resources to the resource pool to reduce the initial cost of investment.

There is another, through automated management, reduce operating costs, in fact, automated management also contains several levels of meaning, first of all, we apply deployment, deployment is not required automation? It used to be manually installed, is it not that we can rely on the machine automatic way to deploy the business up? This is an automated, there is also an automation is the process above, the management process also needs an automated processing, including our event management, including change management, according to our tradition of it and SM, we need to each process, event flow, failure process for a standardized process management, This can also be incorporated into the scope of automated management. The sharing of resources is without affecting the high availability of the business premise, improve resource utilization, the simplest way, we all know, do virtualization, the current market in the mainstream of virtualization manufacturers, we all know, can use virtual machines through thermal migration technology to ensure high availability. In addition, the centralized deployment of hardware reduces PUE value and saves power costs. This is some of the benefits that cloud computing can bring to the current IT environment.

We look at the traditional it application of the island to the evolution of the cloud, each application has a matching hardware and software infrastructure, applications and applications between the independent existence, we describe this situation as an application island, this is the current traditional IT system encountered the first problem, we can extend down, By becoming a homogeneous resource through cluster, we can provide the support of application in resource pool mode. Further down is the heterogeneity, and eventually we will develop it as a service, to give a simple example, our IT department, the operation of the Department, is basically the cost center, it is usually responsible for it planning, operation and maintenance services, is a typical cost center, that is, you put this thing done well, no one will praise you, think it should be , but if it's your problem, now we want to turn our IT department into a profit center and sell something. IT services can be embodied in two areas, one is the public cloud, typically like Amazon, the provider of IaaS, it is to sell it as a service. There is also a private cloud inside the enterprise, although it is not sold out, but ultimately provide IT services, this is what we say the most important goal of cloud computing, which is the purpose of our cloud platform development, is to turn it into a service.

The development path of cloud computing platform, we are all familiar with, from IaaS to PAAs to SaaS, although this is the cloud computing platform development trend, but they do not have the inevitable, or who is above the inevitable link. Many of our customers first built the IaaS platform, moved the application to the resource pool, the application as an Internet business for the industry, and implemented SaaS applications, which are not PAAs.

Our cloud platform is built around such a center, four aspects. A central point is to reduce TCO, four aspects of the first is to solve the problem of resource sharing, the second solution to automated management issues, the third solution to data and information integration, the last to resolve the integration of business logic, which is our cloud platform efforts in four directions.

From the perspective of the development of the cloud platform, we currently have some basic IaaS, including not only virtualization, heterogeneous resource pool management, many tenants, security and so we have basic. PAAs level we will provide application automation deployment, as we mentioned just now, it is entirely by the machine to complete the entire configuration of a business and on-line trial run, this process is reflected in our PAAs platform category. SaaS now Cloud technology does not develop itself, but we can support SaaS applications on our cloud platform. This is our overall development trend.

This is a simple skyform cloud Platform architecture diagram, where the bread contains a resource management, through the resource pool to allocate resources, scheduling up, look at the resource life cycle, to manage, for resources we also need to monitor, need operation dimension, which are included in the resource management level. Part of the operation management, we want to turn it into a service sell, sell out process is the process of operation, this operation process, may include some service templates, service catalogs, orders, billing audit, billing, etc., these functions are basically operational management, but also with a portal, Portals provide different kinds of portals based on different objects of use, with end users, which, through our platform, order IT services in a self service way, from a service portal to an Operation manager with an interface for operational management. The resource Manager uses the resource management portal to use a nested system.

Resource-oriented or service-oriented? We have just said half a day of resources, but services are often overlooked places, in fact, we can see many manufacturers in the market just to mention resources, we think that services should be given more attention. For services, the service itself is a demand for resources, we build the purpose of the cloud platform, in fact, to support the cloud platform running on the services, rather than to manage resources and management resources. How is the resource managed? How do you distribute it? How to become it? into service? We need to define our services, and what kind of resources do we need to differentiate?

Actually the service is in the definition process, in fact, the process of creating service templates, services may be interlinked, can be categorized, we put some of the same, or the same service as a class, as a template, and a service template inside we will be differentiated services to encapsulate the resources.

The characteristics of our heavenly platform, first of all, we have a wealth of computing, network, storage services, we have a flexible resource domain mechanism, the large pool of resources or according to different types of business, it is possible within the enterprise is according to different departments, different application systems, to draw some of the resources of the special, is also in the large pool of resources, there is also a need to logically differentiate between specific dedicated business areas, but all of these businesses are actually sharing resource pools at this level of the resource pool, and the logical domain concept is to define a quota for each application that can be shared within his quota. This is a consideration of our resource domain. There is also an automated deployment, application deployment, user-service capabilities, heterogeneous resource management, but also with the traditional monitoring, reporting, security management, and so on, by these main features constitute the cloud platform of Cloud.

The service catalog is basically a service that is displayed in the IT service, which can be selected like a la carte menu. This is the entire virtual machine service process, from the user into the service portal, through the service catalog to buy the IT products he needs, the entire process, and ultimately from our platform to apply for, use, open this kind of IT services.

We can also use a number of virtual machines to build a complete set of application operating environment, it is also possible, not necessarily a stand-alone.

For our application container such services, the application of container services, we are more popular for the current several sets of applications, set a number of service templates, users can choose the application template as needed, and finally packaged into a service.

The following are some of the management-level functions, including operations management, and resource management, including the management of ordinary users ' services. This is some end-to-end process, as the user role is different, the process is not the same, these processes are basically can be defined, tracked, executed, modified, so the process is very flexible, not a very dead process.

For heterogeneous management, not only the equipment level of heterogeneous, different equipment brands, different hardware devices, for virtualization support, we are also supporting the mainstream of the X86 virtualization architecture, while the traditional minicomputer partitioning technology, can also support.

This is our virtual machine, minicomputer management. This is storage resources, we can achieve unified storage management capabilities, not only array storage, but also distributed storage, from the traditional interface, we support both Sun Storage and NFS storage, all by the unified management platform to manage the allocation.

This is resource scheduling, we have a good scheduling mechanism, vertical priority, horizontal priority, through the open allocation policy interface, add new allocation strategy.

The same can be done with network resource management, by supporting third party network devices, such as Cisco, and Huawei to integrate with them to realize the function of network virtualization in the cloud platform.

This is the capacity of large-scale automatic deployment, can alleviate the workload of people, this is the ITSM operation and maintenance process management software, including our entire operation and management, operations and management, as long as the part of the process involved, we will refer to the relevant provisions, to do the specific process management.

Report statistics, we can support two approaches, one is the current system built some useful reports, will further support the user to customize their own reports, which is our next step to continue the direction of efforts.

In the cloud Platform security considerations, in fact, we also have a variety of solutions, of course, we do not rule out the traditional security measures, including firewall ids,ips, we will remain, the introduction of cloud computing is more data is centralized storage, resources will be shared, there will be multi-tenant concept, Here we will focus on preventing some of the conflicts between shared resources, including the user's own security management, virtual machine isolation, network application isolation, which is also in our Cloud Platform security system to consider.

Finally summed up the days and so several advantages of platform, first we have complete products and solutions. In the second part, we can improve the management ability of the resources through optimizing the infrastructure. The third part, we are for the user's application to design our resource management, so it and the application of the user's needs are combined very closely. The last point we have a lot of successful cases, we will be able to brainstorm, thank you.

Zhang Fubo: Thank you, Zhang, we invite the next speaker, friends of the COO Zhang Yi.

Zhang Yi: Good afternoon, ladies and gentlemen. I feel that people have been listening to the speech more tired, I try to speak a little easier. First of all, I repeat that many guests have mentioned that from an enterprise perspective, it technology has developed a number of more specific trends, which includes from the data perspective, from a computing point of view, the IT department gradually from a cost center into the core service provider, so we think from the enterprise IT department perspective, There are three obvious changes, first the data becomes the core position. Second, this brings the second change, the enterprise's IT architecture design, how to quickly dispose of a data, from this way to deal with the growth of data, the storage of the real value of the data reflected. Third, it gradually from the cost center into a can for the enterprise from data analysis, data mining angle to truly provide business value direction to develop.

Since 07, friends have devoted themselves to the development of the basic software of cloud computing. Everyone from the market to hear the cloud, this word covers two aspects, first of all, we say, perhaps more publicity is the Cloud computing services field, we mentioned just now, Iaas,paas, the end is the service, this more talk is interactive way. That is, how do you get the resources, abilities, or services you want.

The other part, we call it the technical development, we think this is our friends more focused direction. The evolution of the entire cloud computing is fundamentally technical. From our point of view it is obvious that the evolution of distributed technology, from Grid computing, from distributed computing to clustering to grid, to the eventual evolution of cloud computing, is a noticeable upward trend. This is the area that friends focus on as a software company.

Why is it that cloud computing is a change, and where is the necessity? Is it a term? We think there are three very obvious drivers, the first is the user, the increase in the number of users, the popularity of the network brings a very clear phenomenon, so that more and more people, more and more organizations, more and more devices can easily from the entire IT system to get the information or services it needs. We put this down to the growth of the number of users, the growth of this number in the slow process, and finally through the large number of network development to a large extent. And the more users, the more diverse the user needs, which will add to the rear service provider challenges.

In addition, the data volume changes, the data generated data growth is very fast, there is no network situation, each person to generate the number of devices is linear, with the network, is a point, with the growth of the relationship, the amount of data is multiplied.

Because of the increase in user volume and data volume, a direct result is to support users, support data, system support will be very very strong.

It comes down to the idea of our friend's core product, if the number of users is continuously increasing, we have to deal with the amount of data is a continuous increase in the process, background support system to solve this problem, the most logical way to say that, as long as the increase in resources, linear increase in resources can solve this problem, So we think that the core goal of cloud computing or the underlying platform is to expand the capacity by increasing the machine rather than the people. More machines, more resources, the first requirement is to put these machines together to do one thing, how can the machine very well organized, let them collaborate to complete a task.

Two other abilities, since the scale of the underlying system has increased, there are two things that must be done, my bottom capacity or volume increase should not affect the upper level of service, the upper tier of service is only the user of my underlying resources, this time is a transparent, my bottom resource increase will not affect my architecture, the second elasticity, It should be ensured that I did not add a resource, the underlying system processing capacity is gradually enhanced.

How to achieve it? To have three principles, first of all, the principle of synergy, the organization of resources to serve, followed by a dynamic principle, the organization of the process to strain. The third is the scale principle, the organization structure is used to expand. From the distributed system, each additional node, whether it is the network point of view, or communication synergy, there is a certain amount of damage, the system architecture must ensure that I with the increase in capacity, the entire system's ability is a linear growth process.

We have big data for businesses, in the enterprise internal data demand is diverse, different application systems, different application scenarios, summed up we think that this data to separate it, the data in the enterprise divided into three types, first of all, the transaction data, that is, with your core business is very large, such as trading data, such as bank deposit and withdrawal and so on, transaction type data, these data requirements first is that the data is real-time, in addition, from storage and access volume, usually the amount of data is not very large, because you determine in real time my effective transaction data in a certain period of occurrence is limited, another passing business data, is often said, As the transaction takes place or sustains your core business, often involves a lot of data, including Office documents, including historical data used to do operational support, including doing network optimization, do CDI, the data is not really you want to keep it alive, but this amount is very large, especially after a certain period of time, You have to dig deep. So the distributed architecture that we're talking about, or the architecture of cloud computing, is very well suited to address the latter two categories, that is, the flow of business data and the processing of archived data.

We want to communicate with you today several aspects, the first point is that care about the production of a large number of business data, from the data generation point, the key structured data accounted for the vast majority, but also accounted for a very important position, that is, the enterprise's entire business system, or the most valuable data are often generated from trading systems, important, which determines that in the traditional enterprise business system, the key database, is often the first to become a bottleneck in performance and pressure bottlenecks, so at this stage, we think that the enterprise information System cloud, or adopt a new framework to solve the real problem, Perhaps the most important part is how can I use the new technology, new ideas to solve the enterprise business system in the key database directly facing both traffic and performance pressure problems.

There are three typical scenarios along this point, first, cross-domain heterogeneous mass data aggregation problem, this for large enterprises, there are many branches of the group, or State organs, ministries, these branches of information systems, the earliest is their own independent construction, with the requirements of centralization, organizational structure changes, Often asked me how to spread the data in all regions effectively converge, unified management, unified utilization, support my centralized operational requirements. This is a real problem.

Existing solutions can not meet the current needs, data Warehouse is dependent on the key database, capacity ductility is a problem, the other data extraction process, is essentially a bulk processing process, that is, I am a data, I must have an export, cleaning, import process, this process is usually batch, This time on the data freshness is influential, that is, in the branch of things that happen, alluding to data synchronization back, there is a long time lag, the implementation of data synchronization has an impact. The end result is that no matter which warehouse cost is very high.

New solutions, with our products can facilitate the construction of many living key database cluster, we put our products distributed in the middle of this layer, to solve the problem of intermediate capacity, the second part of our solution is from remote to process data replication process, in this process we adopt technology is not a batch import export, Instead, it takes a closer pass, and as the data changes, it happens in real time, capturing the incremental data in real time and transferring it to the central distributed database cluster. There is also a very important point, the central data passenger cluster is a multiple living architecture, in fact, I am storing heterogeneous data, that is, I guarantee that this data from the remote to the central storage without a special need of the cleaning process, remote Data format map to the central is real-time innuendo.

The second challenge that many internet companies, including telecoms companies, often encounters, for my core data storage, is that there are often some businesses that require very high and very large throughput access, and often not just read but modify. This is very stressful for critical databases.

Then the existing solution, the most common is the construction of High-performance Web site is the use of Application server plus data caching server and relational database server combination of the architecture, in which the data cache server to realize the hotspot data memory storage, can greatly improve the application server to the data access speed and throughput.

We presented the solution to what happened to me in memory, and I passed different replication mechanisms to ensure my data persisted. We call this structure a combination of integrated storage and caching. It is also a distributed way to achieve throughput, capacity and reliability requirements.

The third challenge, called the unified data support platform. The best thing I can do now is to manage the unified storage of each business system data, not every business system is an island, which has an impact on data discovery.

Do unified data access or unified data management, in fact, this is a long-term demand, in the past, we call the ear mode, enterprise application integration, although the data in the various business systems, but I can define standard standards, so that business systems can easily exchange this data. In fact, to some extent, the solution to the problem, and the release of this pressure. But it's a very challenging and complex process, and often because new business systems are constantly being added, ear standards are difficult to sustain and eventually evolve back to an island of information.

The solution that we're proposing, we split the mass of data out of the distributed storage System, and then through the Third-party plug-in form, but also to support near-line storage or offline storage, after the construction of the underlying storage, we build a unified data access, this layer can be the underlying data planning, Can be abstracted to provide a relatively transparent access mechanism for the upper application.

The above analysis of three kinds of common in the enterprise data pressure scenarios, summed up these things, we proposed a common cloud-based based on the large data platform architecture, called 123.

The first is the core system, which includes three parts, one is the storage system, the other is the access system, plus the data analysis system.

Two management mechanisms, actually divided into two parts, the first data bus to ensure good liquidity in the distributed system, in addition to the transfer system and Workflow system.

The last three support frames, in fact, this part of the data acquisition is often overlooked is a very important part of how the data can be collected, in such a large data pressure, how to ensure that the data reliable high-speed written in, this collection has two key points, the first is to collect, and also storage. Another operational framework, and finally a management delivery framework.

Today to share with you on these, thank you.

Zhang Fubo: Thank you Zhang, friends in fact in the cloud is doing very early, but also the older software companies, they have a relatively advanced and unique technology, today Zhang always gives us from the big data, point-to-point problem gives a point-to-point solution. The following last guest, we welcome Sky Cloud Science and Technology senior consultant Wang to tell us about the big data their products and ideas.

Wang: Hello, I would like to introduce you to the sky cloud technology side of a product line, large data, not bigdata but Beagledat,beagle is a very smart dog, meaning that we are very intelligent deep knowledge mining and acquisition.

People have received a lot of information about this, we have just heard friends here very wonderful speech, in fact, there are some parts of the existence of similar places, we want to let customers understand that the sky cloud Big data here can provide users with some products to help him achieve these content, we have been saying distributed computing, cloud computing, Virtualization. But we see the internet as an industry that starts thinking about big data, or to figure out some new technology to meet the big data scenario, we see them considering the very complex data types, and the size of the time, the demand is here, but we found many new technologies, new architectures, new products, We don't know how to use it. We have recognized this architecture on distributed systems, but the talent is not very abundant. And then based on the distributed architecture, we need to do application development on the above, this is not a lot of people. And then for statistics, experts in math algorithms aren't necessarily going to help us do every job we need, so we need something that can be very simple for us to use.

So the sky cloud big data This side, we have made a set of product series, can look up from the bottom, the lowest level is the BDP, we provide is the data platform software product, such a product can help the user to store a large amount of data on the above, and realizes is low cost, high efficiency storage. In the project we've landed, with a few X86 architectures, PC servers can achieve a few terabytes of data load per hour, and I can put the data in a low cost and fast way, and after all, everything we do in the future is based on data that must be taken over.

Then it's BDF, and if you're looking for a better object, we can simply think of it as a bit like the ETL process we used to do in the data Warehouse, the function of which is to help users to put a variety of data sources, may be different business units, may be historical legacy files, may also be cross-industry across the field, the data that can be obtained within the scope, may even need some data of a large number of various structures on the Internet, we hope to integrate these, hope to achieve more comprehensive and in-depth data integration, because the data-ready process is considered by many enterprises to be a very basic work, A discount on any part of this section may lead to future mining analysis that is not necessarily accurate.

When we have the data ready to complete, we can provide BDA, say acceleration, in fact, it will be integrated into a lot of complex combinations, so that customers are very convenient to use. When it comes to algorithms, many people are mistaken, the algorithm may be a molding tool, or a kid function, in fact, when it comes to algorithms, not to say that an algorithm, or some of the simple combination of some algorithms can have, the algorithm is to be trained, so in this place to explain a lot of users of the error, said the algorithm is ready-made, In fact, our algorithm needs to help customers in-depth understanding of the business after the combination of some algorithms as the initial model, to carry out continuous training, continuous adjustment, continuous optimization, retraining, and then adjust the time, eventually can form a truly usable efficient, And it can help us realize the true knowledge of information mining.

So we provide three categories of product series, we will be based on this content to simply say we do this product when we focus on what points, which content.

Always say cloud computing, always said that large data, many people feel foggy, unable to fall, so just said that many users want to find a convenient use, convenient deployment of things, so we cloud technology this needs to do several content, first of all, is to do automation, we know the distributed framework, There are countless X86 environment, a lot of data nodes, we want to use him, hope to use large-scale parallel ability to deal with massive data, at our bottom of the deployment of things must be able to help us quickly and easily deploy hundreds of thousands of servers. So we are very focused on automation, ease of operation and friendliness in our development. Let me show you the characteristics of it.

First of all, in the BDP, from the operating system level we began to allow users to automate deployment, as well as a variety of parameters of the unified configuration, different nodes, different role sharing, which are as data nodes, which are as management nodes, what kind of node storage what kind of data, There are a variety of parameters, as to how the configuration, each node may be different. Then there is the management of event-based visualization, there may be problems during installation, all events should be recorded, and the deployment can be traced back or reinstalled.

And then, for example, my operating system is finished, I naturally want to install the BDP software Environment, when loading software environment, countless nodes have to install, different nodes have different roles, so in this place is to provide a better graphical interface, you can allow users to define a certain template configuration, To deploy it quickly to the remaining dozens of sets of hundreds of nodes up.

And then, when I'm ready for this environment, I need to monitor the health of the above, but in fact, many can also achieve distributed deployment, distributed architecture, supporting large data applications, some open source products, which do not have such a function, it only has the core modules and core functions on the OK, Other features are not available to enterprise-class users, so when we do our products we always have the ability to let users use the product in a fool's way.

Also, the unified performance monitoring view, you can all of my needs are unified performance, I use the hard disk, need not add hard disk, will tell users in advance, information mining the most advanced is not to do the forecast. and alarm information.

We also do a lot of development work on business friendliness to meet the needs of the enterprise class. The first is a lot of product architecture, we all said that in the cloud computing environment does not need to consider where computing resources come from, do not need to consider where the data come from, but for the traditional operators, they are not practical, they are very willing to know what my things in the end what kind of? So I did a lot of work in this place, including file browsing, then the distributed database, but also allows users to achieve rapid data import, and then data compression, which is very critical, even in the sky cloud large data side, we have been able to easily implement several compression algorithms, after I compressed data, It's a really good technique to have a better performance boost. After all, we all know that compression is to consume system resources, it is likely to bring system loss, but I can do the faster compression, of course, this is the same as the fixed scene is different. Then the management of the original data dictionary, also will be integrated in the BDP enterprise friendly inside. And then I have this tool that can provide a lot of data storage, processing, query, but a lot of technology with the change, many traditional ways can not fully meet this requirement, we provide a query interface, allowing users to enter his familiar SQL, we will be at the bottom of the data processing, but also encountered a problem, turn, Also consumes system resources, so the sky Cloud did a lot of work, eventually formed a task set, we put a lot of common key SQL writing, the advance of the embedded inside, so do not need to turn every time, as much as possible to shorten the response time.

And business friendliness, we also do a lot of data manipulation work and design, my data may come from a variety of places, a variety of scenarios, or a variety of systems, these data need to quickly integrate it into our BDP platform, let us use, so in this place, we design a lot of interfaces, Including process definition, process monitoring, as if we use ETL tools, design a data Liu De link, each link is can be defined, each link is to support the user two times development. In such a link, users can string it into the entire data manipulation process, including the entire ETL three parts will be covered in this, and ultimately help us to achieve data-ready process, but also provide log monitoring, see inside the task of monitoring to the point and whether there are some error information. Then there is the entire task of monitoring, configuration management, we can form templates. There are monitoring, database conversion and so on.

And then again, that's BDA, when all the data is ready, we're definitely going to use it, we've just said these algorithms, all the algorithms need us to find out, and we're going to tell the user how to do behavioral analysis, or fraud detection, in the experience scenario, and the National Grid to do the statistical analysis of intelligent meter mining scene, are all kinds of algorithms can be used, or even help users crawl data down to help users achieve a more extensive data exploration, because we do data warehouse for ten years, are said to want the data as complete as possible, But previously said the integrity of the enterprise internal data integrity, does not contain external data, we know that the external data volume is larger, the second more explain the problem, some of our customers do public opinion analysis, do security dimensional stability analysis, will use a lot of internet data. This is the module of BDA, which helps the user to get the knowledge he needs eventually.

Today such a short film, tell you how cloud to help users find data, storage data, use data, real change cloud for rain, thank you.

Zhang Fubo: Thank you very much today, and thank you for attending. Thank you.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.