The cloud computing era has brought a lot of opportunities, but also brought a lot of challenges, some people think that with the popularity of the cloud, operators will eventually disappear. Of course, this argument can not help but some extreme, but the cloud era does bring a lot of differences in operation, but also let operators start to think a lot of problems. In the recently held China Transport and peacekeeping security Conference, we are delighted to see a lot of students willing to meet the challenge, but also a lot of Daniel to share their experience and experiences.
China's first generation of hackers, incumbent Ucloud CEO of the Ti Xinhua for you to analyze the cloud computing era for the operation and security challenges and opportunities. First of all, operators must have some basic quality requirements, including the understanding of Feng Shui, in the room when the site is located in the earthquake belt, what wind blows, how the local electricity price is to be considered; understand the network, in the domestic special network environment, to understand the north-south differences; To have physical strength, if necessary, to the engine room to move the server Also understand the operating system, understand the network attack defense and so on ...
But most operators in the company's position is not high, and in the industry's relatively low wages, the reason is still because of the operation of the low threshold, we are not high awareness of operational dimensions. Therefore, Ti Xinhua that, in addition to the above basic knowledge, operational personnel also because of the following three aspects of the quality:
Understand the business, for example, to understand the product of the user is a first-tier city or second-tier city, is the PC or mobile end, in the business has enough knowledge of the situation, can let your work become a leader concern.
Operation, the accident management in operation into process management, and can continue to improve, continuous optimization; operations to be able to do four "first", that is, the first time to find problems, the first time to locate the problem, the first time to solve the problem and the first time feedback problems.
Systematization, to be able to through a variety of systems to assist operation and maintenance work, even to develop their own operation and maintenance system.
Now there are a few bottlenecks in front of everyone, first, the growth of space is limited, in the company's position is not high, the industry's visibility is not high; the second is that cloud computing may lose a lot of the name of the operators, many small start-ups do not even need to carry the dimension, the third is the difficulty of personnel transformation.
Of course, there are many opportunities, for example, the Internet is rapidly changing traditional industries, and the O2O wave that preceded it is a good example of how the operators can help those traditional industries grow fast; the advent of large data opens up a window for everyone, and cloud computing, when you can make an industry do fine, It can be dug into an industry, such as cloud, dnspod, surveillance and security treasures are the best examples.
Ti Xinhua suggested that when you use those free maintenance services, if you can, they pay more to them, let the company know that the operation of the dimension is also valuable. When the development of the students asked how to help Yun-dimensional students, several guests have talked about if you can do devops that is the best, do not appear in this situation:
Lack of products, development and repair, development and maintenance of insufficient operation and maintenance service
Since the cloud is an important theme of this Congress, the nature of the cloud storage content. Han To, from seven cows, introduced some of the practices of seven cows in building cloud storage, and his share was divided into two parts--the underlying storage and the cloud storage built on the former, both of which were designed in very different ways.
The underlying storage has the following difficulties:
Control of redundancy (the balance point between the number of replicas and the cost)
Repair speed (directly affects the reliability of the storage system, in the seven cattle recovery is a cluster task, the disk data copy loosely stored in the cluster, currently able to do in more than 10 minutes to a few 10 minutes to repair 2 to 3T of data)
Coping with the growth of capacity
Acceptable speed of access
A reasonable and efficient cache
Seven cattle on the network using a conventional gigabit LAN, this is taking into account its maturity and cost, between the cabinet can not guarantee any two points at any time are thousand trillion, or even can not guarantee the full unicom, and the speed between the room, bandwidth costs are high, speed and connectivity can not be guaranteed. Therefore, the location of the data storage needs to have a certain balance, the copy in the same cabinet and different cabinets have advantages and disadvantages, the room is also so.
In the fault area, in addition to the failure as a normal, but also to be able to clearly know what to face the fault, their causes, probability and impact range.
For example, common failures are:
Internal faults in the engine room
Network card (disconnection, spin down)
Network cable (disconnection, spin down)
Switches (overall failure, a-fault, VLAN failure)
Cabinet cascading failure
Machine Room Failure
Regional Network fault (engine room exit broken network)
DNS resolution failure (DNS between servers)
For the fault in the engine room, do not need to put too much resources cost to do additional high availability program.
In the network security, in addition to the necessary basic defense, more important is the business level of protection, the basic principle of public cloud is open, any service can be unconditionally exposed to the public network, the machine room interaction and Customer no difference, not a group of VPN.
Cloud storage is built on the base storage, it can provide extremely high upload, download speed, have extremely high usability, have extremely high reliability, have rich additional function (thumbnail, watermark, etc.), convenient network access.
Its difficulty lies in:
Cloud storage belongs to the terminal network, it directly face the user, the situation is complex; it is the outermost access point, the front end has no chance to do occlusion, high requirements for various indicators.
The wide area network infrastructure is generally of low quality and is based on 99% of the available infrastructure to provide 99.999% of the service.
When it comes to infrastructure, the network of computer rooms is a big problem, with network latency ranging from milliseconds to thousands of milliseconds, throughput speeds from dozens of Mbps to several kbps, and the average bandwidth cost is not cheap. The usability of the engine room is not ideal, often there will be link failure, or even large area, regional drop, spin down, not only the room has problems, the room will also frequent breakdowns, small cities, small operators will have a case of the user can not access the phenomenon (seven cattle to provide users with Download SDK, You can connect alternate domain names and IP through the SDK when connecting to the local node on app and web and not downloading content.
Seven cows to the data for the cross room redundancy, in addition to reliability, it is more for usability consideration, the data synchronization adopts the strategy of asynchronous synchronization, the hottest data is asynchronous synchronization, and the cold data is synchronized in batches. In terms of cost, redundancy increases without linear cost increases, Asynchronous synchronization also makes intelligent use of expensive bandwidth resources.
Providing cloud storage and cloud-shooting has brought some experience with CDN and DDoS defense. Shaohai Yang is an introduction to the two main types of DDoS attacks, slow cc attacks and deadly traffic attacks, in his day-to-day work, encountered more of the latter, come fast and fast, the money of the Lord often choose this way. He noted that:
Be sure to find the signs of the attack at the first time and respond in a timely manner.
Huangdong once said, to defend DDoS, directly to the CDN on the line. Shao Haiyang's point of view is not the same as his, the Self-built CDN has the following considerations:
Hardware costs (1U chassis with multiple motherboards, costing about 10,005 to 20,000)
Bandwidth cost (dual-line bandwidth expensive, do not need to do CDN acceleration dual-line, only need a single machine room can be only about 1 per megabyte)
He compared squid, varnish, Nginx, Apache traffic Server (ATS) and haproxy strength, and now take a large number of ATS, cluster size has more than 200 units, ATS cluster function is not perfect now, can pass Nginx in front to do a layer of consistent hash forwarding, to circumvent the ATS cluster problem. In addition, he also emphasized that Haproxy's powerful HTTP header parsing capability is a suitable choice for defensive layers. You can choose from a specific purpose:
Reverse proxy (route acceleration, hidden master node): Haproxy>nginx>varnih>ats>squid
Cache acceleration (static acceleration, bandwidth savings, Edge push): Ats>varnish>squid>nginx>haproxy
Defensive function (fast parsing, filtering matching): Haproxy>nginx>ats>squid>varnish
In addition, the selected system is also best able to support file read and match, support thermal load effective and pluggable cache components flexible combination.
Architecture is a continuous improvement, and the cloud-taking CDN goes through this process:
Intelligent DNS Regionalization (and the cloud is responsible for deployment of nodes, through dnspod implementation of Intelligent node selection, automatically select the nearest node to the user, in order to achieve full network acceleration)
Large-scale log analysis (how to extract malicious code from logs for analysis?) and shoot the cloud. Add a module to the Nginx, save the most recent URLs in memory for real-time analysis, plus a Hadoop cluster analysis log.
Back-end management is not intuitive (using OPENCDN to provide a multi-node CDN management platform)
CC and DDoS may cross, with Haproxy plus back-end storage, is to deal with small traffic attacks, if within the range, you can choose not to cut nodes, but if you encounter large traffic DDoS attacks, you can immediately select the Cut node. Shao Haiyang stressed that the defense of DDoS attacks, rely on technology, business, but also to obtain high-level support.
After speaking a lot of public cloud related technology, Alipay's chapter Han brought some content related to Alipay's private cloud environment, he introduced to pay treasure private cloud in the business as the core of monitoring products.
In Alipay, in addition to conventional operational monitoring and application monitoring, there are many more requirements, such as business monitoring, partner monitoring, and SOA environment monitoring.
In particular, Han emphasized a concept-business analysis, which plays a vital role in the monitoring system of Alipay:
Real-time bi--are sometimes not for troubleshooting purposes, but for confirming that there is no problem
Determine the fault range--different business characteristics, representing different fault impact range, different impact range, emergency personnel have different strategies
Business and partners-such as banks, individual banks fall, may be the problem of banks, all banks down, may be the issue of Alipay
The relationship between business and application--by monitoring different business, can quickly locate the fault
The relationship between business and business-although there is no direct relationship between the systems, there is a real possibility that the business is directly interacting
Relationship between business and operational strategy-for example, to determine the distribution of the engine room drainage and flow
The relationship between business and control strategy--control strategy has many, such as grouping, demotion, limit and drainage, control strategy formulation and business is closely related
Many companies will adopt the practice of embedding in the system monitoring, and Alipay uses business analysis combined with the phenomenon analysis of the practice of real-time fault emergency treatment. Chapter Han pointed out:
Buried point to all servers to do a buried point inspection, and the cause of the failure is infinite, often from symptoms to judge the cause of the failure.
He then introduced a brief introduction to Alipay's internal xflush monitoring solution, which draws on many ideas from Percolator, Storm, Spark, HayStack, GFS, and Rdds. XFlush pursues low intrusion, incremental computing, no preservation of raw data, guaranteed timeliness, guaranteed data accuracy, guaranteed scalability, avoidance of redundancy and computational logic scalability. In order to achieve the above content, and even implemented a set of customized Distributed file system Xstore, it is characterized by unlimited expansion, pure for the cycle of statistical computing and fixed-line monitoring points common customization, can do very low IO, provide high-speed, no IO of metadata retrieval.
Database operation Dimension is also an important work of operation, as a General Assembly, natural database-related content, Thinkinlamp founder Ma and MySQL technical experts Kinguandin for everyone to bring a lot of MySQL database operation and related experience to share. And from the Jinshan network security experts Zhao also told you a lot of security related to the story of Android, in one story let everyone feel that the mobile end of the security is also an important area, Jinshan burner system deserves attention.