The left ear mouse talk about cloud computing: Fight is the Yun-Wei

Source: Internet
Author: User
Tags aliyun

This article is based on a chat with Chenhao (@ Left ear mouse) in March 2014 with the Infoq Chinese station. In communication, Chenhao shares his understanding of cloud computing, including why cloud computing is divided into three layers, the difficulty of implementing a cloud platform, the importance of Wizi in cloud computing, and the value of electricity Shangyun.

Guest profile

Chenhao (@ Left ear mouse), coolshell.cn Blogger. 15 related working experience in software development, more than 8 years project and team management experience. Good at the underlying technology architecture, team building, software engineering, software development consulting, and global software team collaboration management. Experience with High-performance, high-availability, distributed, high concurrency, and large-scale data processing systems. Like to focus on the underlying technology platform and Internet industry applications. Technology is good at C/c++/java and Unix/linux/windows. He has been a research and development manager at Amazon China and is responsible for the development of the global Business of E-commerce (global Open Shop) and the worldwide inventory forecasting system. Has been in the Alibaba Beijing Research and Development Center, Business Department has been a senior expert position, responsible for the electric Shangyun platform, open platform, cloud monitoring and electronic business multimedia platform. Now Alibaba core System Expert group engages in the development work of Ali Core system and Aliyun ECS related virtualization platform.

The definition of cloud computing

Cloud computing actually has the same concept as PC, has CPU, hard disk, operating system, application software. Cloud computing Node (virtual machine) is the CPU in the PC, data caching service is the memory of the PC, storage node is the hard disk of the PC, providing data services, so that the data is not lost, high availability, the controller in the PC is the cloud control system. PC hardware must have an operating system on it. The operating system is a large piece of the system to provide developers with API interface, provide system monitoring to see the operation, and also have system management-such as user account Rights management, backup recovery and so on. The operating system must have application software, so as to serve the end user, application software is really landed business, so that there will be users, with users, the entire system is running.

This is what the engineers say about the stack, the IaaS, PaaS, SaaS three tiers we hear. The IaaS layer is like a PC's base hardware plus driver, and the PAAs layer is like an operating system on a PC--abstracting the underlying hardware, wrapping it, shielding hardware and hardware driver details, and scheduling the underlying hardware, and the SaaS layer is the application software in the PC. In addition, we have to provide developers with a variety of development frameworks, class libraries, and development environments, which is why AWS also makes notifications, messages, workflows, which are used to glue the operating system and business layers, such as making it easy for you to scale horizontally and distribute. Cloud computing will naturally be like a PC, with systems for control and management on all three tiers. That's why cloud computing does this, but the development of computers is in this circle.

In fact, the end user basically does not care what you use CPU, what storage is used, what framework you use to develop, they care more is can solve what problem, have what kind of user experience. As the previous Windows user experience was better than Linux because the application tier was comfortable, Linux had a better user experience than Windows because it was open and allowed developers to be more flexible and free. We can see that there are services to end users like Salesforce, Dropbox, Evernote, and Netflix on the SaaS layer, and they tend to end users and businesses.

In the end, cloud Computing's IaaS, PaaS, SaaS last s are service. That is, no matter what you look like in cloud computing, you have to provide "services" to users, not just hardware and software and resources.

Technical difficulties in cloud computing

Today, the industrial implementation of cloud computing is not too difficult. Now there are Open-source software KVM and Xen, and these two things are basically virtualized, while OpenStack manages and controls the system and is mature. PAAs also has a corresponding open source, such as OpenShift, and Java also has more than N middleware framework and technology. In addition, distributed file System GFS/TFS, distributed computing systems Hadoop/hbase and so on, distributed things are not mysterious. The implementation of the technology may have been a problem before, and now it's not.

For cloud computing engineering, the hardest thing to do now is operation. 100 units, 10,000 units or 1 million machines, that's completely different. Fewer machines you can manage, and more machines are impossible to rely on people. The operation of the system is not functional things, users can not see, so this is a serious underestimate of things. As long as you are big, it is necessary to make a fuss on the operation and maintenance system. Data Center/cloud computing is the ability to fight.

Why do I say that the operation is more complex, there are so many reasons.

On the one hand, cloud computing will replace expensive solutions with cheap equipment. The so-called internet culture is cock silk culture, cock silk is cheap, the internet is to use cheap things to build high-quality things, hardware and resources will not go higher end line-such as EMC, IBM minicomputer, SGI supercomputer, etc., you use it to build cloud computing, the cost is too expensive. Replacing expensive solutions with cheap solutions is the only thing that remains unchanged in the history of the entire computer. So if you want to get the car out of the Mercedes, you need to do a lot of things yourself and build an intelligent system. It's the biggest challenge in cloud computing to use cheap stuff to make high-quality stuff.

On the other hand, because you have more machines, then you use is not expensive hardware, so the fault becomes the normal, hard disk, motherboard, network every day bad. So, there is nothing to think about, the operation dimension must keep up. The goal of cloud computing is to ensure high availability when a failure becomes normal--that is, the availability of your service is 3 9, 4 9, or 5 9.

Finally, this large pile of machines and equipment are put together, your safety is a challenge, on the one hand is security, on the other hand is safety, to ensure the safety of dozens of sets of hundreds of equipment is OK, but for the tens of thousands of hundreds of thousands of design, it is not so simple.

Therefore, in the face of such a difficult problem, people can not be determined, you can only rely on technology to manage and transport the entire platform. For example, there must be a monitoring system. This is the same as the operating system, the management of resources, network traffic, CPU utilization, processes, memory and so on the state must be collected. Collection of the entire cluster of nodes of the state, it is inevitable that every cloud has, are similar.

Then, you have to find a more usable node, which requires a few self-test functions. For example, Aliyun has encountered a disk used to a certain time will be inexplicably unstable, some disk I/O will slow. Slow because of the hard drive, then the hard disk controller may be due to CRC error need to read more than a few times, this is like TCP packets passed over, the data error, need to retransmit. When this hard drive is dying, you need an automatic or automated Discovery Program to monitor this, when the disk may be dead, labeled as a bad disk, don't use it, read a copy to another disk. We need to have fault detection, predictive measures to drive failure, rather than passive response failure, the user experience will be good. In other words, we need automated, proactive operational dimensions.

For the high availability of data, you can only use data redundancy, write multiple to different nodes-industry standard write three is safe. However, you do have redundancy and data consistency issues. In order to solve the consistency problem of redundancy, there is a Paxos voting game, we vote whether this can be changed, so you need a strong control system to control these things.

In addition, the public cloud is coming and going, the resources and services are not used tomorrow, there is distribution and release, there is a freeze, you also have to engage in a resource management system to manage the life state of these resources. There is also authority management, just like AWS's IAM, and if there is no IAM Rights management system like AWS, AWS may not be able to use a lot of big companies like today. Enterprise-Class cloud platform, you need to have enterprise-level operational and peacekeeping management capabilities.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.