Kai-fu Lee: The power of Algorithms

Source: Internet
Author: User
Algorithms are one of the most important cornerstones of the computer science field, but they have been neglected by some programmers in China. Many students have seen a misunderstanding that companies require a wide variety of programming languages for recruitment. They think that computer learning is just about learning a variety of programming languages, or that, learning the latest language, technology, and standards is the best way to pave the way. In fact, everyone is misled by these companies. Although programming languages should be learned, it is more important to learn computer algorithms and theories, because computer algorithms and theories are more important. Because computer languages and development platforms are changing with each passing day, the algorithms and theories remain unchanged, such as data structures, algorithms, compilation principles, computer architecture, and relational database principles. On the "Kaifu student network", a student vividly compared these basic courses with "Internal merits" and compared the new languages, technologies, and standards with "external merits ". People who are fashionable all day can only learn how to behave. Without skill, they cannot become masters.

Algorithm and I

When I transferred to the Computer Science Department in 1980, not many people were specialized in computer science. Many people in other departments laughed at us and said, "Do you know why you only need to add a 'scientifically 'to your department, instead of a 'physical Science Department' or a 'chemical Science Department? Because people are really scientific, they don't need to draw a snake, and you are just looking at your own questions for fear of not being 'scientific ." In fact, they are totally mistaken. People who really learn computer science (not just programmers) have considerable accomplishments in mathematics. They can use the rigorous thinking of scientists to prove their knowledge, engineers can also be used to solve the problem. The best way to interpret this thinking and method is "algorithm ".

I remember that I wrote Othello in my blog and won the world championship for NLP software. At that time, the second person thought that I was lucky enough to win him and asked me how many moves my program could search per second on average, when he finds that my software is more efficient in searching than 60 times faster than him, he is completely convinced. Why can I do 60 times more work on the same machine? This is because I have used a new algorithm to convert an exponential function into four approximate tables, so long as I use constant time, I can get an approximate answer. In this example, whether to use algorithms is the key to winning the World Championship.

I still remember that in 1988, the vice president of Bell Labs personally visited my school to learn why their speech recognition system was dozens of times slower than I did, after expanding to a large vocabulary system, the speed difference is more than several hundred times. Although they bought several supercomputers and barely ran the system, their product department was disgusted with the expensive computing resources, because the "expensive" technology has no prospect of application. I was surprised to find an O (n * m) dynamic Planning (dynamic? Programming) is actually made into O (n * m) by them ). Even more surprised, they have published many articles, even named their own algorithms, and nominated algorithms for a scientific conference, hope to win the grand prize. At that time, the researchers at Bell's lab were of course very smart, but they all made such basic mistakes only when they were born from mathematics, physics, or motor and never learned computer science or algorithms. I think those people will never laugh at computer science again!

Algorithms in the Internet Age

Some people may say, "is the computer so fast today that algorithms are important ?" In fact, there will never be computers that are too fast, because we will always come up with new applications. Despite Moore's Law, the computing power of computers is growing rapidly every year, and the price is also declining. But we should not forget that the amount of information to be processed increases exponentially. Every day, everyone creates a large amount of data (photos, videos, voices, texts, and so on ). The increasingly advanced recording and storage methods have led to an explosive increase in the amount of information for each of us. The Internet's information traffic and log capacity are also growing rapidly. In terms of scientific research, with the advancement of research methods, the amount of data has reached an unprecedented level. Massive computing is required for 3D graphics, massive data processing, machine learning, and speech recognition. In the Internet era, more and more challenges need to be solved by superior algorithms.

Let's look at another example of the Internet age. In Internet and mobile phone search, if you want to find a nearby coffee shop, how should the search engine handle this request? The simplest way is to find out all the city's coffee shops, calculate the distance between them and you, sort them, and return the nearest result. But how can we calculate the distance? There are many algorithms in graph theory that can solve this problem.

This may be the most intuitive, but it is definitely not the fastest. If there are only a few coffee shops in a city, there should be no problem in doing so, but it does not take much effort. However, if there are a lot of coffee shops in a city and many users need similar searches, the server will be under a lot of pressure. In this case, how can we optimize the algorithm?

First, we can "pre-process" the whole city's coffee shop ". For example, a city is divided into several grids, and a user is placed in a grid based on the location of the user, and only the distance of the coffee in the grid is sorted.

The problem is that if the grid size is the same, the vast majority of results can be found in a grid in the city center, and there are only a few results in the grid in the suburbs. In this case, we should separate several grids in the city center. Furthermore, the grid should be a "Tree Structure", with a big grid at the top-the whole city falling down layer by layer, and the grid is getting smaller and smaller, this helps you to perform accurate search. If there are not many search results in the bottom-layer lattice, you can increase the search scope step by step.

The above algorithm is very useful for coffee shops, but is it universal? The answer is no. Abstract The coffee shop. It is a "point". What should I do if I want to search for a "surface? For example, if a user wants to go to a reservoir and a reservoir has several entrances, which one is closest to the user? At this time, the above "tree Structure" will be changed to "r-tree", because each node in the middle of the tree is a range, a boundary range (refer to: http://www.cs.umd.edu /~ Hjs/rtrees/index.html ).

Through this small example, we can see that the requirements of applications are ever-changing. In many cases, we need to break down a complex problem into several simple small problems, and then select appropriate algorithms and data structures.

Parallel Algorithms: Google's core advantages

The above example is a small case in Google! Every day, Google's website needs to process more than one billion searches. GMail needs to store 2G mailboxes of tens of millions of users. Google? Earth allows hundreds of thousands of users to travel across the globe at the same time and submit suitable images to each user over the Internet. Without good algorithms, these applications cannot become a reality.

In these applications, even the most basic problems will bring great challenges to traditional computing. For example, more than one billion users access Google's website every day. Using Google's services, many logs are generated ). Because each Log is increasing rapidly every second, we must have a smart way to handle it. I have asked some questions during the interview about how to analyze and handle logs. Many interviewees have answered the correct answer logically, but it is almost impossible in practical application. According to their algorithms, even if tens of thousands of machines are used, our processing speed cannot be based on the data generation speed.

So how does Google solve these problems?

First, in the Internet era, even the best algorithms must be executed in a parallel computing environment. In Google's data center, we use super-large parallel computers. However, when traditional parallel algorithms run, the efficiency will quickly decrease as the number of machines increases. That is to say, if ten machines have five times the efficiency, up to one thousand servers may only have dozens of times the effect. No company can afford this cost. In addition, in many parallel algorithms, as long as a node makes a mistake, all computing efforts will be exhausted.

So how does Google develop efficient and fault-tolerant parallel computing?

Jeff?, Google's most senior computer scientist? Dean realized that most of the data processing required by Google can be attributed to a simple parallel algorithm: Map? And? Reduce (http://labs.google.com/papers/mapreduce.html ). This algorithm can achieve high efficiency and scalability in many kinds of computing (that is to say, even if the number of one thousand machines cannot reach one thousand times, at least hundreds of times ). Map? And? Another major feature of Reduce is that it can use a large number of cheap machines to form a powerful server? Farm. Finally, its fault tolerance performance is exceptionally good, even if it is a server? Farm is half down, and the entire fram can still run. Is Map available because of the knowledge of this genius? And? Reduce algorithm. With this algorithm, Google is able to increase computing workload almost infinitely and grow with the ever-changing Internet applications.

Algorithms are not limited to computers and networks.

An example outside the computer field: in terms of high-energy physics research, many experiments have several terabytes of data per second. However, due to insufficient processing and storage capabilities, scientists have to discard most of the unprocessed data. But you need to know that the information of new elements is likely to be hidden in the data that we cannot process. Similarly, in any other field, algorithms can change human life. For example, the study of human genes may lead to the invention of new medical methods due to algorithms. In the national security field, effective algorithms may prevent the occurrence of the next 911. In terms of meteorology, algorithms can better predict the occurrence of natural disasters in the future to save lives.

Therefore, if you place the development of computers in the environment of rapid growth of applications and data, you will surely find that the importance of algorithms is not decreasing, but increasing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.