The power of algorithms (to Li Kaifu)

Source: Internet
Author: User

Algorithm It is one of the most important cornerstones of the computer science field, but it has been Program Employee cold. Many students see what companies require during recruitment Programming Language There is a misunderstanding that computer learning is the best way to learn the latest language, technology, and standards. In fact, people are misled by these companies. Although programming languages should be learned, it is more important to learn computer algorithms and theories. Because computer languages and development platforms are changing with each passing day, the algorithms and theories are the same, such as data structures, algorithms, compilation principles, computer architecture, and relational database principles. On the "Kaifu student network", a student vividly compared these basic courses with "Internal merits" and compared the new languages, technologies, and standards with "external merits ". People who are fashionable all day can only learn how to behave. Without skill, they cannot become masters.


Algorithm and I


When I transferred to the Computer Science Department in 1980, not many people were specialized in computer science. Many people in other departments laughed at us and said, "Do you know why you only need to add a 'scientifically 'to your department, instead of a 'physical Science Department' or a 'chemical Science Department? Because people are really scientific, they don't need to draw a snake, and you are just looking at your own questions for fear of not being 'scientific ." In fact, they are totally mistaken. People who really learn computer science (not just programmers) have considerable accomplishments in mathematics. They can use the rigorous thinking of scientists to prove their knowledge, engineers can also be used to solve the problem. The best way to interpret this thinking and method is "algorithm ".

I remember that I wrote Othello in my blog and won the world championship for NLP software. At that time, the second person thought that I was lucky enough to win him and asked me how many moves my program could search per second on average, when he finds that my software is more efficient in searching than 60 times faster than him, he is completely convinced. Why can I do 60 times more work on the same machine? This is because I have used a new algorithm to convert an exponential function into four approximate tables, so long as I use constant time, I can get an approximate answer. In this example, whether to use algorithms is the key to winning the World Championship.

I still remember that in 1988, the vice president of Bell Labs personally visited my school to learn why their speech recognition system was dozens of times slower than I did, after expanding to a large vocabulary system, the speed difference is more than several hundred times. Although they bought several supercomputers and barely ran the system, their product department was disgusted with the expensive computing resources, because the "expensive" technology has no prospect of application. While discussing with them, I was surprised to find that a dynamic programming was made into O (N * n * m) by them ). Even more surprised, they have published many Article And even name your own algorithms, and nominate algorithms for a scientific conference, hoping to win the grand prize. At that time, the researchers at Bell's lab were of course very smart, but they all made such basic mistakes only when they were born from mathematics, physics, or motor and never learned computer science or algorithms. I think those people will never laugh at computer science again!


Algorithms in the Internet Age


Some people may say, "is the computer so fast today that algorithms are important ?" In fact, there will never be computers that are too fast, because we will always come up with new applications. Despite Moore's Law, the computing power of computers is growing rapidly every year, and the price is also declining. But we should not forget that the amount of information to be processed increases exponentially. Every day, everyone creates a large amount of data (photos, videos, voices, texts, and so on ). The increasingly advanced recording and storage methods have led to an explosive increase in the amount of information for each of us. The Internet's information traffic and log capacity are also growing rapidly. In terms of scientific research, with the advancement of research methods, the amount of data has reached an unprecedented level. Massive computing is required for 3D graphics, massive data processing, machine learning, and speech recognition. In the Internet era, more and more challenges need to be solved by superior algorithms.

Let's look at another example of the Internet age. On Internet and mobile phone search, if you want to find a nearby coffee shop, how should the search engine handle this request?

The simplest way is to find out all the city's coffee shops, calculate the distance between them and you, sort them, and return the nearest result. But how can we calculate the distance? There are many algorithms in graph theory that can solve this problem.

This may be the most intuitive, but it is definitely not the fastest. If there are only a few coffee shops in a city, there should be no problem in doing so. It is not much computing. However, if there are a lot of coffee shops in a city and many users need similar searches, the server will be under a lot of pressure. In this case, how can we optimize the algorithm?

First, we can "pre-process" the whole city's coffee shop ". For example, a city is divided into several grids, and a user is placed in a grid based on the location of the user, and only the distance of the coffee in the grid is sorted.

The problem is that if the grid size is the same, the vast majority of results can be found in a grid in the city center, and there are only a few results in the grid in the suburbs. In this case, we should separate several grids in the city center. Furthermore, the grid should be a "Tree Structure", with a big grid at the top-the whole city falling down layer by layer, and the grid is getting smaller and smaller, this helps you to perform accurate search. If there are not many search results in the bottom-layer lattice, you can increase the search scope step by step.

The above algorithm is very useful for coffee shops, but is it universal? The answer is no. Abstract The coffee shop. It is a "point". What should I do if I want to search for a "surface? For example, if a user wants to go to a reservoir and a reservoir has several entrances, which one is closest to the user? At this time, the above "Tree Structure" will be changed to "R-tree", because each node in the middle of the tree is a range, a boundary range (refer to: http://www.cs.umd.edu /~ Hjs/rtrees/index.html ).

Through this small example, we can see that the requirements of applications are ever-changing. In many cases, we need to break down a complex problem into several simple small problems, and then select appropriate algorithms and data structures.


Parallel Algorithms: Google's core advantages


The above example is a small case in Google! Every day, Google's website processes more than one billion searches, Gmail stores 2G mailboxes of tens of millions of users, and Google Earth allows hundreds of thousands of users to travel across the globe at the same time, and submit suitable images to each user over the Internet. Without good algorithms, these applications cannot become a reality.

In these applications, even the most basic problems will bring great challenges to traditional computing. For example, more than one billion users access Google's website every day. Using Google's services, many logs are generated ). Because every minute and second logs increase rapidly, we must have a smart way to handle them. I have asked some questions during the interview about how to analyze and handle logs. Many interviewees have answered the correct answer logically, but it is almost impossible in practical application. According to their algorithms, even with tens of thousands of machines, our processing speed cannot keep up with the data generation speed.

So how does Google solve these problems?

First, in the Internet era, even the best algorithms must be executed in a parallel computing environment. In Google's data center, we use super-large parallel computers. However, when traditional parallel algorithms run, the efficiency will quickly decrease as the number of machines increases. That is to say, if ten machines have five times the efficiency, up to one thousand servers may only have dozens of times the effect. No company can afford this. In addition, in many parallel algorithms, as long as a node makes a mistake, all computing efforts will be exhausted.

So how does Google develop efficient and fault-tolerant parallel computing?

Jeff Dean, Google's most senior computer scientist, recognizes that the vast majority of Google's data processing can all be attributed to a simple parallel algorithm: map and reduce (http://labs.google.com/papers/mapreduce.html ). This algorithm can achieve high efficiency and scalability in many kinds of computing (that is to say, even if the number of one thousand machines cannot reach one thousand times, at least hundreds of times ). Another major feature of MAP and reduce is that it can use a large number of cheap machines to form a powerful server farm. Finally, its fault tolerance performance is exceptionally good. Even if the machine in the server farm is half down, the entire farm can still run. It is precisely because of the knowledge of this genius that the map and reduce algorithm is available. With this algorithm, Google is able to increase computing workload almost infinitely and grow with the ever-changing Internet applications.


Algorithms are not limited to computers and networks.


An example outside the computer field: in terms of high-energy physics research, many experiments Generate several terabytes of data per second. However, due to insufficient processing and storage capabilities, scientists have to discard most of the unprocessed data. But you need to know that the information of new elements is likely to be hidden in the data that we cannot process. Similarly, algorithms can change human life in any other field. For example, the study of human genes may lead to the invention of new medical methods due to algorithms. In the national security field, effective algorithms may prevent the occurrence of the next 911. In terms of meteorology, algorithms can better predict the occurrence of natural disasters in the future to save lives.

Therefore, if you place the development of computers in the environment of rapid growth of applications and data, you will surely find that the importance of algorithms is not decreasing, but increasing.


Seven suggestions for programmers


(1) internal skills. Do not spend only time learning a variety of popular programming languages and tools, as well as the subjects required by some companies to recruit advertisements. Basic courses such as data structures, algorithms, databases, operating system principles, computer architecture, computer networks, and discrete mathematics should be well learned. You may try the questions in the art of computer programming written by Gartner. If you can solve most of these questions, it means you have a certain skill in algorithm.

(2) Multiple practices. Accumulate experience and consolidate knowledge through practical programming. Many Chinese university graduates lack programming and debugging experience. They have learned how to pass the C language and pass the examination. In the project, if the program can be compiled, run, and the input and output meet the requirements, it is okay. These practices are not acceptable. When writing a program, you must think about how to write the program more refined, efficient, and high-quality. We suggest you try to write 100,000 rows in the four years of university Code Experience. What we must understand is that good programmers did not learn it.

(3) be realistic. Do not underestimate any practical work, such as seemingly simple coding or testing. We need to persistently pursue meticulous practices and professionalism for details. I have found that many programmers have a superficial understanding of knowledge. They do not have to worry about it. For example, after learning C ++, do you know how an object is initialized in assembly code after compilation? How are the members of this object stored in the memory? When a member function is called, what additional actions does the compiler add to the assembly code? How is a virtual function called? These things are not mentioned in detail in programming languages or compilation principles. They can only be mastered through practical work.

(4) Pay attention to mathematics learning. Mathematics is the gymnastics of thinking, and mathematics is everywhere. To learn computers, you must at least learn Discrete Mathematics, probability theory, Boolean algebra, set theory, and mathematical logic. This knowledge is not difficult, but it will be of great help to your future work. Especially when you are interested in some "mathematical-intensive" fields such as video and image processing, this knowledge will become a powerful tool in your hands.

(5) cultivate team spirit and learn to cooperate with others. Today, software engineering is no longer available for individual operations, but can be successful only by teamwork. People who do not know how to work together cannot be a big tool. We need to look for opportunities to work with people on projects.

(6) inspire innovation and cultivate curiosity. Without mastering the fundamental principles of a certain algorithm technology, we will not be able to respond and innovate. To be a good programmer (in fact, this is true for any industry), it is important to develop a good habit of research, curiosity, innovation, hands-on, and cooperation, which is not satisfied with the duck filling, it is not satisfied with the examination delivery, not satisfied with the appearance. This is not an overnight course.

(7) "work" strategically ". Find a meaningful summer job or part-time job without affecting your studies. Find a technology-oriented company and complete the programs that will truly be used by users under the guidance of a good "Boss. Don't rush to a place where you want to be "head" and stand alone, because learning from others is your purpose. The same is true for finding a job. You should not only look at the treatment and title, but also select an environment where you can learn, an enterprise that is willing to train employees, or a professional company that attaches importance to you. Finally, we need to pick a good boss.

I hope everyone can seize the opportunity to develop good learning habits and thoroughly learn algorithms. I hope everyone can have a bright future!

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/livelylittlefish/archive/2008/05/04/2386904.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.