Does the programmer have to be algorithmic

Last Update:2015-05-28 Source: Internet

Author: User

Tags constant definition

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Since the title of this chapter is "Programmer and Algorithm", it is necessary to deal with a basic problem, that is, "whether the programmer must algorithm." This is a controversial issue, though not as difficult and heavy as the choice of survival or destruction, but it is by no means an easy topic. The comments and responses that my friends have made in my Algorithmic Series blog column are not all the compliments and encouragement I expect, and there are often some coldness. For example, "Is it an algorithm" or "please explain what the algorithm can play in the XX system".

Once, a netizen by e-Mail asked me: "You write is a pediatric thing, dozens of lines of code can be done, can you complete a bit of advanced algorithm?" "I asked him what he understood as an advanced algorithm," he replied, "Like genetic algorithms, ant colony Algorithms, and so on." "So I gave him an example of a genetic algorithm to solve the 0-1 knapsack problem (see Chapter 16th), and tell him that this is the algorithm of dozens of lines of code, how to interpret it as an advanced algorithm?" He just didn't admit it was a genetic algorithm until I gave him Denis. Cormier publicly published the source code for the genetic algorithm on the North Carolina State Server, he believed he had always thought that the principle of unfathomable genetic algorithms was so simple.

There is also a netizen bluntly I wrote "with three buckets equal to 8 liters of water," the problem is not called the algorithm, he thinks like "Deep blue" as the artificial intelligence is considered an algorithm. I told him that the basic theory of computer chess is game tree, or add an expert system. But he thought the game tree is also a very advanced algorithm, so I gave him a tic-tac-Chess game (see Chapter 23rd), and told him that this is the game Tree search algorithm, very smart, you can absolutely not defeat it (because the Tic-tac-chess game is very simple, the algorithm will be all the status of the search). I believe he must be shocked because the algorithm does not exceed 100 lines of code.

For the above mentioned example, I think the main reason is that the understanding of the algorithm is different, many people's understanding of the algorithm is too one-sided, many people think that only the name contains "XX algorithm" and the like things are the algorithm. And I think the essence of the algorithm is to solve the problem, as long as the problem-solving code is the algorithm. Before discussing the problem of programmers and algorithms, let's explore one of the most basic questions: what is an algorithm.

1.1 What is an algorithm

Introduction to Algorithms describes the algorithm (algorithm) as a well-defined computational process that takes one or a set of values as input and produces one or a set of values as output. In the art of computer programming, Knuth describes an algorithm as a process that begins with a step, executes all the steps in a given order, and finally ends (results are obtained). In data structure and algorithm analysis, Weiss describes the algorithm as a series of computational steps that convert input data into output results.

Although there is no formal definition of "algorithm" generally accepted, but the definition of the basic elements or basic features of the algorithm is clear, Knuth summarizes the four characteristics of the algorithm.

certainty . Each step of the algorithm is clear and the expectation of the result is determined.
have poor sex . The algorithm must be a process consisting of a finite number of steps, which may be several or millions of, but must have a definite end condition.
feasibility . In general, we expect the algorithm to end up with the correct results, which means that every step of the algorithm is feasible. As long as one step is not possible, the algorithm fails, or it cannot be called an algorithm.
input and Output . The algorithm always solves the specific problem, the source of the problem is the input of the algorithm, and the expected result is the output of the algorithm. There is no point in the algorithm without the input, and the algorithm without the output is useless.

The algorithm requires a certain mathematical basis, but there is no documentation to limit the algorithm to solving mathematical problems only. Some people understand greed, division, dynamic programming, linear programming, search and enumeration (including exhaustive enumeration) as algorithms, in fact, these are only design algorithms commonly used design patterns (Knuth called the design Paradigm). Similarly, computer programs are just one form of the algorithm, pseudo-code, flowcharts, various symbols and control tables are also common algorithm display forms. Sequential execution, parallel execution (including distributed computing), recursive methods, and iterative methods are common methods of algorithm implementation.

Based on the above analysis and quoting, I define the algorithm as follows: The algorithm is a set of mathematical models designed to solve a specific problem and a series of operational steps on this mathematical model, which process and transform the input data of the problem description, and finally get a definite result. the use of the word "elaborate" is because I understand the design process of the algorithm as a process of intense collision of knowledge and experience in the human mind, and interpret the algorithm as the intellectual result generally obtained by the "small cosmic eruption".

1.2 Do programmers have to be algorithms?

Many people may be Hollywood blockbusters, thinking that computers are mighty, but that is not the case. A computer is really a silly tool, stupid enough to have almost no IQ (at least for now). It can do the same thing for years without complaining, but if you don't tell it what to do, it won't do anything. The most creative activity is actually done by a person known as a programmer, and the computer does just the physical work that humans do not want to do. The image recognition technology requires a byte-by-byte processing of the data, extracting the eigenvalues of the data, and then comparing and matching the eigenvalues in a massive amount of data, until the two eyes dim, and humans do not do such foolish things. The computer is willing to do it, but only if you tell it what to do. The algorithm can be understood as a technique that tells the computer what to do. Some people think of programming as building blocks, directly using other people's development of components, libraries, or even classes or APIs, and the United States its name "do not reinvent the wheel." I think this is actually called system integration, and if a program clerk works every day to build bricks, that would be a very admirable thing, but I know that's not true. Such a building block-type programming computer can be done, there is no need for people to do, because the cost of labor is higher than the computer. I met more people who posted in the forum for help, such as "asking for code to read a fixed-format text file into memory", such as "Who can help me to sort this array of arrays Ah, the example of the book is an array of integers sorted." They are so helpless, if it is not the forum on the replies have points reward, I am afraid no one will talk to them.

I would say that most programmers do not need to know the algorithms in various fields of expertise, but you will design algorithms that solve the problems you face. In some areas of the classic problems, in the efforts of predecessors have been efficient algorithm implementation, many chapters of this book introduce such algorithms, such as stable matching problem, a * algorithm. But in more cases, the problem you are facing does not have a ready-made algorithm to implement, requiring the programmer to be innovative. Algorithm design needs to have a good mathematical foundation, but mathematics is not the only need for knowledge, computer technology, some basic disciplines (such as data structure) is also necessary knowledge, some people say: program = algorithm + data structure, although this is not exactly correct, but refers to the computer program the most important two points, that is the algorithm and data structure. Algorithms and data structures are always closely linked, the algorithm can be understood as the idea of solving the problem, which is the most creative part of the program, but also a program different from the key point of another program, and the data structure is the carrier of this idea.

Again, like most people, I'm not asking every programmer to be proficient in a variety of algorithms. Most programmers may not encounter problems in the International College Program Design Competition, which is organized by ACM (Association for Computing Machinery) throughout their career, but it is unthinkable to say that data structures and algorithms are not available. The people who say that data structures and algorithms are useless because they can't use them, because they can't think of them, they don't. Please propitiate, I do not want to hit anyone, in many cases it is because it is not, so it is not used, the following is a typical example.

1.2.11 Queues of massacres

My team is responsible for a light access network products, "epon Business Management Module" Development and maintenance work, which is a carrier-grade network equipment, so the requirements for all aspects of performance are very high. One day, a guy in charge of integration testing came to me and said, today's daily build is abnormal, and all line cards (boards that carry data business) are up to 4 minutes slower than yesterday's version. I am surprised that for a carrier-grade network equipment, the line card after each power-up time is the business recovery time, the business recovery time is not acceptable. So I checked the previous day's code inbound records and quickly found the problem. The current version of the task list has such a function, that is, the record line card data change log, the requirements of the description is the online card maintenance of a log buffer, whenever there is a user action resulting in data changes, the record a change of information, line cards on the Go-live batch data synchronization also belongs to the operation of data changes, also count into the log Because it is an embedded device, the size of the log buffer on the line card is limited to a maximum of 1000 records, and when more than 1000 logs are logged, the new log record overwrites the old record, which means that the log buffer retains only the last 1000 records written. A new lad accepted the task and checked the code into the library the day before he left work (the programmer must remember, don't check in the code before the next shift). His implementation is roughly the same (I added the note):

#define sync_log_cnt#define sync_log_memover_cnttypedefstruct{int32u logcnt; Epon_sync_log_data synclogs[sync_log_cnt];} Epon_sync_log; Epon_sync_log S_eponsynclog;voidEpon_sync_log_add (epon_sync_log_data*plogdata) {int32u i =0; int32u synclogcnt =0; synclogcnt = s_eponsynclog.logcnt;if(synclogcnt>=sync_log_cnt) {/ * Buffer full, move forward 950 records, make 50 records for new record space */Memmove (s_eponsynclog.synclogs, S_eponsynclog.synclogs + sync_log_memover_cnt, (sync_log_cnt -SYNC_LOG_MEMOVER_CNT) *sizeof(Epon_sync_log_data));/ * Empty the newly vacated space * /memset (S_eponsynclog.synclogs + (sync_log_cnt-sync_log_memover_cnt),0, sync_log_memover_cnt *sizeof(Epon_sync_log_data));/ * Write the current log * /Memmove (S_eponsynclog.synclogs + (sync_log_cnt-sync_log_memover_cnt), Plogdata,sizeof(Epon_sync_log_data)); s_eponsynclog.logcnt = sync_log_cnt-sync_log_memover_cnt +1;return; }/ * If the buffer has space, write directly to the current record * /Memmove (S_eponsynclog.synclogs + synclogcnt, Plogdata,sizeof(Epon_sync_log_data)); s_eponsynclog.logcnt++;}

This scheme uses an array of 1000 records to store logs, a counter to record the number of valid log entries currently written, the design of the data structure is very good, but when the buffer is full, the old record needs to be overwritten, because each time the first 999 records in the array to make room for the new records , which causes Epon_Sync_Log_Add() the performance of the function to deteriorate sharply. With this in mind, the lad designed a threshold for his scheme, which is SYNC_LOG_MEMOVER_CNT 50 of the constant definition. When the buffer is full, it moves forward 950 records at a time, freeing up 50 records, avoiding the fact that every new record will move all the data. Visible this chap still moved a brain, in the Epon_Sync_Log_Add() function call is not very frequent, in the function and performance between a compromise, according to the situation of self-test, he felt can also, so in the rush to check in code before work, no time to arrange code walk and peer review. However, he did not consider the need to synchronize data in bulk when the line card is on-line, in which case the Epon_Sync_Log_Add() frequency of the function being called is still beyond the tolerable level of this threshold. By analyzing the performance of the task, we find that a lot of time is spent on the Epon_Sync_Log_Add() operation of moving records in the function, even if the threshold value is designed SYNC_LOG_MEMOVER_CNT , the performance is still poor.

In fact, a ring queue is usually the best choice for reading and writing a fixed-length buffer like this. Let's take a look at the ring queue, shown in 1-1.

Figure 1-1 Ring Queue

There is no ring structure in the computer's memory, so the ring queue is implemented with a linear table, and when the data pointer reaches the end of the linear table, it goes to the 0 position to start over again. In the actual programming, it is not necessary to judge whether the data pointer reaches the tail of the linear table, usually using the modulo operation to do the consistency processing. The length of the linear table with the analog ring queue is N, the team head pointer is, the head tail pointer is tail , and each additional record can be calculated using the following method to calculate the new tail pointer:

tail = (tail + 1) % N

For the functional requirements of this example, when tail + 1 equal head , the description queue is full, just move the head pointer forward one bit, you can write a tail new record at the location. With the ring queue, you can avoid moving the recording operations, and the performance issues mentioned at the beginning of this section are resolved. Here, to apply a slogan: "There is no way to do, only unexpected." "Look, am I right?" he said.

1.2.2 My first algorithm

My first job was to write an image preprocessing system for a raster image vectorization software, which can identify bitmap drawings scanned from paper engineering drawings into vectorization graphics files that can be processed by various CAD software. One function in a preprocessing system is to blot out a raster bitmap (black and white bitmap) that has been binary-valued. A stain on a raster bitmap may be a dots that existed prior to scanning on the original sheet, or a noise introduced by the scanner, which could have an impact on the vectorization identification process, identify the wrong graphics and symbols, and therefore require a pre-elimination of these stains.

At that time I did not know that there is a wavelet algorithm, but also do not know that there are various image filtering algorithms, just based on the understanding of the problem, gave me the solution. First of all, I look at drawing documents, such as lines, circles and arcs of meaningful graphics are at least 5 points connected together, and the stain is generally not more than 5 points connected together (the larger stains are removed by other methods). So I gave the definition of a stain: If the total number of points connected to a point is less than 5, then the dots connected together are a stain. According to this definition, I have given my algorithm: starting from the first point of the bitmap search, if this point is 1 (1 for Black, is the point on the drawing, 0 is white, is the drawing background color), the connection point counter plus 1, and then continue to the point of the 8 direction of the connection to search separately, If the adjacent point in a direction is 0, stop searching in that direction. If you search for more than 4 connected points, this point is a point on a graph and exits the search for that point. If the search is completed and the connected point is less than or equal to 4, it means that the point is a stain and needs to be set to 0 (clear the stain).

The algorithm implementation first defines the data structure that stores the connection point information in the search process, which is defined as follows:

typedef struct tagRESULT{    POINT pts[MAX_DIRTY_POINT];/*记录搜索过的前5个点的位置*/    int count;}RESULT;

This data structure has two attributes, which count are the number of connected points found during the search process, and pts are linear tables that record the location of these points of connection. The locations of these points are recorded so that if the points are found to be tainted after the search is complete, the location information of these records can be used to directly clear the color of these points.

 /*8 direction */point dir[] = {{-1, 0}, {-1,-1}, {0,-1}, {1, 1}, {1, 0}, {1, 1}, {-0}};vo ID searchdirty (char bmp[max_bmp_width][max_bmp_height] int x, int y, RESULT *result) {for (int i = 0; i < sizeof (dir)/sizeof (dir[0]);        i++) {int NX = x + dir[i].x;        int ny = y + dir[i].y;            if ((NX >= 0 && NX < max_bmp_width) && (NY >= 0 && NX < max_bmp_height)                && (Bmp[nx][ny] = = 1) {if (Result->count < Max_dirty_point) {                /* Position of max_dirty_point points before recording */result->pts[result->count].x = NX;            result->pts[result->count].x = NY;            } result->count++;            if (Result->count > Max_dirty_point) break;        Searchdirty (BMP, NX, NY, result); }    }}

Search in 8 directions using a preset vector array dir , which is the usual pattern of maze or checkerboard game search, the algorithm introduced in this book will use this mode multiple times. The SearchDirty() function recursively calls itself, realizes the 8 direction of the connectivity search, the final result exists result , if count the number is greater than the 4 [x, y] point where the location is the normal graph point, if count the number is less than or equal to 4, then the description [x, y] The location adjacent to this point is a stain. The location of points adjacent to the stain is recorded in pts , and the bitmap data for those locations is set to 0 to eliminate the stain. The algorithm did not do any optimizations, but fortunately most of the drawings are white backgrounds, and there are not many points to search for. Open the test sheet a try, the speed is not slow, the effect is very good, a few deliberately point up to do the test with the stain have no, small noise point also did not have, the drawings will become white. However, this code is not a part of the software at the end of the program, students who have studied mechanical mapping may have seen it, and the algorithm will kill some small dashed lines and dots.

It was a trivial question, but it was the first time I had devised an algorithm for solving (of course, attempted) problems and eventually implemented it with a program. It makes me realize that the software is written to solve the problem, the programmer's task is to design algorithms to solve these problems. Success is happy, failure is not a price, you can make a comeback at any time. Do not underestimate these things, do not think that only a variety of professional areas of the program will use the algorithm, every tiny design is the embodiment of the creative algorithm, even if the failure, but also better than give up.

1.3 The fun of the algorithm where

Algorithm has many forms of existence, writing a computer program is only one of the ways programmers are accustomed to, this book is to introduce the content is how to study the algorithm in the computer program. The two examples described in section 1.2 are the things I've experienced, and programmers spend most of their time dealing with mundane and trivial programs, but sometimes they need to do some creative work. Remember, the programmer is the "God" of the computer, and the computer solves the problem because its "God" tells it what to do. So, when the problem comes, "God" is to go to various forums to send a post for code, or to solve problems by themselves?

If you want to solve the problem yourself, how should you solve the problem? Why do you have to solve the problem yourself? Let's answer the first question-how do you design an algorithm to solve a problem? The way people solve problems is when they encounter a problem, first of all, search for existing knowledge and experience from the brain, look for the relevant place between them, make an appropriate conversion of an unknown problem, convert to one or more known problems to solve, and finally synthesize to get the original problem solution. To write a computer program implementation algorithm, so that the computer to help us solve the problem of the process is no exception, also requires a certain knowledge and experience. In order for the computer to help us solve the problem, we should design the computer can understand the algorithm program. The first step in designing an algorithmic program is to make the computer understand what the problem is. This requires the establishment of mathematical models of realistic problems. The modeling process is an abstract process to the real problem, using the logical thinking ability, grasping the main factors of the problem, ignoring the secondary factors. After establishing the mathematical model, the second problem to consider is the input and output problem, which is the conversion of natural language or other expressions that humans can understand to describe the problem into the mathematical model of the data, the output is the mathematical model of the expression of the results of the operation into natural language or other human can understand the expression. Finally, the design of the algorithm, in fact, is to design a set of mathematical models of data manipulation and conversion steps, so that it can evolve the final results.

Mathematical models, input and output methods and algorithm steps are the three key factors in programming computer algorithms. For very complex problems, it is very difficult to build mathematical models, such as the "Big Bang" model that astrophysicists study, such as the complex geometry cooling model of thermodynamics, and so on. However, this is not the scope of the book to explore, the problem of the programmer is more than this complex theoretical problem, but the software development process common and common problems, these problems are simple, but not boring. For a simple computer algorithm, building a mathematical model is actually the problem of designing the appropriate data structure. This leads to the above mentioned topic, the data structure plays a very important role in the algorithm design process. The input and output method and the algorithm step design are based on the corresponding data structure design, the corresponding structure to be able to easily transform the original problem into the various properties of the data structure, but also can easily be in the data structure of the results in a way that people can understand the output, at the same time, It also provides the most convenient support for the evolution of each step in the algorithm conversion process. Using a linear table or an associative structure, using a tree or graph, is an issue to consider when designing input and output and algorithm steps.

Why do you have to solve the problem yourself? Einstein said: "Interest is the best teacher." That is to say, as long as one is interested in something, he will take the initiative to learn, to study, and to produce pleasant emotions in the course of study and study. I divided the fun from the algorithm into three levels: the primary level is to find a specific algorithm to solve specific practical problems, the fun is to solve the problem after the sense of accomplishment; intermediate level is some algorithm itself is full of fun, to understand the principle of this algorithm and write out the algorithm code, can make their future work to bring convenience The advanced level is to design the algorithm to solve the problem, so that others can use your algorithm to enjoy the primary level of fun. Sometimes the problem may be that others have not encountered, there is no known solution, in this case can only solve the problem on their own. This is why the book has always emphasized the fun of algorithms. Only to experience the fun, have the motivation to study and research, and this study and research results are to bring their own positive incentives for future work to bring convenience. Recall the example of 1.2.1, the circular queue-related algorithm is a common pattern of fixed-length buffer read and write, if you know this, there will be no such problem.

1.4 Algorithms and Code

The algorithm described in this book is based on computer program as a carrier, and its basic form is program code. As a software developer, what kind of code would you like to see? This is the code:

double kg = gScale * 102.1 + 55.3;NotifyModule1(kk);double kl1 = kg / l_mask;NotifyModule2(kl1);double kl2 = kg * 1.25 / l_mask;NotifyModule2(kl2);

This is still the code:

double globalKerp = GetGlobalKerp();NotifyGlobalModule(globalKerp);double localKrep = globalKerp / localMask;NotifyLocalModule(localKrep);double localKrepBoost = globalKerp * 1.25 / localMask;NotifyLocalModule(localKrepBoost);

Programmers have a hunch that code that can be read is good code. But "can understand" is a very subjective feeling, the same code for different people to see, can understand the difference. The author of "Refactoring" summarizes 21 "bad taste" rules for bad code, hoping to be able to judge the "bad code" in the code. But these 21 rules are still too subjective, so people have given the code a lot of quantitative indicators, such as code annotation rate (this indicator because of meaningless, has been abandoned by many organizations), the average source code file length, average function length, average code dependency, code nesting depth, test case coverage, and so on. The purpose of this work is that people want to see beautiful code, which is not only the need of subjective aesthetics, but also the relentless pursuit of software quality objectively. Beautiful code helps improve the quality of the software, which is already a recognized fact, because programmers in the process of making their code beautiful, can improve the quality of the code in some small but important way, these small but important ways include but not limited to better design, testability and maintainability and other aspects of the method.

On the basis of ensuring the correctness of software behavior, what words do people use to describe good code? Good-looking, beautiful, neat, elegant, art, like poetry? I have seen a lot of software code, there is open source software code, there are commercial software code, good code to my feeling is that these adjectives, of course, have seen bad code, give me the feeling is "a bunch of code" just. When I write the "Algorithmic series" blog column, I pay particular attention to this, even if someone else has already published a similar algorithm implementation, I would like my algorithm to present a completely different code. Design algorithms and design software, should be beautiful code, if the hundreds of lines of code stacked together, no primary and secondary, the relationship is messy, but the final pile of a correct result, this is not the code I want, that is, abuse and abuse of their own. Most people come to see your blog, it should be to understand it. When I prepared this book, I wrote a lot of algorithms again, not only the algorithm interesting, the study of code is also a pleasure. If the algorithm itself is interesting, but the final code implementation is no aesthetic "a bunch of code", it is really disappointing.

1.5 Summary

This chapter borrows the definition of the algorithm in many well-known works, just to let everyone have a "tolerant" understanding of the algorithm. Through my personal experience of two examples, the programmer and the algorithm between the "shear constant, the chaos" relationship. In addition, the source of the algorithm, the relationship between the algorithm and the code, and the pleasure of studying the code itself are briefly discussed.

If you agree with my point of view, you can continue to read the book. Each chapter of the book is independent and has no relationship, and you can read the relevant chapters directly according to your preferences. Hope this book can make you gain, and realize the fun of the algorithm.

1.6 References

[1] cormen T H, et al Introduction to Algorithms (Second Edition). The MIT Press, 2001

[2] Knuth D E. The Art of computer programming (third Edition), vol. 1. Addison-wesley, 1997

[3] Weiss M A. Data Structures and algorithm analysis (Second Edition). Addison-wesley, 2001

[4] Oram A, Wilson G. Beautiful Code. O ' Reilly Media, Inc., 2007

[5] Fowler M, et al refactoring:improving the Design of Existing Code. Addison-wesley, 1999

This digest is from the fun of the algorithm

Does the programmer have to be algorithmic

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More