Software Transcript: programmer's tribal chief Joel (local.joelonsoftware.com/wiki)-23

Source: Internet
Author: User
The Joel on software translation project: let the wrong program see the error from the Joel on software translation projectjump:
Navigation,
Search can tell the wrong program

By Joel Spolsky)
Translated by Paul may Mei puhua
Wednesday, May 11,2005
A part of Joel on software, http://www.joelonsoftware.com

Back in September 1983, my first real job was in Israel's Oranim. The large bread factory cooked hundreds of thousands of breads every night with a giant stove as big as six machines.

The first time I walked into that bakery, I thought it was really dirty. The machine with fireworks is rusty and oil is everywhere.

「 Is it always so dirty here ?」 I asked.

「 What? What are you talking about ?」 The manager replied. 「 We just cleaned it. This is the best time for weeks .」

That's good!

It took me several months to finish cleaning every morning to understand what they mean. For the bread factory, dry bread refers to the fact that there is no dough in the machine that is being baked, there is no fermented dough in the garbage, and there is no heap of dough on the floor.

It doesn't mean that the stove is white and bright. The stove will only paint once in about ten years, and will not be painted once a day. Dry bread does not mean to dry the oil. In fact, many machines have to regularly install oil. A thin layer of oil usually implies that the machine has just been cleaned and maintained.

The concept of this entire dry bread in the bread factory has to be learned. It is impossible for outsiders to walk in and tell where to do the dirty data. People outside the circle will never want to see the dough rolling machine (the machine that rolls the dough into a sphere, see the picture on the right) whether the internal wall has been scrubbed. People outside the circle will feel that there is a problem with the color of the outer wall of the old stove, because the Panel is veryLargeVery conspicuous. However, the Baker did not care about the painting of the stove. Because the bread tastes great.

After two months in the bread factory, you will learn how to "see" dry bread.

The same is true for program code.

When you first write a program or try to read a program written in a new language, all the program code looks mysterious and unsolvable. Before learning about this language, you cannot even see obvious syntax errors.

In the first stage of learning, you will start to find something we usually call programming style. So you begin to pay attention to program code that does not follow the contraction standard and variables that use multiple uppercase letters.

At this stage, you will say, "Damn bastard, here we are.YesWe need to develop some consistent programming styles !」 The next day, I will write a copy of the programming style used by your team. I will discuss one true brace style over the next six days ), then it will take another three weeks to rewrite the old program code into a one true brace style. The manager has always discovered and blamed you for wasting your time on making money. You don't need to change it all at once. It doesn't matter where you change it. So half of the program code has been changed to true brace style, and it won't be long before you forget it. Next, you start to think about other things that have nothing to do with making money, such as changing a string category to another string category.

When you become more proficient in programs in a specific environment, you will begin to learn to see other things. Those things may be completely legal and fit the programming style, but they will worry you again.

For example, in C:

Char * DEST, Src;

This is a syntactic program code. This may comply with your programming specifications, or even intentionally write this Code. However, if you have enough experience in writing C, you will pay attention to this method.DestShengmingchengCharacter
PointerHoweverSRCShengmingchengCharacterThat's all.PossibleThat's what you mean, but it may not. It seems a bit wrong.

Let's look at more subtle examples:

If (I! = 0)
Foo (I );

This program is correct; it complies with most programming specifications and is completely correct, but you may questionIfThe subject of the statement is not enclosed in braces, because someone may insert another line of program code in your mind.

If (I! = 0)
Bar (I );
Foo (I );

... I forgot to add braces.Foo (I)It will always be executed! So when you see a program code section that is not wrapped up with a large arc, you may feel a little uncomfortable.

Well, so far I have mentioned three levels of achievement for programmers:

1. You don't know what is the difference between dirty and dirty.

2. You have a superficial understanding of what you do, mainly based on whether it meets the programming specifications.

3. You start to sniff out the clues that are hidden on the surface. You will notice the problem and find the correct one.

But there is actually a higher level, and that is what I really want to say:

4. You plan to construct the program code, and make the program code more correct with the eye that can detect the problem.

This is the real art: carefullyDesignMake the error obviousProgramming specificationsTo create a solid program.

So now I want to show you a small example and then demonstrate a general rule. You can use this general rule to design programming specifications that increase program stability. In the end, I will direct the topic to a certain Hungary naming law (which may not make people dizzy) to defend and criticize some environments (or perhaps not the most commonly used environment).

However, if you believe that Hungary's naming method is not a good thing, you think that exception handling is the best invention since the chocolate milkshake, and you don't want to hear anything else, you can go to Lori to see the good-looking cartoons. You have nothing to look at here. In fact, in one minute, I will come up with an actual program code example, these examples may make you sleep before you get upset. Yes. I think my plan is to put you down to sleep. When you are asleep and unable to resist, I will secretly put the idea of "Hungary naming = Good, exception handling = bad" into your mind.

Example

Okay. This example is mentioned. Let's assume that you are writing some web application, because these children seem to be writing this stuff.

There is now a security vulnerability called cross site scripting vulnearability, abbreviated as XSS. I will not talk about details here: you only need to know that when writing a web application, you must be careful not to directly pass back any strings that the user fills in the form.

For example, if you have a webpage that allows users to enter their names in the edit box, the webpage will jump to another one and write "Hello, James !」 (Assume the user's name is Michael. Well, this is a security vulnerability, because the user may not input Michael Jacob and some strange HTML and JavaScript, and these strange JavaScript may do some low-level things, for example, read the cookie content you wrote and transfer it to a bad website. And these low-level things now look like you're a ghost.

Let's write the program using the pseudo code method. Imagine the following program

S = request ("name ")

The user input (a post parameter) is read from the HTML table ). If you have written the following program code:

Write "hello," & request ("name ")

Your website has the XSS Attack Vulnerability. This is enough.

You must encode the code before copying it back to HTML to avoid this vulnerability. The so-called encoding is"Change& Quot;>Change& Gt;And so on. So

Write "hello," & encode (Request ("name "))

Is absolutely safe.

All strings from the user areInsecure. Any insecure string must be encoded before output.

Let's try to design a set of programming specifications to ensure that when you make such a mistake, the program codeLooks likeIt is wrong. If the program code is incorrect (at leastLooks likeWrong.

Possible Solution 1

Solution 1: encode all strings immediately and perform the following operations immediately after obtaining them:

S = encode (Request ("name "))

Therefore, our rules will write: If you seeEncodeWrappedRequestThe program must be wrong.

You start to train your eyes to findRequestBecause they violate the rules.

This is useful, because as long as you follow the rules, there will be no XSS problems. However, this is not the best architecture. For example, you may want to store these user strings in the database. It is not reasonable to store HTML-encoded strings because strings may be used outside the HTML webpage. If it is the materials that have been encoded during the credit card processing process, a problem may occur. Most Web application development follows the principle that all strings areWeiEncoded, wait until it is sent to the HTML webpageInstant agoSo this may not be the correct architecture.

We really want to keep the string in insecure format for a period of time.

Okay. Let me try again.

Possible Solution 2

If a programming specification is set upWriteAny string must be encoded. Can it meet the requirements?

S = request ("name ")

// Very follows:
Write encode (s)

Now, when you see a ticket not foundEncodeFollowedWriteYou will know that there is a problem.

Alas, this is not good either ...... Sometimes your program has a small HTML code. In this caseNot AllowedEncoded:

If mode = "linebreak" then prefix = "<br>"
// Very follows:
Write prefix

This is wrong according to our specifications. We must encode it in the output:

Write encode (prefix)

But now we should add a new row"<Br>"Is encoded& Lt; BR & gt;The result is a character that the user can see.<B r>. This solution is also incorrect.

So sometimes you cannot encode the input strings, and sometimes you cannot encode the input strings. Neither of these two proposals can be used. However, without proper coding specifications, we still have the risk of the following problems:

S = request ("name ")
...... Several pages later ......
Name = s ...... Several pages later ......
Recordset ("name") = Name // store the name in the name column of the database. ...... Several days later ......
Thename = recordset ("name ")...... Several pages or even months later ......
Write thename

Do we still remember to encode the string? You can't see the problem in any single place. None of them can be sniffed. If such a program has a large cylinder, it takes a large number of detectives to track the sources of all strings and check whether they are encoded.

Positive Solution

So let me propose a usable programming specification. We only have one rule:

All strings from the user must exist in the variable (or database column) whose header is "us" (indicating unsafe string, insecure string. All strings encoded in HTML or from confirmed security sources must exist in the variable starting with "S" (indicating safe string, secure string.

Let's rewrite the program, just rename the variables according to the specifications, and the rest will not move at all.

US = request ("name ")
...... Several pages later ......
Usname = us ...... Several pages later ......
Recordset ("usname") = usname...... Several days later ......
Sname = encode (recordset ("usname ")) ...... Several pages or even months later ......
Write sname

It is worth noting that, as long as the encoding rules are followed, errors related to unsafe stringsThe code of a single line can certainly be seen.:

S = request ("name ")

Is the previous error, because you can seeRequestIs assignedSVariable at the beginning, which violates the rules.RequestThe result must be insecure. Therefore, you must assign a variable starting with "us.

US = request ("name ")

No problem.

Usname = us

No problem.

Sname = us

It must be wrong.

Sname = encode (US)

It must be correct.

Write usname

It must be wrong.

Write sname

No problem. The same is true below.

Write encode (usname)

Every line of the program is simply watchingProgram codeIt is enough to check itself, and if each line of the program is correct, it is also right to combine the entire program.

Finally, with this encoding specification, you can see it in your eyes.Write usxxxYou know it is wrong, and you immediately know how to fix it. I know that it is a little difficult to see the wrong program at first, but after three weeks, your eyes will get used to it, just like the workers in the bread factory will immediately say, "No one is scanning here! This is a bread factory .」

In fact, we can extend the rulesRequestAndEncodeFunction Name (or block)UsrequestAndSencode...... In other words, functions that return unsafe strings and secure strings must be the same as those of variables.UsAndSAs the first word. Now let's look at the program code:

US = usrequest ("name ")
Usname = us
Recordset ("usname") = usname
Sname = sencode (recordset ("usname "))
Write sname

Did you see our results? Now you can check whether the headers on both sides of the equal sign are the same to find the error.

Us=
UsRequest ("name ")
// No problem. Both sides start with us
S=UsRequest ("name ")
// Error
UsName =Us // Pair
SName =Us // It must be an error.
SName =SEncode (US)// Yes.

I can try againWriteRenamedWritesAndSencodeRenamedSfromus:

Us=UsRequest ("name ")
UsName =Us
Recordset ("UsName ") =UsName
SName =SFromUs(Recordset ("UsName "))
WriteSName

This makes the errorMoreObviously. Your eyes will learn suspicious program code, and this will also help you find hidden security vulnerabilities through the action of writing or reading program code.

It is great to let the wrong program see the error, but it is not the best solution to all security questions. It cannot find all possible problems or errors, because you may not be able to read every line of program code. But it is definitely better than doing nothing, and I hope there are a set of coding specifications that will make the wrong program code at least look wrong. You can get the benefit immediately. Every time a programmer's eyes scan a program, it can check and prevent certain errors.

A general rule

There is a premise to make the wrong program look wrong, that is, to keep the right things together on the screen. When I see a string and decide whether the program code is correct or not, I must know where the string appears and whether the string is safe or not. I don't want these materials to appear in another file or another page that can be viewed only when you need to scroll the screen. I must be ableOn the spotThis is a set of variable naming rules.

There are many other examples to illustrate that you can improve the program code by moving something together. Most programming specifications have the following rules:

  1. Keep the function name brief.
  2. The closer the place where the variable declaration is to the used location.
  3. Do not use macros to establish your own proprietary programming language.
  4. Do not useGoto.
  5. Do not set the right arc to more than one image from the left arc.

One thing these rules have in common is to try to make the information about the actual function of a line of program code closer and better on the screen. This will increase the chance for the eye to identify the substantive content of the program.

In general, I have to admit that I am a little afraid of hiding program language functions. When you see the program code

I = J * 5;

...... For C, you will at least knowJMultiply by 5 and the result is savedI.

However, if you see the same segment in C ++, you do not know anything. In C ++, the only way to know what actually happens is to find outIAndJType, which may be declared in completely different places. BecauseJOfOperator *There may be heavy loads, which will be clever when you are doing multiplication. WhileIOfOperator =They may also be overloaded, and the two types may be incompatible, so they call a function for forced conversion of an automatic type. Checking the type of the variable is not enough to confirm. You have to check the program code that implements this type. It is more troublesome to inherit other types when implementing the variable, because you have to trace back the eight generations inherited by the category to find the real program code. Unfortunately, it is useful for Polymorphism elsewhere.TrueIt's a big headache, because you just knowIAndJStatementThe types are not enough. You need to know them.NowI don't know how much program code to look at, and according to the downtime problem of Computing Theory, you can never be sure that you have read all the places (ah, ah, ah !!!).

When you see the C ++I = J * 5You have to be blessed, brother. This reduces the ability of the program code to identify problems.

Of course, theoretically, this should be okay. When you do some reloadOperator *When it comes to smart things, you only need to provide a beautiful and secure abstraction. Oh, my God, actuallyJIt is a unicode string type. A Unicode string is multiplied by an integer.ApparentlyIs it a good abstract practice to convert Chinese to simplified Chinese, right?

Of course, there is no absolute security abstract method. I have already discussed a lot in the abstract error law, so I won't repeat it here.

Scott Meyers demonstrated various types of abstract errors (at least C ++) and the damage they have caused. He created a career on this topic. (By The Way, Scott's book titled Objective C ++ has just been published; the entire book has been rewritten; buy one today !)

Okay.

It's a little deprecated. I 'd better review the content so far:

Find out the programming specifications that make the wrong program look wrong. Let the correct information be concentrated in the same place in the program code, so that you can see some problems and fix them immediately.

I'm Hungary.

We are now back to the notorious Hungarian naming law.

The Hungarian naming convention was invented by Microsoft's programmer Charles Simonyi. Simonyi's main plan at Microsoft is word; in fact, he also hosts the world's first WYSIWYG processor (in Xerox PARC called the Bravo program ).

In WYSIWYG document processing, you can use a rolling window. Therefore, the coordinate value has two meanings: relative to the window or relative to the processing page. The two coordinates are very different, so it is very important to make proper arrangements.

I guess this is one of the reasons why Simonyi started to adopt some post-called Hungarian naming rules. It looks like Hungary, and Simonyi comes from Hungary, so it is named Hungary. In the Hungarian naming method of Simonyi, each variable is preceded by a lowercase letter, indicating the type of the variable content.

For example, if the variable name is rwcol, RW is the beginning of the word.

I intentionally usedKind)This word, because Simonyi misuse it in his article.Type)As a result, several generations of programmers have misunderstood his meaning.

If you carefully read the Simonyi article, you will find that the naming rules he mentioned are the same as those used in my previous examples.UsAndSThey are defined as insecure strings and secure strings respectively. Both types areString. If you assign another type of string, the compiler will not give any warning, and intelliisense will not say anything. However, their semantics is different. They interpret and process different strings in different ways. Some conversion functions need to be converted when two strings are assigned to each other, otherwise there will beExecution period.WishGood luck.

Microsoft's internal department called Simonyi's original concept of Hungary's life name method is to apply the Hungary life name method because it is used by the Application Department, that is, Word and Excel. There are a lotRWAndColAs you can see, the headers will know that they refer to rows and columns ). Yes, they are all integers, but the conversion between them is meaningless. Someone told me that the word program code contains a lotXLAndXW,XLRepresent the horizontal coordinates relative to the typographical page, and
XWThe horizontal coordinates of the relative window. Both are integers, but they cannot be converted. There are manyCBIndicates the number of bytes. Yes, this is also an integer type, but you can get more information by looking at the variable name: this is the number of bytes, that is, the size of the buffer. If you seeXL = CBYou can pull the alarm. This is obviously a wrong program, althoughXLAndCBAll are integers, but it is absolutely crazy to set the horizontal displacement in pixels to the number of bytes.

The header of the Hungarian naming method can be used for functions and variables. Therefore, although I have never read the source code of word, I bet that word must containYlfromywYou can convert the window coordinates in the vertical direction to the typographical page coordinates in the vertical direction. Using the Hungarian name MethodTypefromtypeReplace the traditionalTypetotypeIn this way, each function name starts with the type returned, which isEncodeRenamedSfromusIn the same way. In fact, in the formal application of the encode function in the Hungarian naming methodYesTo renameSfromus. No other options are provided for the function name by applying the Hungary name method. This is actually a good thing, because you have to do one less thing, and you don't have to worry about it.EncodeWhat type is used. The program has become much more accurate.

It is very useful to use the Hungary naming method, especially when the C language was prevalent and the compiler did not provide a very useful type system.

However, there are some problems.

The dark world occupies the Hungarian naming convention.

It seems that no one knows why or how it happened, but it seems that the people who write files in the Windows team accidentally created something later named the system Hungary name method.

Someone reads Simonyi's article and sees the word "type" in it. Therefore, the author thinks that the author refers to the type, which means the type or type system, or the type check done by the compiler. Actually not. The author carefully and accurately explains the meaning of the word "type", but it is useless. The damage has been caused.

It is useful and meaningful to use the Hungarian naming method. "IX" indicates the array index, and "C" indicates the number, "D" indicates the difference between two numbers (for example, "dx" indicates "width"), and so on.

The function of the system's Hungarian naming method is far less effective. "L" indicates a long integer, "Ul" indicates a positive and long integer, and "DW" indicates a two-character (Er, it is actually a positive and long integer ). In the Hungary naming method, the beginning of the word can only tell you the true data type of the variable.

This misunderstanding of Simonyi's intention and implementation, though subtle, is completely different. The only lesson in this case is to let you know that if you write difficult and difficult academic articles that no one can understand, your ideas may be misunderstood and the results will become ridiculous, it is totally against your original intention. Therefore, a large numberDwfooIt indicates "double-character xx". The worst thing is that a variable is double-character, which is almost useless to you. No wonder everyone hates the Hungary naming convention.

The Hungarian naming convention is widely circulated. It is the standard of the entire windows programming file. Charles Petzold's windows programming (learning the bible of Windows Programming) and so on. Soon it became the main force in Hungary's naming law, even within Microsoft. In Microsoft, only a few programmers who are not in the Word and Excel teams know what mistakes they have made.

The next step is big resistance. A group of programmers have never understood the Hungarian naming law from the very beginning. They discovered that they were using annoying and almost useless branches, so they began to resist. However, there are some good things in the Hungary naming method to help you see the problem. If the Hungary naming method is used, the variable type is known at least during use. However, there is no value in applying the Hungarian naming method.

The great resistance reached its peak when the first version of. Net was released. At that time, Microsoft finally told everyone that "the Hungarian naming convention is not recommended 」. This is really a buzz. I don't think Microsoft will bother explaining the cause. They only refer to the naming guidelines section in the scanning document and add the words "do not use the Hungarian Naming Convention. At that time, Hungary's naming law was very unpopular, so no one complained, and people except Excel and Word did not have to use such troublesome naming rules, they believe that such regulations are not required in the era of strong type checks and intelliisense.

However, the application of the Hungarian naming method is still very valuable. It strengthens the link of program code to make the program code easier to read, write, debug and maintain. The most important thing is that it makes the wrong program visible and wrong.

There is one more thing I said before I continue to do, that is, to scold another exception handling. The last time I did this, it caused a lot of trouble. I wrote in an impromptu comment on the home page of Zhou sibo, saying that I don't like Exception Handling because it is actually a hidden Goto. I think this is worse than the Goto. Of course, millions of people have come to scold me. The only one in the world that jumps out to defend me is, of course, Raymond Chen. By the way, since he is the best programmer in the world, he must speak out, right?

This article focuses on exception handling. Your eyes learn to see the wrong program code to prevent the problem from occurring. To make the program truly stable, you must have a set of naming rules for centralized information during code inspection. In other words, the more information you have on program operations, the better you will find the wrong results. When you see the following program code

Dosomething ();
Cleanup ();

... Your eyes will say it's okay. We always need to clear the action! HoweverDosomethingIt may cause an exception, so it may not callCleanup. UseFinallyThis problem can be fixed easily, but it is not my focus: the problem is to knowCleanupThe only method that will be called is to investigate the entireDosomethingCall tree to see if there are any exceptions. This is also good, controllable standard Exception Handling (checked
Exception) does not work that hard, but the key is that exception handling splits the information away. You have to go.Other placesIn order to know that the program can be correctly executed, you cannot use the feature of your eye talent to learn to see wrong program code, because there is nothing to see.

If I write a small script program that collects data everywhere once a day and prints it out, exception handling is incredibly useful. I just want to ignore all possible errors and directly wrap the entire program with a large try/catch. If there is any problem, use catch to email the error to myself. Exception Handling is useful for simple programs that can be written at will. It is also good for script programs or programs that are important or irrelevant to life and death. However, if you are writing an operating system or nuclear power plant program, or a high-speed electric saw for Happy surgery, exception handling is dangerous.

I know that everyone will think that I am a stupid programmer who cannot correctly understand Exception Handling. I do not know that it can improve my life only when I accept Exception Handling sincerely. This idea is really bad. To write truly trustable program code, you should try out simple tools that take into account the weakness of people, instead of relying on those that provide problematic abstraction and hide the side effects, it also believes that programmers will never make mistakes in complex tools.

Additional books

If you still want to handle exceptions, read Raymond Chen's article to make it more elegant, but harder to read. 「 Whether the exception handling is correct or not, it is difficult for the program code to see it... it is too difficult for me to handle the exception .」

Raymond's article on critical macros a rant against flow control macros discusses another example of program maintenance failures caused by scattered information. 「 When you see the program code using [Macro], you must read the header files to understand their functions .」

To learn about the historical background of the Hungarian naming law, you can start with Simonyi's original Hungarian naming law. Doug klunder introduced it to the Excel group in another clear article. If you want to know more about Hungary's naming law and how it was destroyed by the author, you can go to Larry.
Post on Osterman, especially Scott Ludwig's comment, or rick
Schaut post.

I made fogbugz when I did not write software articles: A stupid smart project management software. Now let's take a look (there is also a free online trial ). We just launched the big upgrade fogbugz 4.0!

The content of these web pages is to express personal opinions.
All contents copyright 1999-2005 by Joel Spolsky. All rights reserved.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.