CGI program development Basics

Source: Internet
Author: User

1. cgi script Structure

When a script is triggered by the server, the server often transmits the information to the script in one of two ways: Get or post. These two methods are called request methods. The request method used is to pass the environment variable to the script. The environment variable is called request_method (two other request methods, head and put, are also defined, but they are not particularly used in CGI, and do not encourage them ).

1) get is a request to data-the same method is used to obtain static documents. The get method sends the request information with parameters appended to the URL. These parameters will be passed to the CGI program in the environment variable QUERY_STRING. For example, a script named myprog.exe is started from the following link:

<A href = "cgi-bin/myprog.exe? Lname = blow & fname = Joe ">
Request_method is get, and QUERY_STRING contains lname = b1ow & fname = Joe. The format of QUERY_STRING is discussed in "url-encoding.

Question marks are used to separate the script names from the start of QUERY_STRING. On Some servers, question marks are mandatory even if QUERY_STRING is not followed. Other servers allow a forward slash to replace the question mark or be appended with it. If a slash is used, the server uses the path_info variable instead of the QUERY_STRING variable to pass the information to the script. (URL Decoding)

2) A post operation occurs when the browser transmits data from a filled form to the server. For post, the query string may be null or not empty, depending on the server. If there is information, it will be formatted and transmitted like get.

Data from post queries are uploaded from the server to the script using stdin. Because stdin is a source, the script needs to know the amount of valid data. The server also provides another variable content_length to indicate the number of bytes of the incoming data. The data format of post is:

Variable1 = value1 & variable2 = value2 & etc

Your program must check the request_method environment variable to know whether to read stdin. The content_length variable is generally only useful when the reouest_method is post.

The basic structure of the CGI application is simple and straightforward: initialization, processing, output, and termination. Because concepts, data sources, and programming rules are discussed, the pseudo code is used in the example instead of a specific language.

Ideally, a script has the following form (Do-initialize, do-process, and do-output represent the appropriate child routines ):
Program start
Call do-Initialize
Call do-proces
Call do 1 Output
The program ends.

The actual situation is not that simple.

1.1 Initialization

The first thing that must be done after a script is started is to determine its input, environment, and status. The basic operating system environment information can be obtained in the usual way: obtained from the system registration area in Windows NT or Windows 95, and obtained from the standard environment variables in Unix systems, in other Windows versions, you can get the INI file, and so on.

The status information comes from the input, not the operating environment or static variables. Remember: Whenever a CGI script is triggered, it seems to have never been triggered before. The script does not run continuously between calls. Everything must be initialized from the beginning, as shown below:

1. determine how the script is triggered
In typical cases, this involves reading the request_methood environment variable and analyzing the word get or post.

Note:
Although the current definition applies only get and post operations to COI, you may encounter put or head from time to time, this may happen if your server supports it and your browser or robot uses it. Pul7k is provided as an alternative for post, but it is never more (the RFC qualification is not accepted and is generally not used. The head is used by some browser fotl readers (the automatic browser is used only to extract the header of the HTML file and is not applicable to the C6 path. There are also some weird request methods. Your code should check whether it is get or post and reject any other methods. Do not assume that the request method is post if it is not get or vice versa.

2. Extract input data
If the method is get, you must obtain, analyze, and decode the QUERY_STRING environment variable. If the method is post, you must check QUERY_STRING and analyze stdin. If the content_type environment variable is set to application/X-WWW-form-urlencoded, the source from stdin also needs to be decoded.

1.2 handling

The script reads and analyzes the input to initialize the environment. What happens in this phase is far from being determined in the initialization phase. During initialization, the parameters are known (or can be found), and the tasks to be performed are more or less the same for each script. However, the processing phase is the core of the script, and the tasks to be done at this time are almost completely dependent on the script's goal.

1. process input data
What you do depends on the script. For example, you can ignore all input but only output data. The input may be spit out in a well-organized HTML format, and information may be retrieved from a database for display, or anything that was never imagined before. Processing data generally means converting it in some way. In traditional data processing terms, this is called a conversion step, because in batch job-oriented processing, a program reads a record and applies some rules to it (converts it ), write it back. CGI programs are rarely seen as traditional data processing, but the idea is the same. The program processes data in different CGI stages.
Program, -- in the data processing stage, you get the input and make something new from it.

2. output results
In a simple CGI script, the output is often just a header and HTML. More complex scripts may be: mixing the output graphics, graphics, and text, or all the information necessary to call the script again with some additional information. A common and more sophisticated technique is to use the get call script once, which can be done with a standard <a href> flag. The script can perceive that it is called with get, and dynamically create HTML forms, including hiding variables and the Code required to call the script again using post.

Compatibility problems

In the Unix world, streams are special files. By default, stdin and stdout are streams. The operating system helps you analyze the stream and ensure that all the passed 7-bitascii code is correct or the approved control code.

7-bit? Yes. There is no problem with HTML. However, if your script sends graphical data, using a character-oriented stream means immediate failure. The solution is to switch the stream to the binary mode. In C language, you can use the setmode function: setmode (fileno (stdout), o_binary ). Use setmode (fi1eno (stdout) and o_text) to switch between streams. A typical graphic script outputs the header in character mode, and then switches to binary mode for graphic data.

In the Windows NT world, stream behavior is the same for compatibility purposes. A simple \ n in the output. When it is written to stdout, it is transformed to \ r \ n. Generally, Windows NT calls, such as write fi1e (), do not undergo the above transformation. If you want a carriage return and a line break at the same time, you must explicitly specify \ r \ n.

Another statement of Character Mode and binary mode is cooked and raw. People who know these two terms may use them, rather than the more common one. No matter what words are used, on what platform, there is another problem about the stream: by default, they have a buffer, which means that the operating system suspends data, until you see a row Terminator, the buffer is full, or the stream is closed. This means that if you mix the prinif () statement with the fwriie () or fprintf () statement without the buffer, the things may become messy, although they are all written to stdout. Printf () writes data to the stream in a buffer zone, while file-oriented routines output data without a buffer zone. The result is a mess of disorder.

You may blame this on backward compatibility. Besides many old programs, the stream has no reason to set the default value as a buffer and cooked. This should be an option that can be opened when needed, rather than disabled when not needed. Fortunately, you can use setvbuf (stdout, null, _ ionbf, 0) to solve this problem. This function disables all the buffer zones of the utdout stream.

Another solution is to avoid mixing different types of output statements. Even so, cooked output cannot be converted to raw. So it is best to close all the buffers. Many servers and browsers do not like to accept monotonous input.

Note:
Those who often talk about Unix may frown on the term CRLF (carriage return and line feed), and those who program on other platforms may not know \ n or \ r \ n. CRLF is equal to \ r \ n. The C programmer uses \ r to represent a carriage return (CR) symbol and \ n to represent a line feed (LF) character. (For basic programming, LF is CHR $ (10, Cr is CHR $(13 ).)

1.3 termination

Termination means cleaning and exit. If you lock any files, you must release them before the program ends. If you allocate memory, semaphores, or other objects, you must release them. If you do not complete these operations correctly, the script will be "flash in the pan ". That is, the script can work when it is called for the first time, and will crash in future calls. In addition, because the script does not properly release resources and locks, it may impede or even damage other scripts or the server itself.

On Some platforms, Windows NT is the most prominent, followed by Unix-the file handle and memory object will be closed and withdrawn when the process is terminated. Even so, it is unwise to rely on the operating system to clean up the garbage for you. For example, on Windows NT, if a program locks all or part of a file and terminates without releasing the lock, the behavior of the file system will be uncertain.

Make sure that you understand the script resources and thoroughly clean up your exit routine, if any, and if any.

2. planning script

Now the reader has seen the basic structure of a script. Next we will learn how to plan a script from scratch. Follow these steps:
Define the program's tasks with some time

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.