[Tse Study Notes of Peking University Skynet search engine] section 6th -- getting user input

Source: Internet
Author: User
This section describes step 2 of the search function Entry Program tsesearch. cpp-getting user input.
(1)

To obtain the query data entered by the user in the browser, You need to interact with the Web server, which uses the CGI method mentioned above. If you are not familiar with the CGI program, please first read the relevant materials to learn (recommended: http://blog.csdn.net/lewsn2008/article/details/8519908 ). Here, we will briefly describe how the CGI program obtains the data entered by the user in the browser.

First, it indicates that the method attribute of the form defined in index.html is "get", that is, the browser will send data to the server in "Get mode, the CGI program of the server also needs to get data in "Get mode. After you enter the search string and click the search button, you will want the server to send a merged URL. In my Tse, enter "Tsinghua University" and click the search button to get the URL:

Http: // localhost/YC-cgi-bin/index/tsesearch? WORD = % C7 % E5 % BB % Aa % B4 % F3 % D1 % A7 & WWW = % CB % D1 % CB % F7 & cdtype = GB.

There is a question in the middle that splits urlinto two parts. The previous part is the cgiprogram's path, and the CGI program path specified by the Action attribute in index.html. The last part is the data requested by the user, and the data part is a key-Value Pair separated by the & symbol, this is the form :? Key1 = value1 & key2 = value2. In another observation, there are three key-value pairs separated by wordshortwwwand cdtype. looking at the <form> definition in index.html, we found that the names in the three <input> tags correspond to the three key names one by one. Therefore, it is clear that "Get mode" will send the data in <form> to the server by appending the word URL in the form of a key-value pair, therefore, the string "% C7 % E5 % BB % Aa % B4 % F3 % D1 % A7" after "word =" in the URL is actually the query string "Tsinghua University" I entered. chinese character encoding, the string "www =" followed by "% CB % D1 % CB % F7" is the Chinese character of "Search". You can try to modify the name of the button in index.html, the changes will be found here.

How does the CGI program of the server obtain the additional data information in the URL? After receiving the user's request information, the server sets the data to the environment variable, so CGI can read the data from the environment variable. Here we will not repeat the environment variables here. We will only mention two very important environment variables used in the program: when the condition defines <form>, there is an attribute "method" and "QUERY_STRING, it is the data string appended to the URL. the QUERY_STRING corresponding to the above URL is:
WORD = % C7 % E5 % BB % Aa % B4 % F3 % D1 % A7 & WWW = % CB % D1 % CB % F7 & cdtype = GB, the CGI program needs to parse this string to extract the required data (key-value pairs ).

(2)

In section 4th, the main function obtains user input and defines a cquery class object, which is the implementation class of the search function, provides interfaces for loading data files, Obtaining user input, Chinese word segmentation, and searching keywords. Then, call the getinputs function of the cquery class, which parses QUERY_STRING and extracts the key-value pair. Before looking at the source code of the function, let's take a look at the definition of the cquery class, mainly the data members defined in it.

// Lb_c: Structure htmlinput_struct is used to record a key-Value Pair typedef struct {char name [maxnamelength]; // lb_c: key name charvalue [maxvaluelength]; // lb_c: key Value} htmlinput_struct;
Class cquery {public: String m_squery; // lb_c: stores the user-input search string m_ssegquery; // lb_c: stores the search string unsigned m_istart separated by "/" after Chinese Word Segmentation; // lb_c: records the page number of the displayed result set selected by the user. The default value is 1st pages ...}

The following is the source code of the getinputs function. A detailed comment is added to explain the meaning of the Code, which should be well understood.

/** Get form informationthrought environment Varible. * return 0 if succeed, otherwise exit. */INT cquery: getinputs () {int I, j; // lb_c: Obtain the environment variable request_method. Obtain whether the browser submits an HTTP request to the server in get mode or post mode, the TSE system adopts the get mode char * mode = getenv ("request_method"); char * tempstr; char * in_line; int length; // lb_c: the page returned to the browser (standard output will be returned to the browser) // lb_c: the first line of the returned page must be like this, the displayed content is cout <"Content-Type: text/html \ n "; // Lb_c: Starting from here, the HTML statement of the page is cout <"<HTML> \ n"; cout <"

The getinputs function parses QUERY_STRING to extract key-value pairs and store them in the htmlinputs array in sequence. The preceding QUERY_STRING content is used as an example. The content of htmlinputs after parsing is as follows:

Htmlinputs [0]. Name = "word" htmlinputs [0]. value = "Tsinghua University"

Htmlinputs [1]. Name = "www" htmlinputs [0]. value = "Search"

Htmlinputs [2]. Name = "cdtype" htmlinputs [0]. value = "GB"

(3)

The main function obtains user input and calls two functions, setquery and setstart, which are explained below.

void CQuery::SetQuery(){     string q =HtmlInputs[0].Value;     CStrFun::Str2Lower(q,q.size());     m_sQuery = q;}void CQuery::SetStart(){     m_iStart =atoi(HtmlInputs[1].Value);}

First, the setquery function obtains the user-input search string (htmlinputs [0]. Value) from the key-Value Pair array and saves it to m_squery.

Second, the setstart function is used to set the page number for displaying the search result set (because there may be many search results webpages, one page cannot be displayed completely, and the page needs to be displayed, you can select the page number of the displayed result, as shown in 1). This function sets the user-selected page number to the data member m_istart. The page number is displayed when the search result is displayed. However, there is a clear problem here. The search result webpage displayed when the user searches from the homepage has a problem. As shown in 1, 114 results are found in total, but the line display is empty. Because htmlinputs [1]. the value is not the page number selected by the user. The above analysis shows that htmlinputs [1] corresponds to the search button, so the value of the button is stored as "Search", so the atoi ("Search ") m_istart is assigned, causing m_istart to be an incorrect value! If you perform a new search on the search results page or select the page on which the search results are displayed, the displayed results are normal because of the m_istart value, the analysis will continue when "show search results" is introduced later.

Figure 1

Why does the original author write such code? There is also a reason. Because, when you perform a new search on the search results page or select the page on which the search results are displayed, the second key-value pair of the submitted URL is indeed the page number (which will be described in detail later in the "show search results" section). The key name of the key-value pair is "start ", in this case, you can obtain htmlinputs [1]. value is assigned to m_istart. Therefore, when setting m_istart, you must determine whether the key name is "start ". Here I have modified the function as follows to check whether the key-Value Pair stored in htmlinputs [1] is a "start" key-value pair. If yes, modify m_istart. Otherwise, set it to the default value of 1.

Void cquery: setstart () {If (strcmp (htmlinputs [1]. name, "Start") = 0) {m_istart = atoi (htmlinputs [1]. value);} else {m_istart = 1; // lb_c: If there is no page number in the URL, set the default value (the first page of the result is displayed by default )}}

By:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.