Baidu's Advanced Search Method (preliminary round of 2007)
Question description:
Have you tried to use the site inurl syntax query on Baidu? If not, try again.
For example, enter site: www.baidu.com inurl: News.
All URLs containing the "news" substring on www.baidu.com are found.
Now we have two copies of data: site_inurl.txt and url.txt.
Each line in site_inurl.txt is a query string consisting of the site inurl syntax, And the URL list is saved in url.txt.
Can you find all the URLs that can be retrieved by the query string in site_inurl.txt in the URL list?
For example, the content of site_inurl.txt is as follows:
Site: www.baidu.com inurl:/more
Site: zhidao.baidu.com inurl:/Browse/
Site: www.sina.com.cn inurl: www20041223am
Url.txt contains the following content:
Http://www.baidu.com/more/
Http://www.baidu.com/guding/more.html
Http://www.baidu.com/events/20060105/photomore.html
Http://hi.baidu.com/browse/
Http://hi.baidu.com/baidu/
Http://www.sina.com.cn/head/www20021123am.shtml
Http://www.sina.com.cn/head/www20041223am.shtml
The output result of your program running should be:
Http://www.baidu.com/more/
Http://www.baidu.com/guding/more.html
Http://www.sina.com.cn/head/www20041223am.shtml
The program uses the command line to input these two file names. The first parameter is the file name corresponding to the site_inurl file, and the second parameter is the URL column.
The file name corresponding to the table. Please output the program output to the standard output.
The following is the source code. This question is relatively simple. You only need to extract and store the content from the two input files. When extracting the information in site_inurl, you must filter out the information you do not need. You only need the information to be queried later. After the preceding steps are completed, the query information is extracted and compared with all URLs. If the comparison is successful, the URL is output to the standard output.
# Include <iostream>
# Include <vector>
# Include <fstream>
Using namespace STD;
Void inputall (vector <string> & input, vector <string> & Data, char * file1, char * file2)
{
Ifstream in (file1 );
Ifstream store (file2 );
String STR;
Char Buf [100];
Do
{
In. Ignore (100 ,'');
In. Ignore (6 );
In. Getline (BUF, sizeof (BUF ));
STR = Buf;
Input. push_back (STR );
} While (in );
Input. pop_back ();
While (store. Getline (BUF, sizeof (BUF )))
{
STR = Buf;
Data. push_back (STR );
}
}
Void getresult (const string & STR, const vector <string> & Data)
{
Int Len = Str. Length ();
For (INT I = 0; I <data. Size (); I ++)
{
For (Int J = 0; j <data [I]. Length (); j ++)
{
If (data [I]. Compare (J, Len, STR) = 0)
{
Cout <data [I] <Endl;
Break;
}
}
}
}
Void process (const vector <string> & input, const vector <string> & Data)
{
For (INT I = 0; I <input. Size (); I ++)
Getresult (input [I], data );
}
Int main (INT argc, char * argv [])
{
Vector <string> input;
Vector <string> data;
Inputall (input, Data, argv [1], argv [2]);
Process (input, data );
}