Web page capture information (php regular expression, php excel operation ). Web page capture information (php regular expression, php excel operation) 1. problem description: captures the information you need on a fixed web page and stores it in tables. I took a row of web page capture information (php regular expression, php excel operation) on wustoj)
1. problem description
Captures the information you need on a fixed web page and stores it as a table. I practiced using a ranking list on wustoj. Address: wustoj
2. ideas
The webpage simply learned php and used it to do something. my idea is as follows:
(1) view the source code of the webpage and save it in a file.
(2) write regular expressions based on required information, read files, and extract required information based on regular expressions. It is best to group when writing regular expressions, which makes it much easier to extract.
(3) operate on excel and output the extracted information in the form of excel.
Better open-source php excel processing links: click to open the link
3. experience
^ Indicates the start of the original string, and $ indicates the end of the original string.
Null characters are not necessarily spaces.
Grouping with () is a good method, such as preg_macth_all (/$ pattern/, $ subject, matches ).
Matches is a two-dimensional array. if there is no _ all, it will only match the first part, which is a one-dimensional array.
$ Matches [0] saves all matches in full mode. $ Matches [1] saves all matches in the first sub-group, that is, the first part of all matches.
This $ patt_ch = chr (0x80). "-". chr (0xff) is used for Chinese matching strings ).
4. code
1team30 _ NAME $ namepatt = "() (\ * {0, 1} team [0-9] +) (_) ([$ patt_ch] +) (<\/a>) "; // part2 part4 // $ namepatt =" (team [0-9] +) (_) ([$ patt_ch] + )"; you can also use this to directly match "team _ name" // 7 $ problempatt = "() ([0-9] +) (<\/a> )"; // Include classrequire_once ('classes/PHPExcel. php '); require_once ('classes/PHPExcel/Writer/excel2007.php'); $ objPHPExcel = new PHPExcel (); // Set properties to Set File attributes $ objPHPExcel-> getProperties () -> setCreator ("Maarten Balliauw "); $ ObjPHPExcel-> getProperties ()-> setLastModifiedBy ("Maarten Balliauw"); $ objPHPExcel-> getProperties ()-> setTitle ("Office 2007 XLSX Test Document "); $ objPHPExcel-> getProperties ()-> setSubject ("Office 2007 XLSX Test Document"); $ objPHPExcel-> getProperties ()-> setDescription ("Test document for Office 2007 XLSX, generated using PHP classes. "); $ objPHPExcel-> getProperties ()-> setKeywords (" office 2007 openxml ph P "); $ objPHPExcel-> getProperties ()-> setCategory (" Test result file "); $ row = 1; $ objPHPExcel-> getActiveSheet () -> setCellValue ('A '. $ row, 'rank '); $ objPHPExcel-> getActiveSheet ()-> setCellValue (' B '. $ row, 'team'); $ objPHPExcel-> getActiveSheet ()-> setCellValue ('C '. $ row, 'solved'); while (! Feof ($ file) {// echo $ row. ""; $ line = fgets ($ file); if (preg_match ("/$ rankpatt/", $ line, $ match) {$ row ++; // print_r ($ match); // echo $ match [2]. ""; // echo ""; $ objPHPExcel-> getActiveSheet ()-> setCellValue ('A '. $ row, $ match [2]); $ objPHPExcel-> getActiveSheet ()-> getStyle ('A '. $ row)-> getAlignment ()-> setHorizontal (PHPExcel_Style_Alignment: HORIZONTAL_LEFT);} if (preg_match ("/$ namepatt/", $ line, $ match )) {// print_r ($ match); // e Cho $ match [2]. "". $ match [4]. ""; // echo ""; $ objPHPExcel-> getActiveSheet ()-> setCellValue ('B '. $ row, $ match [2]. $ match [4]);} if (preg_match ("/$ problempatt/", $ line, $ match) {// print_r ($ match ); // echo $ match [2]. ""; // echo ""; $ objPHPExcel-> getActiveSheet ()-> setCellValue ('C '. $ row, $ match [2]); $ objPHPExcel-> getActiveSheet ()-> getStyle ('C '. $ row)-> getAlignment ()-> setHorizontal (PHPExcel_Style_Alignment: HORIZONTAL_LE FT);} $ objWriter = new PHPExcel_Writer_Excel2007 ($ objPHPExcel); $ objWriter-> save (str_replace ('. php ', '.xlsx', _ FILE _);} echo "well done :)";?>
5. running result
Compile (php regular expression, php excel operation) 1. problem description: captures the information you need on a fixed web page and stores it as a table. I'm taking a row on wustoj...