It is used for website collection and is easy to use: it supports paging collection, image download, filtering, and so on. There are not many instructions. It is limited to the Second Development of php. The previous code snippets are deleted first. Please download the attachment directly, if you need the collection service, Can you contact me for PHP? Php *** capture ** @ authorAdministrator * @ example $ configarray (* host server location
It is used for website collection and is easy to use: it supports paging collection, image download, filtering, and so on. There are not many instructions. It is limited to the Second Development of php. The previous code snippets are deleted first. Please download the attachment directly, if you need the collection service, Can you contact me for PHP? Php/*** capture logs * @ author Administrator * @ example $ config = array (* 'host' = 'server location
It is used for website collection and is easy to use: it supports paging collection, image download, filtering, and so on. There are not many instructions. It is limited to the Second Development of php. The previous code snippets are deleted first. Please download the attachment directly, if you need the collection service, contact me
PHP
'Server address', * 'LIST' => array (* 'items '=> array (regular expression group), * 'page _ url' =>' regular expression of the paging address, $1 is the link, and the number displayed in $2 is ', * 'page _ size' => 'page size', * 'page _ url_rule' =>' obtains the regular number of page numbers, $1 must be a number, * 'page _ limit '=> Number. the maximum number of pages to be scanned. If not specified, then, only the page number of the visible fan is scanned. * 'this _ detail_callback' => 'call back the data on the details page ', * 'list _ detail_url '=> 'specifies the address of the detail page in the items in the list' *) ** details => array (* All rules on the details page, see items Structure Description *), ** time_limit => array ('rule' => corresponding group name, start => superstart time, end => end time ), * num_limit => How many data records are obtained *) ** items structure resolution: array (* 'attribute name' => array ('rule' => regular expression, array in multiple cases, type => '1-text, 2-remote request, 3-> 'sub-rule list items ', 4 => 'sub-config configuration', * replace => replace result, in the form of a callback function or using an array (from => 'regular expression', 'to' => replacement character), 'Multi '=> whether to collect multiple data records ),*) */set_time_limit (0); define ('in _ web', true); date_default_timezone_set ('prc'); include ('collector/init. php '); $ htmlFilter = '/
] * \/> | (Onclick | onmouseover | onmouseout | onblur) = \ "[^ \"] + \ ") |
|
] *> | <\/P> |
] *> (. + ?) <\/Style> |
] *> (. + ?) <\/Emded> |
] *> (. + ?) <\/Object> |
] *> (.*?) <\/Script> |
] *> (. + ?) <\/Noscript> |] *> | <\/a>/is '; $ config = array ('host' => 'HTTP: // news.wto168.net/zixun/', 'LIST' => array ('items '=> array ('time' => array ('rule' =>'/>. * ([0-9] {4}-[0-9] {1, 2}-[0-9] {1, 2} \ s * [0-9] {1, 2 }: [0-9] {1, 2}: [0-9] {1, 2}) <\/li>/I ', 'Multi' => true ), 'link' => array ('rule' => '// I', 'Multi '=> true ,), 'title' => array ('rule' => '/([^>] +) <\/a>/I', 'Multi '=> true, 'replace '=> array ('from' => '/【. +]/',' to '=> ''),), 'list _ detail_url' => 'link', 'page _ url' => '/
] *> \ D + <\/option>/I ', 'page _ url_rule' =>'/_ (\ d + )\. html/', 'page _ limit' => 10,), 'details' => array ('content' => array ('rule' => '/(. + ?)
/Is ', 'keep _ html' => true, 'replace' => array ('from' => $ htmlFilter, 'to' => '')),), 'list _ url' => '/^ http: \/news \. wto168 \. net \/zixun \/list/', 'detail _ url' =>'/^ http: \/news \. wto168 \. net \/zixun \/. *\. html/I ', 'Time _ limit' => array ('rule' => 'time ', 'start' => date ('Y-m-d'),); $ c = new collector ($ config); $ url = 'HTTP: // news.wto168.net/zixun/list_56_1.html'{}res = $ c-> collect ($ url); print_r ($ res);?>