Webpage table information capture

Source: Internet
Author: User
The source code is as follows:
 
 

Assume that the webpage is test.html, and the content of Part Information in the last table is not fixed. it may be one or multiple rows.
What should I do if I want to capture the blue font? Find a solution.


Reply to discussion (solution)

Loop table tr, directly capture the td value

When the page itself returns data, is there Blue on it? If yes, then

               
 
 
Aaaaaa
Aaaaaa
Aaaaaa xxxx (aaaaaa)
Aaaaaa xxxx
Adress aaaaaa adress
Delivery Schedule
Planned arrival time
PUS No. 770266110 version 00
Customer
* DYNP-770266110-00 *

Delivery Information
Factory
Plant
Xxxxxx
Pickup time
Pick Up Time
Supplier Feedback required
Need Duns Response
N
Delivery date
Delivery Date
2013-09-16 Window time
Window Time
16: 30
Unloading port
Dock
CC-70D Unloading port owner
Dock Incharger
Kkk
Unloading port number
Dock Tel
011-1111 Unloading port address
Dock Address
Adress
Delivery location
Delivery Place
Scheduler tracker
Follow Up
Kkkk Planned tracker Phone/Fax
FollowUp Tel/Fax
011-1111
Delivery instructions
Delivery Note

Part Information Part list
Serial Number Part number Part description Demand quantity Promised quantity Number of received instances Number of packages Number of bins Bin No. Real-Time bin No. Number of real-time bins Actual receiving bin No. Number of Real receiving bins Remarks
1 12647212 60 60 15 4 P000000D
2 12654172 615 615 15 41 P000000D

'; $ Result = array (); preg_match_all (' # (. *) # iUus ', $ string, $ result); print_r ($ result [1]);


If there is no blue (id, class, and so on), then only the regular expression of all cells can be matched and retrieved according to the page structure.

If there is no blue (id, class, and so on), then only the regular expression of all cells can be matched and retrieved according to the page structure.
There is no color distinction, but it is identified by me.

$ S page content for you

preg_match_all('#
 
  #isU', $s, $r);$r = array_map('trim', array_map('strip_tags', $r[0]));print_r($r);
 
Array ([0] => [1] => aaaaaa [2] => aaaaaa xxxx (aaaaaa) [3] => aaaaaa xxxx [4] => adress aaaaaa adress [5] => [6] => delivery schedule [7] => planned arrival time [8] => PUS No. 770266110 00 [9] => Customer [10] => * DYNP-770266110-00 * [11] => [12] => Delivery Information [13] => factory Plant [14] => xxxxxx [15] => pickup Time Pick Up Time [16] => 2013-09-09 [17] => supplier feedback Need Duns Response [18] => N [19] => Delivery Date [20] => 2013-09-16 [21] => Window Time [22] => [23] => unloading port dock [24] => CC-70D [25] => unloading port owner Dock Incharger [26] => kkk [27] => unloading port telephone Dock Tel [28] => 011- 1111 [29] => unloading port Address Dock Address [30] => adress [31] => Delivery location Delivery Place [32] => [33] => scheduler tracker Follow up [34] => KKKKK [35] => scheduler tracker Phone/Fax FollowUp Tel/Fax [36] => 011-1111 [37] => Delivery instructions Delivery Note [38] => [39] => Part Information Part list [40] => No. [41] => Part No. [42] => Part description [43] => required quantity [44] => promised quantity [45] => actual quantity [46] => Number of packages [47] => Number of bins [48] => Number of bins [49] => real-Time bin No. [50] => Number of real-time bins [51] => Number of real-time bins [52] => Number of real-time bins [53] => remarks [54] => 1 [55] => 12647212 [56] => [57] => 60 [58] => 60 [59] => [60] => 15 [61] => 4 [62] => P000000D [63] => [64] => [65] => [66] => [67] => [68] => 2 [69] => 12654172 [70] => [71] => 615 [72] => 615 [73] => [74] => 15 [75] => 41 [76] => P000000D [77] => [78] => [79] => [80] => [81] =>)
Isn't it difficult to read a certain item?
// The second table starts with the subscript 40, with 14 columns $ t = array_chunk (array_slice ($ r, 40), 14); for ($ I = 1; $ I
   
  
Array ([0] => Array ([serial number] => 1 [part number] => 12647212 [part description] => [quantity required] => 60 [quantity promised] => 60 [quantity received] => [number of packages] => 15 [number of bins] => 4 [number of bins] => P000000D [number of bins] => [number of bins actually delivered] => [actual receiving bin number] => [number of actual receiving bins] => [remarks] =>) [1] => Array ([serial number] => 2 [part number] => 12654172 [part description] => [quantity required] => 615 [quantity promised] => 615 [quantity received] => [number of packages] => 15 [number of bins] => 41 [number of bins] => P000000D [number of pallets] => [number of pallets] => [actual receiving bin number] => [number of actual receiving bins] => [remarks] => ))

Preg_match_all ('# # IsU ', $ s, $ r );

What is this regular expression used? Thank you!

Preg_match_all ('# # IsU ', $ s, $ r );
If some pages have different values, how can we find those items?
For example: [10] => * DYNP-770266110-00 *, sometimes [12] => * DYNP-770266110-00 *.
However, the values of the previous item are the same, except that the key values are different. For example, [9] => Customer.

That's your problem.
Generally, the text and data are always paired, and the description text is in front and the data is in the back

That's your problem.
Generally, the text and data are always paired, and the description text is in front and the data is in the back

If you are on the first floorThere is a table before, so array_combine will have a warning prompt.

      
   
  
Supplier Signature Carrier Signature
Supplier Signature
_____________ CarrierSignature _____________
Supplier Confirm Time
Supplier confirmation time 13-09-10
Receiver Signature
Receiver signature
_______________

Date
______________
* ** End of page ***

How can I filter out the table information?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.