Web table Information Crawling

Source: Internet
Author: User
The page source code is as follows:
 
  
 
  

Assuming that the page is test.html, and that the contents of the last table part information are not fixed, it could be 1 or more rows.
What do I do if I need to catch the Blue font section? Find a solution.


Reply to discussion (solution)

The TR of the loop table, directly grasping the value of TD

Does the page itself have blue on it when it returns data? If so, then

               
 
 
aaaaaa
aaaaaa
aaaaaa xxxx (AAAAAA)
aaaaaa xxxx
Adress aaaaaa adress
Delivery Planning Form
Planned arrival time 2013-09-16
Pus number 770266110 version 00
Customer Customers
*dynp-770266110-00*

Delivery Information Delivery information
Factory
Plant
Xxxxxx
Pickup time
Pick
up time
2013-09-09 16:30 Need supplier Feedback
Need Duns Response
N
Delivery date
Delivery Date
2013-09-16 Window time
Window
Time
16:30
Discharge port
Dock
cc-70d Head of unloading port
Dock Incharger
Kkk
Discharge Port Telephone
Dock Tel
011-1111 Unloading port Address
Dock Address
Adress
Place of delivery
Delivery
Place
Program Tracker
Follow
up
Kkkk Program Tracker phone/Fax
Followup Tel/fax
011-1111
Delivery Instructions
Delivery Note

TD align= "Middle" >
Part Information Parts List
Part number part description demand quantity commitment quantity Amount paid package number Bin number Bin number real hair bin number actual number of bins Real receipt Bin number actual number of bins remarks
1 12647212 60 60 15 4 p000000d
2 12654172 615 615 15 41 p000000d

'; $result = Array ();p Reg_match_all (' # (. *) #iUus ', $string, $result);p rint_r ($result [1]);


If it is not blue (Id,class or the like) words that can only be all units line match out according to the page structure needs to take

If it is not blue (Id,class or the like) words that can only be all units line match out according to the page structure needs to take
There is no color difference in itself, just my logo out of it.

$s the content of the page you provide

preg_match_all (' # 
  
    #isU ', $s, $r); $r = Array_map (' Trim ', array_map (' Strip_
 Tags ', $r [0]);p rint_r ($r); 
  
Array ([0] = [1] = AAAAAA aaaaaa [2] = aaaaaa xxxx (AAAAAA) [3] + aaaaaa xxxx [4] = = AD                   Ress aaaaaa adress [5] = [6] + = delivery Plan [7] + = planned arrival time 2013-09-16 [8] = pus number Version 770266110 [9] = Customer client [ten] = *dynp-770266110-00* [one] = [] =&gt ; Delivery information Delivery information [] = factory plant [+] = xxxxxx [] = pick up time Pick up Tim e [+] = 2013-09-09 16:30 [+] = need supplier feedback need Duns Response [+] = N [+] = delivery Date de livery Date [+] = 2013-09-16 [+] = window time [23] = 16:30 + Unload Port dock [[]] = cc-70d [+] + discharge port person in charge dock incharger [] = KKK [+] = discharge port Phone Dock Tel [28  ] = 011-1111 [+] = port of discharge address dock [+] = adress [+]-Delivery location delivery place [32] = [+] =Plan tracker follow up [+] = KKKK [+] + Plan tracker phone/fax followup tel/fax [[+] = 011-1111 [PNS] =&gt ;     Delivery instructions delivery note [MAX] [[+]]/part information parts list [[+] = [[+] = [+] = Part No. [+] = part description [[+] = quantity required [+] = [[[]] [[[]] [[[]] [+] = [[]] [[] = [[+]] = number of boxes [[+] = bin number [49] Gt    The actual material box number [[] = the number of the actual delivery box [] = [[]] [[+]] [[+] = [[+]] [[+] = [[+]] [[+]] [[+]] = [[+]] = 12647212    [[+] = [+] [[+] [] [] [] [] [] [] [] = [] = [+] = [4] = + [[+] = [+] = [+] = [[+] = [] [[] [] [] [] [] [] [] [] + [2] = 12654172 [70] =&     Gt [[+] = [+] = [+] = [Bayi] =]
It's not difficult to read something, is it?
The second table starts with subscript 40, 14 Columns $t = Array_chunk (Array_slice ($r, x), +); for ($i =1; $i
   
  
Array (    [0] = = Array        (            [serial number] = 1            [Part number] = 12647212            [Part description] =             [Demand quantity]            [ [Quantity] = [number of packages] = [number of cartons] = [number of bins] +            4            [bin number] = = [            real             -box number] = [ Actual number of boxes] =             [Real receipt box number] and             [number of actual receiving bins] = [             remark] = =         )    [1] = = Array        (            [serial number] = = 2            [Part number] = = 12654172            [Part description] +             [demand Quantity] = 615            [Commitment Quantity] = 615            [Quantity Received] =             [number of packages] = [Box Number] = [bin number] [material box No.] = [real hair bin number] and [Actual delivery box number] and [             Real Receipt bin No.] =             [ Number of bins] =             [remarks] =)         )

Preg_match_all (' # #isU ', $s, $r);

This is how it's used? Thank!

Preg_match_all (' # #isU ', $s, $r);
If there are different items for the value of the page, how do I ask for those items?
For example: [Ten] = *dynp-770266110-00*, sometimes [] and *dynp-770266110-00*.
But the value of the previous item is the same, except that the key value is different. Example: [9] = Customer this item.

That's your problem.
General description text and data are always paired, and the text is in front of the data in the back

That's your problem.
General description text and data are always paired, and the text is in front of the data in the back

If there is a table before the 1 floor, then Array_combine will have a warning hint.

      
 
    
   
Supplier Signature Carrier Signature
Supplier Signature
_____________ carrier signature _____________
Supplier Confirm Time
Supplier Confirmation Time 13-09-10 09:01
Receiver Signature
Consignee signature
_______________

Date
______________
END of PAGE * * *

How do I filter out this table's information?
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.