Processing the large amount of HTML data obtained

Source: Internet
Author: User
Get to a whole bunch of HTML data, about 30,000 characters in size. But the front is useless, only the back thousands of characters are useful, want to extract data from the inside. If you waste a lot of resources directly with regular processing, there is no way to start the regular from the last side of the string and then take it somewhere and stop.
Just started to try to use the Simple_html_dom class to deal with, but later stuck ...
Because each time you go to the thing is not necessarily length, so direct interception of a fixed number of length of characters this method is not very good.


Reply to discussion (solution)

Since you're not exactly where you need to start, there's no problem with wasting resources.

Get HTML
File_get_contents
Curl

Regular parsing of HTML, you can get the text you want.

There is no good way, because you want to filter the data, either the regular match, or use the class you use.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.