HTML Parser project progress and new ideas

Source: Internet
Author: User
This parser is dedicated to researching as a personal hobby. , Lasted for a long time, during which a lot of things occurred. I stayed in Shanghai for a long time And the project is named wittiness. Project purpose:Build a Web Information excavator to efficiently and conveniently To capture the required information. Build idea is: Parse the HTML Tag --> build a hierarchical object --> query to obtain the object that carries the required information --> output results at the object level, where the difficulty is to parse the tag and query to obtain the object. Resolution mark I used positive Then the expression, string judgment, sgmlreader, it is best to think sgmlr The eader is easy to use and efficient. Query and obtain objects. The method is obtained based on the number of indexes in a table. This process takes less time. New ideas:Split the HTML and read it as string []. The unit is an HTML Tag. Two array pointers are used to determine the part to be intercepted. The part is retrieved from the beginning and one from the end, this can effectively solve the problem of tag matching. Only array pointers are used when constructing object hierarchies. I think this should provide efficiency and save memory. Later, we can improve this method to stream processing, so it is more efficient to read large files. If you have a GUI to allow users to select the content to be captured, the operation is more convenient. I don't know how the editor like Dreamweaver is developed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.