. Net implementation (webbrowser data collection-basics)

Source: Internet
Author: User

I always like to open an opening remark when I write a blog, but I read it carefully. Article My friends can understand what I mean. I want more people to understand the purpose of my article, and strive to display complex abstract concepts in a simple and easy-to-understand manner, the true purpose is not to write articles, but to help you learn technology. there are various technologies in the net field. A person has limited energy. It is normal to give a careless explanation of a certain technical point, sometimes, when I read articles from my predecessors, I can find out how many of my predecessors wish to pass their life-long experiences and technologies to everyone who wants to learn. How admired They Are, we are worthy of respect. I am still saying that technology is not used to show off, technology is our ability to survive, but also our interest, our technical friends are deep, careful, and sharp people. Behind the scenes of unrefined borders, strong beards, and vicissitudes of life are the "Scars" behind the painstaking research and technical details ", we should read technical articles with a modest, respectful, and grateful attitude, and try to learn things in the articles. This is also the ultimate goal of every writer; I would like to extend my gratitude to those who have made selfless contributions. After you have worked hard, you have reached a realm that is not the most technical realm, but the most important value of life, between the Lines between you, I can feel your mood when writing articles. Maybe you can write in that bad environment, the pressure on the body is worth studying with a friend who writes a blog; the opening remarks will not go on. We will go into the topic. What I want to talk about today is through a simple one. the netwinform control crawls HTML on the page Code In fact, there are many such requirements. I am lucky to have developed automatic data collection during my work. Program The general goal is to analyze the HTML code and capture the regular and correct data, during this period, we may encounter uncertain factors such as page Jump, page layout IFRAME, and asynchronous Ajax. Some bloggers also asked me how to implement these things, but these things are unclear in just a few words, so after a long delay, I am sorry again. I plan to write it out so that anyone who needs to learn can get reference materials;
Let's first analyze the general implementation ideas. First, we need to understand that to capture the data on the page, we actually need to take the HTML code for analysis and then read the data in it, A friend who has worked on the winform program may easily understand that there is a webbrowser control in our winform control library. In fact, this control is encapsulated on the COM component of the browser, so that we do not need to pay attention to com and. net interoperability technology issues, interested friends can study, how to inject htmldom object data; open the page through the webbrowser control, we can get all the HTML code on the page by getting the document attribute of the webbrowser object. We are analyzing it by using the third-party HTML code analysis component. I recommend htmlagilitypack. DLL is used for everyone. The usage is similar to xmldom. We use a diagram to analyze related technologies as a whole; [Wang qingpei has all rights reserved. For more information, please sign it.]

1:

[Wang qingpei has all rights reserved. For more information, please sign it.]

This figure clearly shows where our webbrowser control actually comes from ,. net has encapsulated a series of COM components so that we can easily use the core functions of the browser, including obtaining html dom objects and operating DOM objects. This makes it difficult for beginners to get confused, in fact, there are still many things we can do. If you feel that the managed webbrowser cannot meet your needs, you can directly use the COM component, which includes Complex Object Inheritance relationships, the ability to dynamically inject data in htmldom is also a security issue that needs to be considered in our web development project. We have learned about the general structure and I am not planning to finish data capturing in an article, I want to give a comprehensive explanation through two or three articles. The implementation process of data collection from start to end is detailed for beginners, the focus of this article is to allow beginners to have a deep understanding of the webbrowser control. Later, it will be of great help for our data collection. From the next article, we will use a specific example, if you want to develop a successful data collection system that is not all the details of the article, you have to explore it on your own, like the famous "locomotive Data Collector", I feel that it is still quite powerful. You can refer to some ideas of others, in many cases, we may not need such a powerful custom collection system. We need targeted data capture software, which involves analysis of HTML code, different development platforms such as J2EE and ,. net, the processing in the background is different, but the browser is the same, all are HTML code; as long as we carefully analyze it, we can find the rules in HTML to traverse and obtain data; I hope this article will help you briefly understand the working principles of webbrowser. [Wang qingpei has All Rights Reserved, for more information, see

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.