Embedded Web browser and mshtml are used to replace human webpage

Source: Internet
Author: User

First of all, I declare that this is an implemented software, but here I try to avoid technical details and write only basic ideas. I hope that you will be interested in related work.

 

Embedded Web browser (embeddedwb for short) is a third-party browser plug-in of Delphi. You can embed an IE kernel browser in the Delphi Program, in addition, various webpage operations can be implemented through programming; mshtml is Microsoft's ...... I am not clear about what it is. It is always a core thing for IE to parse webpages. It defines a large number of interfaces that represent various elements in webpages, for example, ihtmldocument2 indicates the entire webpage document, ihtmlelement indicates an element on the webpage, and so on.

 

Question 1: What is the relationship between the two?

For example, I have defined a browser object of the embeddedwb type, and then we can use the browser. the doc2.activeelement () method obtains an activity element in the web page opened by the browser. The type of the activity element is ihtmlelement defined in mshtml.

That is to say, embeddedwb performs various operations on Webpage elements by defining the element types in mshtml.

 

############### The following is the text (^ ω ^) ######################################## ##

 

Now we have a programmable browser and an interface that can represent web page elements. We hope this browser can record the webpage operations, then, you can restore these operations to the webpage infinitely. Human operations on webpages are actually a combination of a series of operations on each element on the webpage. The types of "operations" are basically fixed. They are nothing more than click, input, and selection. You can directly call the method implementation defined in mshtml for the elements; the problem is how to record and find the "elements" to be operated on a webpage ".

 

Question 2: What information does one need to uniquely identify an element on a webpage?

At the beginning, to meet the versatility, we used the tagname element and the serial number in all the element sets with the same tagname on the current webpage to locate the webpage element. Because every element is actually an HTML tag with a tagname, such as <HTML>, <body>, <A>, and <input>. However, in practical applications, it is found that most of the current web pages are dynamic web pages. When these web pages are opened each time, the relative positions of these elements may dynamically produce some deviations. In order to increase accuracy, we combined the attributes such as ID and name in the element and the complete HTML code of the element, and finally located the webpage element based on the following information:

1. Element ID Attribute Value

2. element name Attribute Value

3. Element HTML code and serial number in all element sets with the same HTML code on the current webpage

4. Element tagname and serial number in all element sets with the same tagname on the current page

The software searches for specific elements on the webpage based on the preceding information in sequence.

In specific implementation, you also need to consider the impact of the frame/iframe framework on the element location information on the webpage, that is, you need to add the number of the frame where the element is located.

 

With the positioning element method, you can further implement the software to replace people's operations on the webpage. Specifically, the user first performs a complete operation on the webpage in the embedded browser in the software, and the software records the element location information and Operation Sequence (which can be stored in the database or script file, similar to macro), you can repeat people's operations to complete a large volume of semi-repetitive work, such as advertising.

In fact, this method allows computers to simulate human behaviors and perform any operations that people can perform on webpages. Currently, software tools based on this feature have been developed to automatically publish articles to various websites in batches. In addition, some applications still need to be developed.

 

Legacy problems:

1. How can I detect various errors that may occur when the software automatically processes the webpage, such as webpage jump errors, unfound elements, and overall website layout changes.

2. How to design interface so that users can easily and quickly add new websites.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.