用C#從IE中擷取HTML文檔
You need to extract the html from the current web page in IE. This article details how to do that.
這篇文章描述如何獲得IE瀏覽器當前網頁的HTML文檔。
- Create a console application in any version of Visual Studio using .Net version 1|2|3|3.5.
- Add two Com object references which will allow us to manipulate IE.
- 用 Visual Studio 的任意版本建立一個控制台程式。
添加2個COM對象引用用來操作IE
- Note the code sample below does not require the using directive for the objects, so just add the code as is.
- Then find the instances of IE and extract the document:
- 添加下面代碼
- 開啟IE擷取HTML文檔
SHDocVw.ShellWindows shellWindows
= new SHDocVw.ShellWindowsClass();
string filename;
foreach (SHDocVw.InternetExplorer ie in shellWindows)
{
filename
= Path.GetFileNameWithoutExtension(ie.FullName).ToLower();
if (filename.Equals("iexplore"))
{
Console.WriteLine("Web Site : {0}", ie.LocationURL);
mshtml.IHTMLDocument2 htmlDoc
= ie.Document as mshtml.IHTMLDocument2;
Console.WriteLine(" Document Snippet: {0}",
( ( htmlDoc != null ) ? htmlDoc.body.outerHTML.Substring(0, 40)
: "***Failed***" ));
Console.WriteLine("{0}{0}", Environment.NewLine);
}
}
Here is a screen-shot of the output:
程式:
代碼:
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
SHDocVw.ShellWindows shellWindows = new SHDocVw.ShellWindowsClass();
string filename;
foreach (SHDocVw.InternetExplorer ie in shellWindows)
{
filename = Path.GetFileNameWithoutExtension(ie.FullName).ToLower();
if (filename.Equals("iexplore"))
{
Console.WriteLine("Web Site : {0}", ie.LocationURL);
mshtml.IHTMLDocument2 htmlDoc = ie.Document as mshtml.IHTMLDocument2;
Console.WriteLine(" 檔案 Snippet: {0}", ((htmlDoc != null) ? htmlDoc.body.outerHTML.Substring(0, 40) : "***Failed***"));
Console.WriteLine("{0}{0}", Environment.NewLine);
}
}
}
}
}