Article title: Introduction to offline browsers in Linux. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source. This article describes how to implement an offline browser to download and browse online resources.
Image directory structure
When you browse and download a local webpage offline, a key problem to be solved is "how to locate other webpages correctly through the hyperlink in a webpage ". A simple method is to create a complete or partial image of the target website under the Local directory specified by the user. That is, save the downloaded file according to the directory structure of the file on the server (see ). In this way, if the hyperlink in the webpage is provided in relative paths, then the browser can directly access the webpage in the local file system through this relative path; if a hyperlink in a webpage is provided in an absolute URL, these URLs must be converted to an absolute local path before saving the webpage.
In the network, a valid URL should only correspond to a unique network file. Therefore, as long as the hierarchy determined by the URL on the network is converted to the hierarchy determined by the directory path in the local file system, a complete or partial image of the website can be created locally. The following describes how to create an image.
Image path algorithm
First, you can split the URL generated when you download the webpage into the protocol class name (protocol), IP address (ipaddr), directory name (directory), and file name (file ).
The KDE environment provides a class KURL for URL parsing. you only need to define an object KURL u (const char *) URL ), you can use the member functions provided by this class to split the URL into the desired part. However, this class does not provide support for ASP positioning statements. Therefore, you can compile your own disassembly functions on the basis of KURL to improve program functions.
Note that in the URL of the same network file, the URL part may be provided in the form of a domain name or an IP address. To avoid mirroring the same file to different directories, use the socket function gethostbyname () to convert the URL to an IP address if it is in the domain name format.
Next, determine the local image path of the network file. If the local directory specified by the user is stored in the character array LDir, the code is as follows:
QString LocalDir = LDir + "/" + protocol + "_" + ipaddr + directory;
QString LocalPath = LocalDir + file; |
In this way, if the URL of a network file is http: // 11.171.38.32/webfile/relax/index.html, and the local directory specified by the user is/home/yangjx/web, the image path for this webpage file is/home/yangjx/web/http_11.171.38.32/webfile/relax/index.html.