google| Access
Google periodically publishes new services that have been designed to help Web site administrators handle excess data on site access or help them catch more network jams. One of Google's recent offerings is to make it possible for web bugs to discover new content on their sites and get them quickly through Google searches.
In this article, I'll give you the specifics of what the Google site is dealing with, and provide examples of how to get a site map and how to run it. I used http://www.allinvites.com as the example site, and this site is my wife's (I have asked for her permission beforehand). Normally, I use the hypothetical site, but for the purposes of this article, I use a small and vivid site.
Specific details
I've already mentioned the Google site map's goal at the top, but there are still a lot of services and warnings. First, Google points out that the site map does not harm or help the site in Google's queues. In fact, Google has made it clear in the site Map FAQ: "Using a site map does not affect PageRank (a web search engine uses a way to evaluate the importance of Web pages in search results), which has no effect on how to calculate the ranking value of a page." "However, the ranking value of the Web page has been elevated because it has not been indexed before and is now indexed by Google." My feeling is that you don't need to use a site map just to elevate the ranking value of a page. If you end up with Google officially stating that the site map can be used for this purpose or that you want to get some other benefit from the site map, you can try this approach.
Google has no need to index all the pages you submit. For example, if you submit a URL in your own site that is protected in the robots.txt file, Google's network bugs will respect the settings in the robots.txt file and ignore the tasks submitted to the site map server. Second, Google does not guarantee that all submitted pages are processed, but the submitted site maps will still be used by Google's Web bugs to learn more about the site. As Google says, submitting information to a site map will only help you without harming you.
The site map is a good service compared to its unobtrusive potential drop. First of all, it is free even for commercial use. This was never a bad thing, especially for the penniless homeless. Second, one of the main points of the site map is to efficiently help people get the index of the site faster. The site map can quickly discover and process changes to your site, new content, and content by submitting it manually.
Last but not least, Google provides a site map-related reporting tool that collects information about the following:
- Query Statistics: provides information about queries that Google searches use and returns about your site.
- Processing Statistics: provides information about handling successes or failures and PageRank information.
- Web Analytics: provides information on how to type Web pages and Web page encoding for a site's page-like example.
- Index Statistics: tell the site how it is indexed, for example, to get a list of the indexes on the site, a list of links to a site, and to view Google's information about your site's storage and more.
Use Google site map
Now that you have a better understanding of how the Google site map can help you solve those problems, let's go into how to use the service.
There are 3 steps to taking full advantage of the site map:
- Create a site map for your site.
- Add the created site map to your Google account.
- Use Google's reporting and statistics tools.
Create a site Map
Google's site Map service uses the site map created by Google's custom open source use of the XML language of the "Site map protocol" to the server to provide your site about the design of information. Google even provides you with a Google Site Map generator, which can be fun to create a ready-made Google site map for you.
If you want to meet the following requirements, use the Google Site Map generator is your best choice:
- You can run the python2.2+ script on a network server.
- There are some ways to upload files to a network server.
- If you want to use an access log to generate a site map, you must know how to encode the logs.
At the beginning, you need a Google site map generator. Because my sample site was built on a Linux server, I downloaded the "tar.gz" version of the generator. My host supports running the Python script and I'm using the Python 2.2.3 version.
Place the downloaded name of the sitemap_gen-x.x.tar.gz file in a location on the server. I put the downloaded file in the root directory of the sample location. Next, use the "gunzip" command to decompress.
GUNZIP–DC sitemap_gen-x.x.tar.gz | Tar xvf
The contents of the file will be uncompressed to the folder that removes the ". tar.gz" suffix name.
In this folder, locate the example_config.xml file and copy it to Config.xml. edit this file to note the following several required (some optional) parameters.
- Base_url basic _url (required): The top-level URL of your site, in my case, is http://www.allinvites.com.
- Store_into store information (required): path and filename will be written to the site map, in my case "/home/alowe/www/sitemap.xml.gz", you do not need to create this file in advance
- default_encoding default encoding (optional): The default value is UTF-8, and if the URL and file path on your system require a different encoding, change this value.
- Verbose details (optional): The default value is "1" and can be set to any number between 0 and 3. 0 provides diagnostics-free output and 3 provides important output.
- URL or urllist (URL or urll list) (optional): Use one of these two instructions to tell the site map generator that URL should be included in the site map. Each URL can be listed individually by a URL directive in the config file, or you can use the urllist directive and point to a separate text file containing all the required URLs to list all URLs. In the example I use the URL directive to list the URLs, which I can see below. If you use the irllist directive, you need to use a URL directive in a separate text file.
- The URL directive contains a required parameter:href. The Href parameter, as you would expect, is a full path that contains your domain and the URL you want to include. You can also use optional properties:changefreq, lastmod, and priority.
- Changefreq (never, yearly, monthly, weekly, daily, hourly) is used to indicate how often the content of URLs in the site map is refreshed.
- Use Lastmod (ISO8601 datestamp format) to identify when content is last changed.
- Use Lostmod to mark the time of the last update.
- Use priority to tell the site map the relative importance of special URLs that are related to other site map content. For example, a value of 0.5 indicates the importance of a URL in a site map that is half the value of a URL of 1. This priority value can affect the order in which search engines search for sites, but do not think it is most helpful to set each URL to the maximum value. This is just telling the search engine that all the individual URLs on your site have the same importance. It is not used to compare the importance of your content with the content of other sites.
- Directory directory (optional): Use this tag to specify the list of URLs that will be included in a particular directory. This instruction has 3 parameters: path( path ), URL, and default_file ( default file ). When a URL directive is used to provide a network path to a directory, the path path is the full path name of the directory (for example,/home/alowe/www). Using the default_file directive is to tell the site Map generator the name of the default file for your server (for example:index.php or index.html)
- Accesslog: Use two parameters: path and encoding, allow site Map generator to parse log file for URL (log)
- Filter: Use this directive to include or dispatch a specified file. I will give a description in the www.allinvites.com configuration file example.
Example
As mentioned above, this is the sample configuration file I submitted to the Google site map for www.allinvites.com :
<?xml version= "1.0" encoding= "UTF-8"?>
<!--? SITE Information-->
<site base_url= "http://www.allinvites.com/" store_into= "/home/alowe/www/sitemap.xml.gz" verbose= "1" >
?<!--? Inputs-->
? <directory path= "/home/alowe/www/images" url= "http://www.allinvites.com/images/"/>
? <directory path= "/home/alowe/www" url= "http://www.allinvites.com/" default_file= "index.php"/>
?<!--? FILTERS-->
?<!--? Exclude URLs that end with a ' ~ '-->
? <filter action= "Drop" type= "wildcard" pattern= "*~"/>
?<!--? Exclude URLs within UNIX hidden files or directories-->
? <filter action= "Drop" type= "RegExp" pattern= "/\. [^/]* "/>
</site>
Check the configuration file before continuing, using the following command:
pythonsitemap_gen.py--config=config.xml--testing
When the redundant set is set to 1, you can only get some profiles of what is happening in the script, and if you need more information, you need to increase the level of redundancy, and here is the example output:
-bash-2.05b$ python sitemap_gen.py--config=config.xml--testing
Reading configuration file: Config.xml
Walking Directory "/home/alowe/www/images/"
Walking Directory "/home/alowe/www/"
Sorting and Normalizing collected URLs.
Writing Sitemap file "/home/alowe/www/sitemap.xml.gz" with the engine URL
Search notification is suppressed.
Count of file extensions on URLs:
5 (no extension)
1 . CSS
& nbsp; 8 . gif
1 . GZ
27 . jpg
1 . Old
22 . PHP
3 . PY
2 . txt
2 . XML
5 /
Number of errors:0
number of warnings:0
Also: Python scripts cannot take advantage of parameters other than "config", "testing" and "help" parameters. Using the "config" directive is to tell the script the name of the configuration file, and the "testing" instruction is the error of testing the script before the program runs.
As you can see, there are no errors or warnings. If you receive an error message, correct the configuration file and then test the script, and when the test is not in error, you can delete the "testing" command and run the script.
Add a new site map to the Google site map account
Before you add a new site map to your site map account, you must register an account with Google. Once the registration is successful, you can access the login home page of the site map to log on. Figure A is a nice, soft, but finished page. I want to demonstrate this page after you have logged in successfully. Click the "Continue" button under Get started with Google Sitemaps to start running the Google site map program.
Figure A: Successful landing to the site map server.
In the Site Overview screen, select the "Add" button.
Figure B: Select the "ADD" button
The Add a Sitemap page allows adding a generic or mobile site (I don't have a mobile site in this article), and even provides a location to use the features of the site map without creating a full site. In my example, for most Web sites, select "General Web Sitemap" and click the next "Next" button.
Figure C : Determine which site map you want to add
In the configuration file, use the "store_into" directive to tell the site map builder where to place the created site map file. In the "Add a General Web Sitemap" page of the site map, enter a full URL and then click the "Add Web Sitemap" button to process it.
Figure D: telling the site map where to create the site files
If a correct Url,google site map is entered for the site file, a successful message will be returned on the "Site Overview" page.
Figure E : Successfully added site
Site validation
You might want to know how Google protects your site when it makes a personal, malicious use of the optional method of submitting a site map to a site that is not part of it (an alternative method will be explained below). To prevent this deception, Google requires that the site be validated before any new site statistics are displayed. In Figure E, note that "Verify" is linked to a newly added site. Click here to start the verification process on your site Map account link.
Google's assumption is that if you can get through and create files on the root of a Web site, you are the owner of the site. Google gives you a long, unique filename and asks if you want to create an empty file with the same name on the root file of the network server.
Figure F:google need to create a validation file to ensure ownership of the submitted site
Create a requested file with a text editor (for Linux, I like to use the Nano, and for Windows I like to use notepad++). When all is ready, click on the "Verify" button in the lower right corner of the page. If the file is created correctly, the page shown in Figure G will be returned.
Figure G . : Verify Success
Do not delete this file from the Web site. Google will periodically check to make sure the files are still there. If deleted, Google will require that the site be verified for ownership.
Other ways to submit a site map
In the article I just described the site Map protocol submission method, as you might expect, Google recommends using their customizations to create a site map protocol. However, as Google, they also know that the user needs are diverse, so it also provides some other ways to build the site map and submit to the site map server. These methods include:
- RSS 2.0 & Atom 0.3: If your site has corporate compatibility, why not use it to keep the site design changes to Google notice? Here you need to take some special steps in the options and refer to more information in the Google Docs.
- text files: If there is a relatively short site that will not change frequently, there is no reason to do a whole set of work to establish a mechanism for the automatic updating of the site. Instead, just leave the text file at the top level of the network site and list all URLs. Then let the Google site map know where to find these files.
- OAI-PMH ( meta-data collection open Archives Primary agreement): OAI-PMH is an open protocol designed to implement metadata sharing between disparate servers. Click here to get more information about OAI and OAI-PMH.