Create your own 404 error message processing program to provide useful links and redirection for site content. Use metaphone matching and a simple weighted score file to generate redirection recommendations for input errors, spelling errors, and invalid links. Customize suggestions based on the content of the website and the preferred redirection location. Capture various errors in the incoming URL request, and correct the directory, script, and HTML page name errors by processing them.
Tutorials on how to create a valid format for the 404 page are everywhere. Most of these tutorials recommend that you include static recommendation links on the 404 page and direct these links to the public area of the site, such as the homepage, download page, and search engine of the site, the premise is to have these pages. 404 the common problem with pages is that they cannot reflect the purpose of users accessing the site. This article describes how to build a suggestion generator and a method to provide more useful redirection links based on the content of the Web site.
The current 404 processing program allows us to provide some suggested links for various errors, such as directing users to site directories. Some Spelling correction programs (such as mod_speling-yes, it only has one "l") can be used to correct errors in dictionary words and direct users to the correct page. The code in this article will help you build a recommendation generation engine that processes words and directory links that cannot be found in the dictionary based on the content of the Web site.
We consider this scenario: When you hear a Web page name in a teleconference, you try to open the blegs/DavSmath.html link. The current spelling correction module cannot provide a useful link to this situation. With the code in this article, you will be able to generate a 404 page and display the recommended valid page/blogs/DaveSmith.html.
Requirement
Any modern PC produced in this century should be sufficient to write and run the code in this article. If your Web page contains more than 10,000 different pages, you may need large memory capacity, high-performance hardware, or patience.
The provided Perl and CGI scripts can be used in a variety of UNIX®And Windows®Running on the platform (see the download section. Although Apache and a CGI script will be used as the recommendation engine in this article, the built tools should be able to run normally on most Web servers. This article will reference the Text: Metaphone module written by Michael Schwern. Install the Text: Metaphone module through your preferred CPAN image before you start. See references for download information.
Web Server Page and audio change code
The primary method for providing alternative suggestions for input and spelling errors is audio change matching. Similar to Soundex and some other algorithms, Metaphone uses alphanumeric code to indicate the pronunciation of a word. However, unlike the Soundex speech algorithm, the purpose of building a speech code is to match the language variability of English pronunciation. Therefore, the audio change code can more accurately represent specific words and provides a theoretical basis for the construction of the suggested library.
Consider the following files in the Web server directory.
List 1. Web server files
Reference content is as follows: ./Index.html ./Survey.html ./Search_tips.html ./About.html ./How.html ./Why.html ./Who.html ./NathanHarrington.html ./Blogs/NathanHarrington.html ./Blogs/DaveSmith.html ./Blogs/MarkCappel.html |
For these static HTML files, we will use the buildMetaphoneList. pl program to create audios for all files with the extension. html.
Listing 2. buildMetaphoneList. pl
Reference content is as follows: #! /Usr/bin/perl-w # BuildMetaphoneList. pl-/split filename, 0 score, metaphonesUse strict; Use File: Find; Use Text: Metaphone; Find (& htmlOnly ,"."); Sub htmlOnly { If ($ File: Find: name = ~ /. Html /) { My $ clipFname = $ File: Find: name; $ ClipFname = ~ S/. html // g; My @ slParts = split/, $ clipFname; Shift (@ slParts ); Print "$ File: Find: name ### 0 ###"; For (@ slParts) {print Metaphone ($ _).""} Print ""; } # If a matching. html file } # HtmlOnly sub |
The buildMetaphoneList. pl program can only process files with the extension. html. It will remove the. html file in the file name and generate a variable for each part of the complete path name. Copy the buildMetaPhoneList. pl program to the root directory of the Web server, and run the command perl buildMetaphoneList. pl> metaphonesScore.txt. For the files in Listing 1, the contents of the corresponding metaphonesScore.txt file are shown in listing 3.
Listing 3. metaphonesScore.txt
Reference content is as follows: ./Index.html ### 0 ### INTKS ./Survey.html ### 0 ### SRF ./Search_tips.html ### 0 ### SRXTPS ./About.html ### 0 ### ABT ./How.html ### 0 ### H ./Why.html ### 0 ### H ./Who.html ### 0 ### H ./NathanHarrington.html ### 0 ### N0NHRNKTN ./Blogs/NathanHarrington.html ### 0 ### BLKS N0NHRNKTN ./Blogs/DaveSmith.html ### 0 ### BLKS TFSM0 ./Blogs/MarkCappel.html ### 0 ### BLKS MRKKPL |
Each line of text in listing 3 shows the actual link, default scope, and Variant code under the root directory of the Web server. Then, how.html, why.html, and who.html are parsed to the same audio change code. To solve this problem, you need to modify the scope field so that the suggested link program can provide links to the page in the specified order. For example, modify the "H" audio change entry:
Reference content is as follows: ./How.html ### 100 ### H ./Why.html ### 50 ### H ./Who.html ### 0 ### H |
This will create an intuitive re-sorting of links and leave space for further modification of the scope. The larger the number of scopes, the closer the order of inserting the same variable file (but different scopes) is to the back. For example, to add a list of hoo.html files with the scope of 25, it will be located under the who.html entry and why.html entry.
You can also use the scope field to differentiate files with different directories and with the same name. For example, if you change the scope of a row in./NathanHarrington.html to 100, a request like nathenHorrington.html will list the./NathanHarrington.html link before the./blogs/NathanHarrington.html page.
When selecting the File Scope, you must consider the Web Site statistics and logical access components. From the log file, we can see that users frequently request why.html pages, but if you think how.html is more important to users, you only need to modify the corresponding scope value to correct the sorting.
Build a CGI 404 processing program
We have generated appropriate audios and specified relevant scope values for them. The next step is to build the actual recommendation generator. Generally, the cause of the 404 error message is a link input error or a link error. We recommend that you use the following three major tests to create a match based on the directory structure, the match using a variant combination, and the "include" match when other methods fail. The three tests are designed to handle most 404 errors. The starting part of the MetaphoneSuggest CGI Perl script is as follows.
Listing 4. MetaphoneSuggest CGI section 1st
Reference content is as follows: #! /Usr/bin/perl-w # MetaphoneSuggest-suggest links for typographical and other errors from 404 s Use s |