. Net summary search engine keyword Encoding

Source: Internet
Author: User
ASP. net2 unified search engine keyword Encoding
The phase II Development of view was officially started, and it was troublesome at the beginning, because in the statistics module, you need to get the keywords of the source search link, so at the beginning, you can use regular expressions to match the keywords, then use the built-in Uri. unescapedatastring () converts urlcoding to text.
Everything went smoothly. Baidu and Netease search were added to the Rules, and an error was reported. I guess it may be related to encoding, because Google has always been a UTF-8, most of the domestic websites prefer to use gb2312, so this problem I am not very worried about.
I almost lost confidence in the study of program algorithms because of the severity of the problem. Today, I heard someone say that I am so bold that I can change all links. In case of the same event, a serious problem may occur. I always think that I am still a very careful person. Before I did that bold thing, I carefully consulted whether the same event exists. After receiving a positive response, I started. Although there are not many codes, I dare say that I have already done all the errors that can be considered and cannot be considered.
Back to the topic, search on the Internet and find that there is really little information about Asp.net. Then I looked at ASP and PHP and found that the practice was very complicated and prone to errors, we even need to create an encoding table from gb2312 to UTF-8.
I firmly believe that Microsoft will provide a perfect solution, so I started to check msdn and started with encoding and found that there is a method to convert the encoding. But the problem occurs again, because I get a URL-based string, but I still don't know whether the string belongs to UTF-8 or gb2312, therefore, you cannot use unescapedatastring to convert to text before transcoding, because an error will be reported immediately.
So I began to analyze the encoding principles of gb2312 and UTF-8 and found that they are in. net is correlated. One is a two-bit encoding and the other is a three-digit encoding. Therefore, I will break down the gb2312 encoding, put every two bits into a byte array as two hexadecimal bytes, convert them into UTF-8 bytes through normal transcoding, and then put them into char, finally, convert it into normal text.
 
The results were immediately revealed and succeeded. Then I made persistent efforts to write a method for determining the search engine type, so everything was solved.
The following code is provided to help you:
Using system;
Using system. Data;
Using system. configuration;
Using system. Web;
Using system. Web. Security;
Using system. Web. UI;
Using system. Web. UI. webcontrols;
Using system. Web. UI. webcontrols. webparts;
Using system. Web. UI. htmlcontrols;
Using system. Text. regularexpressions;
Using system. text;
/// <Summary>
/// Search engine Processing
/// </Summary>
Public class exjudgesystem
{
Public exjudgesystem ()
{
}
# Region initialization variable
// Search Engine Features
Private string [] [] _ enginers = new string [] [] {
New String [] {"google", "utf8", "Q "},
New String [] {"Baidu", "gb2312", "WD "},
New String [] {"yahoo", "utf8", "P "},
New String [] {"yisou", "utf8", "Search "},
New String [] {"live", "utf8", "Q "},
New String [] {"Tom", "gb2312", "word "},
New String [] {"163", "gb2312", "Q "},
New String [] {"iask", "gb2312", "K "},
New String [] {"Soso", "gb2312", "W "},
New String [] {"sogou", "gb2312", "query "},
New String [] {"zhongsou", "gb2312", "W "},
New String [] {"3721", "gb2312", "P "},
New String [] {"openfind", "utf8", "Q "},
New String [] {"alltheweb", "utf8", "Q "},
New String [] {"Lycos", "utf8", "query "},
New String [] {"onseek", "utf8", "Q "}
};
// Search engine name
Private string _ enginename = "";
Public String enginename
{
Get
{
Return _ enginename;
}
}
// Search engine Encoding
Private string _ coding = "utf8 ";
Public String Coding
{
Get
{
Return _ coding;
}
}
// Search engine keyword query parameter name
Private string _ regexword = "";
Public String regexword
{
Get
{
Return _ regexword;
}
}
Private string _ RegEx = @"(";
# Endregion
# Region Search Engine keywords
// Create a regular expression for the Search Keyword
Public void engineregex (string mystring)
{
For (INT I = 0, j = _ enginers. length; I <j; I ++)
{
If (mystring. Contains (_ enginers [I] [0])
{
_ Enginename = _ enginers [I] [0];
_ Coding = _ enginers [I] [1];
_ Regexword = _ enginers [I] [2];
_ RegEx + = _ enginename + @ ". + .*[? /&] "+ _ Regexword + @" [=:]) (? <Key> [^ &] *) ";
Break;
}
}
}
// Obtain the search engine keyword
Public String searchkey (string mystring)
{
Engineregex (mystring. tolower ());
If (_ enginename! = "")
{
RegEx myreg = new RegEx (_ RegEx, regexoptions. ignorecase );
Match matche = myreg. Match (mystring );
Mystring = matche. Groups ["key"]. value;
// + With spaces
Mystring = mystring. Replace ("+ ","");
If (_ coding = "gb2312 ")
{
Mystring = getutf8string (mystring );
}
Else
{
Mystring = URI. unescapedatastring (mystring );
}
}
Return mystring;
}
// Complete Transcoding
Public String getutf8string (string mystring)
{
RegEx myreg = new RegEx ("(? <Key> %...) ", regexoptions. ignorecase );
Matchcollection matches = myreg. Matches (mystring );
String myword;
For (INT I = 0, j = matches. Count; I <j; I ++)
{
Myword = matches [I]. Groups ["key"]. value. tostring ();
Mystring = mystring. Replace (myword, gb2312toutf8 (myword ));
}
Return mystring;
}
// Convert the single word gb2312 to utf8 URL Encoding
Public String gb2312toutf8 (string mystring)
{
String [] myword = mystring. Split ('% ');
Byte [] mybyte = new byte [] {convert. tobyte (myword [1], 16), convert. tobyte (myword [2], 16 )};
Encoding GB = encoding. getencoding ("gb2312 ");
Encoding u8 = encoding. utf8;
Mybyte = encoding. Convert (GB, u8, mybyte );
Char [] chars = new char [u8.getcharcount (mybyte, 0, mybyte. Length)];
U8.getchars (mybyte, 0, mybyte. length, chars, 0 );
Return new string (chars );
}
# Endregion
// Judge whether it is a search engine crawler and return its type
Public String iscrawler (string systeminfo)
{
String [] botlist = new string [] {"google", "Baidu", "MSN", "yahoo", "tmcrawler", "iask", "sogou "};
Foreach (string BOT in botlist)
{
If (systeminfo. tolower (). Contains (BOT. tolower ()))
{
Return Bot;
}
}
Return "null ";
}
}
The code above is transferred to another user. If there is any code that can be optimized, please discuss it together. Thank you!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.