The Python translation script can be used for translation in related languages. This is a very interesting thing. Next we will take a detailed look at the relevant writing process. I hope you will have some gains. Today, I suddenly wanted to write a translation script by myself. Unfortunately, Google provides APIs for network applications.
In the book dive into python, we can see how to extract the desired content from the HTML document. In this case, can we simulate a browser to send the sentence to be translated, then, I receive the HTML source code after the returned result, and finally extract the translation result?
Actually, it is okay, because python can be used to simulate browser behavior and send sentences to Google Translate's homepage. The specific code is as follows:
- import urllib,urllib2
- values={'hl':'zh—CN','ie':'utf8','text':text,'langpair':"en|
zh-CN"}
- url='http://translate.google.cn/translate_t'
- data = urllib.urlencode(values)
- req = urllib2.Request(url, data)
- req.add_header('User-Agent', "Mozilla/5.0+(compatible;+Googl
ebot/2.1;++http://www.google.com/bot.html)")
- response = urllib2.urlopen(req)
-
The most important thing is the text variable. The value is the sentence to be translated. The value of langpair is a language pair. Here it is translated into simplified Chinese in English and can be changed freely. Next, we need to implement a class to retrieve the desired translation results. This class should be derived from SGMLParser, which is included in sgmllib. py.
- from sgmllib import SGMLParser
- class URLLister(SGMLParser):
- def reset(self):
- SGMLParser.reset(self)
- self.result = []
- self.open = False
- def start_div(self, attrs):
- id = [v for k, v in attrs if k=='id']
- if 'result_box' in id:
- self.open = True
- def handle_data(self, text):
- if self.open:
- self.result.append(text)
- self.open = False
-
When the feed method is called, The system looks for the fragment marked as div. When it is found, it calls its own internal method, in fact, we finally call the start_div and handle_data methods to find the desired translation results. The complete code is as follows:
- Import urllib, urllib2
- From sgmllib import SGMLParser
- Class URLLister (SGMLParser ):
- Def reset (self ):
- SGMLParser. reset (self)
- Self. result = []
- Self. open = False
- Def start_div (self, attrs ):
- Id = [v for k, v in attrs if k = 'id']
- If 'result _ box' in id:
- Self. open = True
- Def handle_data (self, text ):
- If self. open:
- Self. result. append (text)
- Self. open = False
- While True:
- Text = raw_input ("Enter the English translation (exit input q ):")
- If text = 'q ':
- Break;
- Values = {'hl ': 'zh-cn', 'ie': 'utf8', 'text': text, 'langpair ':
"En | zh-CN "}
- Url = 'HTTP: // translate.google.cn/translate_t'
- Data = urllib. urlencode (values)
- Req = urllib2.Request (url, data)
- Req. add_header ('user-agent', "Mozilla/5.0 + (compatible; + Googleb
Ot/2.1; ++ http://www.google.com/bot.html )")
- Response = urllib2.urlopen (req)
- Parser = URLLister ()
- Parser. feed (response. read ())
- Parser. close ()
- Print "translation result :"
- For I in parser. result:
- I = unicode (I, 'utf-8'). encode ('gbk ');
- Print I
The above is a detailed introduction to the Python translation script.