04. Weibo message language Detection
Zheng Yi 201010 is affiliated with section 02. Data Parsing
The general idea is to encapsulate the Google language to detect the ajax web service Interface, input a paragraph, and output the language type. This method is from the perspective of RssMeme.com. The test results are good and can be used to detect the language of micro-blog messages, such as Chinese, Japanese, and Korean. However, because Google resets links for too many requests, it is not suitable for submitting a large number of intensive requests.
1. Simple demonstration
Access
Http://ajax.googleapis.com/ajax/services/language/detect? V = 1.0 & q = hello + world
Link, you can see that the returned result is a json string:
{"ResponseData": {"language": "en", "isReliable": false, "confidence": 0.114892714}, "responseDetails": null, "responseStatus": 200}
Remember to add the version number parameter v = 1.0. Otherwise, the following json is returned:
{& Quot; responseData & quot;: null, & quot; responseDetails & quot;: & quot; invalid version & quot;, & quot; responseStatus & quot;: 400}
2. What if it is a message from a Japanese micro-blog?
For example, the microblog message sent for detection is:
RT @ ufotable: at the 22th of this day, Xinghai Social Security News Agency announced the "most recent" shopping holiday, and the "Monthly monthly calendar month" shopping holiday. too many requests. During the second night, the secondary school will perform the Secondary School's secondary school's primary school's role... Http://goo.gl/brJE
Submitted to Google after urlencode transformation, the returned result is:
{"ResponseData": {"language ":"Ja"," IsReliable ": true," confidence ": 0.88555187 }," responseDetails": null, "responseStatus": 200}
In this way, you can use result ['responsedata'] ['language'] to obtain the language code.
Check that this code is not "zh-CN", so it is not a Chinese language.
Iv. encapsulation of Google Language Detect Ajax Web Service
Example:
Import urllib
Import httplib2
Try:
From base import easyjson
Except t:
Pass
Class Detect ():
Google_api_prefix = 'HTTP: // ajax.googleapis.com/ajax/services/language/detect'
Def _ init _ (self, httplib2_inst = None ):
"The httplib instance can be imported from the outside, so that proxy software can be mounted externally """
Self. http = httplib2_inst or httplib2.Http ()
Def post_sentence (self, q ):
Return self. _ fetch (
Self. google_api_prefix,
{'V ': "1.0", 'q': q}
)
Def _ fetch (self, url, params ):
Request = url + "? "+ Urllib. urlencode (params)
Resp, content = self. http. request (request, "GET ")
Return easyjson. parse_json_func (content)
Def detectZHCN (self, text ):
"If the input text is zh-CN, True is returned; otherwise, False is returned """
Data = self. post_sentence (text) ['responsedata']
If (data ):
Language = data ['language']
If (language = 'zh-cn '):
Return True
Return False