04. Weibo message language Detection

Source: Internet
Author: User
04. Weibo message language Detection

Zheng Yi 201010 is affiliated with section 02. Data Parsing

The general idea is to encapsulate the Google language to detect the ajax web service Interface, input a paragraph, and output the language type. This method is from the perspective of RssMeme.com. The test results are good and can be used to detect the language of micro-blog messages, such as Chinese, Japanese, and Korean. However, because Google resets links for too many requests, it is not suitable for submitting a large number of intensive requests.

1. Simple demonstration

Access
Http://ajax.googleapis.com/ajax/services/language/detect? V = 1.0 & q = hello + world
Link, you can see that the returned result is a json string:
{"ResponseData": {"language": "en", "isReliable": false, "confidence": 0.114892714}, "responseDetails": null, "responseStatus": 200}

Remember to add the version number parameter v = 1.0. Otherwise, the following json is returned:
{& Quot; responseData & quot;: null, & quot; responseDetails & quot;: & quot; invalid version & quot;, & quot; responseStatus & quot;: 400}

2. What if it is a message from a Japanese micro-blog?

For example, the microblog message sent for detection is:

RT @ ufotable: at the 22th of this day, Xinghai Social Security News Agency announced the "most recent" shopping holiday, and the "Monthly monthly calendar month" shopping holiday. too many requests. During the second night, the secondary school will perform the Secondary School's secondary school's primary school's role... Http://goo.gl/brJE

Submitted to Google after urlencode transformation, the returned result is:

{"ResponseData": {"language ":"Ja"," IsReliable ": true," confidence ": 0.88555187 }," responseDetails": null, "responseStatus": 200}

In this way, you can use result ['responsedata'] ['language'] to obtain the language code.
Check that this code is not "zh-CN", so it is not a Chinese language.

Iv. encapsulation of Google Language Detect Ajax Web Service

Example:
Import urllib
Import httplib2
Try:
From base import easyjson
Except t:
Pass

Class Detect ():
Google_api_prefix = 'HTTP: // ajax.googleapis.com/ajax/services/language/detect'
Def _ init _ (self, httplib2_inst = None ):
"The httplib instance can be imported from the outside, so that proxy software can be mounted externally """
Self. http = httplib2_inst or httplib2.Http ()
Def post_sentence (self, q ):
Return self. _ fetch (
Self. google_api_prefix,
{'V ': "1.0", 'q': q}
)
Def _ fetch (self, url, params ):
Request = url + "? "+ Urllib. urlencode (params)
Resp, content = self. http. request (request, "GET ")
Return easyjson. parse_json_func (content)

Def detectZHCN (self, text ):
"If the input text is zh-CN, True is returned; otherwise, False is returned """
Data = self. post_sentence (text) ['responsedata']
If (data ):
Language = data ['language']
If (language = 'zh-cn '):
Return True
Return False

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.