A python function that counts Chinese characters/English words

Source: Internet
Author: User
Tags readfile

• Use Regular "(? x) (?: [w-]+ | [X80-xff] {3}) "Gets a list of English words and Chinese characters in the Utf-8 document.
• Use dictionary to record the frequency of each word/kanji, or +1 if it appears, or 1 if not.
• The dictionary is sorted according to value and output.

The code is as follows Copy Code

#!/usr/bin/python
#-*-Coding:utf-8-*-
#
#author: Rex
#blog: http://iregex.org
#filename counter.py
#created: Mon Sep 20 21:00:52 2010
#desc: Convert. py file to HTML with VIM.

Import Sys
Import re
From operator Import Itemgetter

def ReadFile (f):
With file (F, "R") as PFile:
Return Pfile.read ()

Def divide (c, regex):
#the regex below is only valid for UTF8 coding
return Regex.findall (c)


def update_dict (Di,li):
For I in Li:
If Di.has_key (i):
Di[i]+=1
Else
Di[i]=1
return di

def main ():

#receive files from bash
Files=sys.argv[1:]

#regex compile only once
Regex=re.compile (? x) (?: [w-]+ | [X80-xff] {3}) ")

dict={}

#get all words from files
For f in Files:
Words=divide (ReadFile (f), regex)
Dict=update_dict (dict, words)

#sort Dictionary by value
#dict is now a list.
Dict=sorted (Dict.items (), Key=itemgetter (1), reverse=true)

#output to Standard-output
For I in Dict:
Print i[0], i[1]


If __name__== ' __main__ ':
Main ()

You can customize the program. For example

  code is as follows copy code

regex= Re.compile ("(? x)" ([w-]+ | [X80-xff] {3}) "
Words=[w for W-regex.split (line) if W]

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.