How to calculate the frequency of occurrence of words in a text string in python
This example describes how to calculate the frequency of occurrence of words in a python text string. Share it with you for your reference. The specific implementation method is as follows:
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 1
This article mainly introduces how to count the frequency of occurrence of words in a python text string, and related skills related to Python string operations, for more information about how to calculate the frequency of occurrence of words in a text string in python, see
1. English Document Frequency Statistics
English document Word frequency in English original Alice in Wonderland as an example, statistics of each words in the whole novel frequency , and according to frequency from large to small sorting . Because the whole book contains m
Python is used to measure the frequency of occurrence of words in a text string,
This example describes how to calculate the frequency of occurrence of words in a python text string. Share it with you for your reference. The specific implementation method is as follows:
# word
For example, there is a listL=[1,1,-1,2,3,22,34,32,2,-3,34,22,-5]How many times each element appears in the statistics listWay One:Turn the list into a dictionary dict, the dictionary key corresponds to each element in the list, and value represents the number of occurrences of each element.D=dict.fromkeys (l,0) #两个参数, the first parameter is the corresponding list, and the second parameter sets the default value=0 for Dict.Then, traversing each element in the list, the element is encountered in
=items[i] print ("{0:ResultsFollow-up thinkingCode is very simple, master to know how to expand. Now that the data is crawling down, but it's messy, it still needs to be artificially analyzed. Such data I call naked data, the ideal data is readable and related, I call it gold data.The process of this conversion analysis involves two questions:1, how to realize the readable?You can delete bad data by using the del[] method in the dictionary.2, how to achieve the relevance of data?The bare data is
folder, you need to copy the text and jiebacmd.py, remember that the text needs to be saved as Utf-8 encoding, and then in Cygwin with the CD command to switch the working directory into the new folder, and then enter the following command: Cat Abc.txt|python jiebacmd.py|sort|uniq-c|sort-nr|head-100Code:#encoding =utf-8#usage Example (find top words in Abc.txt): #用途: Find the top 100 most frequent words in the Abc.txt file # Copy the following comman
Spit Groove
Usually more commonly used in the statistical device is always low, recently saw a more elegant writing record. Demand
You want to frequency statistics on the word list returned by Jieba.cut. Code
Before optimization
def gen_counter_dict (type_list):
type_dict = {} for
type in type_list:
if Type in Type_dict.keys ():
Type_dict[type] + + 1
else:
Python statistics document morphemes-Frequency appletPython version 2.7The program is as follows, test files and complete programs in my GitHub1 # count the number of spaces and the number of words this function returns only the number of spaces needed to return multiple values on its own2 defcount_space (path):3Number_counts =04Space_counts =05Number_list = []6 7With open (path,'R') as F:8 forLine
def frenquence_statistic (file_name): frequence = {} for line in open (file_name, ' R '). ReadLines (): words = Line.strip (). Split ("") for word in words: word = '. Join (list (filter (Str.isalpha,word)). Lower () if Frequence.get (Word) = = None: Frequence[
python generates Chinese word cloud
What is the word cloud.
First, what is the word cloud? The word cloud is also called the text cloud, is to appear the high frequency in the text data "The key
, axis=0)#assign centroid to mean returnCentroids, ClusterassmentVi. SummaryBasically up to here, a usable Chinese text clustering tool has been completed, the GitHub project address.What about the effect?I myself have some unclassified articles belonging to 人生感悟 the (shy face) category of a total of 182, in the cut and remove the stop word after a total of 13,202 words, I set k=10, ah, the effect is not too good, of course, there may be a reason:
Basic usage of python jieba word segmentation module, pythonjieba
Jieba is a powerful word segmentation dictionary that supports Chinese word segmentation. This article briefly summarizes its basic usage.
Features
Three word segmentation modes are supported:
Accura
curriculum, suitable for Python the basic users, familiar with the python basic knowledge and deepen the consolidation.1.5 Code acquisitionYou can use the following command to download the code into the lab building environment, as a reference to learn from the comparison.$ wget http://labfile.oss.aliyuncs.com/courses/756/simple.py$ wget http://labfile.oss.aliyuncs.com/courses/756/my_word_cloud.pySecond, t
question No. 0006: You have a directory, put your diary for one months, are txt, in order to avoid the problem of word segmentation, if the content is English, please count the most important words you think of each diary.Idea: Switch to the target directory, then traverse the TXT file in that directory, match the word and number of the response with regular expressions, then let
Jieba"Stuttering" Chinese participle: do the best Python Chinese sub-phrase pieces. : Https://github.com/fxsjy/jiebaCharacteristics
Three types of Word breakers are supported:
Precision mode, try to cut the sentence most accurately, suitable for text analysis;
The whole mode, the sentence all can be the word words are scanned out, the s
This article mainly introduces the Python statistical word number of ideas, the text also provides you with no third-party module solutions, interested friends to see together
Problem Description:
Implements the function Count_words () in Python, which enters the string s and the number n and returns the n most frequently occurring words in S. The return value i
How do you feel after you've seen it? Do you want to make a piece of it yourself?If your answer is yes, we will not delay, today we step by step from scratch to make a word cloud analysis diagram. Of course, as the basis of the word cloud, certainly not compared to those two infographic cool. But it doesn't matter, the good start is half the success. The taste of the pulp, you can upgrade your skills in the
Source Address: Https://github.com/fxsjy/jiebaDemo Address: http://jiebademo.ap01.aws.af.cm/Feature 1, support three kinds of word-breaker mode:A, accurate mode, try to cut the sentence most accurately, suitable for text analysis;b, the whole mode, the sentence all can be words of words are scanned out, the speed is very fast, but can not solve the ambiguity;C, search engine mode, on the basis of accurate mode, the long
Python makes the word cloud (wordcloud) 1. Installation 某个教程给出的方法,到[这里][1]下载相应的wordcolud,然后到相应目录pip安装。 其实直接PIP INsTaLL WoRDCLouDOK, go to Python. Import Wordcloud success.# #2. Brief description of the documentThere are 3 main functions that can be seen in the document, and the Wordcloud modules and related functions are mainly introduced.
Wordcloud ()
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.